[issue13703] Hash collision security issue

Jim Jewett Wed, 25 Jan 2012 10:14:20 -0800

Jim Jewett <jimjjew...@gmail.com> added the comment:

On Wed, Jan 25, 2012 at 6:06 AM, Dave Malcolm <dmalc...@redhat.com>
added the comment:


>  hybrid-approach-dmalcolm-2012-01-25-001.patch

> As per haypo's random-8.patch, a randomization seed is read at startup.

Why not wait until it is needed?  I suspect a lot of scripts will
never need it for any dict, so why add the overhead to startup?

> Once a dict has transitioned to paranoid mode, it isn't using
> PyObject_Hash anymore, and thus isn't using cached object values

The alternative hashes could be stored in an id-keyed dict

 performing a more expensive calculation, but I believe this
calculation is essentially constant-time.
>
> This preserves hash() and dict order for the cases where you're not under 
> attack, and gracefully handles the attack without having to raise an 
> exception: it doesn't introduce any new exception types.
>
> It preserves ABI, assuming no-one else is reusing ma_smalltable.
>
> It is suitable for backporting to 3.2, 2.7, and earlier (I'm investigating 
> fixing this going all the way back to Python 2.2)
>
> Under the old implementation, there were 4 types of PyDictObject, given these 
> two booleans:
>  * "small vs large" i.e ma_table == ma_smalltable vs ma_table != ma_smalltable
>  * "all keys are str" vs arbitary keys i.e ma_lookdict == lookdict_unicode vs 
> lookdict
>
> Under this implementation, this doubles to 8 kinds, adding the boolean:
>  * normal hash vs randomized hash (normal vs "paranoid").
>
> This is expressed via the ma_lookdict callback, adding two new variants, 
> lookdict_unicode_paranoid and lookdict_paranoid
>
> Note that if a paranoid dict goes small again (ma_table == ma_smalltable), it 
> stays paranoid.  This is for simplicity: it avoids having to rebuild all of 
> the non-randomized me_hash values again (which could fail).
>
> Naturally the patch adds selftests.  I had to add some diagnostic methods to 
> support them; dict gains _stats() and _make_paranoid() methods, and sys gains 
> a _getrandomizedhash() method.  These could be hidden more thoroughly if need 
> be (see DICT_PROTECTION_TRACKING in dictobject.c).  Amongst other things, the 
> selftests measure wallclock time taken for various dict operations (and so 
> might introduce failures on a heavily-loaded machine, I guess).
>
> Hopefully this approach is a viable way forward.
>
> Caveats and TODO items:
>
> TODO: I haven't yet tuned the safety threshold.  According to 
> http://bugs.python.org/issue13703#msg151850:
>> slot collisions are much more frequent than
>> hash collisions, which only account for less than 0.01% of all
>> collisions.
>>
>> It also shows that slot collisions in the low 1-10 range are
>> most frequent, with very few instances of a dict lookup
>> reaching 20 slot collisions (less than 0.0002% of all
>> collisions).
>
> This suggests that the threshold of 32 slot/hash collisions per lookup may 
> already be high enough.
>
> TODO: in a review of an earlier version of the complexity detection idea, 
> Antoine Pitrou suggested that make the protection scale factor be a run-time 
> configurable value, rather than a #define.  This isn't done yet.
>
> TODO: run more extensive tests (e.g. Django and Twisted), monitoring the 
> worst-case complexity that's encountered
>
> TODO: not yet benchmarked and optimized.  I want to get feedback on the 
> approach before I go in and hand-optimize things (e.g. by hand-inlining 
> check_iter_count, and moving the calculations out of the loop etc).  I 
> believe any performance issues ought to be fixable, in that the we can get 
> the cost of this for the "we're not under attack" case to be negligible, and 
> the "under attack" case should transition from O(N^2) to O(N), albeit it with 
> a larger constant factor.
>
> TODO: this doesn't cover sets, but assuming this approach works, the patch 
> can be extended to cover it in an analogous way.
>
> TODO: should it cover PyMemoryViewObject, buffer object, etc?
>
> TODO: should it cover the hashing in Modules/expat/xmlparse.c?  FWIW I rip 
> this code out when doing my downstream builds in RHEL and Fedora, and instead 
> dynamically link against a system copy of expat
>
> TODO: only tested on Linux so far (which is all I've got).  Fedora 15 x86_64 
> fwiw
>
>  Doc/using/cmdline.rst     |    6
>  Include/bytesobject.h     |    2
>  Include/object.h          |    8
>  Include/pythonrun.h       |    2
>  Include/unicodeobject.h   |    2
>  Lib/os.py                 |   17 --
>  Lib/test/regrtest.py      |    5
>  Lib/test/test_dict.py     |  298 +++++++++++++++++++++++++++++++++++++
>  Lib/test/test_hash.py     |   53 ++++++
>  Lib/test/test_os.py       |   35 +++-
>  Makefile.pre.in           |    1
>  Modules/posixmodule.c     |  126 ++-------------
>  Objects/bytesobject.c     |    7
>  Objects/dictobject.c      |  369 
> +++++++++++++++++++++++++++++++++++++++++++++-
>  Objects/object.c          |   37 ++++
>  Objects/unicodeobject.c   |   51 ++++++
>  PCbuild/pythoncore.vcproj |    4
>  Python/pythonrun.c        |    3
>  Python/sysmodule.c        |   16 +
>  b/Python/random.c         |  268 +++++++++++++++++++++++++++++++++
>  20 files changed, 1173 insertions(+), 137 deletions(-)
>
> ----------
> Added file: 
> http://bugs.python.org/file24320/hybrid-approach-dmalcolm-2012-01-25-001.patch
>
> _______________________________________
> Python tracker <rep...@bugs.python.org>
> <http://bugs.python.org/issue13703>
> _______________________________________

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue13703>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue13703] Hash collision security issue

Reply via email to