Jim Jewett <jimjjew...@gmail.com> added the comment: On Wed, Jan 25, 2012 at 6:06 AM, Dave Malcolm <dmalc...@redhat.com> added the comment:
> hybrid-approach-dmalcolm-2012-01-25-001.patch > As per haypo's random-8.patch, a randomization seed is read at startup. Why not wait until it is needed? I suspect a lot of scripts will never need it for any dict, so why add the overhead to startup? > Once a dict has transitioned to paranoid mode, it isn't using > PyObject_Hash anymore, and thus isn't using cached object values The alternative hashes could be stored in an id-keyed dict performing a more expensive calculation, but I believe this calculation is essentially constant-time. > > This preserves hash() and dict order for the cases where you're not under > attack, and gracefully handles the attack without having to raise an > exception: it doesn't introduce any new exception types. > > It preserves ABI, assuming no-one else is reusing ma_smalltable. > > It is suitable for backporting to 3.2, 2.7, and earlier (I'm investigating > fixing this going all the way back to Python 2.2) > > Under the old implementation, there were 4 types of PyDictObject, given these > two booleans: > * "small vs large" i.e ma_table == ma_smalltable vs ma_table != ma_smalltable > * "all keys are str" vs arbitary keys i.e ma_lookdict == lookdict_unicode vs > lookdict > > Under this implementation, this doubles to 8 kinds, adding the boolean: > * normal hash vs randomized hash (normal vs "paranoid"). > > This is expressed via the ma_lookdict callback, adding two new variants, > lookdict_unicode_paranoid and lookdict_paranoid > > Note that if a paranoid dict goes small again (ma_table == ma_smalltable), it > stays paranoid. This is for simplicity: it avoids having to rebuild all of > the non-randomized me_hash values again (which could fail). > > Naturally the patch adds selftests. I had to add some diagnostic methods to > support them; dict gains _stats() and _make_paranoid() methods, and sys gains > a _getrandomizedhash() method. These could be hidden more thoroughly if need > be (see DICT_PROTECTION_TRACKING in dictobject.c). Amongst other things, the > selftests measure wallclock time taken for various dict operations (and so > might introduce failures on a heavily-loaded machine, I guess). > > Hopefully this approach is a viable way forward. > > Caveats and TODO items: > > TODO: I haven't yet tuned the safety threshold. According to > http://bugs.python.org/issue13703#msg151850: >> slot collisions are much more frequent than >> hash collisions, which only account for less than 0.01% of all >> collisions. >> >> It also shows that slot collisions in the low 1-10 range are >> most frequent, with very few instances of a dict lookup >> reaching 20 slot collisions (less than 0.0002% of all >> collisions). > > This suggests that the threshold of 32 slot/hash collisions per lookup may > already be high enough. > > TODO: in a review of an earlier version of the complexity detection idea, > Antoine Pitrou suggested that make the protection scale factor be a run-time > configurable value, rather than a #define. This isn't done yet. > > TODO: run more extensive tests (e.g. Django and Twisted), monitoring the > worst-case complexity that's encountered > > TODO: not yet benchmarked and optimized. I want to get feedback on the > approach before I go in and hand-optimize things (e.g. by hand-inlining > check_iter_count, and moving the calculations out of the loop etc). I > believe any performance issues ought to be fixable, in that the we can get > the cost of this for the "we're not under attack" case to be negligible, and > the "under attack" case should transition from O(N^2) to O(N), albeit it with > a larger constant factor. > > TODO: this doesn't cover sets, but assuming this approach works, the patch > can be extended to cover it in an analogous way. > > TODO: should it cover PyMemoryViewObject, buffer object, etc? > > TODO: should it cover the hashing in Modules/expat/xmlparse.c? FWIW I rip > this code out when doing my downstream builds in RHEL and Fedora, and instead > dynamically link against a system copy of expat > > TODO: only tested on Linux so far (which is all I've got). Fedora 15 x86_64 > fwiw > > Doc/using/cmdline.rst | 6 > Include/bytesobject.h | 2 > Include/object.h | 8 > Include/pythonrun.h | 2 > Include/unicodeobject.h | 2 > Lib/os.py | 17 -- > Lib/test/regrtest.py | 5 > Lib/test/test_dict.py | 298 +++++++++++++++++++++++++++++++++++++ > Lib/test/test_hash.py | 53 ++++++ > Lib/test/test_os.py | 35 +++- > Makefile.pre.in | 1 > Modules/posixmodule.c | 126 ++------------- > Objects/bytesobject.c | 7 > Objects/dictobject.c | 369 > +++++++++++++++++++++++++++++++++++++++++++++- > Objects/object.c | 37 ++++ > Objects/unicodeobject.c | 51 ++++++ > PCbuild/pythoncore.vcproj | 4 > Python/pythonrun.c | 3 > Python/sysmodule.c | 16 + > b/Python/random.c | 268 +++++++++++++++++++++++++++++++++ > 20 files changed, 1173 insertions(+), 137 deletions(-) > > ---------- > Added file: > http://bugs.python.org/file24320/hybrid-approach-dmalcolm-2012-01-25-001.patch > > _______________________________________ > Python tracker <rep...@bugs.python.org> > <http://bugs.python.org/issue13703> > _______________________________________ ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue13703> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com