Am 17.01.2012 22:26, schrieb Antoine Pitrou: > On Tue, 17 Jan 2012 21:59:28 +0100 > "Martin v. Löwis" <mar...@v.loewis.de> wrote: >> I'd like to propose a different approach to seeding the string hashes: >> only do so for dictionaries involving only strings, and leave the >> tp_hash slot of strings unchanged. > > I think Python 3 would be better with a clean fix (all hashes > randomized). > Now for Python 2... The problem with this idea is that it only > addresses str dicts. Unicode dicts, and any other dicts, are left > vulnerable.
No, you misunderstood. I meant to propose that this applies to both kinds of string (unicode and byte strings); for 2.x also dictionaries including a mix of them. > Only 2 bits are used in ob_sstate, meaning 30 are left. These 30 bits > could cache a "hash perturbation" computed from the string and the > random bits: > > - hash() would use ob_shash > - dict_lookup() would use ((ob_shash * 1000003) ^ (ob_sstate & ~3)) > > This way, you cache almost all computations, adding only a computation > and a couple logical ops when looking up a string in a dict. That's a good idea. For Unicode, it might be best to add another slot into the object, even though this increases the object size. Regards, Martin _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com