Re: [Python-Dev] Hashing proposal: change only string-only dicts

Martin v. Löwis Tue, 17 Jan 2012 17:00:39 -0800

Am 17.01.2012 22:26, schrieb Antoine Pitrou:
> On Tue, 17 Jan 2012 21:59:28 +0100
> "Martin v. Löwis" <[email protected]> wrote:
>> I'd like to propose a different approach to seeding the string hashes:
>> only do so for dictionaries involving only strings, and leave the
>> tp_hash slot of strings unchanged.
> 
> I think Python 3 would be better with a clean fix (all hashes
> randomized).
> Now for Python 2... The problem with this idea is that it only
> addresses str dicts. Unicode dicts, and any other dicts, are left
> vulnerable.


No, you misunderstood. I meant to propose that this applies to both
kinds of string (unicode and byte strings); for 2.x also dictionaries
including a mix of them.

> Only 2 bits are used in ob_sstate, meaning 30 are left. These 30 bits
> could cache a "hash perturbation" computed from the string and the
> random bits:
> 
> - hash() would use ob_shash
> - dict_lookup() would use ((ob_shash * 1000003) ^ (ob_sstate & ~3))
> 
> This way, you cache almost all computations, adding only a computation
> and a couple logical ops when looking up a string in a dict.

That's a good idea. For Unicode, it might be best to add another slot
into the object, even though this increases the object size.

Regards,
Martin
_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Hashing proposal: change only string-only dicts

Reply via email to