On Tue, Jan 17, 2012 at 12:59 PM, "Martin v. Löwis" <mar...@v.loewis.de>wrote:
> I'd like to propose a different approach to seeding the string hashes: > only do so for dictionaries involving only strings, and leave the > tp_hash slot of strings unchanged. > > Each string would get two hashes: the "public" hash, which is constant > across runs and bugfix releases, and the dict-hash, which is only used > by the dictionary implementation, and only if all keys to the dict are > strings. In order to allow caching of the hash, all dicts should use > the same hash (if caching wasn't necessary, each dict could use its own > seed). > > There are several variants of that approach wrt. caching of the hash > 1. add an additional field to all string objects, to cache the second > hash value. > yuck, our objects are large enough as it is. > a) variant: in 3.3, drop the extra field, and declare that hashes > may change across runs > +1 Absolutely. We can and should make 3.3 change hashes across runs (behavior that can be disabled via a flag or environment variable). I think the issue of doctests and such breaking even in 2.7 due to hash order changes is a being overblown. Code like that has already needs to fix its tests at least once when they want tests to pass on on both 32-bit and 64-bit python VMs (they have different hashes). Do we have _any_ measure of how big a deal this will be before going too far here? -gps
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com