Marc-Andre Lemburg <m...@egenix.com> added the comment: Some comments:
1. The security implications in all this is being somewhat overemphasized. There are many ways you can do a DoS attack on web servers. It's the responsibility of the used web frameworks and servers to deal with the possible cases. It's a good idea to provide some way to protect against hash collision attacks, but that will only solve one possible way of causing a resource attack on a server. There are other ways you can generate lots of CPU overhead with little data input (e.g. think of targeting the search feature on many Zope/Plone sites). In order to protect against such attacks in general, we'd have to provide a way to control CPU time and e.g. raise an exception if too much time is being spent on a simple operation such as a key insertion. This can be done using timers, signals or even under OS control. The easiest way to protect against the hash collision attack is by limiting the POST/GET/HEAD request size. The second best way would be to limit the number of parameters that a web framework accepts for POST/GET/HEAD request. 2. Changing the semantics of hashing in a dot release is not allowed. If randomization of the hash start vector or some other method is enabled by default in a dot release, this will change the semantics of any application switching to that dot release. The hash values of Python objects are not only used by the Python dictionary implementation, but also by other storage mechanisms such as on-disk dictionaries, inter-process object exchange via share memory, memcache, etc. Hence, if changed, the hash change should be disabled per default for dot releases and enabled for 3.3. 3. Changing the way strings are hashed doesn't solve the problem. Hash values of other types can easily be guessed as well, e.g. take integers which use a trivial hash function. We'd have to adapt all hash functions of the basic types in Python or come up with a generic solution using e.g. double-hashing in the dictionary/set implementations. 4. By just using a random start vector you change the absolute hash values for specific objects, but not the overall hash sequence or its period. An attacker only needs to create many hash collisions, not specific ones. It's the period of the hash function that's important in such attacks and that doesn't change when moving to a different start vector. 5. Hashing needs to be fast. It's one of the most used operations in Python. Please get experts into the boat like Tim Peters and Christian Tismer, who both have worked on the dict implementation and the hash functions, before experimenting with ad-hoc fixes. 6. Counting collisions could solve the issue without having to change hashing. Another idea would be counting the collisions and raising an exception if the number of collisions exceed a certain threshold. Such a change would work for all hashable Python objects and protect against the attack without changing any hash function. Thanks, -- Marc-Andre Lemburg eGenix.com ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ ---------- nosy: +lemburg _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue13703> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com