Re: [Python-Dev] Hash collision security issue (now public)
On Thu, Dec 29, 2011 at 8:19 AM, Christian Heimes li...@cheimes.de wrote: Persistence layers like ZODB and cross interpreter communication channels used by multiprocessing may (!) rely on the fact that the hash of a string is fixed. ZODB does not rely on a fixed hash function for strings; for any application to rely on a stable hash would cause problems when updating Python versions. -Fred -- Fred L. Drake, Jr. fdrake at acm.org A person who won't read has no advantage over one who can't read. --Samuel Langhorne Clemens ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Hi, Just some extra thoughts about the whole topic in the light of web applications (since this was hinted in the talk) running on Python: Yes, you can limit the number of maximum allowed parameters for post data but really there are so many places where data is parsed into hashing containers that it's quite a worthless task. Here a very brief list of things usually parsed into a dict or set and where it happens: - URL parameters and url encoded form data Generally this happens somewhere in a framework but typically also in utility libraries that deal with URLs. For instance the stdlib's cgi.parse_qs or urllib.parse.parse_qs on Python 3 do just that and that code is used left and right. Even if a framework would start limiting it's own URL parsing there is still a lot of code that does not do that the stdlib does that as well. With form data it's worse because you have multipart headers that need parsing and that is usually abstracted away so far from the user that they do not do that. Many frameworks just use the cgi module's parsing functions which also just directly feed into a dictionary. - HTTP headers. There is zero a WSGI framework can do about that since the headers are parsed into a dictionary by the WSGI server. - Incoming JSON data. Again outside of what the framework can do for the most part. simplejson can be modified to stop parsing with the hook stuff but nobody does that and since users invoke simplejson's parsing routines themselves most webapps would still be vulnerable even if all frameworks would fix the problem. - Hidden dict parameters. Things like the parameter part of content-type or the content-disposition headers are generally also just parsed into a dictionary. Likewise many frameworks parse things into set headers (for instance incoming etags). The cookie header is usually parsed into a dictionary as well. The issue is nothing new and at least my current POV on this topic was that your server should be guarded and shoot handlers of requests going rogue. Dictionaries are not the only thing that has a worst case performance that could be triggered by user input. That said. Considering that there are so many different places where things are probably close to arbitrarily long that is parsed into a dictionary or other hashing structure it's hard for a web application developer or framework to protect itself against. In case the watchdog is not a viable solution as I had assumed it was, I think it's more reasonable to indeed consider adding a flag to Python that allows randomization of hashes optionally before startup. However as it was said earlier, the attack is a lot more complex to carry out on a 64bit environment that it's probably (as it stands right now!) safe to ignore. The main problem there however is not that it's a new attack but that some dickheads could now make prebaked attacks against websites to disrupt them that might cause some negative publicity. In general though there are so many more ways to DDOS a website than this that I would rate the whole issue very low. Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Hi, Something I should add to this now that I thought about it a bit more: Assuming this should be fixed on a language level the solution would probably be to salt hashes. The most common hash to salt here is the PyUnicode hash for obvious reasons. - Option a: Compiled in Salt + Easy to implement - Breaks unittests most likely (those were broken in the first place but that's still a very annoying change to make) - Might cause problems with interoperability of Pythons compiled with different hash salts - You're not really solving the problem because each linux distribution (besides Gentoo I guess) would have just one salt compiled in and that would be popular enough to have the same issue. - Option b: Environment variable for the salt + Easy-ish to implement + Easy to synchronize over different machines - initialization for base types happens early and unpredictive which makes it hard for embedded Python interpreters (think mod_wsgi and other things) to specify the salt - Option c: Random salt at runtime + Easy to implement - impossible to synchronize - breaks unittests in the same way as a compiled in salt would do Where to add the salt to? Unicode strings and bytestrings (byte objects) I guess since those are the most common offenders. Sometimes tuples are keys of dictionaries but in that case a contributing factor to the hash is the string in the tuple anyways. Also related: since this is a security related issue, would this be something that goes into Python 2? Does that affect how a fix would look like? Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On 12/28/2011 9:09 PM, Raymond Hettinger wrote: Also, randomizing the hash wreaks havoc on doctests, book examples not matching actual dict reprs, and on efforts by users to optimize the insertion order into dicts with frequent lookups. I don't have a strong opinion about what to do about this vulnerability, but I know that none of these three reasons are a good reason to not change anything. Dictionary key order has never been guaranteed, and changes from time to time. Any code relying on it is broken to begin with. This is one of the reasons not to use doctests in the first place: comparing dicts textually has always been silly. --Ned. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Michael Foord wrote: Hello all, A paper (well, presentation) has been published highlighting security problems with the hashing algorithm (exploiting collisions) in many programming languages Python included: http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf Although it's a security issue I'm posting it here because it is now public and seems important. The issue they report can cause (for example) handling an http post to consume horrible amounts of cpu. For Python the figures they quoted: reasonable-sized attack strings only for 32 bits Plone has max. POST size of 1 MB 7 minutes of CPU usage for a 1 MB request ~20 kbits/s → keep one Core Duo core busy This was apparently reported to the security list, but hasn't been responded to beyond an acknowledgement on November 24th (the original report didn't make it onto the security list because it was held in a moderation queue). The same vulnerability was reported against various languages and web frameworks, and is already fixed in some of them. Their recommended fix is to randomize the hash function. The attack relies on being able to predict the hash value for a given string. Randomising the string hash function is quite straightforward. There is no need to change the dictionary code. A possible (*untested*) patch is attached. I'll leave it for those more familiar with unicodeobject.c to do properly. Cheers, Mark diff -r f1298f4ec638 Objects/unicodeobject.c --- a/Objects/unicodeobject.c Wed Dec 28 14:59:40 2011 + +++ b/Objects/unicodeobject.c Thu Dec 29 11:11:09 2011 + @@ -41,6 +41,7 @@ #define PY_SSIZE_T_CLEAN #include Python.h #include ucnhash.h +#include time.h /* for seeding of hash value */ #ifdef MS_WINDOWS #include windows.h @@ -184,6 +185,9 @@ /* Single character Unicode strings in the Latin-1 range are being shared as well. */ static PyObject *unicode_latin1[256]; + +/* Base hash value (hash of empty string) */ +static Py_uhash_t base_hash; /* Fast detection of the most frequent whitespace characters */ const unsigned char _Py_ascii_whitespace[] = { @@ -11107,7 +1,7 @@ /* The hash function as a macro, gets expanded three times below. */ #define HASH(P) \ -x = (Py_uhash_t)*P 7; \ +x = base_hash + (Py_uhash_t)*P 7; \ while (--len = 0) \ x = (103*x) ^ (Py_uhash_t)*P++; @@ -13869,6 +13873,14 @@ int _PyUnicode_Init(void) { int i; +time_t now; + +time(now); +base_hash = (Py_uhash_t)now; +/* Avoid overly large numbers to reduce the chance of small strings + * hashing to -1 */ +if (base_hash == (Py_uhash_t)-1) +base_hash = (Py_uhash_t)-2; /* XXX - move this array to unicodectype.c ? */ Py_UCS2 linebreak[] = { ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On Wed, Dec 28, 2011 at 8:49 PM, Eric Snow ericsnowcurren...@gmail.com wrote: On Wed, Dec 28, 2011 at 6:28 PM, Michael Foord fuzzy...@voidspace.org.uk wrote: Hello all, A paper (well, presentation) has been published highlighting security problems with the hashing algorithm (exploiting collisions) in many programming languages Python included: http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf Although it's a security issue I'm posting it here because it is now public and seems important. The issue they report can cause (for example) handling an http post to consume horrible amounts of cpu. For Python the figures they quoted: reasonable-sized attack strings only for 32 bits Plone has max. POST size of 1 MB 7 minutes of CPU usage for a 1 MB request ~20 kbits/s → keep one Core Duo core busy This was apparently reported to the security list, but hasn't been responded to beyond an acknowledgement on November 24th (the original report didn't make it onto the security list because it was held in a moderation queue). The same vulnerability was reported against various languages and web frameworks, and is already fixed in some of them. Their recommended fix is to randomize the hash function. Ironically, this morning I ran across a discussion from about 8 years ago on basically the same thing: http://mail.python.org/pipermail/python-dev/2003-May/035874.html From what I read in the thread, it didn't seem like anyone here was too worried about it. Does this new research change anything? Not really. It's actually somewhat behind previous work in that it doesn't exploit the timing deltas, just generates very large ones. Geremy Condra ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On Thu, 29 Dec 2011 03:55:22 +0100 Christian Heimes li...@cheimes.de wrote: I've been dealing with web stuff and security for almost a decade. I've seen far worse attack vectors. This one can easily be solved with a couple of lines of Python code. For example Application developers can limit the maximum amount of POST parameters to a sensible amount and limit the length of each key, too. Shouldn't the setting be implemented by frameworks? CPython could aid developers with a special subclass of dict. The crucial lookup function is already overwrite-able per dict instance and on subclasses of dict through PyDictObj's struct member PyDictEntry *(*ma_lookup)(PyDictObject *mp, PyObject *key, long hash). For example specialized subclass could limit the seach for a free slot to n recursions or choose to ignore the hash argument and calculate its own hash of the key. Or, rather, the specialized subclass could implement hash randomization. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On Thu, 29 Dec 2011 04:04:17 +0100 Christian Heimes li...@cheimes.de wrote: Am 29.12.2011 02:37, schrieb Jesse Noller: Back up link for the PDF: http://dl.dropbox.com/u/1374/2007_28C3_Effective_DoS_on_web_application_platforms.pdf Ocert disclosure: http://www.ocert.org/advisories/ocert-2011-003.html From http://www.nruns.com/_downloads/advisory28122011.pdf --- Python uses a hash function which is very similar to DJBX33X, which can be broken using a meet-in-the-middle attack. It operates on register size and is thus different for 64 and 32 bit machines. While generating multi-collisions efficiently is also possible for the 64 bit version of the function, the resulting colliding strings are too large to be relevant for anything more than an academic attack. Plone as the most prominent Python web framework accepts 1 MB of POST data, which it parses in about 7 minutes of CPU time in the worst case. This gives an attacker with about 20 kbit/s the possibility to keep one Core Duo core constantly busy. If the attacker is in the position to have a Gigabit line available, he can keep about 50.000 Core Duo cores busy. --- If I remember correctly CPython uses the long for its hash function so 64bit Windows uses a 32bit hash. Not anymore, Py_hash_t is currently aligned with Py_ssize_t. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Raymond Hettinger wrote: FWIW, Uncle Timmy considers the non-randomized hashes to be a virtue. It is believed that they give us better-than-random results for commonly encountered datasets. A change to randomized hashes would have a negative performance impact on those cases. Tim Peter's analysis applies mainly to ints which would be unchanged. A change to the hash function for strings would make no difference to the performance of the dict, as the ordering of the hash values is already quite different from the ordering of the strings for any string of more than 3 characters. Also, randomizing the hash wreaks havoc on doctests, book examples not matching actual dict reprs, and on efforts by users to optimize the insertion order into dicts with frequent lookups. The docs clearly state that the ordering of iteration over dicts is arbitrary. Perhaps changing it once in a while might be a good thing :) Cheers, Mark. Raymond On Dec 28, 2011, at 5:28 PM, Michael Foord wrote: Hello all, A paper (well, presentation) has been published highlighting security problems with the hashing algorithm (exploiting collisions) in many programming languages Python included: http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf Although it's a security issue I'm posting it here because it is now public and seems important. The issue they report can cause (for example) handling an http post to consume horrible amounts of cpu. For Python the figures they quoted: reasonable-sized attack strings only for 32 bits Plone has max. POST size of 1 MB 7 minutes of CPU usage for a 1 MB request ~20 kbits/s → keep one Core Duo core busy This was apparently reported to the security list, but hasn't been responded to beyond an acknowledgement on November 24th (the original report didn't make it onto the security list because it was held in a moderation queue). The same vulnerability was reported against various languages and web frameworks, and is already fixed in some of them. Their recommended fix is to randomize the hash function. All the best, Michael -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/raymond.hettinger%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/mark%40hotpy.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On Thu, 29 Dec 2011 14:32:21 +0100 Christian Heimes li...@cheimes.de wrote: Am 29.12.2011 13:57, schrieb Armin Ronacher: Hi, Something I should add to this now that I thought about it a bit more: Assuming this should be fixed on a language level the solution would probably be to salt hashes. The most common hash to salt here is the PyUnicode hash for obvious reasons. - Option a: Compiled in Salt + Easy to implement - Breaks unittests most likely (those were broken in the first place but that's still a very annoying change to make) - Might cause problems with interoperability of Pythons compiled with different hash salts - You're not really solving the problem because each linux distribution (besides Gentoo I guess) would have just one salt compiled in and that would be popular enough to have the same issue. - Option b: Environment variable for the salt + Easy-ish to implement + Easy to synchronize over different machines - initialization for base types happens early and unpredictive which makes it hard for embedded Python interpreters (think mod_wsgi and other things) to specify the salt - Option c: Random salt at runtime + Easy to implement - impossible to synchronize - breaks unittests in the same way as a compiled in salt would do - Option d: Don't change __hash__ but only use randomized hash for PyDictEntry lookup + Easy to implement - breaks only software to relies on a fixed order of dict keys - breaks only a few to no unit tests IMHO we don't have to alter the outcome of hash(some string), hash(1) and all other related types. We just need to reduce the change the an attacker can produce collisions in the dict (and set?) code that looks up the slot (PyDictEntry). How about adding the random value in Object/dictobject.c:lookdict() and lookdict_str() (Python 2.x) / lookdict_unicode() (Python 3.x)? With this approach the hash of all our objects stay the same and just the dict code needs to be altered. The approach has also the benefit that all possible objects gain a randomized hash. I basically agree with your proposal. The only downside is that custom hashed containers (such as _pickle.c's memotable) don't automatically benefit. That said, I think it would be difficult to craft an attack against the aforementioned memotable (you would have to basically choose the addresses of pickled objects). Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Hi, how about Option d: Host based salt + Easy-ish to implement – how about basing it on the hostname for example? + transparent for all processes on the same host - probably unit test breakage In fact, we could use host based as default with the option to specify own which would solve the sync problems. That said, I agree with Armin that fixing this in the frameworks isn't an option. Regards, Hynek Am Donnerstag, 29. Dezember 2011 um 13:57 schrieb Armin Ronacher: Hi, Something I should add to this now that I thought about it a bit more: Assuming this should be fixed on a language level the solution would probably be to salt hashes. The most common hash to salt here is the PyUnicode hash for obvious reasons. - Option a: Compiled in Salt + Easy to implement - Breaks unittests most likely (those were broken in the first place but that's still a very annoying change to make) - Might cause problems with interoperability of Pythons compiled with different hash salts - You're not really solving the problem because each linux distribution (besides Gentoo I guess) would have just one salt compiled in and that would be popular enough to have the same issue. - Option b: Environment variable for the salt + Easy-ish to implement + Easy to synchronize over different machines - initialization for base types happens early and unpredictive which makes it hard for embedded Python interpreters (think mod_wsgi and other things) to specify the salt - Option c: Random salt at runtime + Easy to implement - impossible to synchronize - breaks unittests in the same way as a compiled in salt would do Where to add the salt to? Unicode strings and bytestrings (byte objects) I guess since those are the most common offenders. Sometimes tuples are keys of dictionaries but in that case a contributing factor to the hash is the string in the tuple anyways. Also related: since this is a security related issue, would this be something that goes into Python 2? Does that affect how a fix would look like? Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org (mailto:Python-Dev@python.org) http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/hs%40ox.cx ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On Thu, 29 Dec 2011 11:25:03 + Mark Shannon m...@hotpy.org wrote: Also, randomizing the hash wreaks havoc on doctests, book examples not matching actual dict reprs, and on efforts by users to optimize the insertion order into dicts with frequent lookups. The docs clearly state that the ordering of iteration over dicts is arbitrary. Perhaps changing it once in a while might be a good thing :) We already change it once in a while. http://twistedmatrix.com/trac/ticket/5352 ;) Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Am 29.12.2011 12:13, schrieb Mark Shannon: The attack relies on being able to predict the hash value for a given string. Randomising the string hash function is quite straightforward. There is no need to change the dictionary code. A possible (*untested*) patch is attached. I'll leave it for those more familiar with unicodeobject.c to do properly. I'm worried that hash randomization of str is going to break 3rd party software that rely on a stable hash across multiple Python instances. Persistence layers like ZODB and cross interpreter communication channels used by multiprocessing may (!) rely on the fact that the hash of a string is fixed. Perhaps the dict code is a better place for randomization. The code in lookdict() and lookdict_unicode() could add a value to the hash. My approach is less intrusive and also closes the attack vector for all possible objects including str, byte, int and so on. I like also Armin's idea of an optional hash randomization. Christian ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Am 29.12.2011 12:10, schrieb Antoine Pitrou: I've been dealing with web stuff and security for almost a decade. I've seen far worse attack vectors. This one can easily be solved with a couple of lines of Python code. For example Application developers can limit the maximum amount of POST parameters to a sensible amount and limit the length of each key, too. Shouldn't the setting be implemented by frameworks? Web framework like Django or CherryPy can be considered an application from the CPython core's point of view. ;) You are right. The term framework is a better word. CPython could aid developers with a special subclass of dict. The crucial lookup function is already overwrite-able per dict instance and on subclasses of dict through PyDictObj's struct member PyDictEntry *(*ma_lookup)(PyDictObject *mp, PyObject *key, long hash). For example specialized subclass could limit the seach for a free slot to n recursions or choose to ignore the hash argument and calculate its own hash of the key. Or, rather, the specialized subclass could implement hash randomization. Yeah! I was thinking about the same when I wrote calculate its own hash but I was too sloppy to carry on my argument. Please take 3am as my excuse. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Mark Shannon wrote: Michael Foord wrote: Hello all, A paper (well, presentation) has been published highlighting security problems with the hashing algorithm (exploiting collisions) in many programming languages Python included: http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf Although it's a security issue I'm posting it here because it is now public and seems important. The issue they report can cause (for example) handling an http post to consume horrible amounts of cpu. For Python the figures they quoted: reasonable-sized attack strings only for 32 bits Plone has max. POST size of 1 MB 7 minutes of CPU usage for a 1 MB request ~20 kbits/s → keep one Core Duo core busy This was apparently reported to the security list, but hasn't been responded to beyond an acknowledgement on November 24th (the original report didn't make it onto the security list because it was held in a moderation queue). The same vulnerability was reported against various languages and web frameworks, and is already fixed in some of them. Their recommended fix is to randomize the hash function. The attack relies on being able to predict the hash value for a given string. Randomising the string hash function is quite straightforward. There is no need to change the dictionary code. A possible (*untested*) patch is attached. I'll leave it for those more familiar with unicodeobject.c to do properly. The paper mentions that several web frameworks work around this by limiting the number of parameters per GET/POST/HEAD request. This sounds like a better alternative than randomizing the hash function of strings. Uncontrollable randomization has issues when you work with multi-process setups, since the processes would each use different hash values for identical strings. Putting the base_hash value under application control could be done to solve this problem, making sure that all processes use the same random base value. BTW: Since your randomization trick uses the current time, it would also be rather easy to tune an attack to find the currently used base_hash. To make this safe, you'd have to use a more random source for initializing the base_hash. Note that the same hash collision attack can be used for other key types as well, e.g. integers (where it's very easy to find hash collisions), so this kind of randomization would have to be applied to other basic types too. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Dec 29 2011) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On Thu, 29 Dec 2011 13:57:07 +0100 Armin Ronacher armin.ronac...@active-4.com wrote: - Option c: Random salt at runtime + Easy to implement - impossible to synchronize - breaks unittests in the same way as a compiled in salt would do This option would have my preference. I don't think hash() was ever meant to be synchronizable. Already using a 32-bit Python will give you different results from a 64-bit Python. As for breaking unittests, these tests were broken in the first place. hash() does change from time to time. Where to add the salt to? Unicode strings and bytestrings (byte objects) I guess since those are the most common offenders. Sometimes tuples are keys of dictionaries but in that case a contributing factor to the hash is the string in the tuple anyways. Or it could be a process-wide constant for all dicts. If the constant is additive as proposed by Mark the impact should be negligible. (but the randomness must be good enough) Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Am 29.12.2011 11:32, schrieb Antoine Pitrou: If I remember correctly CPython uses the long for its hash function so 64bit Windows uses a 32bit hash. Not anymore, Py_hash_t is currently aligned with Py_ssize_t. Thanks for the update. Python 2.x still uses long and several large frameworks like Zope/Plone require 2.x. Christian ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Am 29.12.2011 13:57, schrieb Armin Ronacher: Hi, Something I should add to this now that I thought about it a bit more: Assuming this should be fixed on a language level the solution would probably be to salt hashes. The most common hash to salt here is the PyUnicode hash for obvious reasons. - Option a: Compiled in Salt + Easy to implement - Breaks unittests most likely (those were broken in the first place but that's still a very annoying change to make) - Might cause problems with interoperability of Pythons compiled with different hash salts - You're not really solving the problem because each linux distribution (besides Gentoo I guess) would have just one salt compiled in and that would be popular enough to have the same issue. - Option b: Environment variable for the salt + Easy-ish to implement + Easy to synchronize over different machines - initialization for base types happens early and unpredictive which makes it hard for embedded Python interpreters (think mod_wsgi and other things) to specify the salt - Option c: Random salt at runtime + Easy to implement - impossible to synchronize - breaks unittests in the same way as a compiled in salt would do - Option d: Don't change __hash__ but only use randomized hash for PyDictEntry lookup + Easy to implement - breaks only software to relies on a fixed order of dict keys - breaks only a few to no unit tests IMHO we don't have to alter the outcome of hash(some string), hash(1) and all other related types. We just need to reduce the change the an attacker can produce collisions in the dict (and set?) code that looks up the slot (PyDictEntry). How about adding the random value in Object/dictobject.c:lookdict() and lookdict_str() (Python 2.x) / lookdict_unicode() (Python 3.x)? With this approach the hash of all our objects stay the same and just the dict code needs to be altered. The approach has also the benefit that all possible objects gain a randomized hash. Also related: since this is a security related issue, would this be something that goes into Python 2? Does that affect how a fix would look like? IMHO it does affect the fix. A changed and randomized hash function may break software that relies on a stable hash() function. Christian ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
+1 to option d (host-based salt) but would need to consistently order the hostnames/addresses to guarantee that all processes on the same machine got the same salt by default. +1 to option c (environment variable) as an override. And/or maybe an override on the command line. +1 to implementing the salt in the dictionary hash as an additive value. +0 to exposing the salt as a constant (3.3+ only) - or alternatively expose a hash function that just takes an existing hash and returns the salted hash. That would make it very easy for anything that wanted a salted hash to get one. For choosing the default salt, I think something like: a. If IPv6 is enabled, take the link-local address of the interface with the default route. Pretty much guaranteed not to change, can't be determined externally (salting doesn't need a secret, but it doesn't hurt), large number so probably a good salt. (If it is likely to change, a salt override should be being used instead). Don't use any other IPv6 address. In particular, never use a temporary IPv6 address like Windows assigns - multiprocessing could end up with instances with different salts. b. Take the FQDN of the machine. Tim Delaney ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
+1 to option c (environment variable) as an override. And/or maybe an override on the command line. That obviously should have said option b (environment variable) ... Tim Delaney ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On Thu, Dec 29, 2011 at 8:32 AM, Christian Heimes li...@cheimes.de wrote: IMHO we don't have to alter the outcome of hash(some string), hash(1) and all other related types. We just need to reduce the change the an attacker can produce collisions in the dict (and set?) code that looks up the slot (PyDictEntry). How about adding the random value in Object/dictobject.c:lookdict() and lookdict_str() (Python 2.x) / lookdict_unicode() (Python 3.x)? With this approach the hash of all our objects stay the same and just the dict code needs to be altered. I don't understand how that helps a collision attack. If you can still generate two strings with the same (pre-randomized) hash, what difference does it make that the dict adds a random number? The post-randomized number will still be the same, no? Or does this attack just rely on the hash *remainders* being the same? If so, I can see how hashing the hash would help. But since the attacker doesn't know the modulus, and it can change as the dictionary grows, I would expect the attack to require matching hashes, not just matching hash remainders... unless I'm just completely off base here. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
It's worth pointing out that if the salt is somehow exposed to an attacker, or is guessable, much of the benefit goes away. It's likely that a timing attack could be used to discover the salt if it is fixed per machine or process over a long period of time. If a salt is generally fixed per machine, but varies from machine-to-machine, I think we'll see an influx of frustrated devs who have something that works perfectly on their machine but not for others. It doesn't matter that they're doing it wrong, we'll still have to deal with them as a community. This seems like an argument in favor of randomizing it at runtime by default, so it fails early for them. Allowing an environment and command line override makes sense, as it allows users to rotate the salt as frequently as their needs dictate. -Paul ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Am 29.12.2011 21:07, schrieb PJ Eby: I don't understand how that helps a collision attack. If you can still generate two strings with the same (pre-randomized) hash, what difference does it make that the dict adds a random number? The post-randomized number will still be the same, no? Or does this attack just rely on the hash *remainders* being the same? If so, I can see how hashing the hash would help. But since the attacker doesn't know the modulus, and it can change as the dictionary grows, I would expect the attack to require matching hashes, not just matching hash remainders... unless I'm just completely off base here. The attack doesn't need perfect collisions. The attacker calculates strings in a way so that their hashes results in as many collision as possible in the dict code. An attacker succeeds when the initial slot for an hash is filled and as many subsequent slots of the perturbed masked hash, too. Also an attacker can easily predict the size and therefore the mask for the hash remainder. A POST request parser usually starts with an empty dict and the growth rate of Python's dicts is well documented. The changing mask makes the attack just a tiny bit more challenging. The hash randomization idea adds a salt to throw the attacker of course. Instead of position = hash mask it's now hash = salt + hash position = hash mask where salt is a random, process global value that is fixed for the life time of the program. The salt also affects the perturbance during the search for new slots. As you already stated this salt won't be affective against full hash collisions. The attack needs A LOT of problematic strings to become an issue, possible hundred of thousands or even millions of keys in a very large POST request. In reality an attacker won't reach the full theoretical O(n^2) performance degradation for a hash table. But even more than O(n) instead of O(1) for a million keys in each request has some impact on your servers' CPUs. Some vendors have limited to POST request to 1 MB or the amount of keys to 10,000 to work around the issue. One paper also states that attacks on Python with 64bit is just academical for now. Christian ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
The talk was a presentation yesterday by Alexander Klink and Julian Wälde at the Chaos Communication Congress in Germany hash...@alech.de I read the non-technical summary at http://arstechnica.com/business/news/2011/12/huge-portions-of-web-vulnerable-to-hashing-denial-of-service-attack.ars and watched the video of the talk at https://www.youtube.com/watch?feature=player_embeddedv=_EEhviEO1Vo# My summary: hash table creation with N keys changes from amortized O(N) to O(N**2) time if the hash values of all the keys are the same. This should only happen for large N if done intentionally. This is easy to accomplish with a linear multiply and add hash function, such as used in PHP4 (but nowhere else that the authors found). A nonlinear multiply and xor hash function, used in one form or another by everything else, is much harder to break. It is *theoretically* vulnerable to brute-force search and this has been known for years. With a more cleaver meet-in-the-middle strategy, that builds a dict of suffixes and then searches for matching prefixes, 32-bit hashes are *practically* vulnerable. The attack depends on, for instance, 2**16 (64K) being 1/64K of 2**32. (I did not hear when this strategy was developed, but it is certainly more practical on a desktop now than even 8 years ago.) [64-bit hashes are much, much less vulnerable to attack, at least for now. So it seems to me that anyone who hashes potential attack data can avoid the problem by using 64-bit Python with 64-bit hash values. If I understood Antoine, that should be all 64-bit builds.] More summary: Perl added an #define option to start the hash calculation with non-zero value instead of 0 years ago to avoid algorithmic complexity attacks. The patch is at 47:20 in the video. The authors believe all should do similarly. [The change amounts to adding a char, unknown to attackers, to the beginning of every string before hashing. So it adds a small bit of time. The code patch shown did not show the source of the non-zero seed or the timing and scope of any randomization. As the discussion here has shown, this is an important issue to applications. So 'do the same' is inadequate and over-simplified advice. I believe Armin's patch is similar to the Perl patch.] Since the authors sent out CERT alert about Nov 1, PHP has added to PHP5 a new function to limit the number of vars hashed. Microsoft will do something similar now with hash randomization to follow (maybe?). JRuby is going to do something. Java does not think it needs to change Java itself, but will leave all to the frameworks. [The discussion here suggests that this is an inadequate response for 32-bit systems like Java since one person/group may not control all the pieces of a server system. However, a person or group can run all pieces on a system Python with an option turned on.] -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On 12/29/2011 4:31 PM, Christian Heimes wrote: The hash randomization idea adds a salt to throw the attacker of course. Instead of position = hash mask it's now hash = salt + hash As I understood the talk (actually, the bit of Perl interpreter C code shown), the randomization is to change hash(s) to hash(salt+s) so that the salt is completely mixed into the hash from the beginning, rather than just tacked on at the end. -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Am 29.12.2011 23:28, schrieb Terry Reedy: As I understood the talk (actually, the bit of Perl interpreter C code shown), the randomization is to change hash(s) to hash(salt+s) so that the salt is completely mixed into the hash from the beginning, rather than just tacked on at the end. Yes, the Perl and Ruby code uses a random seed as IV for hash generation. It's the best way to create randomized hashes but it might not be a feasible fix for Python 2.x. I'm worried that it might break applications that rely on stable hash values. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On 30 December 2011 06:59, Tim Delaney timothy.c.dela...@gmail.com wrote: +0 to exposing the salt as a constant (3.3+ only) - or alternatively expose a hash function that just takes an existing hash and returns the salted hash. That would make it very easy for anything that wanted a salted hash to get one. Sorry - brain fart on my part there - the salt needs to be included right from the start. Tim Delaney ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com