Re: [Python-Dev] Hash collision security issue (now public)
> That said, I don't think smallest-format is actually enforced with > anything stronger than comments (such as in unicodeobject.h struct > PyASCIIObject) and asserts (mostly calling > _PyUnicode_CheckConsistency). I don't have any insight on how > prevalent non-conforming strings will be in practice, or whether > supporting their equality will be required as a bugfix. If you are only Python, you cannot create a string in a non canonical form. If you use the C API, you can create a string in a non canonical form using PyUnicode_New() + PyUnicode_WRITE, or PyUnicode_FromUnicode(NULL, length) (or PyUnicode_FromStringAndSize(NULL, length)) + direct access to the Py_UNICODE* string. If you create strings in a non canonical form, it is a bug in your application and Python doesn't help you. But how could Python help you? Expose a function to check your newly creating string? There is already _PyUnicode_CheckConsistency() which is slow (O(n)) because it checks each character, it is only used in debug mode. Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Jim Jewett, 08.01.2012 23:33: > Stefan Behnel wrote: >> Admittedly, this may require some adaptation for the PEP393 unicode memory >> layout in order to produce identical hashes for all three representations >> if they represent the same content. > > They SHOULD NOT represent the same content; comparing two strings > currently requires converting them to canonical form, which means the > smallest format (of those three) that works. > [...] > That said, I don't think smallest-format is actually enforced with > anything stronger than comments (such as in unicodeobject.h struct > PyASCIIObject) and asserts (mostly calling > _PyUnicode_CheckConsistency). That's what I meant. AFAIR, the PEP393 discussions at some point brought up the suspicion that third party code may end up generating Unicode strings that do not comply with that "invariant". So internal code shouldn't strictly rely on it when it deals with user provided data. One example is the "unequal kinds" optimisation in equality comparison, which, if I'm not mistaken, wasn't implemented, due to exactly this reasoning. The same applies to hashing then. Stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On Sun, Jan 8, 2012 at 16:33, Jim Jewett wrote: > In http://mail.python.org/pipermail/python-dev/2012-January/115368.html > Stefan Behnel wrote: Can you please configure your mail client to not create new threads like this? As if this topic wasn't already hard enough to follow, it now exists across handfuls of threads with the same title. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On 1/7/2012 12:57 PM, Christian Heimes wrote: Am 07.01.2012 12:02, schrieb Stefan Behnel: Admittedly, this may require some adaptation for the PEP393 unicode memory layout in order to produce identical hashes for all three representations if they represent the same content. So it's not a drop-in replacement. Is this condition required and implemented at the moment? If o1 == o2, then hash(o1) == hash(o2) is an unstated requirement implied by "They [hash values] are used to quickly compare dictionary keys during a dictionary lookup." since hash(o1) != hash(o2) is taken to mean o1 != o2 (whereas hash(o1) == hash(o2) is taken to mean o1 == o2 is possible but must be checked). Hashing should be a coarsening of == as an equivalence relationship. -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Am 07.01.2012 12:02, schrieb Stefan Behnel: > Wouldn't Bob Jenkins' "lookup3" hash function fit in here? After all, it's > portable, known to provide a very good distribution for different string > values and is generally fast on both 32 and 64 bit architectures. > > http://burtleburtle.net/bob/c/lookup3.c > > The analysis is here: > > http://burtleburtle.net/bob/hash/doobs.html > > It seems that there's also support for generating 64bit hash values > (actually 2x32bits) efficiently. This thread as well as the ticket is getting so long that people barely have a chance to catch up ... Guido has stated that he doesn't want a completely new hash algorithm for Python 2.x to 3.2. A new hash algorithm for 3.3 needs a PEP, too. I've done some experiments with FNV and Murmur3. With Murmur3 128bit I've seen some minor speed improvements on 64bit platforms. At first I was surprised but it makes sense. Murmur3 operates on uint32_t blocks while Python's hash algorithm iterates over 1 byte (bytes, ASCII), 2 bytes (USC2) or 4 bytes (USC4) types. Since most strings are either ASCII or UCS2, the inner loop of the current algorithm is more tight. > Admittedly, this may require some adaptation for the PEP393 unicode memory > layout in order to produce identical hashes for all three representations > if they represent the same content. So it's not a drop-in replacement. Is this condition required and implemented at the moment? Christian ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Christian Heimes, 31.12.2011 04:59: > Am 31.12.2011 03:22, schrieb Victor Stinner: > The unique structure of CPython's dict implementation makes it harder to > get the number of values with equal hash. The academic hash map (the one > I learnt about at university) uses a bucket to store all elements with > equal hash (more precise hash: mod mask). However Python's dict however > perturbs the hash until it finds a free slot its array. The second, > third ... collision can be caused by a legit and completely different > (!) hash. > >> The last choice is to change the hash algorithm. The *idea* is the same >> than adding salt to hashed password (in practice it will be a little bit >> different): if a pseudo-random salt is added, the attacker cannot >> prepare a single dataset, he/she will have to regenerate a new dataset >> for each possible salt value. If the salt is big enough (size in bits), >> the attacker will need too much CPU to generate the dataset (compute N >> keys with the same hash value). Basically, it slows down the attack by >> 2^(size of the salt). > > That's the idea of randomized hashing functions as implemented by Ruby > 1.8, Perl and others. The random seed is used as IV. Multiple rounds of > multiply, XOR and MOD (integer overflows) cause a deviation. In your > other posting you were worried about the performance implication. A > randomized hash function just adds a single ADD operation, that's all. > > Downside: With randomization all hashes are unpredictable and change > after every restart of the interpreter. This has some subtle side > effects like a different outcome of {a:1, b:1, c:1}.keys() after a > restart of the interpreter. > >> Another possibility would be to replace our fast hash function by a >> better hash function like MD5 or SHA1 (so the creation of the dataset >> would be too slow in practice = too expensive), but cryptographic hash >> functions are much slower (and so would slow down Python too much). > > I agree with your analysis. Cryptographic hash functions are far too > slow for our use case. During my research I found another hash function > that claims to be fast and that may not be vulnerable to this kind of > attack: http://isthe.com/chongo/tech/comp/fnv/ Wouldn't Bob Jenkins' "lookup3" hash function fit in here? After all, it's portable, known to provide a very good distribution for different string values and is generally fast on both 32 and 64 bit architectures. http://burtleburtle.net/bob/c/lookup3.c The analysis is here: http://burtleburtle.net/bob/hash/doobs.html It seems that there's also support for generating 64bit hash values (actually 2x32bits) efficiently. Admittedly, this may require some adaptation for the PEP393 unicode memory layout in order to produce identical hashes for all three representations if they represent the same content. So it's not a drop-in replacement. Stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On 1/5/2012 4:10 PM, Nick Coghlan wrote: On Fri, Jan 6, 2012 at 8:15 AM, Serhiy Storchaka wrote: 05.01.12 21:14, Glenn Linderman написав(ла): So, fixing the vulnerable packages could be a sufficient response, rather than changing the hash function. How to fix? Each of those above allocates and returns a dict. Simply have each of those allocate and return and wrapped dict, which has the following behaviors: i) during __init__, create a local, random, string. ii) for all key values, prepend the string, before passing it to the internal dict. Good idea. Thanks for the implementation, Serhiy. That is the sort of thing I had in mind, indeed. Not a good idea - a lot of the 3rd party tests that depend on dict ordering are going to be using those modules anyway, Stats? Didn't someone post a list of tests that fail when changing the hash? Oh, those were stdlib tests, not 3rd party tests. I'm not sure how to gather the stats, then, are you? so scattering our solution across half the standard library is needlessly creating additional work without really reducing the incompatibility problem. Half the standard library? no one has cared to augment my list of modules, but I have seen reference to JSON in addition to cgi and urllib.parse. I think there are more than 6 modules in the standard library... If we're going to change anything, it may as well be the string hashing algorithm itself. Changing the string hashing algorithm is known (or at least no one has argued otherwise) to be a source of backward incompatibility that will break programs. My proposal (and Serhiy's implementation, assuming it works, or can be easily tweaked to work, I haven't reviewed it in detail or attempted to test it) will only break programs that have vulnerabilities. I failed to mention one other benefit of my proposal: every web request would have a different random prefix, so attempting to gather info is futile: the next request has a different random prefix, so different strings would collide. Cheers, Nick. Indeed it is nice when we can be cheery even when arguing, for the most part :) I've enjoyed reading the discussions in this forum because most folks have respect for other people's opinions, even when they differ. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On 6 January 2012 20:25, Mark Shannon wrote: > Hi, > > It seems to me that half the folk discussing this issue want a super-strong, > resist-all-hypothetical-attacks hash with little regard to performance. The > other half want no change or a change that will have no observable effect. > (I may be exaggerating a little.) > > Can I propose the following, half-way proposal: > > 1. Since there is a published vulnerability, > that we fix it with the most efficient solution proposed so far: > http://bugs.python.org/file24143/random-2.patch > > 2. Decide which versions of Python this should be applied to. > 3.3 seems a given, the other are open to debate. > > 3. If and only if (and I think this unlikely) the solution chosen is shown > to be vulnerable to a more sophisticated attack then a new issue should be > opened and dealt with separately. +1 Paul ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Hi, It seems to me that half the folk discussing this issue want a super-strong, resist-all-hypothetical-attacks hash with little regard to performance. The other half want no change or a change that will have no observable effect. (I may be exaggerating a little.) Can I propose the following, half-way proposal: 1. Since there is a published vulnerability, that we fix it with the most efficient solution proposed so far: http://bugs.python.org/file24143/random-2.patch 2. Decide which versions of Python this should be applied to. 3.3 seems a given, the other are open to debate. 3. If and only if (and I think this unlikely) the solution chosen is shown to be vulnerable to a more sophisticated attack then a new issue should be opened and dealt with separately. Cheers, Mark. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Using my patch (random-2.patch), the overhead is 0%. I cannot see a difference with and without my patch. Numbers: --- unpatched: == 3 characters == 1 loops, best of 3: 459 usec per loop == 10 characters == 1 loops, best of 3: 575 usec per loop == 500 characters == 1 loops, best of 3: 1.36 msec per loop patched: == 3 characters == 1 loops, best of 3: 458 usec per loop == 10 characters == 1 loops, best of 3: 575 usec per loop == 500 characters == 1 loops, best of 3: 1.36 msec per loop --- (the patched version looks faster just because the timer is not reliable enough for such fast test) Script: --- echo "== 3 characters ==" ./python -m timeit -n 1 -s 'text=(("%03i" % x) for x in range(1,1000))' 'sum(hash(x) for x in text)' ./python -m timeit -n 1 -s 'text=(("%03i" % x) for x in range(1,1000))' 'sum(hash(x) for x in text)' ./python -m timeit -n 1 -s 'text=(("%03i" % x) for x in range(1,1000))' 'sum(hash(x) for x in text)' echo "== 10 characters ==" ./python -m timeit -n 1 -s 'text=(("%010i" % x) for x in range(1,1000))' 'sum(hash(x) for x in text)' ./python -m timeit -n 1 -s 'text=(("%010i" % x) for x in range(1,1000))' 'sum(hash(x) for x in text)' ./python -m timeit -n 1 -s 'text=(("%010i" % x) for x in range(1,1000))' 'sum(hash(x) for x in text)' echo "== 500 characters ==" ./python -m timeit -n 1 -s 'text=(("%0500i" % x) for x in range(1,1000))' 'sum(hash(x) for x in text)' ./python -m timeit -n 1 -s 'text=(("%0500i" % x) for x in range(1,1000))' 'sum(hash(x) for x in text)' ./python -m timeit -n 1 -s 'text=(("%0500i" % x) --- (Take the smallest timing for each test) "-n 1" is needed because the hash value is only computed once (is cached). I may be possible to have more reliable results by disabling completly the hash cache (comment "PyUnicode_HASH(self) = x;" line). Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Glenn Linderman wrote: On 1/5/2012 5:52 PM, Steven D'Aprano wrote: At some point, presuming that there is no speed penalty, the behaviour will surely become not just enabled by default but mandatory. Python has never promised that hashes must be predictable or consistent, so apart from backwards compatibility concerns for old versions, future versions of Python should make it mandatory. Presuming that there is no speed penalty, I'd argue in favour of making it mandatory for 3.3. Why do we need a flag for something that is going to be always on? I think the whole paragraph is invalid, because it presumes there is no speed penalty. I presume there will be a speed penalty, until benchmarking shows otherwise. There *may* be a speed penalty, but I draw your attention to Paul McMillian's email on 1st of January: Empirical testing shows that this unoptimized python implementation produces ~10% slowdown in the hashing of ~20 character strings. and Christian Heimes' email on 3rd of January: The changeset adds the murmur3 hash algorithm with some minor changes, for example more random seeds. At first I was worried that murmur might be slower than our old hash algorithm. But in fact it seems to be faster! So I think that it's a fairly safe bet that there will be a solution that is as fast, or at worst, trivially slower, than the current hash function. But of course, benchmarks will be needed. -- Steven ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Serhiy Storchaka wrote: 06.01.12 02:10, Nick Coghlan написав(ла): Not a good idea - a lot of the 3rd party tests that depend on dict ordering are going to be using those modules anyway, so scattering our solution across half the standard library is needlessly creating additional work without really reducing the incompatibility problem. If we're going to change anything, it may as well be the string hashing algorithm itself. Changing the string hashing algorithm will hit the general performance and also will break down any code that depend on dict ordering. Specialized dict slow down only needed parts of some applications. The minimal proposed change of seeding the hash from a global value (a single memory read and an addition) will have such a minimal performance effect that it will be undetectable even on the most noise-free testing environment. Cheers, Mark ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
06.01.12 02:10, Nick Coghlan написав(ла): Not a good idea - a lot of the 3rd party tests that depend on dict ordering are going to be using those modules anyway, so scattering our solution across half the standard library is needlessly creating additional work without really reducing the incompatibility problem. If we're going to change anything, it may as well be the string hashing algorithm itself. Changing the string hashing algorithm will hit the general performance and also will break down any code that depend on dict ordering. Specialized dict slow down only needed parts of some applications. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On 1/5/2012 5:52 PM, Steven D'Aprano wrote: At some point, presuming that there is no speed penalty, the behaviour will surely become not just enabled by default but mandatory. Python has never promised that hashes must be predictable or consistent, so apart from backwards compatibility concerns for old versions, future versions of Python should make it mandatory. Presuming that there is no speed penalty, I'd argue in favour of making it mandatory for 3.3. Why do we need a flag for something that is going to be always on? I think the whole paragraph is invalid, because it presumes there is no speed penalty. I presume there will be a speed penalty, until benchmarking shows otherwise. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Benjamin Peterson wrote: 2012/1/5 Steven D'Aprano : [...] There's nothing obscure about directly testing the hash. That's about as far from obscure as it is possible to get: you are directly testing the presence of a feature by testing the feature. It's obscure because hash('') != 0 doesn't necessarily mean the hashes are randomized. A different hashing algorithm could be in effect. Fair point, but I didn't actually suggest testing hash('') != 0, that was Nick's suggestion, which he's since withdrawn. -- Steven ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Am 06.01.2012 03:04, schrieb Benjamin Peterson: > It's obscure because hash('') != 0 doesn't necessarily mean the hashes > are randomized. A different hashing algorithm could be in effect. Also in 1 of 2**32 or 2**64 tries hash('') is 0 although randomizing is active. Christian ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
2012/1/5 Steven D'Aprano : > Benjamin Peterson wrote: >> >> 2012/1/5 Nick Coghlan : >>> >>> On Fri, Jan 6, 2012 at 10:07 AM, Steven D'Aprano >>> wrote: Surely the way to verify the behaviour is to run this from the shell: python -c print(hash("abcde")) twice, and see that the calls return different values. (Or have I misunderstood the way the fix is going to work?) In any case, I wouldn't want to rely on the presence of a flag in the sys module to verify the behaviour, I'd want to see for myself that hash collisions are no longer predictable. >>> >>> More directly, you can just check that the hash of the empty string is >>> non-zero. >>> >>> So -1 for a flag in the sys module - "hash('') != 0" should serve as a >>> sufficient check whether or not process-level string hash >>> randomisation is in effect. >> >> >> What exactly is the disadvantage of a sys attribute? That would seem >> preferable to an obscure incarnation like that. > > > There's nothing obscure about directly testing the hash. That's about as far > from obscure as it is possible to get: you are directly testing the presence > of a feature by testing the feature. It's obscure because hash('') != 0 doesn't necessarily mean the hashes are randomized. A different hashing algorithm could be in effect. -- Regards, Benjamin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Benjamin Peterson wrote: 2012/1/5 Nick Coghlan : On Fri, Jan 6, 2012 at 10:07 AM, Steven D'Aprano wrote: Surely the way to verify the behaviour is to run this from the shell: python -c print(hash("abcde")) twice, and see that the calls return different values. (Or have I misunderstood the way the fix is going to work?) In any case, I wouldn't want to rely on the presence of a flag in the sys module to verify the behaviour, I'd want to see for myself that hash collisions are no longer predictable. More directly, you can just check that the hash of the empty string is non-zero. So -1 for a flag in the sys module - "hash('') != 0" should serve as a sufficient check whether or not process-level string hash randomisation is in effect. What exactly is the disadvantage of a sys attribute? That would seem preferable to an obscure incarnation like that. There's nothing obscure about directly testing the hash. That's about as far from obscure as it is possible to get: you are directly testing the presence of a feature by testing the feature. Relying on a flag to tell you whether hashes are randomised adds additional complexity: now you need to care about whether hashes are randomised AND know that there is a flag you can look up and what it is called. And since the flag won't exist in all versions of Python, or even in all builds of a particular Python version, it isn't a matter of just testing the flag, but of doing the try...except or hasattr() dance to check whether it exists first. At some point, presuming that there is no speed penalty, the behaviour will surely become not just enabled by default but mandatory. Python has never promised that hashes must be predictable or consistent, so apart from backwards compatibility concerns for old versions, future versions of Python should make it mandatory. Presuming that there is no speed penalty, I'd argue in favour of making it mandatory for 3.3. Why do we need a flag for something that is going to be always on? -- Steven ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On Fri, Jan 6, 2012 at 10:59 AM, Benjamin Peterson wrote: > What exactly is the disadvantage of a sys attribute? That would seem > preferable to an obscure incarnation like that. Adding sys attributes in maintenance (or security) releases makes me nervous. However, Victor and Christian are right about the need for a special case to avoid leaking information, so my particular suggested check won't work. The most robust check would be to run sys.executable in a subprocess and check if it gives the same hash for a non-empty string as the current process. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On Fri, 06 Jan 2012 01:50:00 +0100 Christian Heimes wrote: > Am 06.01.2012 01:34, schrieb Nick Coghlan: > > On Fri, Jan 6, 2012 at 10:07 AM, Steven D'Aprano > > wrote: > >> Surely the way to verify the behaviour is to run this from the shell: > >> > >> python -c print(hash("abcde")) > >> > >> twice, and see that the calls return different values. (Or have I > >> misunderstood the way the fix is going to work?) > >> > >> In any case, I wouldn't want to rely on the presence of a flag in the sys > >> module to verify the behaviour, I'd want to see for myself that hash > >> collisions are no longer predictable. > > > > More directly, you can just check that the hash of the empty string is > > non-zero. > > > > So -1 for a flag in the sys module - "hash('') != 0" should serve as a > > sufficient check whether or not process-level string hash > > randomisation is in effect. > > This might not work as we have to special case empty strings and perhaps > \0 strings, too. The special case value doesn't have to be zero. Make it age(Barry) for example (which, I think, is still representable in a 32-bit integer!). Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
2012/1/5 Nick Coghlan : > On Fri, Jan 6, 2012 at 10:07 AM, Steven D'Aprano wrote: >> Surely the way to verify the behaviour is to run this from the shell: >> >> python -c print(hash("abcde")) >> >> twice, and see that the calls return different values. (Or have I >> misunderstood the way the fix is going to work?) >> >> In any case, I wouldn't want to rely on the presence of a flag in the sys >> module to verify the behaviour, I'd want to see for myself that hash >> collisions are no longer predictable. > > More directly, you can just check that the hash of the empty string is > non-zero. > > So -1 for a flag in the sys module - "hash('') != 0" should serve as a > sufficient check whether or not process-level string hash > randomisation is in effect. What exactly is the disadvantage of a sys attribute? That would seem preferable to an obscure incarnation like that. -- Regards, Benjamin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Am 06.01.2012 01:34, schrieb Nick Coghlan: > On Fri, Jan 6, 2012 at 10:07 AM, Steven D'Aprano wrote: >> Surely the way to verify the behaviour is to run this from the shell: >> >> python -c print(hash("abcde")) >> >> twice, and see that the calls return different values. (Or have I >> misunderstood the way the fix is going to work?) >> >> In any case, I wouldn't want to rely on the presence of a flag in the sys >> module to verify the behaviour, I'd want to see for myself that hash >> collisions are no longer predictable. > > More directly, you can just check that the hash of the empty string is > non-zero. > > So -1 for a flag in the sys module - "hash('') != 0" should serve as a > sufficient check whether or not process-level string hash > randomisation is in effect. This might not work as we have to special case empty strings and perhaps \0 strings, too. Otherwise we would give away the random seed to an attacker if an attacker can somehow get hold of hash('') or hash(n * '\0'). Christian ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
2012/1/6 Barry Warsaw : >>Settings for PYRANDOMHASH: >> >> PYRANDOMHASH=1 >> enable randomized hashing function >> >> PYRANDOMHASH=/path/to/seed >> enable randomized hashing function and read seed from 'seed' >> >> PYRANDOMHASH=0 >> disable randomed hashing function > > Why not PYTHONHASHSEED then? See my patch attached to the issue #13703? I prepared the code to be able to set easily the hash seed (it has a LCG, it's seed can be provided by the user directly). I agree that the value 0 should give the same behaviour than the actual hash (disable the randomized hash). I will add the variable in the next version of my patch. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On Fri, Jan 6, 2012 at 10:07 AM, Steven D'Aprano wrote: > Surely the way to verify the behaviour is to run this from the shell: > > python -c print(hash("abcde")) > > twice, and see that the calls return different values. (Or have I > misunderstood the way the fix is going to work?) > > In any case, I wouldn't want to rely on the presence of a flag in the sys > module to verify the behaviour, I'd want to see for myself that hash > collisions are no longer predictable. More directly, you can just check that the hash of the empty string is non-zero. So -1 for a flag in the sys module - "hash('') != 0" should serve as a sufficient check whether or not process-level string hash randomisation is in effect. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On Jan 05, 2012, at 10:40 PM, Christian Heimes wrote: >Hey Barry, stop stealing my ideas! :) I've argued for these default >settings for days. :) >I've suggested the env var PYRANDOMHASH. It's easy to set env vars in >Apache. For example Debian/Ubuntu has /etc/apache2/envvars. For consistency, it really should be PYTHONSOMETHING. I personally don't care how long it is (e.g. PYTHONIOENCODING). >Settings for PYRANDOMHASH: > > PYRANDOMHASH=1 > enable randomized hashing function > > PYRANDOMHASH=/path/to/seed > enable randomized hashing function and read seed from 'seed' > > PYRANDOMHASH=0 > disable randomed hashing function Why not PYTHONHASHSEED then? >Since there isn't an easy way to set env vars in a shebang line since >something like > > #!/usr/bin/env PYRANDOMHASH=1 python2.7 > >doesn't work, we could come up with a solution the shebang. We have precedence for mirroring startup options and envars, so it doesn't bother me to add such a switch to Python 3.3. It *does* bother me to add a switch to any stable release. >IMHO the setting for the default setting should be a compile time >option. It's reasonable easy to extend the configure script to support >--enable-randomhash / --disable-randomhash. The MS VC build scripts can >grow a flag, too. > >I still think that the topic needs a PEP. A couple of days ago I started >with a PEP. But Guido told me that he doesn't see a point in a PEP >because he prefers a small and quick solution, so I stopped working on >it. However the arguments, worries and ideas in this enormous topic have >repeated over and over. We know from experience that a PEP is a great >way to explain the how, what and why of the change as well as the paths >we didn't take. One way to look at it is to have a quick-and-dirty solution for stable releases. It could be suboptimal from a ui point of view because of backward compatibility issues. The PEP could then outline the boffo perfect solution for Python 3.3, which a section on how it will be backported to stable releases. Cheers, -Barry signature.asc Description: PGP signature ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
David Malcolm wrote: When backporting the fix to ancient python versions, I'm inclined to turn the change *off* by default, requiring the change to be enabled via an environment variable: I want to avoid breaking existing code, even if such code is technically relying on non-guaranteed behavior. But we could potentially tweak mod_python/mod_wsgi so that it defaults to *on*. That way /usr/bin/python would default to the old behavior, but web apps would have some protection. Any such logic here also suggests the need for an attribute in the sys module so that you can verify the behavior. Surely the way to verify the behaviour is to run this from the shell: python -c print(hash("abcde")) twice, and see that the calls return different values. (Or have I misunderstood the way the fix is going to work?) In any case, I wouldn't want to rely on the presence of a flag in the sys module to verify the behaviour, I'd want to see for myself that hash collisions are no longer predictable. -- Steven ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On 1/5/2012 3:10 PM, Ethan Furman wrote: Tres Seaver wrote: 1) the security problem is not in CPython, but rather in web servers that use dict inappropriately. Most webapp vulnerabilities are due to their use of Python's cgi module, which it uses a dict to hold the form / query string data being supplied by untrusted external users. And Glenn suggested further down that an appropriate course of action would be to fix the cgi module (and others) instead of messing with dict. I think both should be done. For web applications, it would be best to reject DOS attempts with 'random' keys in O(1) time rather than in O(n) time even with improved hash. But some other apps, like the Python interpreter itself, 'random' names may be quite normal. -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On Fri, Jan 6, 2012 at 8:15 AM, Serhiy Storchaka wrote: > 05.01.12 21:14, Glenn Linderman написав(ла): >> >> So, fixing the vulnerable packages could be a sufficient response, >> rather than changing the hash function. How to fix? Each of those >> above allocates and returns a dict. Simply have each of those allocate >> and return and wrapped dict, which has the following behaviors: >> >> i) during __init__, create a local, random, string. >> ii) for all key values, prepend the string, before passing it to the >> internal dict. > > > Good idea. Not a good idea - a lot of the 3rd party tests that depend on dict ordering are going to be using those modules anyway, so scattering our solution across half the standard library is needlessly creating additional work without really reducing the incompatibility problem. If we're going to change anything, it may as well be the string hashing algorithm itself. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
05.01.12 21:14, Glenn Linderman написав(ла): So, fixing the vulnerable packages could be a sufficient response, rather than changing the hash function. How to fix? Each of those above allocates and returns a dict. Simply have each of those allocate and return and wrapped dict, which has the following behaviors: i) during __init__, create a local, random, string. ii) for all key values, prepend the string, before passing it to the internal dict. Good idea. # -*- coding: utf-8 -*- from collections import MutableMapping import random class SafeDict(dict, MutableMapping): def __init__(self, *args, **kwds): dict.__init__(self) self._prefix = str(random.getrandbits(64)) self.update(*args, **kwds) def clear(self): dict.clear(self) self._prefix = str(random.getrandbits(64)) def _safe_key(self, key): return self._prefix + repr(key), key def __getitem__(self, key): try: return dict.__getitem__(self, self._safe_key(key)) except KeyError as e: e.args = (key,) raise e def __setitem__(self, key, value): dict.__setitem__(self, self._safe_key(key), value) def __delitem__(self, key): try: dict.__delitem__(self, self._safe_key(key)) except KeyError as e: e.args = (key,) raise e def __iter__(self): for skey, key in dict.__iter__(self): yield key def __contains__(self, key): return dict.__contains__(self, self._safe_key(key)) setdefault = MutableMapping.setdefault update = MutableMapping.update pop = MutableMapping.pop popitem = MutableMapping.popitem keys = MutableMapping.keys values = MutableMapping.values items = MutableMapping.items def __repr__(self): return '{%s}' % ', '.join('%s: %s' % (repr(k), repr(v)) for k, v in self.items()) def copy(self): return self.__class__(self) @classmethod def fromkeys(cls, iterable, value=None): d = cls() for key in iterable: d[key] = value return d def __eq__(self, other): return all(k in other and other[k] == v for k, v in self.items()) and \ all(k in self and self[k] == v for k, v in other.items()) def __ne__(self, other): return not self == other ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Am 05.01.2012 22:59, schrieb Antoine Pitrou: > I don't think we (python-dev) are really concerned with 2.3, 2.4, > 2.5 and 3.0. They're all unsupported, and people do what they want > with their local source trees. Let me reply with a quote from Barry: > Correct, although there's no reason why a patch for versions > older than 2.6 couldn't be included on a python.org security > page for reference in CVE or other security notifications. > Distros that care about versions older than Python 2.6 will > basically be back-porting the patch anyway. Christian ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On Thu, 05 Jan 2012 22:40:58 +0100 Christian Heimes wrote: > Am 05.01.2012 21:45, schrieb Barry Warsaw: > > This sounds like a reasonable compromise for all stable Python releases. It > > can be turned on by default for Python 3.3. If you also make the default > > setting easy to change (i.e. parameterized in one place), then distros can > > make their own decision about the default, although I'd argue for the above > > default approach for Debian/Ubuntu. > > Hey Barry, stop stealing my ideas! :) I've argued for these default > settings for days. > > ver deliveryrandomized hashing > == > 2.3 patch disabled by default > 2.4 patch disabled > 2.5 patch disabled > 2.6 release disabled > 2.7 release disabled > 3.0 ignore? disabled > 3.1 release disabled > 3.2 release disabled > 3.3 n/a yet enabled by default I don't think we (python-dev) are really concerned with 2.3, 2.4, 2.5 and 3.0. They're all unsupported, and people do what they want with their local source trees. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Am 05.01.2012 21:10, schrieb Ethan Furman: > Tres Seaver wrote: >> -BEGIN PGP SIGNED MESSAGE- >> Hash: SHA1 >> >> On 01/05/2012 02:14 PM, Glenn Linderman wrote: >>> 1) the security problem is not in CPython, but rather in web servers >>> that use dict inappropriately. >> >> Most webapp vulnerabilities are due to their use of Python's cgi module, >> which it uses a dict to hold the form / query string data being supplied >> by untrusted external users. > > And Glenn suggested further down that an appropriate course of action > would be to fix the cgi module (and others) instead of messing with dict. You'd have to fix any Python core module that may handle data from untrusted sources. The issue isn't limited to web apps and POST requests. It's possible to trigger the DoS from JSON, a malicious PDF, JPEG's EXIF metadata or any other data. Oh, and somebody has to fix all 3rd party modules, too. Christian ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Am 05.01.2012 21:45, schrieb Barry Warsaw: > This sounds like a reasonable compromise for all stable Python releases. It > can be turned on by default for Python 3.3. If you also make the default > setting easy to change (i.e. parameterized in one place), then distros can > make their own decision about the default, although I'd argue for the above > default approach for Debian/Ubuntu. Hey Barry, stop stealing my ideas! :) I've argued for these default settings for days. ver deliveryrandomized hashing == 2.3 patch disabled by default 2.4 patch disabled 2.5 patch disabled 2.6 release disabled 2.7 release disabled 3.0 ignore? disabled 3.1 release disabled 3.2 release disabled 3.3 n/a yet enabled by default 2.3 to 2.5 are still used in production (RHEL, Ubuntu LTS). Guido has stated that he needs a patch for 2.4, too. I think we may safely ignore Python 3.0. Nobody should use Python 3.0 on a production system. I've suggested the env var PYRANDOMHASH. It's easy to set env vars in Apache. For example Debian/Ubuntu has /etc/apache2/envvars. Settings for PYRANDOMHASH: PYRANDOMHASH=1 enable randomized hashing function PYRANDOMHASH=/path/to/seed enable randomized hashing function and read seed from 'seed' PYRANDOMHASH=0 disable randomed hashing function Since there isn't an easy way to set env vars in a shebang line since something like #!/usr/bin/env PYRANDOMHASH=1 python2.7 doesn't work, we could come up with a solution the shebang. IMHO the setting for the default setting should be a compile time option. It's reasonable easy to extend the configure script to support --enable-randomhash / --disable-randomhash. The MS VC build scripts can grow a flag, too. I still think that the topic needs a PEP. A couple of days ago I started with a PEP. But Guido told me that he doesn't see a point in a PEP because he prefers a small and quick solution, so I stopped working on it. However the arguments, worries and ideas in this enormous topic have repeated over and over. We know from experience that a PEP is a great way to explain the how, what and why of the change as well as the paths we didn't take. Christian ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On 01/05/2012 09:45 PM, Barry Warsaw wrote: > On Jan 05, 2012, at 02:33 PM, David Malcolm wrote: > >>We have similar issues in RHEL, with the Python versions going much >>further back (e.g. 2.3) >> >>When backporting the fix to ancient python versions, I'm inclined to >>turn the change *off* by default, requiring the change to be enabled via >>an environment variable: I want to avoid breaking existing code, even if >>such code is technically relying on non-guaranteed behavior. But we >>could potentially tweak mod_python/mod_wsgi so that it defaults to *on*. >>That way /usr/bin/python would default to the old behavior, but web apps >>would have some protection. > > This sounds like a reasonable compromise for all stable Python releases. It > can be turned on by default for Python 3.3. If you also make the default > setting easy to change (i.e. parameterized in one place), then distros can > make their own decision about the default, although I'd argue for the above > default approach for Debian/Ubuntu. Agreed. Georg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On Thu, 2012-01-05 at 20:35 +, Paul Moore wrote: > On 5 January 2012 19:33, David Malcolm wrote: > > We have similar issues in RHEL, with the Python versions going much > > further back (e.g. 2.3) > > > > When backporting the fix to ancient python versions, I'm inclined to > > turn the change *off* by default, requiring the change to be enabled via > > an environment variable: I want to avoid breaking existing code, even if > > such code is technically relying on non-guaranteed behavior. But we > > could potentially tweak mod_python/mod_wsgi so that it defaults to *on*. > > That way /usr/bin/python would default to the old behavior, but web apps > > would have some protection. Any such logic here also suggests the need > > for an attribute in the sys module so that you can verify the behavior. > > Uh, surely no-one is suggesting backporting to "ancient" versions? I > couldn't find the statement quickly on the python.org website (so this > is via google), but isn't it true that 2.6 is in security-only mode > and 2.5 and earlier will never get the fix? Having a source-only > release for 2.6 means the fix is "off by default" in the sense that > you can choose not to build it. Or add a #ifdef to the source if it > really matters. Sorry, if I was unclear. I don't expect python-dev to do this backporting, but those of us who do maintain such ancient pythons via Linux distributions may want to do the backport for our users. My email was to note that it may make sense to pick more conservative defaults for such a scenario, as compared to 2.6 onwards. [snip] Hope this is helpful Dave ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On Thu, Jan 05, 2012 at 08:35:57PM +, Paul Moore wrote: > On 5 January 2012 19:33, David Malcolm wrote: > > We have similar issues in RHEL, with the Python versions going much > > further back (e.g. 2.3) > > > > When backporting the fix to ancient python versions, I'm inclined to > > turn the change *off* by default, requiring the change to be enabled via > > an environment variable: I want to avoid breaking existing code, even if > > such code is technically relying on non-guaranteed behavior. But we > > could potentially tweak mod_python/mod_wsgi so that it defaults to *on*. > > That way /usr/bin/python would default to the old behavior, but web apps > > would have some protection. Any such logic here also suggests the need > > for an attribute in the sys module so that you can verify the behavior. > > Uh, surely no-one is suggesting backporting to "ancient" versions? I > couldn't find the statement quickly on the python.org website (so this > is via google), but isn't it true that 2.6 is in security-only mode > and 2.5 and earlier will never get the fix? > I think when dmalcolm says "backporting" he means that he'll have to backport the fix from modern, supported-by-python.org python to the ancient python's that he's supporting as part of the Linux distributions where he's the python package maintainer. I'm thinking he's mentioning it here mainly to see if someone thinks that his approach for those distributions causes anyone to point out a reason not to diverge from upstream in that manner. > Having a source-only > release for 2.6 means the fix is "off by default" in the sense that > you can choose not to build it. Or add a #ifdef to the source if it > really matters. > I don't think that this would satisfy dmalcolm's needs. What he's talking about sounds more like a runtime switch (possibly only when initializing, though, not on-the-fly). -Toshio pgp7qk95cGJ9b.pgp Description: PGP signature ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On Jan 05, 2012, at 08:35 PM, Paul Moore wrote: >Uh, surely no-one is suggesting backporting to "ancient" versions? I >couldn't find the statement quickly on the python.org website (so this >is via google), but isn't it true that 2.6 is in security-only mode >and 2.5 and earlier will never get the fix? Having a source-only >release for 2.6 means the fix is "off by default" in the sense that >you can choose not to build it. Or add a #ifdef to the source if it >really matters. Correct, although there's no reason why a patch for versions older than 2.6 couldn't be included on a python.org security page for reference in CVE or other security notifications. Distros that care about versions older than Python 2.6 will basically be back-porting the patch anyway. >My feeling is that it should go into 2.7, 3.2, and 3.3+, but with no >bells and whistles to switch it off or the like. I like David Malcolm's suggestion, but I have no problem applying it to 3.3, enabled by default with no way to turn it off. The off-by-default on-switch policy for stable releases would be justified by maximum backward compatibility conservativeness. -Barry ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On Jan 05, 2012, at 02:33 PM, David Malcolm wrote: >We have similar issues in RHEL, with the Python versions going much >further back (e.g. 2.3) > >When backporting the fix to ancient python versions, I'm inclined to >turn the change *off* by default, requiring the change to be enabled via >an environment variable: I want to avoid breaking existing code, even if >such code is technically relying on non-guaranteed behavior. But we >could potentially tweak mod_python/mod_wsgi so that it defaults to *on*. >That way /usr/bin/python would default to the old behavior, but web apps >would have some protection. This sounds like a reasonable compromise for all stable Python releases. It can be turned on by default for Python 3.3. If you also make the default setting easy to change (i.e. parameterized in one place), then distros can make their own decision about the default, although I'd argue for the above default approach for Debian/Ubuntu. >Any such logic here also suggests the need for an attribute in the sys module >so that you can verify the behavior. That would be read-only though, right? -Barry ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On 5 January 2012 19:33, David Malcolm wrote: > We have similar issues in RHEL, with the Python versions going much > further back (e.g. 2.3) > > When backporting the fix to ancient python versions, I'm inclined to > turn the change *off* by default, requiring the change to be enabled via > an environment variable: I want to avoid breaking existing code, even if > such code is technically relying on non-guaranteed behavior. But we > could potentially tweak mod_python/mod_wsgi so that it defaults to *on*. > That way /usr/bin/python would default to the old behavior, but web apps > would have some protection. Any such logic here also suggests the need > for an attribute in the sys module so that you can verify the behavior. Uh, surely no-one is suggesting backporting to "ancient" versions? I couldn't find the statement quickly on the python.org website (so this is via google), but isn't it true that 2.6 is in security-only mode and 2.5 and earlier will never get the fix? Having a source-only release for 2.6 means the fix is "off by default" in the sense that you can choose not to build it. Or add a #ifdef to the source if it really matters. Personally, I find it hard to see this as a Python security hole, but I can sympathise with the idea that it would be nice to make dict "safer by default". (Although the benefit for me personally would be zero, so I'm reluctant for the change to have a detectable cost...) My feeling is that it should go into 2.7, 3.2, and 3.3+, but with no bells and whistles to switch it off or the like. If it's not suitable to go in on that basis, restrict it to 3.3+ (where it's certainly OK) and advise users of earlier versions to either upgrade or code defensively to avoid hitting the pathological case. Surely that sort of defensive code should be second nature to the people who might be affected by the issue? Paul. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Tres Seaver wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 01/05/2012 02:14 PM, Glenn Linderman wrote: 1) the security problem is not in CPython, but rather in web servers that use dict inappropriately. Most webapp vulnerabilities are due to their use of Python's cgi module, which it uses a dict to hold the form / query string data being supplied by untrusted external users. And Glenn suggested further down that an appropriate course of action would be to fix the cgi module (and others) instead of messing with dict. ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On 1/5/2012 11:49 AM, Tres Seaver wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 01/05/2012 02:14 PM, Glenn Linderman wrote: 1) the security problem is not in CPython, but rather in web servers that use dict inappropriately. Most webapp vulnerabilities are due to their use of Python's cgi module, which it uses a dict to hold the form / query string data being supplied by untrusted external users. Yes, I understand that (and have some such web apps in production). In fact, I pointed out urllib.parse and cgi as specific modules for which a proposed fix could be made without impacting the Python hash function. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 01/05/2012 02:14 PM, Glenn Linderman wrote: > 1) the security problem is not in CPython, but rather in web servers > that use dict inappropriately. Most webapp vulnerabilities are due to their use of Python's cgi module, which it uses a dict to hold the form / query string data being supplied by untrusted external users. Tres. - -- === Tres Seaver +1 540-429-0999 tsea...@palladion.com Palladion Software "Excellence by Design"http://palladion.com -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8F/uEACgkQ+gerLs4ltQ679QCgqKPYYwEetKR3bEMVh5eukLin cA8An3XJMYWhK5MutjbOCxCfYzKXmDzc =V3lh -END PGP SIGNATURE- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On Thu, 2012-01-05 at 19:34 +0200, Maciej Fijalkowski wrote: > On Thu, Jan 5, 2012 at 3:39 PM, Antoine Pitrou wrote: > > On Thu, 5 Jan 2012 15:26:27 +1100 > > Andrew Bennetts wrote: > >> > >> I don't think that's news either. > >> http://mail.python.org/pipermail/python-dev/2003-May/035907.html and > >> http://twistedmatrix.com/pipermail/twisted-python/2003-June/004339.html for > >> instance show that in 2003 it was clearly known to at least be likely to > >> be an > >> exploitable DoS in common code (a dict of HTTP headers or HTTP form keys). > >> > >> There was debate about whether it's the language's responsibility to > >> mitigate > >> the problem or if apps should use safer designs for handling untrusted > >> input > >> (e.g. limit the number of keys input is allowed to create, or use something > >> other than dicts), and debate about just how practical an effective exploit > >> would be. But I think it was understood to be a real concern 8 years ago, > >> so > >> not exactly sudden. > > > > That's not news indeed, but that doesn't make it less of a problem, > > especially now that the issue has been widely publicized through a > > conference and announcements on several widely-read Web sites. > > > > That said, only doing the security fix in 3.3 would have the nice side > > effect of pushing people towards Python 3, so perhaps I'm for it after > > all. > > > > Half-jokingly, > > > > Antoine. > > Just to make things clear - stdlib itself has 1/64 of tests relying on > dict order. Changing dict order in *older* pythons will break > everyone's tests and some peoples code. Making this new 2.6.x release > would mean that people using new python 2.6 would have to upgrade an > unspecified amount of their python packages, that does not sound very > cool. Also consider that new 2.6.x would go as a security fix to old > ubuntu, but all other packages won't, because they'll not contain > security fixes. Just so you know We have similar issues in RHEL, with the Python versions going much further back (e.g. 2.3) When backporting the fix to ancient python versions, I'm inclined to turn the change *off* by default, requiring the change to be enabled via an environment variable: I want to avoid breaking existing code, even if such code is technically relying on non-guaranteed behavior. But we could potentially tweak mod_python/mod_wsgi so that it defaults to *on*. That way /usr/bin/python would default to the old behavior, but web apps would have some protection. Any such logic here also suggests the need for an attribute in the sys module so that you can verify the behavior. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On Thu, 5 Jan 2012 19:34:13 +0200 Maciej Fijalkowski wrote: > > Just to make things clear - stdlib itself has 1/64 of tests relying on > dict order. Changing dict order in *older* pythons will break > everyone's tests and some peoples code. Breaking tests is not a problem: they are typically not run by production code and so people can take the time to fix them. Breaking other code is a problem if it is legitimate. Relying on dict ordering is totally wrong and I don't think we should care about such cases. The only issue is when relying on hash() being stable accross runs. But hashing already varies from build to build (32-bit vs. 64-bit) and I think that anyone seriously relying on it should already have been bitten. > Making this new 2.6.x release > would mean that people using new python 2.6 would have to upgrade an > unspecified amount of their python packages, that does not sound very > cool. How about 2.7? Do you think it should also remain untouched? I am ok for leaving 2.6 alone (that's Barry's call anyway) but 2.7 is another matter - should people migrate to 3.x to get the security fix? As for 3.2, it should certainly get the fix IMO. There are not many Python 3 legacy applications relying on hash() stability, I think. > Also consider that new 2.6.x would go as a security fix to old > ubuntu, but all other packages won't, because they'll not contain > security fixes. Ubuntu can decide *not* to ship the fix if they prefer it like that. Their policies and decisions, though, should not taint ours. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On 1/5/2012 9:34 AM, Maciej Fijalkowski wrote: Also consider that new 2.6.x would go as a security fix to old ubuntu, but all other packages won't, because they'll not contain security fixes. Just so you know Why should CPython by constrained by broken policies of Ubuntu? If the other packages must be fixed so they work correctly with a security fix in Python, then they should be considered as containing a security fix. If they aren't, then that is a broken policy. On the other hand, it is very true that the seductive convenience of dict (readily available, good performance) in normal cases have created the vulnerability because its characteristics are a function of the data inserted, and when used for data that is from unknown, possibly malicious sources, that is a bug in the program that uses dict, not in dict itself. So it seems to me that: 1) the security problem is not in CPython, but rather in web servers that use dict inappropriately. 2) changing CPython in a way that breaks code is not a security fix to CPython, but rather a gratuitous breakage of compatibility promises, wrapped in a security-fix lie. The problem for CPython here can be summarized as follows: a) it is being blamed for problems in web servers that are not problems in CPython b) perhaps dict documentation is a bit too seductive, in not declaring that data from malicious sources could cause its performance to degrade significantly (but then, any programmer who has actually taken a decent set of programming classes should understand that, but on the other hand, there are programmers who have not taken such classes). c) CPython provides no other mapping data structures that rival the performance and capabilities of dict as an alternative, nor can such data structures be written in CPython, as the performance of dict comes not only from hashing, but also from being written in C. The solutions could be: A) push back on the blame: it is not a CPython problem B) perhaps add a warning to the documentation for the naïve, untrained programmers C) consider adding an additional data structure to the language, and mention it in the B warning for versions 3.3+. On the other hand, the web server vulnerability could be blamed on CPython in another way: identify vulnerable packages in the stdlib that are likely the be used during the parsing of user-supplied data. Ones that come to mind (Python 3.2) are: urllib.parse (various parse* functions) (package names different in Python 2.x) cgi (parse_multipart, FieldStorage) So, fixing the vulnerable packages could be a sufficient response, rather than changing the hash function. How to fix? Each of those above allocates and returns a dict. Simply have each of those allocate and return and wrapped dict, which has the following behaviors: i) during __init__, create a local, random, string. ii) for all key values, prepend the string, before passing it to the internal dict. Changing these vulnerable packages rather than the hash function is a much more constrained change, and wouldn't create bugs in programs that erroneously depend on the current hash function directly or indirectly. This would not fix web servers that use their own parsing and storage mechanism for fields, if they have also inappropriately used a dict as their storage mechanism for user supplied data. However, a similar solution could be similarly applied by the authors of those web servers, and would be a security fix to such packages, so should be applied to Ubuntu, if available there, or other systems with security-only fix acceptance. This solution does not require changes to the hash, does not require a cryptographicly secure hash, and does not require code to be added to the initialization of Python before normal objects and mappings can be created. If a port doesn't contain a good random number generator, a weak one can be subsitituted, but such decisions can be made in Python code after the interpreter is initialized, and use of stdlib packages is available. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On Thu, Jan 5, 2012 at 3:39 PM, Antoine Pitrou wrote: > On Thu, 5 Jan 2012 15:26:27 +1100 > Andrew Bennetts wrote: >> >> I don't think that's news either. >> http://mail.python.org/pipermail/python-dev/2003-May/035907.html and >> http://twistedmatrix.com/pipermail/twisted-python/2003-June/004339.html for >> instance show that in 2003 it was clearly known to at least be likely to be >> an >> exploitable DoS in common code (a dict of HTTP headers or HTTP form keys). >> >> There was debate about whether it's the language's responsibility to mitigate >> the problem or if apps should use safer designs for handling untrusted input >> (e.g. limit the number of keys input is allowed to create, or use something >> other than dicts), and debate about just how practical an effective exploit >> would be. But I think it was understood to be a real concern 8 years ago, so >> not exactly sudden. > > That's not news indeed, but that doesn't make it less of a problem, > especially now that the issue has been widely publicized through a > conference and announcements on several widely-read Web sites. > > That said, only doing the security fix in 3.3 would have the nice side > effect of pushing people towards Python 3, so perhaps I'm for it after > all. > > Half-jokingly, > > Antoine. > > > ___ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/fijall%40gmail.com Just to make things clear - stdlib itself has 1/64 of tests relying on dict order. Changing dict order in *older* pythons will break everyone's tests and some peoples code. Making this new 2.6.x release would mean that people using new python 2.6 would have to upgrade an unspecified amount of their python packages, that does not sound very cool. Also consider that new 2.6.x would go as a security fix to old ubuntu, but all other packages won't, because they'll not contain security fixes. Just so you know Cheers, fijal ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On Thu, 5 Jan 2012 15:26:27 +1100 Andrew Bennetts wrote: > > I don't think that's news either. > http://mail.python.org/pipermail/python-dev/2003-May/035907.html and > http://twistedmatrix.com/pipermail/twisted-python/2003-June/004339.html for > instance show that in 2003 it was clearly known to at least be likely to be an > exploitable DoS in common code (a dict of HTTP headers or HTTP form keys). > > There was debate about whether it's the language's responsibility to mitigate > the problem or if apps should use safer designs for handling untrusted input > (e.g. limit the number of keys input is allowed to create, or use something > other than dicts), and debate about just how practical an effective exploit > would be. But I think it was understood to be a real concern 8 years ago, so > not exactly sudden. That's not news indeed, but that doesn't make it less of a problem, especially now that the issue has been widely publicized through a conference and announcements on several widely-read Web sites. That said, only doing the security fix in 3.3 would have the nice side effect of pushing people towards Python 3, so perhaps I'm for it after all. Half-jokingly, Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On Wed, Jan 04, 2012 at 11:55:13AM +0100, Antoine Pitrou wrote: > On Wed, 4 Jan 2012 09:59:15 +0200 > Maciej Fijalkowski wrote: > > > > Is it *really* a security issue? We knew all along that dicts are > > O(n^2) in worst case scenario, how is this suddenly a security > > problem? > > Because it has been shown to be exploitable for malicious purposes? I don't think that's news either. http://mail.python.org/pipermail/python-dev/2003-May/035907.html and http://twistedmatrix.com/pipermail/twisted-python/2003-June/004339.html for instance show that in 2003 it was clearly known to at least be likely to be an exploitable DoS in common code (a dict of HTTP headers or HTTP form keys). There was debate about whether it's the language's responsibility to mitigate the problem or if apps should use safer designs for handling untrusted input (e.g. limit the number of keys input is allowed to create, or use something other than dicts), and debate about just how practical an effective exploit would be. But I think it was understood to be a real concern 8 years ago, so not exactly sudden. Just because it's old news doesn't make it not a security problem, of course. -Andrew. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On Wed, Jan 4, 2012 at 12:59 AM, Maciej Fijalkowski wrote: > On Wed, Jan 4, 2012 at 12:02 AM, Bill Janssen wrote: >> Christian Heimes wrote: >> >>> Am 29.12.2011 12:13, schrieb Mark Shannon: >>> > The attack relies on being able to predict the hash value for a given >>> > string. Randomising the string hash function is quite straightforward. >>> > There is no need to change the dictionary code. >>> > >>> > A possible (*untested*) patch is attached. I'll leave it for those more >>> > familiar with unicodeobject.c to do properly. >>> >>> I'm worried that hash randomization of str is going to break 3rd party >>> software that rely on a stable hash across multiple Python instances. >>> Persistence layers like ZODB and cross interpreter communication >>> channels used by multiprocessing may (!) rely on the fact that the hash >>> of a string is fixed. >> >> Software that depends on an undefined hash function for synchronization >> and persistence deserves to break, IMO. There are plenty of >> well-defined hash functions available for this purpose. >> >> Bill >> ___ >> Python-Dev mailing list >> Python-Dev@python.org >> http://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> http://mail.python.org/mailman/options/python-dev/fijall%40gmail.com > > A lot of software will break their tests, because dict ordering would > depend on the particular run. I know, because some of them break on > pypy which has a different dict ordering. This is probably a good > thing in general, but is it really worth it? People will install > python 2.6.newest and stuff *will* break. So if we're making the new hashing the default and giving an option to use the old, we should make it _really_ clear in the release notes/announcement about how to revert the behavior. -eric > > Is it *really* a security issue? We knew all along that dicts are > O(n^2) in worst case scenario, how is this suddenly a security > problem? > > Cheers, > fijal > ___ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/ericsnowcurrently%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Am 04.01.2012 08:59, schrieb Maciej Fijalkowski: > Is it *really* a security issue? We knew all along that dicts are > O(n^2) in worst case scenario, how is this suddenly a security > problem? For example Microsoft has released an extraordinary and unscheduled security patch for the issue between Christmas and New Year. I don't normally use MS as reference but this should give you a hint about the severity. Have you watched the talk yet? http://www.youtube.com/watch?v=R2Cq3CLI6H8 Christian ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On Wed, 4 Jan 2012 09:59:15 +0200 Maciej Fijalkowski wrote: > > Is it *really* a security issue? We knew all along that dicts are > O(n^2) in worst case scenario, how is this suddenly a security > problem? Because it has been shown to be exploitable for malicious purposes? Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On Wed, Jan 4, 2012 at 12:02 AM, Bill Janssen wrote: > Christian Heimes wrote: > >> Am 29.12.2011 12:13, schrieb Mark Shannon: >> > The attack relies on being able to predict the hash value for a given >> > string. Randomising the string hash function is quite straightforward. >> > There is no need to change the dictionary code. >> > >> > A possible (*untested*) patch is attached. I'll leave it for those more >> > familiar with unicodeobject.c to do properly. >> >> I'm worried that hash randomization of str is going to break 3rd party >> software that rely on a stable hash across multiple Python instances. >> Persistence layers like ZODB and cross interpreter communication >> channels used by multiprocessing may (!) rely on the fact that the hash >> of a string is fixed. > > Software that depends on an undefined hash function for synchronization > and persistence deserves to break, IMO. There are plenty of > well-defined hash functions available for this purpose. > > Bill > ___ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/fijall%40gmail.com A lot of software will break their tests, because dict ordering would depend on the particular run. I know, because some of them break on pypy which has a different dict ordering. This is probably a good thing in general, but is it really worth it? People will install python 2.6.newest and stuff *will* break. Is it *really* a security issue? We knew all along that dicts are O(n^2) in worst case scenario, how is this suddenly a security problem? Cheers, fijal ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On 1/3/2012 5:02 PM, Bill Janssen wrote: Software that depends on an undefined hash function for synchronization and persistence deserves to break, IMO. There are plenty of well-defined hash functions available for this purpose. The doc for id() now says "This is an integer which is guaranteed to be unique and constant for this object during its lifetime." Since the default 3.2.2 hash for my win7 64bit CPython is id-address // 16, it can have no longer guarantee. I suggest that hash() doc say something similar: http://bugs.python.org/issue13707 -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Christian Heimes wrote: > Am 29.12.2011 12:13, schrieb Mark Shannon: > > The attack relies on being able to predict the hash value for a given > > string. Randomising the string hash function is quite straightforward. > > There is no need to change the dictionary code. > > > > A possible (*untested*) patch is attached. I'll leave it for those more > > familiar with unicodeobject.c to do properly. > > I'm worried that hash randomization of str is going to break 3rd party > software that rely on a stable hash across multiple Python instances. > Persistence layers like ZODB and cross interpreter communication > channels used by multiprocessing may (!) rely on the fact that the hash > of a string is fixed. Software that depends on an undefined hash function for synchronization and persistence deserves to break, IMO. There are plenty of well-defined hash functions available for this purpose. Bill ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On Dec 31, 2011, at 04:56 PM, Guido van Rossum wrote: >Is there a tracker issue yet? The discussion should probably move there. I think the answer to this was "no"... until now. http://bugs.python.org/issue13703 Proposed patches should be linked to this issue now. Please nosy yourself if you want to follow the progress. Cheers, -Barry signature.asc Description: PGP signature ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On Jan 02, 2012, at 06:38 PM, Georg Brandl wrote: >I wouldn't expect too much -- they seem rather keen on cheap laughs: > >http://twitter.com/#!/bk3n/status/152068096448921600/photo/1/large Heh, so yeah, it won't be me contacting them. -Barry ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On Mon, Jan 2, 2012 at 10:53 PM, Alexey Borzenkov wrote: > On Mon, Jan 2, 2012 at 7:18 PM, Christian Heimes wrote: >> Am 02.01.2012 06:55, schrieb Paul McMillan: >>> I think Ruby uses FNV-1 with a salt, making it less vulnerable to >>> this. FNV is otherwise similar to our existing hash function. >>> >>> For the record, cryptographically strong hash functions are in the >>> neighborhood of 400% slower than our existing hash function. >> >> I've pushed a new patch >> http://hg.python.org/features/randomhash/rev/0a65d2462e0c > > It seems for 32-bit version you are using pid for the two constants. > Also, it's unclear why you even need to use a random constant for the > final pass, you already use random constant as an initial h1, and it > should be enough, no need to use for k1. Same for 128-bit: k1, k2, k3, > k4 should be initialized to zero, these are key data, they don't need > to be mixed with anything. Sorry, sent too soon. What I mean is that you're initializing a pretty big array of values when you only need a 32-bit value. Pid, in my opinion might be too predictable, it would be a lot better to simply hash pid and gettimeofday bytes to produce this single 32-bit value and use it for h1, h2, h3 and h4 in both 32-bit and 128-bit versions. > Also, I'm not sure how portable is the always_inline attribute, is it > supported on all compilers and all platforms? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On Mon, Jan 2, 2012 at 7:18 PM, Christian Heimes wrote: > Am 02.01.2012 06:55, schrieb Paul McMillan: >> I think Ruby uses FNV-1 with a salt, making it less vulnerable to >> this. FNV is otherwise similar to our existing hash function. >> >> For the record, cryptographically strong hash functions are in the >> neighborhood of 400% slower than our existing hash function. > > I've pushed a new patch > http://hg.python.org/features/randomhash/rev/0a65d2462e0c It seems for 32-bit version you are using pid for the two constants. Also, it's unclear why you even need to use a random constant for the final pass, you already use random constant as an initial h1, and it should be enough, no need to use for k1. Same for 128-bit: k1, k2, k3, k4 should be initialized to zero, these are key data, they don't need to be mixed with anything. Also, I'm not sure how portable is the always_inline attribute, is it supported on all compilers and all platforms? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On Mon, Jan 2, 2012 at 7:47 AM, Christian Heimes wrote: > Am 01.01.2012 19:45, schrieb Terry Reedy: > > On 1/1/2012 10:13 AM, Guido van Rossum wrote: > >> PS. Is the collision-generator used in the attack code open source? > > > > As I posted before, Alexander Klink and Julian Wälde gave their project > > email as hash...@alech.de. Since they indicated disappointment in not > > hearing from Python, I presume they would welcome engagement. > > Somebody should contact Alexander and Julian to let them know, that we > are working on the matter. It should be somebody "official" for the > initial contact, too. I've included Guido (BDFL), Barry (their initial > security contact) and MvL (most prominent German core dev) in CC, as > they are the logical choice for me. > > I'm willing to have a phone call with them once the contact has been > established. IMHO it's slightly easier to talk in native tongue -- > Alexander and Julian are German, too. > I'm not sure I see the point -- just give them a link to the python-dev archives. -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On 01/02/2012 04:47 PM, Christian Heimes wrote: > Am 01.01.2012 19:45, schrieb Terry Reedy: >> On 1/1/2012 10:13 AM, Guido van Rossum wrote: >>> PS. Is the collision-generator used in the attack code open source? >> >> As I posted before, Alexander Klink and Julian Wälde gave their project >> email as hash...@alech.de. Since they indicated disappointment in not >> hearing from Python, I presume they would welcome engagement. > > Somebody should contact Alexander and Julian to let them know, that we > are working on the matter. It should be somebody "official" for the > initial contact, too. I've included Guido (BDFL), Barry (their initial > security contact) and MvL (most prominent German core dev) in CC, as > they are the logical choice for me. > > I'm willing to have a phone call with them once the contact has been > established. IMHO it's slightly easier to talk in native tongue -- > Alexander and Julian are German, too. I wouldn't expect too much -- they seem rather keen on cheap laughs: http://twitter.com/#!/bk3n/status/152068096448921600/photo/1/large Georg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Am 01.01.2012 19:45, schrieb Terry Reedy: > On 1/1/2012 10:13 AM, Guido van Rossum wrote: >> PS. Is the collision-generator used in the attack code open source? > > As I posted before, Alexander Klink and Julian Wälde gave their project > email as hash...@alech.de. Since they indicated disappointment in not > hearing from Python, I presume they would welcome engagement. Somebody should contact Alexander and Julian to let them know, that we are working on the matter. It should be somebody "official" for the initial contact, too. I've included Guido (BDFL), Barry (their initial security contact) and MvL (most prominent German core dev) in CC, as they are the logical choice for me. I'm willing to have a phone call with them once the contact has been established. IMHO it's slightly easier to talk in native tongue -- Alexander and Julian are German, too. Christian ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
> On 1/1/2012 12:28 PM, Christian Heimes wrote: > I understood Alexander Klink and Julian Wälde, hash...@alech.de, as > saying that they consider that using a random non-zero start value is > sufficient to make the hash non-vulnerable. Sufficient against their current attack. But will it last? For a long-running server, there must be plenty of ways information can leak that will help guessing that start value. The alternative, to provide a dict-like datastructure for use with untrusted input, deserves consideration. Perhaps something simpler than a balanced tree would do? How about a dict-like class that is built on a lazily sorted list? Insertions basically just do list.append and set a dirty-flag, and lookups use bisect - sorting first if the dirty-flag is set. It wouldn't be complete dict replacement by any means, mixing insertions and lookups would have terrible performance, but for something like POST parameters it should be good enough. I half expected to find something like that on activestate recipes already, but couldn't find any. regards, Anders ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Am 02.01.2012 06:55, schrieb Paul McMillan: > I think Ruby uses FNV-1 with a salt, making it less vulnerable to > this. FNV is otherwise similar to our existing hash function. > > For the record, cryptographically strong hash functions are in the > neighborhood of 400% slower than our existing hash function. I've pushed a new patch http://hg.python.org/features/randomhash/rev/0a65d2462e0c The changeset adds the murmur3 hash algorithm with some minor changes, for example more random seeds. At first I was worried that murmur might be slower than our old hash algorithm. But in fact it seems to be faster! Pybench 10 rounds on my Core2 Duo 2.60: py3k: 3.230 sec randomahash: 3.182 sec Christian ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Christian Heimes writes: > Am 31.12.2011 13:03, schrieb Stephen J. Turnbull: > > I don't know the implementation issues well enough to claim it is a > > solution, but this hasn't been mentioned before AFAICS: > > > > While the dictionary probe has to start with a hash for backward > > compatibility reasons, is there a reason the overflow strategy for > > insertion has to be buckets containing lists? How about > > double-hashing, etc? > > Python's dict implementation doesn't use bucket but open addressing (aka > closed hashed table). The algorithm for conflict resolution doesn't use > double hashing. Instead it takes the original and (in most cases) cached > hash and perturbs the hash with a series of add, multiply and bit shift ops. In an attack, this is still O(collisions) per probe (as any scheme where the address of the nth collision is a function of only the hash), where double hashing should be "roughly" O(1) (with double the coefficient). But that evidently imposes too large a performance burden on non-evil users, so it's not worth thinking about whether "roughly O(1)" is close enough to O(1) to deter or exhaust attackers. I withdraw the suggestion. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On Sun, 1 Jan 2012 21:55:52 -0800 Paul McMillan wrote: > > This is similar to the change proposed by Christian Heimes. > > Most importantly, I moved the xor with r[x % len_r] down a line. > Before, it wasn't being applied to the last character. Shouldn't it be r[i % len(r)] instead? (refer to yesterday's #python-dev discussion) > I think Ruby uses FNV-1 with a salt, making it less vulnerable to > this. FNV is otherwise similar to our existing hash function. Again, we could re-use FNV-1's primes, since they claim they have better dispersion properties than the average prime. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On 1/2/2012 12:55 AM, Paul McMillan wrote: Terry Reedy said: I understood Alexander Klink and Julian Wälde, hash...@alech.de, as saying that they consider that using a random non-zero start value is sufficient to make the hash non-vulnerable. I've been talking to them. They're happy to look at our proposed changes. They indicate that a non-zero start value is sufficient to prevent the attack, but D. J. Bernstein disagrees with them. He also has indicated a willingness to look at our solution. Great. My main concern currently is that there should be no noticeable slowdown for 64 bit builds which are apparently not vulnerable and which therefore would get no benefit. Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
I fixed a couple things in my proposed algorithm: https://gist.github.com/0a91e52efa74f61858b5 I had a typo, and used 21 instead of 12 for the size multiplier. We definitely don't need 2MB random data. The initialization of r was broken. Now it is an array of ints, so there's no conversion when it's used. I've adjusted it so there's 8k of random data, broken into 2048 ints. I added a length-based seed to the initial value of x. This prevents single-characters from being used to enumerate raw values from r. This is similar to the change proposed by Christian Heimes. Most importantly, I moved the xor with r[x % len_r] down a line. Before, it wasn't being applied to the last character. > Christian Heimes said: > We also need to special case short strings. The above routine hands over > the seed to attackers, if he is able to retrieve lots of single > character hashes. The updated code always includes at least 2 lookups from r, which I believe solves the single-character enumeration problem. If we special-case part of our hash function for short strings, we may get suboptimal collisions between the two types of hashes. I think Ruby uses FNV-1 with a salt, making it less vulnerable to this. FNV is otherwise similar to our existing hash function. For the record, cryptographically strong hash functions are in the neighborhood of 400% slower than our existing hash function. > Terry Reedy said: > I understood Alexander Klink and Julian Wälde, hash...@alech.de, as saying > that they consider that using a random non-zero start value is sufficient to > make the hash non-vulnerable. I've been talking to them. They're happy to look at our proposed changes. They indicate that a non-zero start value is sufficient to prevent the attack, but D. J. Bernstein disagrees with them. He also has indicated a willingness to look at our solution. -Paul ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
> Different concern. What if someone were to have code implementing an > external, persistent hash table, using Python's hash() function? They might > have a way to rehash everything when a new version of Python comes along, > but they would not be happy if hash() is different in each process. I > somehow vaguely remember possibly having seen such code, or something else > where a bit of random data was needed and hash() was used since it's so > easily available. I agree that there are use cases for allowing users to choose the random seed, in much the same way it's helpful to be able to set it for the random number generator. This should probably be something that can be passed in at runtime. This feature would also be useful for users who want to synchronize the hashes of multiple independent processes, for whatever reason. For the general case though, randomization should be on by default. -Paul ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
> But how about precomputing the intermediate value (x)? The hash is (mostly) > doing x = f(x, c) for each c in the input. That's a fair point. If we go down that avenue, I think simply choosing a random fixed starting value for x is the correct choice, rather than computing an intermediate value. > I sort of see your point, but I still think that if we could add as little > per-character overhead as possible it would be best -- sometimes people *do* > hash very long strings. Yep, agreed. My original proposal did not adequately address this. >> Another option to consider would be to apply this change to some but >> not all of the rounds. If we added the seed lookup xor operation for >> only the first and last 5 values of x, we would still retain much of >> the benefit without adding much computational overhead for very long >> strings. > > I like that. I believe this is a reasonable solution. An attacker could still manipulate the internal state of long strings, but the additional information at both ends should make that difficult to exploit. I'll talk it over with the reviewers. >> We could also consider a less computationally expensive operation than >> the modulo for calculating the lookup index, like simply truncating to >> the correct number of bits. > > Sure. Thanks for thinking about all the details here!! Again, I'll talk to the reviewers (and run the randomness test battery) to be double-check that this doesn't affect the distribution in some unexpected way, but I think it should be fine. > PS. Is the collision-generator used in the attack code open source? Not in working form, and they've turned down requests for it from other projects that want to check their work. If it's imperative that we have one, I can write one, but I'd rather not spend the effort if we don't need it. -Paul ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On 1/1/2012 12:28 PM, Christian Heimes wrote: Am 01.01.2012 17:54, schrieb Antoine Pitrou: I don't understand. FNV-1 multiplies the current running result with a prime and then xors it with the following byte. This is also what we do. (I'm assuming 103 is prime) There must be a major difference somewhere inside the algorithm. The talk at the CCC conference in Berlin mentions that Ruby 1.9 is not vulnerable to meet-in-the-middle attacks and Ruby 1.9 uses FNV. The C code of FNV is more complex than our code, too. I understood Alexander Klink and Julian Wälde, hash...@alech.de, as saying that they consider that using a random non-zero start value is sufficient to make the hash non-vulnerable. -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On 1/1/2012 10:13 AM, Guido van Rossum wrote: PS. Is the collision-generator used in the attack code open source? As I posted before, Alexander Klink and Julian Wälde gave their project email as hash...@alech.de. Since they indicated disappointment in not hearing from Python, I presume they would welcome engagement. -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Le 01/01/2012 04:29, Paul McMillan a écrit : This is incorrect. Once an attacker has guessed the random seed, any operation which reveals the ordering of hashed objects can be used to verify the answer. JSON responses would be ideal. In fact, an attacker can do a brute-force attack of the random seed offline. Once they have the seed, generating collisions is a fast process. If we want to protect a website against this attack for example, we must suppose that the attacker can inject arbitrary data and can get (indirectly) the result of hash(str) (e.g. with the representation of a dict in a traceback, with a JSON output, etc.). The goal isn't perfection, but we need to do better than a simple salt. I disagree. I don't want to break backward compatibility and have a hash() function different for each process, if the change is not an effective protection against the "hash vulnerability". It's really hard to write a good (secure) hash function: see for example the recent NIST competition (started in 2008, will end this year). Even good security researcher are unable to write a strong and fast hash function. It's easy to add a weakness in the function if you don't have a good background in cryptography. The NIST competition gives 4 years to analyze new hash functions. We should not rush to add a quick "hack" if it doesn't solve correctly the problem (protect against a collision attack and preimage attack). http://en.wikipedia.org/wiki/NIST_hash_function_competition http://en.wikipedia.org/wiki/Collision_attack Runtime performance does matter, I'm not completly sure that changing Python is the best place to add a countermeasure against a vulnerability. I don't want to slow down numpy for a web vulnerability. Because there are different use cases, a better compromise is maybe to add a runtime option to use a secure hash function, and keep the unsafe but fast hash function by default. I propose we modify the string hash function like this: https://gist.github.com/0a91e52efa74f61858b5 Always allocate 2**21 bytes just to workaround one specific kind of attack is not acceptable. I suppose that the maximum acceptable is 4096 bytes (or better 256 bytes). Crytographic hash functions don't need random data, why would Python need 2 MB (!) for its hash function? Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Am 01.01.2012 01:11, schrieb Guido van Rossum: > FWIW I managed to build Python 2.6, and a trivial mutation of the > string/unicode hash function (add 1 initially) made only three tests > fail; test_symtable and test_json both have a dependency on dictionary > order, test_ctypes I can't quite figure out what's going on. In my fork, these tests are failing: test_dbm test_dis test_gdb test_inspect test_packaging test_set test_symtable test_urllib test_userdict test_collections test_json ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Am 01.01.2012 17:54, schrieb Antoine Pitrou: > I don't understand. FNV-1 multiplies the current running result with a > prime and then xors it with the following byte. This is also what we do. > (I'm assuming 103 is prime) There must be a major difference somewhere inside the algorithm. The talk at the CCC conference in Berlin mentions that Ruby 1.9 is not vulnerable to meet-in-the-middle attacks and Ruby 1.9 uses FNV. The C code of FNV is more complex than our code, too. Christian ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Le dimanche 01 janvier 2012 à 17:34 +0100, Christian Heimes a écrit : > Am 01.01.2012 17:09, schrieb Antoine Pitrou: > > On Sun, 01 Jan 2012 16:48:32 +0100 > > Christian Heimes wrote: > >> The talkers claim and have shown that it's too easy to pre-calculate > >> collisions with hashing algorithms similar to DJBX33X / DJBX33A. It > >> might be a good idea to change the hashing algorithm, too. Paul as > >> listed some new algorithms. Ruby 1.9 is using FNV > >> http://isthe.com/chongo/tech/comp/fnv/ which promises to be fast with a > >> good dispersion pattern. > > > > We already seem to be using a FNV-alike, is it just a matter of > > changing the parameters? > > No, we are using something similar to DJBX33X. FNV is a completely > different type of hash algorithm. I don't understand. FNV-1 multiplies the current running result with a prime and then xors it with the following byte. This is also what we do. (I'm assuming 103 is prime) I see two differences: - FNV starts with a non-zero constant offset basis - FNV uses a different prime than ours (as a side note, FNV operates on bytes, but for unicode we must operate on code points in [0, 1114111]: although arguably the common case is hashing ASCII substrings (protocol tokens etc.)) Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Am 01.01.2012 17:09, schrieb Antoine Pitrou: > On Sun, 01 Jan 2012 16:48:32 +0100 > Christian Heimes wrote: >> The talkers claim and have shown that it's too easy to pre-calculate >> collisions with hashing algorithms similar to DJBX33X / DJBX33A. It >> might be a good idea to change the hashing algorithm, too. Paul as >> listed some new algorithms. Ruby 1.9 is using FNV >> http://isthe.com/chongo/tech/comp/fnv/ which promises to be fast with a >> good dispersion pattern. > > We already seem to be using a FNV-alike, is it just a matter of > changing the parameters? No, we are using something similar to DJBX33X. FNV is a completely different type of hash algorithm. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Am 01.01.2012 06:57, schrieb Paul McMillan: > I agree that doing anything is better than doing nothing. If we use > the earlier suggestion and prepend everything with a fixed-length > seed, we need quite a bit of entropy (and so a fairly long string) to > make a lasting difference. Your code at https://gist.github.com/0a91e52efa74f61858b5 reads about 2 MB (2**21 - 1) data from urandom. I'm worried that this is going to exhaust the OS's random pool and suck it dry. We shouldn't forget that Python is used for long running processes as well as short scripts. Your suggestion also increases the process size by 2 MB which is going to be an issue for mobile and embedded platforms. How about this: r = [ord(i) for i in os.urandom(256)] rs = os.urandom(4) # or 8 ? seed = rs[-1] + (rs[-2] << 8) + (rs[-3] << 16) + (rs[-4] << 24) def _hash_string(s): """The algorithm behind compute_hash() for a string or a unicode.""" from pypy.rlib.rarithmetic import intmask length = len(s) if length == 0: return -1 x = intmask(seed + (ord(s[0]) << 7)) i = 0 while i < length: o = ord(s[i]) x = intmask((103*x) ^ o ^ r[o % 0xff] i += 1 x ^= length return intmask(x) This combines a random seed for the hash with your suggestion. We also need to special case short strings. The above routine hands over the seed to attackers, if he is able to retrieve lots of single character hashes. The randomization shouldn't be used if we can prove that it's not possible to create hash collisions for strings shorter than X. For example 64bit FNV-1 has no collisions for 8 chars or less, 32bit FNV has no collisions for 4 or less cars. Christian Christian ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On Sun, 01 Jan 2012 16:56:19 +0100 Christian Heimes wrote: > Am 01.01.2012 05:11, schrieb Antoine Pitrou: > > On Sat, 31 Dec 2011 16:56:00 -0700 > > Guido van Rossum wrote: > >> ISTM the only reasonable thing is to have a random seed picked very early > >> in the process, to be used to change the hash() function of > >> str/bytes/unicode (in a way that they are still compatible with each > >> other). > > > > Do str and bytes still have to be compatible with each other in 3.x? > > py3k has tests for hash("ascii") == hash(b"ascii"). Are you talking > about this invariant? Yes. It doesn't seem to have any point anymore. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On Sun, 01 Jan 2012 16:48:32 +0100 Christian Heimes wrote: > The talkers claim and have shown that it's too easy to pre-calculate > collisions with hashing algorithms similar to DJBX33X / DJBX33A. It > might be a good idea to change the hashing algorithm, too. Paul as > listed some new algorithms. Ruby 1.9 is using FNV > http://isthe.com/chongo/tech/comp/fnv/ which promises to be fast with a > good dispersion pattern. We already seem to be using a FNV-alike, is it just a matter of changing the parameters? > A hashing algorithm without a > meet-in-the-middle vulnerability would reduce the pressure on a good and > secure seed, too. > > > We need to fix this as far back as Python 2.6, and it would be nice if a > > source patch was available that works on Python 2.5 -- personally I do > > have a need for a 2.5 fix and if nobody creates one I will probably end > > up backporting the fix from 2.6 to 2.5. > > +1 > > Should the randomization be disabled on 2.5 to 3.2 by default to reduce > backward compatibility issues? Isn't 2.5 already EOL'ed? As for 3.2, I'd say no. I don't know about 2.6 and 2.7. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Am 01.01.2012 05:11, schrieb Antoine Pitrou: > On Sat, 31 Dec 2011 16:56:00 -0700 > Guido van Rossum wrote: >> ISTM the only reasonable thing is to have a random seed picked very early >> in the process, to be used to change the hash() function of >> str/bytes/unicode (in a way that they are still compatible with each other). > > Do str and bytes still have to be compatible with each other in 3.x? py3k has tests for hash("ascii") == hash(b"ascii"). Are you talking about this invariant? Christian ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Am 01.01.2012 00:56, schrieb Guido van Rossum: > ISTM the only reasonable thing is to have a random seed picked very > early in the process, to be used to change the hash() function of > str/bytes/unicode (in a way that they are still compatible with each other). > > The seed should be unique per process except it should survive fork() > (but not exec()). I'm not worried about unrelated processes needing to > have the same hash(), but I'm not against offering an env variable or > command line flag to force the seed. I've created a clone at http://hg.python.org/features/randomhash/ as a testbed. The code creates the seed very early in PyInitializeEx(). The method isn't called on fork() but on exec(). > I'm not too concerned about a 3rd party being able to guess the random > seed -- this would require much more effort on their part, since they > would have to generate a new set of colliding keys each time they think > they have guessed the hash (as long as they can't force the seed -- this > actually argues slightly *against* offering a way to force the seed, > except that we have strong backwards compatibility requirements). The talkers claim and have shown that it's too easy to pre-calculate collisions with hashing algorithms similar to DJBX33X / DJBX33A. It might be a good idea to change the hashing algorithm, too. Paul as listed some new algorithms. Ruby 1.9 is using FNV http://isthe.com/chongo/tech/comp/fnv/ which promises to be fast with a good dispersion pattern. A hashing algorithm without a meet-in-the-middle vulnerability would reduce the pressure on a good and secure seed, too. > We need to fix this as far back as Python 2.6, and it would be nice if a > source patch was available that works on Python 2.5 -- personally I do > have a need for a 2.5 fix and if nobody creates one I will probably end > up backporting the fix from 2.6 to 2.5. +1 Should the randomization be disabled on 2.5 to 3.2 by default to reduce backward compatibility issues? Christian ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Am 31.12.2011 23:38, schrieb Terry Reedy: > On 12/31/2011 4:43 PM, PJ Eby wrote: > >> Here's an idea. Suppose we add a sys.hash_seed or some such, that's >> settable to an int, and defaults to whatever we're using now. Then >> programs that want a fix can just set it to a random number, > > I do not think we can allow that to change once there are hashed > dictionaries existing. Me, too. Armin suggested to use an env var as random. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Am 01.01.2012 16:13, schrieb Guido van Rossum: > Different concern. What if someone were to have code implementing an > external, persistent hash table, using Python's hash() function? They > might have a way to rehash everything when a new version of Python comes > along, but they would not be happy if hash() is different in each > process. I somehow vaguely remember possibly having seen such code, or > something else where a bit of random data was needed and hash() was used > since it's so easily available. I had the same concern as you and was worried that projects like ZODB might require a stable hash function. Fred already stated that ZODB doesn't use the hash in its btree structures. Possible solutions: * make it possible to provide the seed as an env var * disable randomizing as default setting or at least add an option to disable randomization IMHO the issue needs a PEP that explains the issue, shows all possible solutions and describes how we have solved the issue. I'm willing to start a PEP. Who likes to be the co-author? Christian ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
PS. Is the collision-generator used in the attack code open source? -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Different concern. What if someone were to have code implementing an external, persistent hash table, using Python's hash() function? They might have a way to rehash everything when a new version of Python comes along, but they would not be happy if hash() is different in each process. I somehow vaguely remember possibly having seen such code, or something else where a bit of random data was needed and hash() was used since it's so easily available. -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On Sat, Dec 31, 2011 at 10:57 PM, Paul McMillan wrote: > > Still, it would represent an effort for the attacker of a much greater > > magnitude than the current attack. It's all a trade-off -- at some point > > it'll just be easier for the attacker to use some other vulnerability. > Also > > the class of vulnerable servers would be greatly reduced. > > I agree that doing anything is better than doing nothing. If we use > the earlier suggestion and prepend everything with a fixed-length > seed, we need quite a bit of entropy (and so a fairly long string) to > make a lasting difference. > Ah, but the effect of that long string is summarized in a single (32- or 64-bit) integer. > > I'm not sure I understand this. What's the worry about "a different > class of > > hash functions"? (It may be clear that I do not have a deep mathematical > > understanding of hash functions.) > > This was mostly in reference to earlier suggestions of switching to > cityhash, or using btrees, or other more invasive changes. Python 2.X > is pretty stable and making large changes like that to the codebase > can have unpredictable effects. Agreed. > We know that the current hash function > works well (except for this specific problem), so it seems like the > best fix will be as minimal a modification as possible, to avoid > introducing bugs. > Yup. > > I forget -- what do we do on systems without urandom()? (E.g. Windows?) > Windows has CryptGenRandom which is approximately equivalent. > > > Let's worry about timing attacks another time okay? > Agreed. As long as there isn't a gaping hole, I'm fine with that. > > > Hm. I'm not sure I like the idea of extra arithmetic for every character > > being hashed. > > From a performance standpoint, this may still be better than adding 8 > or 10 characters to every single hash operation, since most hashes are > over short strings. But how about precomputing the intermediate value (x)? The hash is (mostly) doing x = f(x, c) for each c in the input. It is important that this function touches every > character - if it only interacts with a subset of them, an attacker > can fix that subset and vary the rest. > I sort of see your point, but I still think that if we could add as little per-character overhead as possible it would be best -- sometimes people *do* hash very long strings. > > But I like the idea of a bigger random seed from which we > > deterministically pick some part. > > Yeah. This makes it much harder to attack, since it very solidly > places the attacker outside the realm of "just brute force the key". > > > How about just initializing x to some > > subsequence of the seed determined by e.g. the length of the hashed > string > > plus a few characters from it? > > We did consider this, and if performance is absolutely the prime > directive, this (or a variant) may be the best option. Unfortunately, > the collision generator doesn't necessarily vary the length of the > string. Additionally, if we don't vary based on all the letters in the > string, an attacker can fix the characters that we do use and generate > colliding strings around them. > Still, much more work for the attacker. > Another option to consider would be to apply this change to some but > not all of the rounds. If we added the seed lookup xor operation for > only the first and last 5 values of x, we would still retain much of > the benefit without adding much computational overhead for very long > strings. > I like that. > We could also consider a less computationally expensive operation than > the modulo for calculating the lookup index, like simply truncating to > the correct number of bits. > Sure. Thanks for thinking about all the details here!! -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
> Still, it would represent an effort for the attacker of a much greater > magnitude than the current attack. It's all a trade-off -- at some point > it'll just be easier for the attacker to use some other vulnerability. Also > the class of vulnerable servers would be greatly reduced. I agree that doing anything is better than doing nothing. If we use the earlier suggestion and prepend everything with a fixed-length seed, we need quite a bit of entropy (and so a fairly long string) to make a lasting difference. > I'm not sure I understand this. What's the worry about "a different class of > hash functions"? (It may be clear that I do not have a deep mathematical > understanding of hash functions.) This was mostly in reference to earlier suggestions of switching to cityhash, or using btrees, or other more invasive changes. Python 2.X is pretty stable and making large changes like that to the codebase can have unpredictable effects. We know that the current hash function works well (except for this specific problem), so it seems like the best fix will be as minimal a modification as possible, to avoid introducing bugs. > I forget -- what do we do on systems without urandom()? (E.g. Windows?) Windows has CryptGenRandom which is approximately equivalent. > Let's worry about timing attacks another time okay? Agreed. As long as there isn't a gaping hole, I'm fine with that. > Hm. I'm not sure I like the idea of extra arithmetic for every character > being hashed. >From a performance standpoint, this may still be better than adding 8 or 10 characters to every single hash operation, since most hashes are over short strings. It is important that this function touches every character - if it only interacts with a subset of them, an attacker can fix that subset and vary the rest. > But I like the idea of a bigger random seed from which we > deterministically pick some part. Yeah. This makes it much harder to attack, since it very solidly places the attacker outside the realm of "just brute force the key". > How about just initializing x to some > subsequence of the seed determined by e.g. the length of the hashed string > plus a few characters from it? We did consider this, and if performance is absolutely the prime directive, this (or a variant) may be the best option. Unfortunately, the collision generator doesn't necessarily vary the length of the string. Additionally, if we don't vary based on all the letters in the string, an attacker can fix the characters that we do use and generate colliding strings around them. Another option to consider would be to apply this change to some but not all of the rounds. If we added the seed lookup xor operation for only the first and last 5 values of x, we would still retain much of the benefit without adding much computational overhead for very long strings. We could also consider a less computationally expensive operation than the modulo for calculating the lookup index, like simply truncating to the correct number of bits. -Paul ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On Sat, Dec 31, 2011 at 8:29 PM, Paul McMillan wrote: > > I'm not too concerned about a 3rd party being able to guess the random > seed > > -- this would require much more effort on their part, since they would > have > > to generate a new set of colliding keys each time they think they have > > guessed the hash > > This is incorrect. Once an attacker has guessed the random seed, any > operation which reveals the ordering of hashed objects can be used to > verify the answer. JSON responses would be ideal. In fact, an attacker > can do a brute-force attack of the random seed offline. Once they have > the seed, generating collisions is a fast process. > Still, it would represent an effort for the attacker of a much greater magnitude than the current attack. It's all a trade-off -- at some point it'll just be easier for the attacker to use some other vulnerability. Also the class of vulnerable servers would be greatly reduced. > The goal isn't perfection, but we need to do better than a simple > salt. Perhaps. > I propose we modify the string hash function like this: > > https://gist.github.com/0a91e52efa74f61858b5 > > This code is based on PyPy's implementation, but the concept is > universal. Rather than choosing a single short random seed per > process, we generate a much larger random seed (r). As we hash, we > deterministically choose a portion of that seed and incorporate it > into the hash process. This modification is a minimally intrusive > change to the existing hash function, and so should not introduce > unexpected side effects which might come from switching to a different > class of hash functions. > I'm not sure I understand this. What's the worry about "a different class of hash functions"? (It may be clear that I do not have a deep mathematical understanding of hash functions.) > I've worked through this code with Alex Gaynor, Antoine Pitrou, and > Victor Stinner, and have asked several mathematicians and security > experts to review the concept. The reviewers who have gotten back to > me thus far have agreed that if the initial random seed is not flawed, I forget -- what do we do on systems without urandom()? (E.g. Windows?) > this should not overly change the properties of the hash function, but > should make it quite difficult for an attacker to deduce the necessary > information to predictably cause hash collisions. This function is not > designed to protect against timing attacks, but should be nontrivial > to reverse even with access to timing data. > Let's worry about timing attacks another time okay? > Empirical testing shows that this unoptimized python implementation > produces ~10% slowdown in the hashing of ~20 character strings. This > is probably an acceptable trade off, and actually provides better > performance in the case of short strings than a high-entropy > fixed-length seed prefix. > Hm. I'm not sure I like the idea of extra arithmetic for every character being hashed. But I like the idea of a bigger random seed from which we deterministically pick some part. How about just initializing x to some subsequence of the seed determined by e.g. the length of the hashed string plus a few characters from it? -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On Sat, Dec 31, 2011 at 9:11 PM, Antoine Pitrou wrote: > On Sat, 31 Dec 2011 16:56:00 -0700 > Guido van Rossum wrote: > > ISTM the only reasonable thing is to have a random seed picked very early > > in the process, to be used to change the hash() function of > > str/bytes/unicode (in a way that they are still compatible with each > other). > > Do str and bytes still have to be compatible with each other in 3.x? > Hm, you're right, that's no longer a concern. (Though ATM the hashes still *are* compatible.) > Merry hashes, weakrefs and thread-local memoryviews to everyone! > :-) -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On Sat, 31 Dec 2011 16:56:00 -0700 Guido van Rossum wrote: > ISTM the only reasonable thing is to have a random seed picked very early > in the process, to be used to change the hash() function of > str/bytes/unicode (in a way that they are still compatible with each other). Do str and bytes still have to be compatible with each other in 3.x? Merry hashes, weakrefs and thread-local memoryviews to everyone! cheers Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
(Well, technically, you could use trees or some other O log n data structure as a fallback once you have too many collisions, for some value of "too many". Seems a bit wasteful for the purpose, though.) I don't think that would be wasteful. You wouldn't just use the tree for the case of too many collisions, but for any collision. You might special-case the case of a single key, i.e. start using the tree only if there is a collision. The issue is not the effort, but the need to support ordering if you want to use trees. So you might restrict this to dicts that have only str keys (which in practice should never have any collision, unless it's a deliberate setup). I'd use the tagged-pointer trick to determine whether a key is an object pointer (tag 0) or an AVL tree (tag 1). So in the common case of interned strings, the comparison for pointer equality (which is the normal case if the keys are interned) will succeed quickly; if pointer comparison fails, check the tag bit. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
> I'm not too concerned about a 3rd party being able to guess the random seed > -- this would require much more effort on their part, since they would have > to generate a new set of colliding keys each time they think they have > guessed the hash This is incorrect. Once an attacker has guessed the random seed, any operation which reveals the ordering of hashed objects can be used to verify the answer. JSON responses would be ideal. In fact, an attacker can do a brute-force attack of the random seed offline. Once they have the seed, generating collisions is a fast process. The goal isn't perfection, but we need to do better than a simple salt. I propose we modify the string hash function like this: https://gist.github.com/0a91e52efa74f61858b5 This code is based on PyPy's implementation, but the concept is universal. Rather than choosing a single short random seed per process, we generate a much larger random seed (r). As we hash, we deterministically choose a portion of that seed and incorporate it into the hash process. This modification is a minimally intrusive change to the existing hash function, and so should not introduce unexpected side effects which might come from switching to a different class of hash functions. I've worked through this code with Alex Gaynor, Antoine Pitrou, and Victor Stinner, and have asked several mathematicians and security experts to review the concept. The reviewers who have gotten back to me thus far have agreed that if the initial random seed is not flawed, this should not overly change the properties of the hash function, but should make it quite difficult for an attacker to deduce the necessary information to predictably cause hash collisions. This function is not designed to protect against timing attacks, but should be nontrivial to reverse even with access to timing data. Empirical testing shows that this unoptimized python implementation produces ~10% slowdown in the hashing of ~20 character strings. This is probably an acceptable trade off, and actually provides better performance in the case of short strings than a high-entropy fixed-length seed prefix. -Paul ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On Sat, Dec 31, 2011 at 4:56 PM, Guido van Rossum wrote: > PS. I would propose a specific fix but I can't seem to build a working > CPython from the trunk on my laptop (OS X 10.6, Xcode 4.1). I get this > error late in the build: > > ./python.exe -SE -m sysconfig --generate-posix-vars > Fatal Python error: Py_Initialize: can't initialize sys standard streams > Traceback (most recent call last): > File "/Users/guido/cpython/Lib/io.py", line 60, in > make: *** [Lib/_sysconfigdata.py] Abort trap > FWIW I managed to build Python 2.6, and a trivial mutation of the string/unicode hash function (add 1 initially) made only three tests fail; test_symtable and test_json both have a dependency on dictionary order, test_ctypes I can't quite figure out what's going on. Oh, and an unrelated failure in test_sqlite: File "/Users/guido/pythons/p26/Lib/sqlite3/test/types.py", line 355, in CheckSqlTimestamp self.failUnlessEqual(ts.year, now.year) AssertionError: 2012 != 2011 I betcha that's because it's still 2011 here in Texas but already 2012 in UTC-land. Happy New Year everyone! :-) -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
ISTM the only reasonable thing is to have a random seed picked very early in the process, to be used to change the hash() function of str/bytes/unicode (in a way that they are still compatible with each other). The seed should be unique per process except it should survive fork() (but not exec()). I'm not worried about unrelated processes needing to have the same hash(), but I'm not against offering an env variable or command line flag to force the seed. I'm not too concerned about a 3rd party being able to guess the random seed -- this would require much more effort on their part, since they would have to generate a new set of colliding keys each time they think they have guessed the hash (as long as they can't force the seed -- this actually argues slightly *against* offering a way to force the seed, except that we have strong backwards compatibility requirements). We need to fix this as far back as Python 2.6, and it would be nice if a source patch was available that works on Python 2.5 -- personally I do have a need for a 2.5 fix and if nobody creates one I will probably end up backporting the fix from 2.6 to 2.5. Is there a tracker issue yet? The discussion should probably move there. PS. I would propose a specific fix but I can't seem to build a working CPython from the trunk on my laptop (OS X 10.6, Xcode 4.1). I get this error late in the build: ./python.exe -SE -m sysconfig --generate-posix-vars Fatal Python error: Py_Initialize: can't initialize sys standard streams Traceback (most recent call last): File "/Users/guido/cpython/Lib/io.py", line 60, in make: *** [Lib/_sysconfigdata.py] Abort trap -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On 12/31/2011 4:43 PM, PJ Eby wrote: Here's an idea. Suppose we add a sys.hash_seed or some such, that's settable to an int, and defaults to whatever we're using now. Then programs that want a fix can just set it to a random number, I do not think we can allow that to change once there are hashed dictionaries existing. -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On Sat, Dec 31, 2011 at 4:04 PM, Jeffrey Yasskin wrote: > Hash functions are already unstable across Python versions. Making > them unstable across interpreter processes (multiprocessing doesn't > share dicts, right?) doesn't sound like a big additional problem. > Users who want a distributed hash table will need to pull their own > hash function out of hashlib or re-implement a non-cryptographic hash > instead of using the built-in one, but they probably need to do that > already to allow themselves to upgrade Python. > Here's an idea. Suppose we add a sys.hash_seed or some such, that's settable to an int, and defaults to whatever we're using now. Then programs that want a fix can just set it to a random number, and on Python versions that support it, it takes effect. Everywhere else it's a silent no-op. Downside: sys has to have slots for this to work; does sys actually have slots? My memory's hazy on that. I guess actually it'd have to be sys.set_hash_seed(). But same basic idea. Anyway, this would make fixing the problem *possible*, while still pushing off the hard decisions to the app/framework developers. ;-) Downside: every hash operation includes one extra memory access, but strings only compute their hash once anyway.) Given that changing dict won't help, and changing the default hash is a non-starter, an option to set the seed is probably the way to go. (Maybe with an environment variable and/or command line option so users can work around old code.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On Wed, Dec 28, 2011 at 5:37 PM, Jesse Noller wrote: > > > On Wednesday, December 28, 2011 at 8:28 PM, Michael Foord wrote: > >> Hello all, >> >> A paper (well, presentation) has been published highlighting security >> problems with the hashing algorithm (exploiting collisions) in many >> programming languages Python included: >> >> http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf >> >> Although it's a security issue I'm posting it here because it is now public >> and seems important. >> >> The issue they report can cause (for example) handling an http post to >> consume horrible amounts of cpu. For Python the figures they quoted: >> >> reasonable-sized attack strings only for 32 bits Plone has max. POST size of >> 1 MB >> 7 minutes of CPU usage for a 1 MB request >> ~20 kbits/s → keep one Core Duo core busy >> >> This was apparently reported to the security list, but hasn't been responded >> to beyond an acknowledgement on November 24th (the original report didn't >> make it onto the security list because it was held in a moderation queue). >> >> The same vulnerability was reported against various languages and web >> frameworks, and is already fixed in some of them. >> >> Their recommended fix is to randomize the hash function. >> >> All the best, >> >> Michael >> > Back up link for the PDF: > http://dl.dropbox.com/u/1374/2007_28C3_Effective_DoS_on_web_application_platforms.pdf > > Ocert disclosure: > http://www.ocert.org/advisories/ocert-2011-003.html Discussion of hash functions in general: http://burtleburtle.net/bob/hash/doobs.html Two of the best hash functions that currently exist: http://code.google.com/p/cityhash/ and http://code.google.com/p/smhasher/wiki/MurmurHash. I'm not sure exactly what problem the paper is primarily complaining about: 1. Multiply+add and multiply+xor hashes are weak: this would be solved by changing to either of the better-and-faster hashes I linked to above. On the other hand: http://mail.python.org/pipermail/python-3000/2007-September/010327.html 2. It's possible to find collisions in any hash function in a 32-bit space: only solved by picking a varying seed at startup or compile time. If you decide to change to a stronger hash function overall, it might also be useful to change the advice "to somehow mix together (e.g. using exclusive or) the hash values for the components" in http://docs.python.org/py3k/reference/datamodel.html#object.__hash__. hash(tuple(components)) will likely be better if tuple's hash is improved. Hash functions are already unstable across Python versions. Making them unstable across interpreter processes (multiprocessing doesn't share dicts, right?) doesn't sound like a big additional problem. Users who want a distributed hash table will need to pull their own hash function out of hashlib or re-implement a non-cryptographic hash instead of using the built-in one, but they probably need to do that already to allow themselves to upgrade Python. Jeffrey ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On Sat, Dec 31, 2011 at 7:03 AM, Stephen J. Turnbull wrote: > While the dictionary probe has to start with a hash for backward > compatibility reasons, is there a reason the overflow strategy for > insertion has to be buckets containing lists? How about > double-hashing, etc? > This won't help, because the keys still have the same hash value. ANYTHING you do to them after they're generated will result in them still colliding. The *only* thing that works is to change the hash function in such a way that the strings end up with different hashes in the first place. Otherwise, you'll still end up with (deliberate) collisions. (Well, technically, you could use trees or some other O log n data structure as a fallback once you have too many collisions, for some value of "too many". Seems a bit wasteful for the purpose, though.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Zitat von Victor Stinner : The current implementation of dict uses a linked-list. [...] Tell me if I am wrong. At least with this statement, you are wrong: the current implementation does *not* use a linked-list. Instead, it uses open addressing. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Am 31.12.2011 13:03, schrieb Stephen J. Turnbull: > I don't know the implementation issues well enough to claim it is a > solution, but this hasn't been mentioned before AFAICS: > > While the dictionary probe has to start with a hash for backward > compatibility reasons, is there a reason the overflow strategy for > insertion has to be buckets containing lists? How about > double-hashing, etc? Python's dict implementation doesn't use bucket but open addressing (aka closed hashed table). The algorithm for conflict resolution doesn't use double hashing. Instead it takes the original and (in most cases) cached hash and perturbs the hash with a series of add, multiply and bit shift ops. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com