Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread Fred Drake
On Thu, Dec 29, 2011 at 8:19 AM, Christian Heimes li...@cheimes.de wrote:
 Persistence layers like ZODB and cross interpreter communication
 channels used by multiprocessing may (!) rely on the fact that the hash
 of a string is fixed.

ZODB does not rely on a fixed hash function for strings; for any application
to rely on a stable hash would cause problems when updating Python versions.


  -Fred

-- 
Fred L. Drake, Jr.    fdrake at acm.org
A person who won't read has no advantage over one who can't read.
   --Samuel Langhorne Clemens
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread Armin Ronacher
Hi,

Just some extra thoughts about the whole topic in the light of web
applications (since this was hinted in the talk) running on Python:

   Yes, you can limit the number of maximum allowed parameters for post
   data but really there are so many places where data is parsed into
   hashing containers that it's quite a worthless task.  Here a very
   brief list of things usually parsed into a dict or set and where it
   happens:

   - URL parameters and url encoded form data
 Generally this happens somewhere in a framework but typically also
 in utility libraries that deal with URLs.  For instance the
 stdlib's cgi.parse_qs or urllib.parse.parse_qs on Python 3 do
 just that and that code is used left and right.

 Even if a framework would start limiting it's own URL parsing there
 is still a lot of code that does not do that the stdlib does that
 as well.

 With form data it's worse because you have multipart headers that
 need parsing and that is usually abstracted away so far from the
 user that they do not do that.  Many frameworks just use the cgi
 module's parsing functions which also just directly feed into a
 dictionary.

   - HTTP headers.
 There is zero a WSGI framework can do about that since the headers
 are parsed into a dictionary by the WSGI server.

   - Incoming JSON data.
 Again outside of what the framework can do for the most part.
 simplejson can be modified to stop parsing with the hook stuff
 but nobody does that and since users invoke simplejson's parsing
 routines themselves most webapps would still be vulnerable even
 if all frameworks would fix the problem.

   - Hidden dict parameters.
 Things like the parameter part of content-type or the
 content-disposition headers are generally also just parsed into a
 dictionary.  Likewise many frameworks parse things into set headers
 (for instance incoming etags).  The cookie header is usually parsed
 into a dictionary as well.

The issue is nothing new and at least my current POV on this topic was
that your server should be guarded and shoot handlers of requests going
rogue.  Dictionaries are not the only thing that has a worst case
performance that could be triggered by user input.

That said.  Considering that there are so many different places where
things are probably close to arbitrarily long that is parsed into a
dictionary or other hashing structure it's hard for a web application
developer or framework to protect itself against.

In case the watchdog is not a viable solution as I had assumed it was, I
think it's more reasonable to indeed consider adding a flag to Python
that allows randomization of hashes optionally before startup.

However as it was said earlier, the attack is a lot more complex to
carry out on a 64bit environment that it's probably (as it stands right
now!) safe to ignore.

The main problem there however is not that it's a new attack but that
some dickheads could now make prebaked attacks against websites to
disrupt them that might cause some negative publicity.  In general
though there are so many more ways to DDOS a website than this that I
would rate the whole issue very low.


Regards,
Armin

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread Armin Ronacher
Hi,

Something I should add to this now that I thought about it a bit more:

Assuming this should be fixed on a language level the solution would
probably be to salt hashes.  The most common hash to salt here is the
PyUnicode hash for obvious reasons.

- Option a: Compiled in Salt
  + Easy to implement
  - Breaks unittests most likely (those were broken in the first place
but that's still a very annoying change to make)
  - Might cause problems with interoperability of Pythons compiled with
different hash salts
  - You're not really solving the problem because each linux
distribution (besides Gentoo I guess) would have just one salt
compiled in and that would be popular enough to have the same
issue.

- Option b: Environment variable for the salt
  + Easy-ish to implement
  + Easy to synchronize over different machines
  - initialization for base types happens early and unpredictive which
makes it hard for embedded Python interpreters (think mod_wsgi and
other things) to specify the salt

- Option c: Random salt at runtime
  + Easy to implement
  - impossible to synchronize
  - breaks unittests in the same way as a compiled in salt would do

Where to add the salt to?  Unicode strings and bytestrings (byte
objects) I guess since those are the most common offenders.  Sometimes
tuples are keys of dictionaries but in that case a contributing factor
to the hash is the string in the tuple anyways.

Also related: since this is a security related issue, would this be
something that goes into Python 2?  Does that affect how a fix would
look like?


Regards,
Armin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread Ned Batchelder

On 12/28/2011 9:09 PM, Raymond Hettinger wrote:

Also, randomizing the hash wreaks havoc on doctests, book examples
not matching actual dict reprs, and on efforts by users to optimize
the insertion order into dicts with frequent lookups.
I don't have a strong opinion about what to do about this vulnerability, 
but I know that none of these three reasons are a good reason to not 
change anything.  Dictionary key order has never been guaranteed, and 
changes from time to time.  Any code relying on it is broken to begin 
with. This is one of the reasons not to use doctests in the first place: 
comparing dicts textually has always been silly.


--Ned.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread Mark Shannon

Michael Foord wrote:

Hello all,

A paper (well, presentation) has been published highlighting security problems 
with the hashing algorithm (exploiting collisions) in many programming 
languages Python included:

 
http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf

Although it's a security issue I'm posting it here because it is now public and 
seems important.

The issue they report can cause (for example) handling an http post to consume 
horrible amounts of cpu. For Python the figures they quoted:

reasonable-sized attack strings only for 32 bits Plone has max. POST 
size of 1 MB
7 minutes of CPU usage for a 1 MB request
~20 kbits/s → keep one Core Duo core busy

This was apparently reported to the security list, but hasn't been responded to beyond an acknowledgement on November 24th (the original report didn't make it onto the security list because it was held in a moderation queue). 


The same vulnerability was reported against various languages and web 
frameworks, and is already fixed in some of them.

Their recommended fix is to randomize the hash function.



The attack relies on being able to predict the hash value for a given 
string. Randomising the string hash function is quite straightforward.

There is no need to change the dictionary code.

A possible (*untested*) patch is attached. I'll leave it for those more 
familiar with unicodeobject.c to do properly.


Cheers,
Mark
diff -r f1298f4ec638 Objects/unicodeobject.c
--- a/Objects/unicodeobject.c	Wed Dec 28 14:59:40 2011 +
+++ b/Objects/unicodeobject.c	Thu Dec 29 11:11:09 2011 +
@@ -41,6 +41,7 @@
 #define PY_SSIZE_T_CLEAN
 #include Python.h
 #include ucnhash.h
+#include time.h   /* for seeding of hash value */
 
 #ifdef MS_WINDOWS
 #include windows.h
@@ -184,6 +185,9 @@
 /* Single character Unicode strings in the Latin-1 range are being
shared as well. */
 static PyObject *unicode_latin1[256];
+
+/* Base hash value (hash of empty string) */
+static Py_uhash_t base_hash;
 
 /* Fast detection of the most frequent whitespace characters */
 const unsigned char _Py_ascii_whitespace[] = {
@@ -11107,7 +1,7 @@
 
 /* The hash function as a macro, gets expanded three times below. */
 #define HASH(P) \
-x = (Py_uhash_t)*P  7; \
+x = base_hash + (Py_uhash_t)*P  7; \
 while (--len = 0) \
 x = (103*x) ^ (Py_uhash_t)*P++;
 
@@ -13869,6 +13873,14 @@
 int _PyUnicode_Init(void)
 {
 int i;
+time_t now;
+
+time(now);
+base_hash = (Py_uhash_t)now;
+/* Avoid overly large numbers to reduce the chance of small strings
+ * hashing to -1 */
+if (base_hash == (Py_uhash_t)-1)
+base_hash = (Py_uhash_t)-2;
 
 /* XXX - move this array to unicodectype.c ? */
 Py_UCS2 linebreak[] = {
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread geremy condra
On Wed, Dec 28, 2011 at 8:49 PM, Eric Snow ericsnowcurren...@gmail.com wrote:
 On Wed, Dec 28, 2011 at 6:28 PM, Michael Foord
 fuzzy...@voidspace.org.uk wrote:
 Hello all,

 A paper (well, presentation) has been published highlighting security 
 problems with the hashing algorithm (exploiting collisions) in many 
 programming languages Python included:

         
 http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf

 Although it's a security issue I'm posting it here because it is now public 
 and seems important.

 The issue they report can cause (for example) handling an http post to 
 consume horrible amounts of cpu. For Python the figures they quoted:

        reasonable-sized attack strings only for 32 bits Plone has max. POST 
 size of 1 MB
        7 minutes of CPU usage for a 1 MB request
        ~20 kbits/s → keep one Core Duo core busy

 This was apparently reported to the security list, but hasn't been responded 
 to beyond an acknowledgement on November 24th (the original report didn't 
 make it onto the security list because it was held in a moderation queue).

 The same vulnerability was reported against various languages and web 
 frameworks, and is already fixed in some of them.

 Their recommended fix is to randomize the hash function.

 Ironically, this morning I ran across a discussion from about 8 years
 ago on basically the same thing:

 http://mail.python.org/pipermail/python-dev/2003-May/035874.html

  From what I read in the thread, it didn't seem like anyone here was
 too worried about it.  Does this new research change anything?

Not really. It's actually somewhat behind previous work in that it
doesn't exploit the timing deltas, just generates very large ones.

Geremy Condra
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread Antoine Pitrou
On Thu, 29 Dec 2011 03:55:22 +0100
Christian Heimes li...@cheimes.de wrote:
 
 I've been dealing with web stuff and security for almost a decade. I've
 seen far worse attack vectors. This one can easily be solved with a
 couple of lines of Python code. For example Application developers can
 limit the maximum amount of POST parameters to a sensible amount and
 limit the length of each key, too.

Shouldn't the setting be implemented by frameworks?

 CPython could aid developers with a special subclass of dict. The
 crucial lookup function is already overwrite-able per dict instance and
 on subclasses of dict through PyDictObj's struct member PyDictEntry
 *(*ma_lookup)(PyDictObject *mp, PyObject *key, long hash). For example
 specialized subclass could limit the seach for a free slot to n
 recursions or choose to ignore the hash argument and calculate its own
 hash of the key.

Or, rather, the specialized subclass could implement hash randomization.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread Antoine Pitrou
On Thu, 29 Dec 2011 04:04:17 +0100
Christian Heimes li...@cheimes.de wrote:
 Am 29.12.2011 02:37, schrieb Jesse Noller:
  Back up link for the PDF:
  http://dl.dropbox.com/u/1374/2007_28C3_Effective_DoS_on_web_application_platforms.pdf
  
  Ocert disclosure:
  http://www.ocert.org/advisories/ocert-2011-003.html
 
 From http://www.nruns.com/_downloads/advisory28122011.pdf
 
 ---
 Python uses a hash function which is very similar to DJBX33X, which can
 be broken using a
 meet-in-the-middle attack. It operates on register size and is thus
 different for 64 and 32 bit
 machines. While generating multi-collisions efficiently is also possible
 for the 64 bit version
 of the function, the resulting colliding strings are too large to be
 relevant for anything more
 than an academic attack.
 
 Plone as the most prominent Python web framework accepts 1 MB of POST
 data, which it
 parses in about 7 minutes of CPU time in the worst case.
 This gives an attacker with about 20 kbit/s the possibility to keep one
 Core Duo core
 constantly busy. If the attacker is in the position to have a Gigabit
 line available, he can keep
 about 50.000 Core Duo cores busy.
 ---
 
 If I remember correctly CPython uses the long for its hash function so
 64bit Windows uses a 32bit hash.

Not anymore, Py_hash_t is currently aligned with Py_ssize_t.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread Mark Shannon

Raymond Hettinger wrote:

FWIW, Uncle Timmy considers the non-randomized hashes to be a virtue.
It is believed that they give us better-than-random results for commonly
encountered datasets.  A change to randomized hashes would have a
negative performance impact on those cases.


Tim Peter's analysis applies mainly to ints which would be unchanged.

A change to the hash function for strings would make no difference to 
the performance of the dict, as the ordering of the hash values is 
already quite different from the ordering of the strings for any string 
of more than 3 characters.




Also, randomizing the hash wreaks havoc on doctests, book examples
not matching actual dict reprs, and on efforts by users to optimize
the insertion order into dicts with frequent lookups.


The docs clearly state that the ordering of iteration over dicts is 
arbitrary. Perhaps changing it once in a while might be a good thing :)



Cheers,
Mark.




Raymond





On Dec 28, 2011, at 5:28 PM, Michael Foord wrote:


Hello all,

A paper (well, presentation) has been published highlighting security problems 
with the hashing algorithm (exploiting collisions) in many programming 
languages Python included:

 
http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf

Although it's a security issue I'm posting it here because it is now public and 
seems important.

The issue they report can cause (for example) handling an http post to consume 
horrible amounts of cpu. For Python the figures they quoted:

reasonable-sized attack strings only for 32 bits Plone has max. POST 
size of 1 MB
7 minutes of CPU usage for a 1 MB request
~20 kbits/s → keep one Core Duo core busy

This was apparently reported to the security list, but hasn't been responded to beyond an acknowledgement on November 24th (the original report didn't make it onto the security list because it was held in a moderation queue). 


The same vulnerability was reported against various languages and web 
frameworks, and is already fixed in some of them.

Their recommended fix is to randomize the hash function.

All the best,

Michael


--
http://www.voidspace.org.uk/


May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing 
http://www.sqlite.org/different.html






___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/raymond.hettinger%40gmail.com


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/mark%40hotpy.org


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread Antoine Pitrou
On Thu, 29 Dec 2011 14:32:21 +0100
Christian Heimes li...@cheimes.de wrote:
 Am 29.12.2011 13:57, schrieb Armin Ronacher:
  Hi,
  
  Something I should add to this now that I thought about it a bit more:
  
  Assuming this should be fixed on a language level the solution would
  probably be to salt hashes.  The most common hash to salt here is the
  PyUnicode hash for obvious reasons.
  
  - Option a: Compiled in Salt
+ Easy to implement
- Breaks unittests most likely (those were broken in the first place
  but that's still a very annoying change to make)
- Might cause problems with interoperability of Pythons compiled with
  different hash salts
- You're not really solving the problem because each linux
  distribution (besides Gentoo I guess) would have just one salt
  compiled in and that would be popular enough to have the same
  issue.
  
  - Option b: Environment variable for the salt
+ Easy-ish to implement
+ Easy to synchronize over different machines
- initialization for base types happens early and unpredictive which
  makes it hard for embedded Python interpreters (think mod_wsgi and
  other things) to specify the salt
  
  - Option c: Random salt at runtime
+ Easy to implement
- impossible to synchronize
- breaks unittests in the same way as a compiled in salt would do
 
 - Option d: Don't change __hash__ but only use randomized hash for
 PyDictEntry lookup
   + Easy to implement
   - breaks only software to relies on a fixed order of dict keys
   - breaks only a few to no unit tests
 
 IMHO we don't have to alter the outcome of hash(some string), hash(1)
 and all other related types. We just need to reduce the change the an
 attacker can produce collisions in the dict (and set?) code that looks
 up the slot (PyDictEntry). How about adding the random value in
 Object/dictobject.c:lookdict() and lookdict_str() (Python 2.x) /
 lookdict_unicode() (Python 3.x)? With this approach the hash of all our
 objects stay the same and just the dict code needs to be altered. The
 approach has also the benefit that all possible objects gain a
 randomized hash.

I basically agree with your proposal. The only downside is that custom
hashed containers (such as _pickle.c's memotable) don't
automatically benefit. That said, I think it would be difficult to
craft an attack against the aforementioned memotable (you would have
to basically choose the addresses of pickled objects).

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread Hynek Schlawack
Hi,  

how about

Option d: Host based salt
 + Easy-ish to implement – how about basing it on the hostname for example?
 + transparent for all processes on the same host
 - probably unit test breakage

In fact, we could use host based as default with the option to specify own 
which would solve the sync problems.

That said, I agree with Armin that fixing this in the frameworks isn't an 
option.

Regards,
Hynek


Am Donnerstag, 29. Dezember 2011 um 13:57 schrieb Armin Ronacher:

 Hi,
  
 Something I should add to this now that I thought about it a bit more:
  
 Assuming this should be fixed on a language level the solution would
 probably be to salt hashes. The most common hash to salt here is the
 PyUnicode hash for obvious reasons.
  
 - Option a: Compiled in Salt
 + Easy to implement
 - Breaks unittests most likely (those were broken in the first place
 but that's still a very annoying change to make)
 - Might cause problems with interoperability of Pythons compiled with
 different hash salts
 - You're not really solving the problem because each linux
 distribution (besides Gentoo I guess) would have just one salt
 compiled in and that would be popular enough to have the same
 issue.
  
 - Option b: Environment variable for the salt
 + Easy-ish to implement
 + Easy to synchronize over different machines
 - initialization for base types happens early and unpredictive which
 makes it hard for embedded Python interpreters (think mod_wsgi and
 other things) to specify the salt
  
 - Option c: Random salt at runtime
 + Easy to implement
 - impossible to synchronize
 - breaks unittests in the same way as a compiled in salt would do
  
 Where to add the salt to? Unicode strings and bytestrings (byte
 objects) I guess since those are the most common offenders. Sometimes
 tuples are keys of dictionaries but in that case a contributing factor
 to the hash is the string in the tuple anyways.
  
 Also related: since this is a security related issue, would this be
 something that goes into Python 2? Does that affect how a fix would
 look like?
  
  
 Regards,
 Armin
 ___
 Python-Dev mailing list
 Python-Dev@python.org (mailto:Python-Dev@python.org)
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: http://mail.python.org/mailman/options/python-dev/hs%40ox.cx



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread Antoine Pitrou
On Thu, 29 Dec 2011 11:25:03 +
Mark Shannon m...@hotpy.org wrote:
  
  Also, randomizing the hash wreaks havoc on doctests, book examples
  not matching actual dict reprs, and on efforts by users to optimize
  the insertion order into dicts with frequent lookups.
 
 The docs clearly state that the ordering of iteration over dicts is 
 arbitrary. Perhaps changing it once in a while might be a good thing :)

We already change it once in a while.
http://twistedmatrix.com/trac/ticket/5352
;)

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread Christian Heimes
Am 29.12.2011 12:13, schrieb Mark Shannon:
 The attack relies on being able to predict the hash value for a given
 string. Randomising the string hash function is quite straightforward.
 There is no need to change the dictionary code.
 
 A possible (*untested*) patch is attached. I'll leave it for those more 
 familiar with unicodeobject.c to do properly.

I'm worried that hash randomization of str is going to break 3rd party
software that rely on a stable hash across multiple Python instances.
Persistence layers like ZODB and cross interpreter communication
channels used by multiprocessing may (!) rely on the fact that the hash
of a string is fixed.

Perhaps the dict code is a better place for randomization. The code in
lookdict() and lookdict_unicode() could add a value to the hash. My
approach is less intrusive and also closes the attack vector for all
possible objects including str, byte, int and so on. I like also Armin's
idea of an optional hash randomization.

Christian
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread Christian Heimes
Am 29.12.2011 12:10, schrieb Antoine Pitrou:
 I've been dealing with web stuff and security for almost a decade. I've
 seen far worse attack vectors. This one can easily be solved with a
 couple of lines of Python code. For example Application developers can
 limit the maximum amount of POST parameters to a sensible amount and
 limit the length of each key, too.
 
 Shouldn't the setting be implemented by frameworks?

Web framework like Django or CherryPy can be considered an application
from the CPython core's point of view. ;)
You are right. The term framework is a better word.

 CPython could aid developers with a special subclass of dict. The
 crucial lookup function is already overwrite-able per dict instance and
 on subclasses of dict through PyDictObj's struct member PyDictEntry
 *(*ma_lookup)(PyDictObject *mp, PyObject *key, long hash). For example
 specialized subclass could limit the seach for a free slot to n
 recursions or choose to ignore the hash argument and calculate its own
 hash of the key.
 
 Or, rather, the specialized subclass could implement hash randomization.

Yeah! I was thinking about the same when I wrote calculate its own
hash but I was too sloppy to carry on my argument. Please take 3am as
my excuse.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread M.-A. Lemburg
Mark Shannon wrote:
 Michael Foord wrote:
 Hello all,

 A paper (well, presentation) has been published highlighting security 
 problems with the hashing
 algorithm (exploiting collisions) in many programming languages Python 
 included:

 
 http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf


 Although it's a security issue I'm posting it here because it is now public 
 and seems important.

 The issue they report can cause (for example) handling an http post to 
 consume horrible amounts of
 cpu. For Python the figures they quoted:

 reasonable-sized attack strings only for 32 bits Plone has max. POST 
 size of 1 MB
 7 minutes of CPU usage for a 1 MB request
 ~20 kbits/s → keep one Core Duo core busy

 This was apparently reported to the security list, but hasn't been responded 
 to beyond an
 acknowledgement on November 24th (the original report didn't make it onto 
 the security list
 because it was held in a moderation queue).
 The same vulnerability was reported against various languages and web 
 frameworks, and is already
 fixed in some of them.

 Their recommended fix is to randomize the hash function.

 
 The attack relies on being able to predict the hash value for a given string. 
 Randomising the string
 hash function is quite straightforward.
 There is no need to change the dictionary code.
 
 A possible (*untested*) patch is attached. I'll leave it for those more 
 familiar with
 unicodeobject.c to do properly.

The paper mentions that several web frameworks work around this by
limiting the number of parameters per GET/POST/HEAD request.

This sounds like a better alternative than randomizing the hash
function of strings.

Uncontrollable randomization has issues when you work with
multi-process setups, since the processes would each use different
hash values for identical strings. Putting the base_hash value
under application control could be done to solve this problem,
making sure that all processes use the same random base value.

BTW: Since your randomization trick uses the current time, it would
also be rather easy to tune an attack to find the currently
used base_hash. To make this safe, you'd have to use a more
random source for initializing the base_hash.

Note that the same hash collision attack can be used for
other key types as well, e.g. integers (where it's very easy
to find hash collisions), so this kind of randomization
would have to be applied to other basic types too.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 29 2011)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try our new mxODBC.Connect Python Database Interface for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread Antoine Pitrou
On Thu, 29 Dec 2011 13:57:07 +0100
Armin Ronacher armin.ronac...@active-4.com wrote:
 
 - Option c: Random salt at runtime
   + Easy to implement
   - impossible to synchronize
   - breaks unittests in the same way as a compiled in salt would do

This option would have my preference. I don't think hash() was ever
meant to be synchronizable. Already using a 32-bit Python will give
you different results from a 64-bit Python.

As for breaking unittests, these tests were broken in the first place.
hash() does change from time to time.

 Where to add the salt to?  Unicode strings and bytestrings (byte
 objects) I guess since those are the most common offenders.  Sometimes
 tuples are keys of dictionaries but in that case a contributing factor
 to the hash is the string in the tuple anyways.

Or it could be a process-wide constant for all dicts. If the constant
is additive as proposed by Mark the impact should be negligible.
(but the randomness must be good enough)

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread Christian Heimes
Am 29.12.2011 11:32, schrieb Antoine Pitrou:
 If I remember correctly CPython uses the long for its hash function so
 64bit Windows uses a 32bit hash.
 
 Not anymore, Py_hash_t is currently aligned with Py_ssize_t.

Thanks for the update. Python 2.x still uses long and several large
frameworks like Zope/Plone require 2.x.

Christian
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread Christian Heimes
Am 29.12.2011 13:57, schrieb Armin Ronacher:
 Hi,
 
 Something I should add to this now that I thought about it a bit more:
 
 Assuming this should be fixed on a language level the solution would
 probably be to salt hashes.  The most common hash to salt here is the
 PyUnicode hash for obvious reasons.
 
 - Option a: Compiled in Salt
   + Easy to implement
   - Breaks unittests most likely (those were broken in the first place
 but that's still a very annoying change to make)
   - Might cause problems with interoperability of Pythons compiled with
 different hash salts
   - You're not really solving the problem because each linux
 distribution (besides Gentoo I guess) would have just one salt
 compiled in and that would be popular enough to have the same
 issue.
 
 - Option b: Environment variable for the salt
   + Easy-ish to implement
   + Easy to synchronize over different machines
   - initialization for base types happens early and unpredictive which
 makes it hard for embedded Python interpreters (think mod_wsgi and
 other things) to specify the salt
 
 - Option c: Random salt at runtime
   + Easy to implement
   - impossible to synchronize
   - breaks unittests in the same way as a compiled in salt would do

- Option d: Don't change __hash__ but only use randomized hash for
PyDictEntry lookup
  + Easy to implement
  - breaks only software to relies on a fixed order of dict keys
  - breaks only a few to no unit tests

IMHO we don't have to alter the outcome of hash(some string), hash(1)
and all other related types. We just need to reduce the change the an
attacker can produce collisions in the dict (and set?) code that looks
up the slot (PyDictEntry). How about adding the random value in
Object/dictobject.c:lookdict() and lookdict_str() (Python 2.x) /
lookdict_unicode() (Python 3.x)? With this approach the hash of all our
objects stay the same and just the dict code needs to be altered. The
approach has also the benefit that all possible objects gain a
randomized hash.

 Also related: since this is a security related issue, would this be
 something that goes into Python 2?  Does that affect how a fix would
 look like?

IMHO it does affect the fix. A changed and randomized hash function may
break software that relies on a stable hash() function.

Christian
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread Tim Delaney
+1 to option d (host-based salt) but would need to consistently order the
hostnames/addresses to guarantee that all processes on the same machine got
the same salt by default.

+1 to option c (environment variable) as an override. And/or maybe an
override on the command line.

+1 to implementing the salt in the dictionary hash as an additive value.

+0 to exposing the salt as a constant (3.3+ only) - or alternatively expose
a hash function that just takes an existing hash and returns the salted
hash. That would make it very easy for anything that wanted a salted hash
to get one.

For choosing the default salt, I think something like:

a. If IPv6 is enabled, take the link-local address of the interface with
the default route. Pretty much guaranteed not to change, can't be
determined externally (salting doesn't need a secret, but it doesn't hurt),
large number so probably a good salt. (If it is likely to change, a salt
override should be being used instead). Don't use any other IPv6 address.
In particular, never use a temporary IPv6 address like Windows assigns -
multiprocessing could end up with instances with different salts.

b. Take the FQDN of the machine.

Tim Delaney
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread Tim Delaney

 +1 to option c (environment variable) as an override. And/or maybe an
 override on the command line.


That obviously should have said option b (environment variable) ...

Tim Delaney
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread PJ Eby
On Thu, Dec 29, 2011 at 8:32 AM, Christian Heimes li...@cheimes.de wrote:

 IMHO we don't have to alter the outcome of hash(some string), hash(1)
 and all other related types. We just need to reduce the change the an
 attacker can produce collisions in the dict (and set?) code that looks
 up the slot (PyDictEntry). How about adding the random value in
 Object/dictobject.c:lookdict() and lookdict_str() (Python 2.x) /
 lookdict_unicode() (Python 3.x)? With this approach the hash of all our
 objects stay the same and just the dict code needs to be altered.


I don't understand how that helps a collision attack.  If you can still
generate two strings with the same (pre-randomized) hash, what difference
does it make that the dict adds a random number?  The post-randomized
number will still be the same, no?

Or does this attack just rely on the hash *remainders* being the same?  If
so, I can see how hashing the hash would help.  But since the attacker
doesn't know the modulus, and it can change as the dictionary grows, I
would expect the attack to require matching hashes, not just matching hash
remainders...  unless I'm just completely off base here.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread Paul McMillan
It's worth pointing out that if the salt is somehow exposed to an
attacker, or is guessable, much of the benefit goes away. It's likely
that a timing attack could be used to discover the salt if it is fixed
per machine or process over a long period of time.

If a salt is generally fixed per machine, but varies from
machine-to-machine, I think we'll see an influx of frustrated devs who
have something that works perfectly on their machine but not for
others. It doesn't matter that they're doing it wrong, we'll still
have to deal with them as a community. This seems like an argument in
favor of randomizing it at runtime by default, so it fails early for
them.

Allowing an environment and command line override makes sense, as it
allows users to rotate the salt as frequently as their needs dictate.

-Paul
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread Christian Heimes
Am 29.12.2011 21:07, schrieb PJ Eby:
 I don't understand how that helps a collision attack.  If you can still
 generate two strings with the same (pre-randomized) hash, what
 difference does it make that the dict adds a random number?  The
 post-randomized number will still be the same, no?
 
 Or does this attack just rely on the hash *remainders* being the same?
  If so, I can see how hashing the hash would help.  But since the
 attacker doesn't know the modulus, and it can change as the dictionary
 grows, I would expect the attack to require matching hashes, not just
 matching hash remainders...  unless I'm just completely off base here.

The attack doesn't need perfect collisions. The attacker calculates
strings in a way so that their hashes results in as many collision as
possible in the dict code. An attacker succeeds when the initial slot
for an hash is filled and as many subsequent slots of the perturbed
masked hash, too. Also an attacker can easily predict the size and
therefore the mask for the hash remainder. A POST request parser usually
starts with an empty dict and the growth rate of Python's dicts is well
documented. The changing mask makes the attack just a tiny bit more
challenging.

The hash randomization idea adds a salt to throw the attacker of course.
Instead of

  position = hash  mask

it's now

  hash = salt + hash
  position = hash  mask

where salt is a random, process global value that is fixed for the life
time of the program. The salt also affects the perturbance during the
search for new slots. As you already stated this salt won't be affective
against full hash collisions.

The attack needs A LOT of problematic strings to become an issue,
possible hundred of thousands or even millions of keys in a very large
POST request. In reality an attacker won't reach the full theoretical
O(n^2) performance degradation for a hash table. But even more than O(n)
instead of O(1) for a million keys in each request has some impact on
your servers' CPUs. Some vendors have limited to POST request to 1 MB or
the amount of keys to 10,000 to work around the issue. One paper also
states that attacks on Python with 64bit is just academical for now.

Christian
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread Terry Reedy
The talk was a presentation yesterday by Alexander Klink and Julian 
Wälde at the Chaos Communication Congress in Germany hash...@alech.de

I read the non-technical summary at
http://arstechnica.com/business/news/2011/12/huge-portions-of-web-vulnerable-to-hashing-denial-of-service-attack.ars 


and watched the video of the talk at
https://www.youtube.com/watch?feature=player_embeddedv=_EEhviEO1Vo#

My summary: hash table creation with N keys changes from amortized O(N)
to O(N**2) time if the hash values of all the keys are the same. This 
should only happen for large N if done intentionally. This is easy to 
accomplish with a linear multiply and add hash function, such as used in 
PHP4 (but nowhere else that the authors found). A nonlinear multiply and 
xor hash function, used in one form or another by everything else, is 
much harder to break. It is *theoretically* vulnerable to brute-force 
search and this has been known for years. With a more cleaver 
meet-in-the-middle strategy, that builds a dict of suffixes and then 
searches for matching prefixes, 32-bit hashes are *practically* 
vulnerable. The attack depends on, for instance, 2**16 (64K) being 1/64K 
of 2**32. (I did not hear when this strategy was developed, but it is 
certainly more practical on a desktop now than even 8 years ago.)


[64-bit hashes are much, much less vulnerable to attack, at least for 
now. So it seems to me that anyone who hashes potential attack data can 
avoid the problem by using 64-bit Python with 64-bit hash values. If I 
understood Antoine, that should be all 64-bit builds.]


More summary: Perl added an #define option to start the hash calculation 
with non-zero value instead of 0 years ago to avoid algorithmic 
complexity attacks. The patch is at 47:20 in the video. The authors 
believe all should do similarly.


[The change amounts to adding a char, unknown to attackers, to the 
beginning of every string before hashing. So it adds a small bit of 
time. The code patch shown did not show the source of the non-zero seed 
or the timing and scope of any randomization. As the discussion here has 
shown, this is an important issue to applications. So 'do the same' is 
inadequate and over-simplified advice. I believe Armin's patch is 
similar to the Perl patch.]


Since the authors sent out CERT alert about Nov 1, PHP has added to PHP5 
a new function to limit the number of vars hashed. Microsoft will do 
something similar now with hash randomization to follow (maybe?). JRuby 
is going to do something. Java does not think it needs to change Java 
itself, but will leave all to the frameworks.


[The discussion here suggests that this is an inadequate response for 
32-bit systems like Java since one person/group may not control all the 
pieces of a server system. However, a person or group can run all pieces 
on a system Python with an option turned on.]


--
Terry Jan Reedy


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread Terry Reedy

On 12/29/2011 4:31 PM, Christian Heimes wrote:


The hash randomization idea adds a salt to throw the attacker of course.
Instead of

   position = hash  mask

it's now

   hash = salt + hash


As I understood the talk (actually, the bit of Perl interpreter C code 
shown), the randomization is to change hash(s) to hash(salt+s) so that 
the salt is completely mixed into the hash from the beginning, rather 
than just tacked on at the end.


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread Christian Heimes
Am 29.12.2011 23:28, schrieb Terry Reedy:
 As I understood the talk (actually, the bit of Perl interpreter C code 
 shown), the randomization is to change hash(s) to hash(salt+s) so that 
 the salt is completely mixed into the hash from the beginning, rather 
 than just tacked on at the end.

Yes, the Perl and Ruby code uses a random seed as IV for hash
generation. It's the best way to create randomized hashes but it might
not be a feasible fix for Python 2.x. I'm worried that it might break
applications that rely on stable hash values.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread Tim Delaney
On 30 December 2011 06:59, Tim Delaney timothy.c.dela...@gmail.com wrote:

 +0 to exposing the salt as a constant (3.3+ only) - or alternatively
 expose a hash function that just takes an existing hash and returns the
 salted hash. That would make it very easy for anything that wanted a salted
 hash to get one.


Sorry - brain fart on my part there - the salt needs to be included right
from the start.

Tim Delaney
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com