subject:"\[Python\-Dev\] Counting collisions for the win"


The main issue with that approach is that it allows a new kind of attack.


Indeed, I posted another example: http://bugs.python.org/msg151677

This kind of fix can be used in a specific application or maybe in a
special-purpose framework, but not on the level of a general-purpose
language.

Frank
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win

2012-01-20 Thread Nick Coghlan

On Fri, Jan 20, 2012 at 7:34 PM, Martin v. Löwis mar...@v.loewis.de wrote:
 The main issue with that approach is that it allows a new kind of attack.

 An attacker now needs to find 1000 colliding keys, and submit them
 one-by-one into a database. The limit will not trigger, as those are
 just database insertions.

 Now, if the applications also as a need to read the entire database
 table into a dictionary, that will suddenly break, and not for the
 attacker (which would be ok), but for the regular user of the
 application or the site administrator.

 So it may be that this approach actually simplifies the attack, making
 the cure worse than the disease.

Ouch, I think you're right. So hash randomisation may be the best
option, and admins will need to test for themselves to see if it
breaks things...

Regards,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win

2012/1/20 Ivan Kozik i...@ludios.org:
 I'd like to point out that an attacker is not limited to sending just
 one dict full of colliding keys.  Given a 22ms stall ...

The presented attack produces a stall of at least 30 seconds (5
minutes or more if there is no time limit in the application), not
0.022 second. You have to send a lot of requests to produce a DoS if a
single requests just eat 22 ms. I suppose that there are a lot of
other kinds of request than takes much longer than 22 ms, even valid
requests.

 Another issue is that even with a configurable limit, different
 modules can't have their own limits.  One module might want a
 relatively safe raise-at-100, and another module creating massive
 dicts might want raise-at-1000.  How does a developer know whether
 they can raise or lower the limit, given that they use a bunch of
 different modules?

Python becomes really slow when you have more than N collisions
(O(n^2) problem). If an application hits this limit with valid data,
it is time to use another data structure or use a different hash
function. We have to do more tests to choose correctly N, but
according my first tests, it looks like N=1000 is a safe limit.

Marc Andre's patch doesn't count all collisions, but only
collisions requiring to compare objects. When two objects have the
same hash value, the open addressing algorithm searchs a free bucket.
If a bucket is not free but has a different hash value, the objects
are not compared and the collision counter is not incremented. The
limit is only reached when you have N objects having the same hash
value modulo the size of the bucket (hash(str)  DICT_MASK).

When there are not enough empty buckets (it comes before all buckets
are full), Python resizes the dictionary (it does something like size
= size * 2) and so it uses at least one more bit each time than the
dictionary is resized. Collisions are very likely with a small
dictioanry, but becomes more rare each time than the dictionary is
resized. It means that the number of potential collisions (with valid
data) decreases when the dictionary grows. Tell me if I am wrong.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win

 I'm surprised we haven't seen bug reports about it from users
 of 64-bit Pythons long ago

A Python dictionary only uses the lower bits of a hash value. If your
dictionary has less than 2**32 items, the dictionary order is exactly
the same on 32 and 64 bits system: hash32(str)  mask == hash64(str) 
mask for mask = 2**32-1.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win


No, that's not true.
Whenever a collision happens, other bits are mixed in very fast.

Frank

Am 20.01.2012 13:08, schrieb Victor Stinner:

I'm surprised we haven't seen bug reports about it from users
of 64-bit Pythons long ago

A Python dictionary only uses the lower bits of a hash value. If your
dictionary has less than 2**32 items, the dictionary order is exactly
the same on 32 and 64 bits system: hash32(str)  mask == hash64(str)
mask for mask= 2**32-1.
_


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win

2012/1/20 Frank Sievertsen fr...@sievertsen.de:
 No, that's not true.
 Whenever a collision happens, other bits are mixed in very fast.

Oh, I didn't know that. So the dict order is only the same if there is
no collision.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win

 The main issue with that approach is that it allows a new kind of attack.

 An attacker now needs to find 1000 colliding keys, and submit them
 one-by-one into a database. The limit will not trigger, as those are
 just database insertions.

 Now, if the applications also as a need to read the entire database
 table into a dictionary, that will suddenly break, and not for the
 attacker (which would be ok), but for the regular user of the
 application or the site administrator.

Oh, good catch. But it would not call it a new kind of attack, it is
just a particular case of the hash collision vulnerability.

Counting collision doesn't solve this case, but it doesn't make the
situation worse than before. Raising quickly an exception is better
than stalling for minutes, even if I agree than it is not the best
behaviour. The best would is to answer quickly with the expected
result :-) (using a different data structure or a different hash
function?)

Right now, I don't see any counter measure against this case.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win

2012-01-20 Thread Barry Warsaw

On Jan 20, 2012, at 01:50 PM, Victor Stinner wrote:

Counting collision doesn't solve this case, but it doesn't make the
situation worse than before. Raising quickly an exception is better
than stalling for minutes, even if I agree than it is not the best
behaviour.

ISTM that adding the possibility of raising a new exception on dictionary
insertion is *more* backward incompatible than changing dictionary order,
which for a very long time has been known to not be guaranteed.  You're
running some application, you upgrade Python because you apply all security
fixes, and suddenly you're starting to get exceptions in places you can't
really do anything about.  Yet those exceptions are now part of the documented
public API for dictionaries.  This is asking for trouble.  Bugs will suddenly
start appearing in that application's tracker and they will seem to the
application developer like Python just added a new public API in a security
release.

OTOH, if you change dictionary order and *that* breaks the application, then
the bugs submitted to the application's tracker will be legitimate bugs that
have to be fixed even if nothing else changed.

So I still think we should ditch the paranoia about dictionary order changing,
and fix this without counting.  A little bit of paranoia could creep back in
by disabling the hash fix by default in stable releases, but I think it would
be fine to make that a compile-time option.

-Barry
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win

2012-01-20 Thread Barry Warsaw

On Jan 20, 2012, at 03:18 PM, Nick Coghlan wrote:

On Fri, Jan 20, 2012 at 2:54 PM, Carl Meyer c...@oddbird.net wrote:
 I don't have the expertise to speak otherwise to the alternatives for
 fixing the collisions vulnerability, but I don't believe it's accurate
 to presume that Django would not want to fix a dict-ordering dependency,
 and use that as a justification for one approach over another.

It's more a matter of wanting deployment of a security fix to be as
painless as possible - a security fix that system administrators can't
deploy because it breaks critical applications may as well not exist.

True, but collision counting is worse IMO.  It's just as likely (maybe) that
an application would start getting new exceptions on dictionary insertion, as
they would failures due to dictionary order changes.  Unfortunately, in the
former case it's because Python just added a new public API in a security
release (the new exception *is* public API).  In the latter case, no new API
was added, but something exposed an already existing bug in the application.
That's still a bug in the application even if counting was added.  It's also a
bug that any number of changes in the environment, or OS vendor deployment,
could have triggered.

-1 for collision counting.

-Barry
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win

2012-01-20 Thread Barry Warsaw

On Jan 20, 2012, at 03:15 PM, Nick Coghlan wrote:

With the 1000 collision limit in place, the attacker sends their
massive request, the affected dict quickly hits the limit, throws an
unhandled exception which is then caught by the web framework and
turned into a 500 Error response (or whatever's appropriate for the
protocol being attacked).

Let's just be clear about it: this exception is new public API.  Changing
dictionary order is not.

For me, that comes down firmly on the side of the latter rather than the
former for stable releases.

-Barry
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win

2012-01-20 Thread Antoine Pitrou

On Fri, 20 Jan 2012 13:50:18 +0100
Victor Stinner victor.stin...@haypocalc.com wrote:

  The main issue with that approach is that it allows a new kind of attack.
 
  An attacker now needs to find 1000 colliding keys, and submit them
  one-by-one into a database. The limit will not trigger, as those are
  just database insertions.
 
  Now, if the applications also as a need to read the entire database
  table into a dictionary, that will suddenly break, and not for the
  attacker (which would be ok), but for the regular user of the
  application or the site administrator.
 
 Oh, good catch. But it would not call it a new kind of attack, it is
 just a particular case of the hash collision vulnerability.
 
 Counting collision doesn't solve this case, but it doesn't make the
 situation worse than before. Raising quickly an exception is better
 than stalling for minutes, even if I agree than it is not the best
 behaviour.

Actually, it *is* worse because stalling for seconds or minutes may not
be a problem in some cases (e.g. some batch script that gets run
overnight).

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win

On Fri, Jan 20, 2012 at 1:34 AM, Martin v. Löwis mar...@v.loewis.dewrote:

  The last solution is very simple: count collision and raise an
  exception if it hits a limit. The path is something like 10 lines
  whereas the randomized hash is more close to 500 lines, add a new
  file, change Visual Studio project file, etc. First I thaught that it
  would break more applications than the randomized hash

 The main issue with that approach is that it allows a new kind of attack.

 An attacker now needs to find 1000 colliding keys, and submit them
 one-by-one into a database. The limit will not trigger, as those are
 just database insertions.

 Now, if the applications also as a need to read the entire database
 table into a dictionary, that will suddenly break, and not for the
 attacker (which would be ok), but for the regular user of the
 application or the site administrator.

 So it may be that this approach actually simplifies the attack, making
 the cure worse than the disease.


It would be a pretty lousy app that tried to load the contents of an entire
database into a dict. It seems that this would require much more knowledge
of what the app is trying to do before a successful attack can be mounted.
So I don't think this is worse than the original attack -- I think it
requires much more ingenuity of an attacker. (I'm thinking that the
original attack is trivial once the set of 65000 colliding keys is public
knowledge, which must be only a matter of time.)

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win


Hello,

I still see at least two ways to create a DOS attack even with the
collison-counting-patch.

I assumed that it's possible to send ~500KB of payload to the
application.

1. It's fully deterministic which slots the dict will lookup.
Since we don't count slot-collisions, but only hash-value-collisions
this can be exploited easily by creating strings with the hash-values
along the lookup-way of an arbitrary (short) string.

So first we pick an arbitrary string. Then calculate which slots it will
visit on the way to the first empty slot. Then we create strings with
hash-values for these slots.

This attack first injects the strings to fill all the slots that the
one short string will want to visit. Then it adds THE SAME string
again and again. Since the entry is already there, nothing will be added
and no additional collisions happen, no exception raised.

$ ls -l super.txt
-rw-r--r-- 1 fx5 fx5 52 20. Jan 10:19 super.txt
$ tail -n3 super.txt
FX5
FX5
FX5
$ wc -l super.txt
9 super.txt
$ time python -c 'dict((unicode(l[:-1]), 0)  for l in open(super.txt))'
real0m52.724s
user0m51.543s
sys0m0.028s

2. The second attack actually attacks that 1000 allowed string
comparisons are still a lot of work.
First I added 999 strings that collide with a one-byte string a. In
some applications a zero-byte string might work even better. Then I
can add a many thousand of the a's, just like the first attack.

$ ls -l 1000.txt
-rw-r--r-- 1 fx5 fx5 50 20. Jan 16:15 1000.txt
$ head -n 3 1000.txt
7hLci00
4wVFm10
_rZJU50
$ wc -l 1000.txt
247000 1000.txt
$ tail -n 3 1000.txt
a
a
a
$ time python -c 'dict((unicode(l[:-1]), 0)  for l in open(1000.txt))'
real0m17.408s
user0m15.897s
sys0m0.008s

Of course the first attack is far more efficient. One could argue
that 16 seconds is not enough for an attack. But maybe it's possible
to send 1MB, have zero-bytes strings, and since for example django
does 5 lookups per query-string this will keep it busy for ~80 seconds on
my pc.

What to do now?
I think it's not smart to reduce the number of allowed collisions 
dramatically

AND count all slot-collisions at the same time.

Frank
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win

 (I'm thinking that the original
 attack is trivial once the set of 65000 colliding keys is public knowledge,
 which must be only a matter of time.)

I have a program able to generate collisions: it takes 1 second to
compute 60,000 colliding strings on a desktop computer. So the
security of the randomized hash is based on the fact than the attacker
cannot compute the secret.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win

 So I still think we should ditch the paranoia about dictionary order changing,
 and fix this without counting.

The randomized hash has other issues:

 - its security is based on its secret, whereas it looks to be easy to
compute it (see more details in the issue)
 - my patch only changes hash(str), whereas other developers asked me
to patch also bytes, int and other types

hash(bytes) can be changed. But changing hash(int) may leak easily the
secret. We may use a different secret for each type, but if it is easy
to compute int hash secret, dictionaries using int are still
vulnerable.

--

There is no perfect solutions, drawbacks of each solution should be compared.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win

2012-01-20 Thread Antoine Pitrou

On Fri, 20 Jan 2012 17:17:24 +0100
Victor Stinner victor.stin...@haypocalc.com wrote:
  So I still think we should ditch the paranoia about dictionary order 
  changing,
  and fix this without counting.
 
 The randomized hash has other issues:
 
  - its security is based on its secret, whereas it looks to be easy to
 compute it (see more details in the issue)

How do you compute the secret? I see two possibilities:

- the application leaks the hash() values: this sounds unlikely since I
  don't see the use case for it;

- the application shows the dict iteration order (e.g. order of HTML
  attributes): then we could add a second per-dictionary secret so that
  the iteration order of a single dict doesn't give any useful
  information about the hash function.

But the bottom line for me is the following:

- randomized hashes eliminate the possibility to use a single exploit
  for all Python-powered applications: for each application, the
  attacker now has to find a way to extract the secret;

- collision counting doesn't eliminate the possibility of generic
  exploits, as Frank Sievertsen has just shown in
  http://mail.python.org/pipermail/python-dev/2012-January/115726.html

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win

2012-01-20 Thread Lennart Regebro

On Fri, Jan 20, 2012 at 01:48, Victor Stinner
victor.stin...@haypocalc.com wrote:
  - the limit should be configurable: a new function in the sys module
 should be enough. It may be private (or replaced by an environment
 variable?) in stable versions

I'd like to see both. I would like both the programmer and the user
to be able to control what the limit is.

//Lennart
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win

On Thu, Jan 19, 2012 at 8:06 PM, Ivan Kozik i...@ludios.org wrote:

 No, I wasn't happy with termination.  I wanted to treat it just like a
 JSON decoding error, and send the appropriate response.


So was this attack actually being mounted on your service regularly? I'd
think it would be sufficient to treat it as a MemoryError -- unavoidable,
if it happens things are really bad, and hopefully you'll crash quickly and
some monitor process restarts your service. That's a mechanism that you
should have anyway.


 I actually forgot to mention the main reason I abandoned the
 stop-at-N-collisions approach.  I had a server with a dict that stayed
 in memory, across many requests.  It was being populated with
 identifiers chosen by clients.  I couldn't have my server stay broken
 if this dict filled up with a bunch of colliding keys.  (I don't think
 I could have done another thing either, like nuke the dict or evict
 some keys.)


What would your service do if it ran out of memory?

Maybe one tweak to the collision counting would be that the exception needs
to inherit from BaseException (like MemoryError) so most generic exception
handlers don't actually handle it. (Style note: never use except:, always
use except Exception:.)

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win

On Fri, Jan 20, 2012 at 1:57 AM, Frank Sievertsen py...@sievertsen.dewrote:

  The main issue with that approach is that it allows a new kind of attack.


 Indeed, I posted another example: http://bugs.python.org/msg151677

 This kind of fix can be used in a specific application or maybe in a
 special-purpose framework, but not on the level of a general-purpose
 language.


Right. We are discussion this issue (for weeks now...) because it makes
pretty much any Python app that takes untrusted data vulnerable, especially
web apps, and after extensive analysis we came to the conclusion that
defenses in the framework or in the app are really hard to do, very
disruptive for developers, whereas preventing the attack by a modification
of the dict or hash algorithms would fix it for everybody. And moreover,
the attack would work against pretty much any Python web app using a set of
evil strings computed once (hence encouraging script kiddies of just firing
their fully-automatic weapon at random websites).

The new attacks that are now being considered require analysis of how the
website is implemented, how it uses and stores data, etc. So an attacker
has to sit down and come up with an attack tailored to a specific website.
That can be dealt with on an ad-hoc basis.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win

On Fri, Jan 20, 2012 at 5:10 AM, Barry Warsaw ba...@python.org wrote:

 On Jan 20, 2012, at 01:50 PM, Victor Stinner wrote:

 Counting collision doesn't solve this case, but it doesn't make the
 situation worse than before. Raising quickly an exception is better
 than stalling for minutes, even if I agree than it is not the best
 behaviour.

 ISTM that adding the possibility of raising a new exception on dictionary
 insertion is *more* backward incompatible than changing dictionary order,
 which for a very long time has been known to not be guaranteed.  You're
 running some application, you upgrade Python because you apply all security
 fixes, and suddenly you're starting to get exceptions in places you can't
 really do anything about.  Yet those exceptions are now part of the
 documented
 public API for dictionaries.  This is asking for trouble.  Bugs will
 suddenly
 start appearing in that application's tracker and they will seem to the
 application developer like Python just added a new public API in a security
 release.


Dict insertion can already raise an exception: MemoryError. I think we
should be safe if the new exception also derives from BaseException. We
should actually eriously consider just raising MemoryException, since
introducing a new built-in exception in a bugfix release is also very
questionable: code explicitly catching or raising it would not work on
previous bugfix releases of the same feature release.

OTOH, if you change dictionary order and *that* breaks the application, then
 the bugs submitted to the application's tracker will be legitimate bugs
 that
 have to be fixed even if nothing else changed.


There are lots of things that are undefined according to the language spec
(and quite possibly known to vary between versions or platforms or
implementations like PyPy or Jython) but which we would never change in a
bugfix release.

So I still think we should ditch the paranoia about dictionary order
 changing,
 and fix this without counting.  A little bit of paranoia could creep back
 in
 by disabling the hash fix by default in stable releases, but I think it
 would
 be fine to make that a compile-time option.


I'm sorry, but I don't want to break a user's app with a bugfix release and
say haha your code was already broken you just didn't know it.

Sure, the dict order already varies across Python implementations, possibly
across 32/64 bits or operating systems. But many organizations (I know a
few :-) have a very large installed software base, created over many years
by many people with varying skills, that is kept working in part by very
carefully keeping the environment as constant as possible. This means that
the target environment is much more predictable than it is for the typical
piece of open source software.

Sure, a good Python developer doesn't write apps or tests that depend on
dict order. But time and again we see that not everybody writes perfect
code every time. Especially users writing in-house apps (as opposed to
frameworks shared as open source) are less likely to always use the most
robust, portable algorithms in existence, because they may know with much
more certainty that their code will never be used on certain combinations
of platforms. For example, I rarely think  about whether code I write might
not work on IronPython or Jython, or even CPython on Windows. And if
something I wrote suddenly needs to be ported to one of those, well, that's
considered a port and I'll just accept that it might mean changing a few
things.

The time to break a dependency on dict order is not with a bugfix release
but with a feature release: those are more likely to break other things as
well anyway, and uses are well aware that they have to test everything and
anticipate having to fix some fraction of their code for each feature
release. OTOH we have established a long and successful track record of
conservative bugfix releases that don't break anything. (I am aware of
exactly one thing that was broken by a bugfix release in application code I
am familiar with.)

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win

On Fri, Jan 20, 2012 at 5:20 AM, Barry Warsaw ba...@python.org wrote:

 Let's just be clear about it: this exception is new public API.  Changing
 dictionary order is not.


Not if you raise MemoryError or BaseException.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win

This is the first objection I have seen against collision-counting that
might stand.

On Fri, Jan 20, 2012 at 7:55 AM, Frank Sievertsen py...@sievertsen.dewrote:

Hello,

I still see at least two ways to create a DOS attack even with the
collison-counting-patch.

I assumed that it's possible to send ~500KB of payload to the
application.

1. It's fully deterministic which slots the dict will lookup.
Since we don't count slot-collisions, but only hash-value-collisions
this can be exploited easily by creating strings with the hash-values
along the lookup-way of an arbitrary (short) string.

So first we pick an arbitrary string. Then calculate which slots it will
visit on the way to the first empty slot. Then we create strings with
hash-values for these slots.

This attack first injects the strings to fill all the slots that the
one short string will want to visit. Then it adds THE SAME string
again and again. Since the entry is already there, nothing will be added
and no additional collisions happen, no exception raised.

$ ls -l super.txt
-rw-r--r-- 1 fx5 fx5 52 20. Jan 10:19 super.txt
$ tail -n3 super.txt
FX5
FX5
FX5
$ wc -l super.txt
9 super.txt
$ time python -c 'dict((unicode(l[:-1]), 0) for l in open(super.txt))'
real0m52.724s
user0m51.543s
sys0m0.028s

2. The second attack actually attacks that 1000 allowed string
comparisons are still a lot of work.
First I added 999 strings that collide with a one-byte string a. In
some applications a zero-byte string might work even better. Then I
can add a many thousand of the a's, just like the first attack.

$ ls -l 1000.txt
-rw-r--r-- 1 fx5 fx5 50 20. Jan 16:15 1000.txt
$ head -n 3 1000.txt
7hLci00
4wVFm10
_rZJU50
$ wc -l 1000.txt
247000 1000.txt
$ tail -n 3 1000.txt
a
a
a
$ time python -c 'dict((unicode(l[:-1]), 0) for l in open(1000.txt))'
real0m17.408s
user0m15.897s
sys0m0.008s

Of course the first attack is far more efficient. One could argue
that 16 seconds is not enough for an attack. But maybe it's possible
to send 1MB, have zero-bytes strings, and since for example django
does 5 lookups per query-string this will keep it busy for ~80 seconds on
my pc.

What to do now?
I think it's not smart to reduce the number of allowed collisions
dramatically
AND count all slot-collisions at the same time.

Frank

__**_
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/**mailman/listinfo/python-devhttp://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/**mailman/options/python-dev/**
guido%40python.orghttp://mail.python.org/mailman/options/python-dev/guido%40python.org

--
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win

2012-01-20 Thread Georg Brandl

Am 20.01.2012 19:15, schrieb Guido van Rossum:

 OTOH, if you change dictionary order and *that* breaks the application, 
 then
 the bugs submitted to the application's tracker will be legitimate bugs 
 that
 have to be fixed even if nothing else changed.
 
 
 There are lots of things that are undefined according to the language spec 
 (and
 quite possibly known to vary between versions or platforms or implementations
 like PyPy or Jython) but which we would never change in a bugfix release.
 
 So I still think we should ditch the paranoia about dictionary order 
 changing,
 and fix this without counting.  A little bit of paranoia could creep back 
 in
 by disabling the hash fix by default in stable releases, but I think it 
 would
 be fine to make that a compile-time option.
 
 
 I'm sorry, but I don't want to break a user's app with a bugfix release and 
 say
 haha your code was already broken you just didn't know it.
 
 Sure, the dict order already varies across Python implementations, possibly
 across 32/64 bits or operating systems. But many organizations (I know a few 
 :-)
 have a very large installed software base, created over many years by many
 people with varying skills, that is kept working in part by very carefully
 keeping the environment as constant as possible. This means that the target
 environment is much more predictable than it is for the typical piece of open
 source software.

I agree.  This applies to 3.2 and 2.7, but even more to 3.1 and 2.6, which are
in security-fix mode.

Even if relying on dict order is a bug right now, I believe it happens many
times more often in code bases out there than dicts that are filled with many
many colliding keys.  So even if we can honestly blame the programmer in the
former case, the users applying the security fix will have the same bad
experience and won't likely care if we claim undefined behavior.  This means
that it seems preferable to go with the situation where you have less breakages
in total.

Not to mention that changing dict order is likely to lead to much more subtle
bugs than a straight MemoryError on a dict insert.

Also, another advantage of collision counting I haven't seen in the discussion
yet is that it's quite trivial to provide an API in sys to turn it on or off --
while turning on or off randomized hashes has to be done before Python starts
up, i.e. at build time or with an environment variable or flag.

Georg

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win

2012-01-20 Thread Brett Cannon

On Fri, Jan 20, 2012 at 13:15, Guido van Rossum gu...@python.org wrote:

 On Fri, Jan 20, 2012 at 5:10 AM, Barry Warsaw ba...@python.org wrote:

 On Jan 20, 2012, at 01:50 PM, Victor Stinner wrote:

 Counting collision doesn't solve this case, but it doesn't make the
 situation worse than before. Raising quickly an exception is better
 than stalling for minutes, even if I agree than it is not the best
 behaviour.

 ISTM that adding the possibility of raising a new exception on dictionary
 insertion is *more* backward incompatible than changing dictionary order,
 which for a very long time has been known to not be guaranteed.  You're
 running some application, you upgrade Python because you apply all
 security
 fixes, and suddenly you're starting to get exceptions in places you can't
 really do anything about.  Yet those exceptions are now part of the
 documented
 public API for dictionaries.  This is asking for trouble.  Bugs will
 suddenly
 start appearing in that application's tracker and they will seem to the
 application developer like Python just added a new public API in a
 security
 release.


 Dict insertion can already raise an exception: MemoryError. I think we
 should be safe if the new exception also derives from BaseException. We
 should actually eriously consider just raising MemoryException, since
 introducing a new built-in exception in a bugfix release is also very
 questionable: code explicitly catching or raising it would not work on
 previous bugfix releases of the same feature release.

 OTOH, if you change dictionary order and *that* breaks the application,
 then
 the bugs submitted to the application's tracker will be legitimate bugs
 that
 have to be fixed even if nothing else changed.


 There are lots of things that are undefined according to the language spec
 (and quite possibly known to vary between versions or platforms or
 implementations like PyPy or Jython) but which we would never change in a
 bugfix release.

 So I still think we should ditch the paranoia about dictionary order
 changing,
 and fix this without counting.  A little bit of paranoia could creep back
 in
 by disabling the hash fix by default in stable releases, but I think it
 would
 be fine to make that a compile-time option.


 I'm sorry, but I don't want to break a user's app with a bugfix release
 and say haha your code was already broken you just didn't know it.

 Sure, the dict order already varies across Python implementations,
 possibly across 32/64 bits or operating systems. But many organizations (I
 know a few :-) have a very large installed software base, created over many
 years by many people with varying skills, that is kept working in part by
 very carefully keeping the environment as constant as possible. This means
 that the target environment is much more predictable than it is for the
 typical piece of open source software.

 Sure, a good Python developer doesn't write apps or tests that depend on
 dict order. But time and again we see that not everybody writes perfect
 code every time. Especially users writing in-house apps (as opposed to
 frameworks shared as open source) are less likely to always use the most
 robust, portable algorithms in existence, because they may know with much
 more certainty that their code will never be used on certain combinations
 of platforms. For example, I rarely think  about whether code I write might
 not work on IronPython or Jython, or even CPython on Windows. And if
 something I wrote suddenly needs to be ported to one of those, well, that's
 considered a port and I'll just accept that it might mean changing a few
 things.

 The time to break a dependency on dict order is not with a bugfix release
 but with a feature release: those are more likely to break other things as
 well anyway, and uses are well aware that they have to test everything and
 anticipate having to fix some fraction of their code for each feature
 release. OTOH we have established a long and successful track record of
 conservative bugfix releases that don't break anything. (I am aware of
 exactly one thing that was broken by a bugfix release in application code I
 am familiar with.)


Why can't we have our cake and eat it too?

Can we do hash randomization in 3.3 and use the hash count solution for
bugfix releases? That way we get a basic fix into the bugfix releases that
won't break people's code (hopefully) but we go with a more thorough (and
IMO correct) solution of hash randomization starting with 3.3 and moving
forward. We aren't breaking compatibility in any way by doing this since
it's a feature release anyway where we change tactics. And it can't be that
much work since we seem to have patches for both solutions. At worst it
will make merging commits for those files affected by the patches, but that
will most likely be isolated and not a common collision (and less of any
issue once 3.3 is released later this year).

I understand the desire to keep backwards-compatibility,

Re: [Python-Dev] Counting collisions for the win

2012-01-20 Thread Terry Reedy


On 1/20/2012 11:17 AM, Victor Stinner wrote:


There is no perfect solutions, drawbacks of each solution should be compared.


Amen.

One possible attack that has been described for a collision counting 
dict depends on knowing precisely the trigger point. So let 
MAXCOLLISIONS either be configureable or just choose a random count 
between M and N, say 700 and 999.


It would not hurt to have alternate patches available in case a 
particular Python-powered site comes under prolonged attack. Though 
given our miniscule share of the market, than is much less likely that 
an attack on a PHP- or MS-powered site.


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win

2012-01-20 Thread Donald Stufft

Even if a MemoryException is raised I believe that is still a fundamental 
change in the documented contract of dictionary API. I don't believe there is a 
way to fix this without breaking someones application. The major differences I 
see between the two solutions is that counting will break people's applications 
who are otherwise following the documented api contract of dictionaries, and 
randomization will break people's applications who are violating the documented 
api contract of dictionaries. 

Personally I feel that the lesser of two evils is to reward those who followed 
the documentation, and not reward those who didn't.

So +1 for Randomization as the only option in 3.3, and off by default with a 
flag or environment variable in bug fixes. I think it's the only way to proceed 
that won't hurt people who have followed the documented behavior. 


On Friday, January 20, 2012 at 1:49 PM, Brett Cannon wrote:

 
 
 On Fri, Jan 20, 2012 at 13:15, Guido van Rossum gu...@python.org 
 (mailto:gu...@python.org) wrote:
  On Fri, Jan 20, 2012 at 5:10 AM, Barry Warsaw ba...@python.org 
  (mailto:ba...@python.org) wrote:
   On Jan 20, 2012, at 01:50 PM, Victor Stinner wrote:
   
   Counting collision doesn't solve this case, but it doesn't make the
   situation worse than before. Raising quickly an exception is better
   than stalling for minutes, even if I agree than it is not the best
   behaviour.
   
   ISTM that adding the possibility of raising a new exception on dictionary
   insertion is *more* backward incompatible than changing dictionary order,
   which for a very long time has been known to not be guaranteed.  You're
   running some application, you upgrade Python because you apply all 
   security
   fixes, and suddenly you're starting to get exceptions in places you can't
   really do anything about.  Yet those exceptions are now part of the 
   documented
   public API for dictionaries.  This is asking for trouble.  Bugs will 
   suddenly
   start appearing in that application's tracker and they will seem to the
   application developer like Python just added a new public API in a 
   security
   release.
  
  Dict insertion can already raise an exception: MemoryError. I think we 
  should be safe if the new exception also derives from BaseException. We 
  should actually eriously consider just raising MemoryException, since 
  introducing a new built-in exception in a bugfix release is also very 
  questionable: code explicitly catching or raising it would not work on 
  previous bugfix releases of the same feature release.
  
   OTOH, if you change dictionary order and *that* breaks the application, 
   then
   the bugs submitted to the application's tracker will be legitimate bugs 
   that
   have to be fixed even if nothing else changed.
  
  There are lots of things that are undefined according to the language spec 
  (and quite possibly known to vary between versions or platforms or 
  implementations like PyPy or Jython) but which we would never change in a 
  bugfix release.
  
   So I still think we should ditch the paranoia about dictionary order 
   changing,
   and fix this without counting.  A little bit of paranoia could creep back 
   in
   by disabling the hash fix by default in stable releases, but I think it 
   would
   be fine to make that a compile-time option.
  
  I'm sorry, but I don't want to break a user's app with a bugfix release and 
  say haha your code was already broken you just didn't know it.
  
  Sure, the dict order already varies across Python implementations, possibly 
  across 32/64 bits or operating systems. But many organizations (I know a 
  few :-) have a very large installed software base, created over many years 
  by many people with varying skills, that is kept working in part by very 
  carefully keeping the environment as constant as possible. This means that 
  the target environment is much more predictable than it is for the typical 
  piece of open source software.
  
  Sure, a good Python developer doesn't write apps or tests that depend on 
  dict order. But time and again we see that not everybody writes perfect 
  code every time. Especially users writing in-house apps (as opposed to 
  frameworks shared as open source) are less likely to always use the most 
  robust, portable algorithms in existence, because they may know with much 
  more certainty that their code will never be used on certain combinations 
  of platforms. For example, I rarely think  about whether code I write might 
  not work on IronPython or Jython, or even CPython on Windows. And if 
  something I wrote suddenly needs to be ported to one of those, well, that's 
  considered a port and I'll just accept that it might mean changing a few 
  things.
  
  The time to break a dependency on dict order is not with a bugfix release 
  but with a feature release: those are more likely to break other things as 
  well anyway, and uses are well aware that they have to test

Re: [Python-Dev] Counting collisions for the win

2012-01-20 Thread Case Van Horsen

On Fri, Jan 20, 2012 at 8:17 AM, Victor Stinner
victor.stin...@haypocalc.com wrote:
 So I still think we should ditch the paranoia about dictionary order 
 changing,
 and fix this without counting.

 The randomized hash has other issues:

  - its security is based on its secret, whereas it looks to be easy to
 compute it (see more details in the issue)
  - my patch only changes hash(str), whereas other developers asked me
 to patch also bytes, int and other types

Changing hash(int) on a bugfix release will cause issues with
extensions (gmpy, sage, probably others) that calculate the hash of
numerical objects.


 hash(bytes) can be changed. But changing hash(int) may leak easily the
 secret. We may use a different secret for each type, but if it is easy
 to compute int hash secret, dictionaries using int are still
 vulnerable.

 --

 There is no perfect solutions, drawbacks of each solution should be compared.

 Victor
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: 
 http://mail.python.org/mailman/options/python-dev/casevh%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win

2012-01-20 Thread Terry Reedy


On 1/20/2012 10:55 AM, Frank Sievertsen wrote:

Hello,

I still see at least two ways to create a DOS attack even with the
collison-counting-patch.



2. The second attack actually attacks that 1000 allowed string
comparisons are still a lot of work.
First I added 999 strings that collide with a one-byte string a. In
some applications a zero-byte string might work even better. Then I
can add a many thousand of the a's, just like the first attack.


If 1000 were replaced by, for instance, random.randint(700,1000) the 
dict could not be set to have an exception triggered with one other 
entry (which I believe was Martin's idea). But I suppose you would say 
that 699 entries would still make for much work.


The obvious defense for this particular attack is to reject duplicate 
keys. Perhaps there should be write-once string sets and dicts available.


This gets to the point that there is no best blind defense to all 
possible attacks.


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win

2012-01-20 Thread Tres Seaver

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 01/20/2012 02:04 PM, Donald Stufft wrote:

 Even if a MemoryException is raised I believe that is still a 
 fundamental change in the documented contract of dictionary API.

How so?  Dictionary inserts can *already* raise that error.

 I don't believe there is a way to fix this without breaking someones 
 application. The major differences I see between the two solutions is
  that counting will break people's applications who are otherwise 
 following the documented api contract of dictionaries,

Do you have a case in mind where legitimate user data (not crafted as
part of a DoS attack) would trip the 1000-collision limit?  How likely is
it that such cases exist in already-deployed applications, compared to
the known breakage in existing applications due to hash randomization?

 and randomization will break people's applications who are violating 
 the documented api contract of dictionaries.
 
 Personally I feel that the lesser of two evils is to reward those who
  followed the documentation, and not reward those who didn't.

Except that I think your set is purely hypothetical, while the second set
is *lots* of deployed applications.


Tres.
- -- 
===
Tres Seaver  +1 540-429-0999  tsea...@palladion.com
Palladion Software   Excellence by Designhttp://palladion.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk8ZwlgACgkQ+gerLs4ltQ4KOACglAHDgn5wUb+cye99JbeW0rZo
5oAAn2ja7K4moFLN/aD4ZP7m+8WnwhcA
=u7Mt
-END PGP SIGNATURE-

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win

2012-01-20 Thread Donald Stufft



On Friday, January 20, 2012 at 2:36 PM, Tres Seaver wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 On 01/20/2012 02:04 PM, Donald Stufft wrote:
 
  Even if a MemoryException is raised I believe that is still a 
  fundamental change in the documented contract of dictionary API.
  
 
 
 How so? Dictionary inserts can *already* raise that error.
Because it's raising it for a fundamentally different thing. You have plenty 
of memory, but we decided to add an arbitrary limit that has nothing to do with 
memory and pretend you are out of memory anyways.
 
  I don't believe there is a way to fix this without breaking someones 
  application. The major differences I see between the two solutions is
  that counting will break people's applications who are otherwise 
  following the documented api contract of dictionaries,
  
 
 
 Do you have a case in mind where legitimate user data (not crafted as
 part of a DoS attack) would trip the 1000-collision limit? How likely is
 it that such cases exist in already-deployed applications, compared to
 the known breakage in existing applications due to hash randomization?
 
 

I don't, but as there's never been a limit on how many collisions a dictionary 
can have, this would be a fundamental change in the documented (and 
undocumented) abilities of a dictionary. Dictionary key order has never been 
guaranteed, is documented to not be relied on, already changes depending on if 
you are using 32bit, 64bit, Jython, PyPy etc or as someone else pointed out, to 
any number of possible improvements to dict. The counting solution violates the 
existing contract in order to serve people who themselves are violating the 
contract. Even with their violation the method that I +1'd still serves to not 
break existing applications by default.
 
  and randomization will break people's applications who are violating 
  the documented api contract of dictionaries.
  
  Personally I feel that the lesser of two evils is to reward those who
  followed the documentation, and not reward those who didn't.
  
 
 
 Except that I think your set is purely hypothetical, while the second set
 is *lots* of deployed applications.
 
 

Which is why I believe that it should be off by default on the bugfix, but 
easily enabled. (Flag, env var, whatever). That allows people to upgrade to a 
bugfix without breaking their application, and if this vulnerability affects 
them, they can enable it.

I think the counting collision is at best a bandaid and not a proper fix 
stemmed from a desire to not break existing applications on a bugfix release 
which can be better solved by implementing the real fix and allowing people to 
control (only on the bugfix, on 3.3+ it should be forced to on always) if they 
have it enabled or not.
 
 
 Tres.
 - -- 
 ===
 Tres Seaver +1 540-429-0999 tsea...@palladion.com 
 (mailto:tsea...@palladion.com)
 Palladion Software Excellence by Design http://palladion.com
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.10 (GNU/Linux)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
 
 iEYEARECAAYFAk8ZwlgACgkQ+gerLs4ltQ4KOACglAHDgn5wUb+cye99JbeW0rZo
 5oAAn2ja7K4moFLN/aD4ZP7m+8WnwhcA
 =u7Mt
 -END PGP SIGNATURE-
 
 ___
 Python-Dev mailing list
 Python-Dev@python.org (mailto:Python-Dev@python.org)
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: 
 http://mail.python.org/mailman/options/python-dev/donald.stufft%40gmail.com
 
 


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win

2012-01-20 Thread Ethan Furman


Donald Stufft wrote:
Even if a MemoryException is raised I believe that is still a 
fundamental change in the documented contract of dictionary API. I don't 
believe there is a way to fix this without breaking someones 
application. The major differences I see between the two solutions is 
that counting will break people's applications who are otherwise 
following the documented api contract of dictionaries, and randomization 
will break people's applications who are violating the documented api 
contract of dictionaries. 

Personally I feel that the lesser of two evils is to reward those who 
followed the documentation, and not reward those who didn't.


So +1 for Randomization as the only option in 3.3, and off by default 
with a flag or environment variable in bug fixes. I think it's the only 
way to proceed that won't hurt people who have followed the documented 
behavior.


+1

~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win

On Fri, Jan 20, 2012 at 11:51 AM, Donald Stufft donald.stu...@gmail.comwrote:

  On Friday, January 20, 2012 at 2:36 PM, Tres Seaver wrote:

 On 01/20/2012 02:04 PM, Donald Stufft wrote:

 Even if a MemoryException is raised I believe that is still a
 fundamental change in the documented contract of dictionary API.

 How so? Dictionary inserts can *already* raise that error.

 Because it's raising it for a fundamentally different thing. You have
 plenty of memory, but we decided to add an arbitrary limit that has nothing
 to do with memory and pretend you are out of memory anyways.


Actually due to fragmentation that can already happen.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win

2012-01-20 Thread Terry Reedy


On 1/20/2012 2:51 PM, Donald Stufft wrote:


I think the counting collision is at best a bandaid and not a proper fix
stemmed from a desire to not break existing applications on a bugfix
release ...


My opinion of counting is better than yours, but even conceding the 
theoretical, purity argument, our release process is practical as well. 
There have been a few occasions when fixes to bugs in our code have been 
delayed from a bugfix release to the next feature release -- because the 
fix would break too much code depending on the bug.


Some years ago there was a proposal that we should deliberately tweak 
hash() to break 'buggy' code that depended on it not changing. This 
never happened. So it has been left de facto constant, to the extent it 
is, for some years.


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win

2012-01-20 Thread Ben Wolfson

On Fri, Jan 20, 2012 at 2:11 PM, Terry Reedy tjre...@udel.edu wrote:
 On 1/20/2012 2:51 PM, Donald Stufft wrote:

 I think the counting collision is at best a bandaid and not a proper fix
 stemmed from a desire to not break existing applications on a bugfix
 release ...

 My opinion of counting is better than yours, but even conceding the
 theoretical, purity argument, our release process is practical as well.
 There have been a few occasions when fixes to bugs in our code have been
 delayed from a bugfix release to the next feature release -- because the fix
 would break too much code depending on the bug.

AFAICT Brett's suggestion (which had occurred to me as well, but I'm
not a core developer by any stretch) seemed to get lost in the debate:
would it be possible to go with collision counting for bugfix releases
and hash randomization for new feature releases? (Brett made it here:
http://mail.python.org/pipermail/python-dev/2012-January/115740.html.)

-- 
Ben Wolfson
Human kind has used its intelligence to vary the flavour of drinks,
which may be sweet, aromatic, fermented or spirit-based. ... Family
and social life also offer numerous other occasions to consume drinks
for pleasure. [Larousse, Drink entry]
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win

2012-01-20 Thread Donald Stufft

I believe that either solution has the potential to break existing applications 
so to ensure that no applications are broken there will need to be a method of 
disabling it. I also believe that to maintain the backwards compatibility that 
Python has traditionally had in bug fix releases that either solution will need 
to default to off. 

Given those 2 things that I believe, I don't think that the argument should be 
which solution will break less, because I believe both will need to be off by 
default, but which solution more adequately solves the underlying problem. 


On Friday, January 20, 2012 at 5:11 PM, Terry Reedy wrote:

 On 1/20/2012 2:51 PM, Donald Stufft wrote:
 
  I think the counting collision is at best a bandaid and not a proper fix
  stemmed from a desire to not break existing applications on a bugfix
  release ...
  
 
 
 My opinion of counting is better than yours, but even conceding the 
 theoretical, purity argument, our release process is practical as well. 
 There have been a few occasions when fixes to bugs in our code have been 
 delayed from a bugfix release to the next feature release -- because the 
 fix would break too much code depending on the bug.
 
 Some years ago there was a proposal that we should deliberately tweak 
 hash() to break 'buggy' code that depended on it not changing. This 
 never happened. So it has been left de facto constant, to the extent it 
 is, for some years.
 
 -- 
 Terry Jan Reedy
 
 ___
 Python-Dev mailing list
 Python-Dev@python.org (mailto:Python-Dev@python.org)
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: 
 http://mail.python.org/mailman/options/python-dev/donald.stufft%40gmail.com
 
 


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win


Am 20.01.2012 16:33, schrieb Guido van Rossum:
(I'm thinking that the original attack is trivial once the set of 
65000 colliding keys is public knowledge, which must be only a matter 
of time.



I think it's very likely that this will happen soon.

For ASP and PHP there is attack-payload publicly available.
PHP and ASP have patches to limit the number of query-variables.

We're very lucky that there's no public payload for python yet,
and all non-public software and payload I'm aware of is based
upon my software.

But this can change any moment. It's not really difficult to
write software to create 32bit-collisions.

Frank
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win

On Fri, Jan 20, 2012 at 2:33 PM, Ben Wolfson wolf...@gmail.com wrote:
 On Fri, Jan 20, 2012 at 2:11 PM, Terry Reedy tjre...@udel.edu wrote:
 On 1/20/2012 2:51 PM, Donald Stufft wrote:

 I think the counting collision is at best a bandaid and not a proper fix
 stemmed from a desire to not break existing applications on a bugfix
 release ...

 My opinion of counting is better than yours, but even conceding the
 theoretical, purity argument, our release process is practical as well.
 There have been a few occasions when fixes to bugs in our code have been
 delayed from a bugfix release to the next feature release -- because the fix
 would break too much code depending on the bug.

 AFAICT Brett's suggestion (which had occurred to me as well, but I'm
 not a core developer by any stretch) seemed to get lost in the debate:
 would it be possible to go with collision counting for bugfix releases
 and hash randomization for new feature releases? (Brett made it here:
 http://mail.python.org/pipermail/python-dev/2012-January/115740.html.)

I made it earlier.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win

On Fri, Jan 20, 2012 at 2:35 PM, Frank Sievertsen py...@sievertsen.de wrote:
 Am 20.01.2012 16:33, schrieb Guido van Rossum:

 (I'm thinking that the original attack is trivial once the set of 65000
 colliding keys is public knowledge, which must be only a matter of time.



 I think it's very likely that this will happen soon.

 For ASP and PHP there is attack-payload publicly available.
 PHP and ASP have patches to limit the number of query-variables.

 We're very lucky that there's no public payload for python yet,
 and all non-public software and payload I'm aware of is based
 upon my software.

 But this can change any moment. It's not really difficult to
 write software to create 32bit-collisions.

While we're debating the best fix, could we allow people to at least
protect themselves against script-kiddies by offering fixes to cgi.py,
django, webob and a few other popular frameworks that limits forms to
1000 keys? (I suppose it's really only POST requests that are
vulnerable to script kiddies, because of the length restriction on
URLs.)

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win

2012-01-20 Thread Steven D'Aprano


Guido van Rossum wrote:

On Fri, Jan 20, 2012 at 11:51 AM, Donald Stufft donald.stu...@gmail.comwrote:


 On Friday, January 20, 2012 at 2:36 PM, Tres Seaver wrote:

On 01/20/2012 02:04 PM, Donald Stufft wrote:

Even if a MemoryException is raised I believe that is still a
fundamental change in the documented contract of dictionary API.

How so? Dictionary inserts can *already* raise that error.

Because it's raising it for a fundamentally different thing. You have
plenty of memory, but we decided to add an arbitrary limit that has nothing
to do with memory and pretend you are out of memory anyways.



Actually due to fragmentation that can already happen.


Whether you have run out of total memory, or a single contiguous block, it is 
still a memory error.


An arbitrary limit on collisions is not a memory error. If we were designing 
this API from scratch, would anyone propose using MemoryError for you have 
reached a limit on collisions? It has nothing to do with memory. Using 
MemoryError for something which isn't a memory error is ugly.


How about RuntimeError?



--
Steven

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win

It should derive from BaseException.

--Guido van Rossum (sent from Android phone)
On Jan 20, 2012 4:59 PM, Steven Dapos;Aprano st...@pearwood.info wrote:

Guido van Rossum wrote:

On Fri, Jan 20, 2012 at 11:51 AM, Donald Stufft donald.stu...@gmail.com
**wrote:

On Friday, January 20, 2012 at 2:36 PM, Tres Seaver wrote:

On 01/20/2012 02:04 PM, Donald Stufft wrote:

Even if a MemoryException is raised I believe that is still a
fundamental change in the documented contract of dictionary API.

How so? Dictionary inserts can *already* raise that error.

Because it's raising it for a fundamentally different thing. You have
plenty of memory, but we decided to add an arbitrary limit that has
nothing
to do with memory and pretend you are out of memory anyways.

Actually due to fragmentation that can already happen.

Whether you have run out of total memory, or a single contiguous block, it
is still a memory error.

An arbitrary limit on collisions is not a memory error. If we were
designing this API from scratch, would anyone propose using MemoryError for
you have reached a limit on collisions? It has nothing to do with memory.
Using MemoryError for something which isn't a memory error is ugly.

How about RuntimeError?

--
Steven

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win

2012-01-20 Thread Steven D'Aprano


Terry Reedy wrote:

On 1/20/2012 11:17 AM, Victor Stinner wrote:

There is no perfect solutions, drawbacks of each solution should be 
compared.


Amen.

One possible attack that has been described for a collision counting 
dict depends on knowing precisely the trigger point. So let 
MAXCOLLISIONS either be configureable or just choose a random count 
between M and N, say 700 and 999.


Have I missed something? Why wouldn't the attacker simply target 1000 
collisions, and if the collision triggers at 700 instead of 1000, that's a bonus?



--
Steven

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win

2012-01-20 Thread Steven D'Aprano


Guido van Rossum wrote:

It should derive from BaseException.


RuntimeError meets that requirement, and it is an existing exception so there 
are no issues with introducing a new built-in exception to a point release.


py issubclass(RuntimeError, BaseException)
True


--
Steven
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win

2012-01-20 Thread Benjamin Peterson

2012/1/20 Steven D'Aprano st...@pearwood.info:
 Guido van Rossum wrote:

 It should derive from BaseException.


 RuntimeError meets that requirement, and it is an existing exception so
 there are no issues with introducing a new built-in exception to a point
 release.

 py issubclass(RuntimeError, BaseException)
 True

Guido meant a direct subclass.



-- 
Regards,
Benjamin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win