Hi! >> Lets [try] to quantify the probability of reaching the collision limit C > with a hashtable of size N and assuming a random hash distribution. The > upper bound for this should be (N choose C) * (1/N)^(C-1), with (1/N)^(C-1) > being the probability that C elements hash to one value and (N over C) the > number of C element subsets of an N element set. For large N and N >> C we > approximate (N choose C) to (e*N/C)^C / sqrt(2pi*C). As such our upper > bound becomes N * (e/C)^C / sqrt(2pi*C). Choosing N = 2^31 (largest > possible hashtable) we get for C = 20 probability 8.9E-10 and for C = 100 > probability 2.3E-149. The patch uses C = 1000.
I assume you're talking here about per-hashtable limit? Here I notice two things: 1. Hash keys are not really random. Which can both improve and make it worse, but we have a lot of keys named "key" and not a lot of keys named "\x07\xF5\xDD". 2. This does not account for the fact that if we have two keys used by app colliding, we'll probably see this collision a lot. Of course, if we're counting per-hashtable, that would make is somewhat less, but still. To validate this theory, I applied the following patch: https://gist.github.com/smalyshev/2c3c085998ba81d6c163a27386aaacd8 And run composer update on one of my projects. I got these lines: Destroyed hash with coll = 48454 Destroyed hash with coll = 41871 Destroyed hash with coll = 41828 Destroyed hash with coll = 40535 Destroyed hash with coll = 32774 Destroyed hash with coll = 22782 Destroyed hash with coll = 21381 Destroyed hash with coll = 15232 Destroyed hash with coll = 14745 Destroyed hash with coll = 12216 Destroyed hash with coll = 11742 Destroyed hash with coll = 11625 Destroyed hash with coll = 8916 Destroyed hash with coll = 6820 Destroyed hash with coll = 5911 Destroyed hash with coll = 5429 Destroyed hash with coll = 3390 Destroyed hash with coll = 1188 I.e. there were 18 hashes with more than 1000 collisions on the course of the run. Is there something wrong with my count or am I counting wrong things (I didn't see the patch for limiting counts so maybe I'm still counting wrong things). > In other words, it is extremely unlikely that you hit the collision limit > by accident, with non-malicious data. So no, the user does not have to > discriminate normal from malicious causes. If the limit triggers, it's > malicious. I'm not so sure of this now - unless, again, I misunderstand what we're counting. -- Stas Malyshev smalys...@gmail.com -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php