Hello! On Fri, Sep 04, 2015 at 11:00:58PM +0200, Sergey Brester wrote:
> On 04.09.2015 21:43, Maxim Dounin wrote: > > >No one yet happened. And likely won't ever happen, as md5 is a > >good hash function 128 bits wide, and it took many years to find > >even a single collision of md5. > > You confuse good for "collision-search algorithms" with a good in the sense > of the "probability the collision can occur". A estimation of collision in > sence of "collision-search algorithm" and co. implies the hashed string is > unknown and for example it estimates attacks to find that (like brute, > chosen-prefix etc). > > I'm talking about the probability of incidence the same hash for two > different cache keys. > In addition, because of so-called birthday problem > (https://en.wikipedia.org/wiki/Birthday_problem) we can increase this > probability with at least comparable 64 bit for real random data (different > length). Well, not, I don't confuse anything. For sure, brute force attack on a 128 bit hash requires approximately 2^64 attempts. That is, a single nginx instance with 2^64 cached resources will likely show up a collision. But that's not a number of resources you'll be able to store on a single node - in particular, because 64-bit address space wouldn't be enough to address that many cached items. To obtain a collision of a 128-bit hash with at least 1% probability, you'll need more than 10^18 resources cached on a single node, which is not even close to a something possible as well. Assuming 1 billion of keys (which is way more than a single nginx node can handle now, and will require about 125G of memory for a cache keys zone), probability of a collision is less than 10^(-20). Quoting https://en.wikipedia.org/wiki/Birthday_attack: : For comparison, 10^(−18) to 10^(−15) is the uncorrectable bit : error rate of a typical hard disk. That is, you are trying to fight a problem which is less probable than the chance that you'll get wrong data from your hard disk. > Don't forget our keys, that will be hashed, are not really any "random" data > - most of the time it contains only specified characters and/or has > specified length. > > So the probability that the collision will occur is still significant larger > (a billion billion times larger). This is not true as long as you are using a good enough hash function. See https://en.wikipedia.org/wiki/Hash_function#Uniformity for details. > >And even if it'll happen, we have > >crc32 check in place to protect us. > > Very funny... You make such conclusions based on what? > So last but not least, if you still haven't seen the collision in sence of > md5 "protected" crc32, how can you be sure, that this is still not occurred? > > For example, how large you will estimate the probability that the collision > will occur, if my keys will contain only exact 32 characters in range > [0-9A-Za-z]? And it frequency? Just approximately dimension... See above for detailed numbers. -- Maxim Dounin http://nginx.org/ _______________________________________________ nginx-devel mailing list nginx-devel@nginx.org http://mailman.nginx.org/mailman/listinfo/nginx-devel