On Wed, Sep 21, 2016 at 4:05 PM, Tom Worster <f...@thefsb.org> wrote:
> On 9/21/16 8:37 AM, Rowan Collins wrote: > >> On 21 September 2016 13:02:20 BST, Glenn Eggleton <geggl...@gmail.com> >> wrote: >> >>> What if we had some sort of configuration limit on collision length? >>> >> >> Previous discussions have come to the conclusion that the difference >> between normal collision frequency and sufficient for a DoS is so large >> that the only meaningful settings would be on or off. e.g. the proposed >> limit is 1000, and randomly inserting millions of rows produces about 12. >> >> The problem with long running applications is not that they need to raise >> the limit, it's that they need to handle the error gracefully if they are >> in fact under attack. Because hash tables are so ubiquitous in the engine, >> there's no guarantee that that's possible, so an attacker would have the >> ability to crash the process with the limit turned on, or hang the CPU with >> the limit turned off. >> > > Right. It seems like count-and-limit pushes the problem onto the user who > then has to discriminate normal from malicious causes for rising counters > and find appropriate actions for each. > > Even a sophisticated user who understands hash collision counters may not > welcome this since it adds complexity that's hard to test and involves > questionable heuristics. > > Tom > Quoting a relevant part of the previous discussion: > Lets [try] to quantify the probability of reaching the collision limit C with a hashtable of size N and assuming a random hash distribution. The upper bound for this should be (N choose C) * (1/N)^(C-1), with (1/N)^(C-1) being the probability that C elements hash to one value and (N over C) the number of C element subsets of an N element set. For large N and N >> C we approximate (N choose C) to (e*N/C)^C / sqrt(2pi*C). As such our upper bound becomes N * (e/C)^C / sqrt(2pi*C). Choosing N = 2^31 (largest possible hashtable) we get for C = 20 probability 8.9E-10 and for C = 100 probability 2.3E-149. The patch uses C = 1000. In other words, it is extremely unlikely that you hit the collision limit by accident, with non-malicious data. So no, the user does not have to discriminate normal from malicious causes. If the limit triggers, it's malicious. Nikita