The first value is taken about 75% of the time for N=1000 and s=2.5, which
means that a non deterministic implementation would succeed about 0.75² ~
56% of the time on that one.

Right, that's about what we've been seeing on OpenBSD.

Also, the drawing procedure is less efficient when the parameter is close
to 1 because it is more likely to loop,

That might be something to fix, but I agree it's a reason not to go
overboard trying to flatten the test case's distribution right now.

Probably you would have to invent a new method to draw a zipfian distribution for that, which would be nice.

If you want something more drastic, using 1.5 instead of 2.5 would reduce
the probability of accidentaly passing the test by chance to about 20%, so
it would fail 80% of the time.

I think your math is off;

Argh. Although I confirm my computation, ISTM that with 1.5 the first value as 39% chance of getting out so collision on 15% of cases, second value 14% so collision on 2%, ... total cumulated probability about 18%.

1.5 works quite well here. I saw one failure to produce distinct values in 20 attempts.

For 3 failure expected, that is possible.

It's not demonstrably slower than 2.5 either. (1.1 is measurably slower; probably not by enough for anyone to care, but 1.5 is good enough for me.)

Good if it fails quick enough for you.

--
Fabien.

Reply via email to