Re: rand48 replacement

Fabien COELHO Thu, 01 Jul 2021 14:19:04 -0700


Hello Dean,

I haven't looked at the patch in detail, but one thing I object to is
the code to choose a random integer in an arbitrary range.


Thanks for bringing up this interesting question!

Currently, this is done in pgbench by getrand(), which has its
problems.

Yes. That is one of the motivation for providing something hopefullybetter.

However, this patch seems to be replacing that with a simple
modulo operation, which is perhaps the worst possible way to do it.


I did it knowing this issue. Here is why:

The modulo operation is biased for large ranges close to the limit, sure.Also, the bias is somehow of the same magnitude as the FP multiplicationapproach used previously, so the "worst" has not changed much, it isreally the same as before.

I thought it is not such an issue because for typical uses we are unlikelyto be in these conditions, so the one-operation no branching approachseemed like a good precision vs performance compromise: I'd expect thetypical largest ranges to be well below 40 bits (eg a key in a prettylarge table in pgbench), which makes the bias well under 1/2**24 and ISTMthat I can live with that. With the initial 48 bits state, obviously thesituation was not the same.

There's plenty of research out there on how to do it better -- see,
for example, [1] for a nice summary.

Rejection methods include branches, thus may cost significantly more, asshow by the performance figures in blog.

Also, it somehow breaks the sequence determinism when using range, which Ifound quite annoying: ISTM desirable that when generating a number thestate advances once, and just once.

Also some methods have higher costs depending on the actual range, eg thebitmask approach: for range 129 the bitmask is 0xff and you have a nearly50% probability of iterating once, nearly 25% of iterating twice, and soon… I like performance to be uniform, not to depend on actual values.

Given these arguments I'd be inclined to keep the bias, but I'm open tomore discussion.

Also, I'd say that functions to choose random integers in an arbitraryrange ought to be part of the common API, as they are in almost everylanguage's random API.


That is a good point.

--
Fabien.

Re: rand48 replacement

Reply via email to