Hello Dean,

I haven't looked at the patch in detail, but one thing I object to is
the code to choose a random integer in an arbitrary range.

Thanks for bringing up this interesting question!

Currently, this is done in pgbench by getrand(), which has its
problems.

Yes. That is one of the motivation for providing something hopefully better.

However, this patch seems to be replacing that with a simple
modulo operation, which is perhaps the worst possible way to do it.

I did it knowing this issue. Here is why:

The modulo operation is biased for large ranges close to the limit, sure. Also, the bias is somehow of the same magnitude as the FP multiplication approach used previously, so the "worst" has not changed much, it is really the same as before.

I thought it is not such an issue because for typical uses we are unlikely to be in these conditions, so the one-operation no branching approach seemed like a good precision vs performance compromise: I'd expect the typical largest ranges to be well below 40 bits (eg a key in a pretty large table in pgbench), which makes the bias well under 1/2**24 and ISTM that I can live with that. With the initial 48 bits state, obviously the situation was not the same.

There's plenty of research out there on how to do it better -- see,
for example, [1] for a nice summary.

Rejection methods include branches, thus may cost significantly more, as show by the performance figures in blog.

Also, it somehow breaks the sequence determinism when using range, which I found quite annoying: ISTM desirable that when generating a number the state advances once, and just once.

Also some methods have higher costs depending on the actual range, eg the bitmask approach: for range 129 the bitmask is 0xff and you have a nearly 50% probability of iterating once, nearly 25% of iterating twice, and so on… I like performance to be uniform, not to depend on actual values.

Given these arguments I'd be inclined to keep the bias, but I'm open to more discussion.

Also, I'd say that functions to choose random integers in an arbitrary range ought to be part of the common API, as they are in almost every language's random API.

That is a good point.

--
Fabien.

Reply via email to