Hello Yura,
Given 99.99% cases will be in the likely case, branch predictor should
eliminate decision cost.
Hmmm. ISTM that a branch predictor should predict that unknown < small
should probably be false, so a hint should be given that it is really
true.
Why? Branch predictor is history based:
Hmmm. This means running the compiler with some special options, running
the code on significant and representative data, then recompiling based on
collected branch stats. This is not the usual way pg is built.
if it were usually true here then it will be true this time either.
unknown < small is usually true therefore branch predictor will assume
it is true.
I put `likely` for compiler: compiler then puts `likely` path closer.
Yes, an explicit hint is needed.
And as Dean Rasheed correctly mentioned, mask method will have really bad
pattern for branch predictor if range is not just below or equal to power
of two.
On average the bitmask is the better unbiased method, if the online
figures are to be trusted. Also, as already said, I do not really want
to add code complexity, especially to get lower average performance,
and especially with code like "threshold = - range % range", where
both variables are unsigned, I have a headache just looking at it:-)
If you mention https://www.pcg-random.org/posts/bounded-rands.html then
Indeed, this is the figures I was refering to when saying that bitmask
looks the best method.
1. first graphs are made with not exact Lemire's code.
Last code from https://lemire.me/blog/2016/06/30/fast-random-shuffling/
Ok, other figures, however there is no comparison with the mask method in
this post, it mostly argues agains division/modulo.
By the way, we have 64bit random. If we use 44bit from it for range <=
(1<<20), then bias will be less than 1/(2**24). Could we just ignore it
(given it is not crypto strong random)?
That was my initial opinion, by Dean insists on an unbiased method. I
agree with Dean that performance, if it is not too bad, does not matter
that much, so that I'm trying to keep the code simple as a main objective.
You do not seem ready to buy this argument. Note that despite that my
research is about compiler optimizations, I did bought it:-)
Given the overheads involved in pgbench, the performance impact of best vs
worst case scenario is minimal:
\set i random(0, 7) -- 8 values, good for mask: 4.515 Mtps
\set i random(0, 8) -- 9 values, bad for mask: 4.151 Mtps
sp the performance penalty is about 8%.
if ((range & (range-1) == 0) /* handle all power 2 cases */
return range != 0 ? val & (range-1) : 0;
if (likely(range < (1<<20)))
/*
* While multiply method is biased, bias will be smaller than 1/(1<<24)
for
* such small ranges. Lets ignore it.
*/
return ((val >> 20) * range) >> 44;
/* Apple's mask method */
m = mask_u64(range-1);
val &= m;
while (val >= range)
val = xoroshiro128ss(state) & m;
return val;
}
Hmmm. The code looks "simple" enough, but I do not like optimizing for 20
bits values is worth it, especially as the bitmask method seems the best
anyway. We were discussing 32 bits before.
Anyway, excuse me for heating this discussion cause of such
non-essential issue.
Well, I like to discuss things!
I'll try to control myself and don't proceed it further.
Yep. We have to compromise at some point. The majority opinion seems to be
that we want code simplicity more, so the bitmask it is. I've posted a
v10.
Thanks for the interesting discussion and arguments!
--
Fabien.