Hello Yura,

Given 99.99% cases will be in the likely case, branch predictor should
eliminate decision cost.

Hmmm. ISTM that a branch predictor should predict that unknown < small
should probably be false, so a hint should be given that it is really
true.

Why? Branch predictor is history based:

Hmmm. This means running the compiler with some special options, running the code on significant and representative data, then recompiling based on collected branch stats. This is not the usual way pg is built.

if it were usually true here then it will be true this time either. unknown < small is usually true therefore branch predictor will assume it is true.

I put `likely` for compiler: compiler then puts `likely` path closer.

Yes, an explicit hint is needed.

And as Dean Rasheed correctly mentioned, mask method will have really bad pattern for branch predictor if range is not just below or equal to power of two.

On average the bitmask is the better unbiased method, if the online
figures are to be trusted. Also, as already said, I do not really want
to add code complexity, especially to get lower average performance,
and especially with code like "threshold = - range % range", where
both variables are unsigned, I have a headache just looking at it:-)

If you mention https://www.pcg-random.org/posts/bounded-rands.html then

Indeed, this is the figures I was refering to when saying that bitmask looks the best method.

1. first graphs are made with not exact Lemire's code.
 Last code from https://lemire.me/blog/2016/06/30/fast-random-shuffling/

Ok, other figures, however there is no comparison with the mask method in this post, it mostly argues agains division/modulo.

By the way, we have 64bit random. If we use 44bit from it for range <= (1<<20), then bias will be less than 1/(2**24). Could we just ignore it (given it is not crypto strong random)?

That was my initial opinion, by Dean insists on an unbiased method. I agree with Dean that performance, if it is not too bad, does not matter that much, so that I'm trying to keep the code simple as a main objective.

You do not seem ready to buy this argument. Note that despite that my research is about compiler optimizations, I did bought it:-)

Given the overheads involved in pgbench, the performance impact of best vs worst case scenario is minimal:

  \set i random(0, 7) -- 8 values, good for mask: 4.515 Mtps
  \set i random(0, 8) -- 9 values, bad for mask: 4.151 Mtps

sp the performance penalty is about 8%.

 if ((range & (range-1) == 0) /* handle all power 2 cases */
   return range != 0 ? val & (range-1) : 0;
 if (likely(range < (1<<20)))
    /*
* While multiply method is biased, bias will be smaller than 1/(1<<24) for
     * such small ranges. Lets ignore it.
     */
    return ((val >> 20) * range) >> 44;
 /* Apple's mask method */
 m = mask_u64(range-1);
 val &= m;
 while (val >= range)
   val = xoroshiro128ss(state) & m;
 return val;
}

Hmmm. The code looks "simple" enough, but I do not like optimizing for 20 bits values is worth it, especially as the bitmask method seems the best anyway. We were discussing 32 bits before.

Anyway, excuse me for heating this discussion cause of such non-essential issue.

Well, I like to discuss things!

I'll try to control myself and don't proceed it further.

Yep. We have to compromise at some point. The majority opinion seems to be that we want code simplicity more, so the bitmask it is. I've posted a v10.

Thanks for the interesting discussion and arguments!

--
Fabien.


Reply via email to