Hello Yura,
I believe most "range" values are small, much smaller than UINT32_MAX.
In this case, according to [1] fastest method is Lemire's one (I'd take
original version from [2]) [...]
Yep.
I share your point that the range is more often 32 bits.
However, I'm not enthousiastic at combining two methods depending on the
range, the function looks complex enough without that, so I would suggest
not to take this option. Also, the decision process adds to the average
cost, which is undesirable. I would certainly select the unbias multiply
method if we want a u32 range variant.
--
Fabien.