On Tuesday, 3 July 2012 at 17:25:18 UTC, bearophile wrote:
ixid:
In any case with large values of k the branch prediction will
be right almost all of the time, explaining why this form is
faster than modulo as modulo is fairly slow while this is a
correctly predicted branch doing an addition if it doesn't
make it branchless.
That seems the explanation.
The branchless version gives the same time result as branched,
is there a way to force that line not to optimized to compare
the predicted version?
I don't fully understand the question. Do you mean annotations
like the __builtin_expect of GCC?
Bye,
bearophile
If
uint iter_next = iter + 1 > k? 0 : iter + 1;
is getting optimized to
uint iter_next = (iter + 1) * !(iter + 1 > k);
or something like it by the compiler then it would be nice to be
able to test the branched code without having the rest of the
program lose optimizations for speed because as I said, for large
k branching will almost always be correctly predicted making me
think it'd be faster than the branchless version.