On Tuesday, 3 July 2012 at 17:25:18 UTC, bearophile wrote:
ixid:

In any case with large values of k the branch prediction will be right almost all of the time, explaining why this form is faster than modulo as modulo is fairly slow while this is a correctly predicted branch doing an addition if it doesn't make it branchless.

That seems the explanation.


The branchless version gives the same time result as branched, is there a way to force that line not to optimized to compare the predicted version?

I don't fully understand the question. Do you mean annotations like the __builtin_expect of GCC?

Bye,
bearophile

If

uint iter_next = iter + 1 > k? 0 : iter + 1;

is getting optimized to

uint iter_next = (iter + 1) * !(iter + 1 > k);

or something like it by the compiler then it would be nice to be able to test the branched code without having the rest of the program lose optimizations for speed because as I said, for large k branching will almost always be correctly predicted making me think it'd be faster than the branchless version.

Reply via email to