ixid:

In any case with large values of k the branch prediction will be right almost all of the time, explaining why this form is faster than modulo as modulo is fairly slow while this is a correctly predicted branch doing an addition if it doesn't make it branchless.

That seems the explanation.


The branchless version gives the same time result as branched, is there a way to force that line not to optimized to compare the predicted version?

I don't fully understand the question. Do you mean annotations like the __builtin_expect of GCC?

Bye,
bearophile

Reply via email to