https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760

--- Comment #16 from Wilco <wilco at gcc dot gnu.org> ---
(In reply to rguent...@suse.de from comment #15)

> which is what I refered to for branch prediction.  Your & prompts me
> to a way to do sth similar as duffs device, turning the loop into a nest.
> 
>  head:
>    if (n == 0) exit;
>    <1 iteration>
>    if (n & 1)
>      n -= 1, goto head;
>    <1 iteration>
>    if (n & 2)
>      n -= 2, goto head;
>    <2 iterations>
>    if (n & 4)
>      n -= 4, goto head;
>    <4 iterations>
>    n -= 8, goto head;
> 
> the inner loop tests should become well-predicted quickly.
> 
> But as always - more branches make it more likely that one is
> mispredicted.  For a single non-unrolled loop you usually only
> get the actual exit mispredicted.

Yes the overlapping the branches for the tail loop and the main loop will
result in more mispredictions. And there are still 4 branches for an 8x
unrolled loop, blocking optimizations and scheduling. So Duff's device is
always inefficient - the above loop is much faster like this:

if (n & 1)
 do 1 iteration
if (n & 2)
 do 2 iterations
if (n >= 4)
 do
  4 iterations
 while ((n -= 4) > 0)

Reply via email to