https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108

--- Comment #62 from Richard Biener <rguenth at gcc dot gnu.org> ---
Created attachment 34238
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=34238&action=edit
frontend patch

So the issue is that the division is executed conditionally because it is
placed
after the loop entry check.  The frontend generates

      if (step > 0)
        {
         if (to < from)
           goto exit_label;
         countm1 = (to - from) / step;
        }
      else
        {
         if (to > from)
           goto exit_label;
         countm1 = (from - to) / -step;
        }

which ends up optimized to (as step is known to be positive here):

    if (to < from)
     {
       countm1 = (to - from) / step;
       ...loop...

if we instead generate

      if (step > 0)
        {
         countm1 = (to - from) / step;
         if (to < from)
           goto exit_label;
        }
      else
        {
         countm1 = (from - to) / -step;
         if (to > from)
           goto exit_label;
        }

the division can be hoisted out of the outer loop (at the cost of computing
countm1 even when the loop is not entered -- but that would have happened
if the desired transform happened anyway).

Note that if step > 0 cannot be optimized we still have the same issue,
so in theory we could generate

   if (step > 0)
     {
       tem1 = (to - from);
       tem2 = step;
     }
   else
     {
       tem1 = (from - to);
       tem2 = -step;
     }
   countm1 = tem1 / tem2;
   if (step > 0)
    {
      if (to < from)
        goto exit_label;
    }
   else
    {
      if (to > from)
        goto exit_label;
    }

but hoisting the division with conditional computed operands is a lot more
expensive (but possible for exactly two ops, and likely done).  The
runtime overhead from the above is one extra branch.

If we do that then unfortunately jump threading undoes that change creating
the first variant again.

Reply via email to