https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108840

            Bug ID: 108840
           Summary: Aarch64 doesn't optimize away shift counter masking
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jakub at gcc dot gnu.org
  Target Milestone: ---

As mentioned in 
https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612214.html
aarch64 doesn't optimize away and instructions masking shift count if there is
more than one shift with the same count.  Consider -O2 -fno-tree-vectorize:
int
foo (int x, int y)
{
  return x << (y & 31);
}

void
bar (int x[3], int y)
{
  x[0] <<= (y & 31);
  x[1] <<= (y & 31);
  x[2] <<= (y & 31);
}

void
baz (int x[3], int y)
{
  y &= 31;
  x[0] <<= y;
  x[1] <<= y;
  x[2] <<= y;
}

void corge (int, int, int);

void
qux (int x, int y, int z, int n)
{
  n &= 31;
  corge (x << n, y << n, z >> n);
}

foo is optimized correctly, combine matches the shift with masking, but in the
rest of cases due to costs the desirable combination is rejected.  Shift with
embedded masking of the count should have rtx_cost the same as normal shift
when it is actually under the hood the shift itself.

Reply via email to