https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105219

Tamar Christina <tnfchris at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |tnfchris at gcc dot gnu.org
   Last reconfirmed|                            |2022-4-11

--- Comment #1 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
Smaller reproducer getting rid of the loop nest and simplify the inner
condition.

int a;                                                                         
                                                                               
                                                                               
                               char b[60];
short d[19];
long long f;

__attribute__ ((noinline, noipa))                                              
                                                                               
                                                                               
                               void e(int g, int h, short k[19]) {
    for (signed j = 1; j < h + 14; j++) {
      int i = 0;
      b[j] = 1;
      i = 2;
      b[i * 14 + j] = 1;
      a = g ? k[j] : 0;
    }
}

__attribute__ ((optimize("O0")))
int main() {
  e(9, 1, d);
  for (long l = 0; l < 6; ++l)
    for (long m = 0; m < 4; ++m)
      f ^= b[l + m * 4];
  if (f)
    __builtin_abort ();
}

and the -mtune=thunderx doesn't seem needed. I can reproduce with just -O3
-march=armv8.2-a+sve -msve-vector-bits=128.

the write to `a` itself is significant not for the result but to block the loop
from being optimized to a memset.

Reply via email to