[Bug tree-optimization/109885] gcc does not generate movmskps and testps instructions (clang does)

2024-02-17 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109885

--- Comment #4 from Hongtao Liu  ---
int sum() {
   int ret = 0;
   for (int i=0; i<8; ++i) ret +=(0==v[i]);
   return ret;
}

int sum2() {
   int ret = 0;
   auto m = v==0;
   for (int i=0; i<8; ++i) ret += m[i];
   return ret;
}

For sum, gcc tries to reduce for an {0/1, 0/1, ...} vector, for sum2, it tries
to reduce {0/-1,0/-1,...} vector. But LLVM tries to reduce {0/1, 0/1, ... }
vector for both sum and sum2. Not sure which is correct?

[Bug tree-optimization/109885] gcc does not generate movmskps and testps instructions (clang does)

2024-02-10 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109885

Andrew Pinski  changed:

   What|Removed |Added

  Component|target  |tree-optimization
 CC||pinskia at gcc dot gnu.org
 Blocks||53947

--- Comment #3 from Andrew Pinski  ---
What is even funnier on the LLVM side is if we have:
```
void f(unsigned int * __restrict a, unsigned int * __restrict b)
{
  unsigned int t = 0;
  t += (a[0] == b[0]);
  t += (a[1] == b[1])<<1;
  t += (a[2] == b[2])<<2;
  t += (a[3] == b[3])<<3;
  *a = t;
}
```
LLVM can produce movmskps for x86_64 but then does do a similar trick that it
did for the sum for aarch64.

Note GCC does not handle reductions that well for SLP either.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations