[Bug tree-optimization/109885] gcc does not generate movmskps and testps instructions (clang does)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109885 --- Comment #4 from Hongtao Liu --- int sum() { int ret = 0; for (int i=0; i<8; ++i) ret +=(0==v[i]); return ret; } int sum2() { int ret = 0; auto m = v==0; for (int i=0; i<8; ++i) ret += m[i]; return ret; } For sum, gcc tries to reduce for an {0/1, 0/1, ...} vector, for sum2, it tries to reduce {0/-1,0/-1,...} vector. But LLVM tries to reduce {0/1, 0/1, ... } vector for both sum and sum2. Not sure which is correct?
[Bug tree-optimization/109885] gcc does not generate movmskps and testps instructions (clang does)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109885 Andrew Pinski changed: What|Removed |Added Component|target |tree-optimization CC||pinskia at gcc dot gnu.org Blocks||53947 --- Comment #3 from Andrew Pinski --- What is even funnier on the LLVM side is if we have: ``` void f(unsigned int * __restrict a, unsigned int * __restrict b) { unsigned int t = 0; t += (a[0] == b[0]); t += (a[1] == b[1])<<1; t += (a[2] == b[2])<<2; t += (a[3] == b[3])<<3; *a = t; } ``` LLVM can produce movmskps for x86_64 but then does do a similar trick that it did for the sum for aarch64. Note GCC does not handle reductions that well for SLP either. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations