https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97770
Bug ID: 97770 Summary: Missing vectorization for vpopcnt Product: gcc Version: 11.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: crazylht at gmail dot com Target Milestone: --- cat test.c --- void foo(int* __restrict dest, int* src, int n) { for (int i = 0; i != 8; i++) dest[i] = __builtin_popcount (src[i]); } --- gcc -O3 -march=icelake-server -S -fopt-info-all Inlined 0 calls, eliminated 0 functions test.c:4:3: missed: couldn't vectorize loop test.c:5:15: missed: not vectorized: relevant stmt not supported: _7 = __builtin_popcount (_5); test.c:2:1: note: vectorized 0 loops in function. test.c:4:3: note: ***** Analysis failed with vector mode VOID test.c:4:3: note: ***** Analysis failed with vector mode V8SI test.c:4:3: note: ***** Skipping vector mode V32QI, which would repeat the analysis for V8SI test.c:6:1: note: ***** Analysis failed with vector mode VOID This loop could be vectorized by ICC and Clang: foo(int*, int*, int): vpopcntd ymm0, YMMWORD PTR [rsi] #5.15 vmovdqu YMMWORD PTR [rdi], ymm0 #5.5 vzeroupper #6.1 ret