https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102473
--- Comment #2 from Hongtao.liu <crazylht at gmail dot com> --- (In reply to Richard Biener from comment #1) > Looks like at least on Zen movs[hl]dup is on the integer domain so we'l see > a domain crossing penalty here(?). But since this is a generic arch/tuning > regression the SSE2 code path should be what matters - on the committed > testcase I see > > foo: > .LFB572: > .cfi_startproc > pxor %xmm0, %xmm0 > addss (%rdi), %xmm0 > addss 4(%rdi), %xmm0 > addss 8(%rdi), %xmm0 > addss 12(%rdi), %xmm0 > ret > > where it seems that the vectorizer doesn't pick up the reduction pattern. > Guess you're use O3, -ffast-math is needed for v4sf reduction https://godbolt.org/z/sjf4Pncna And original code also have movhlps. BTW: i can't reproduce the regression on CLX/coffelake for one copy run. options are below 521.wrf_r: "gfortran -m64" (in FC) "gcc -m64" (in CC) "gfortran -m64" (in LD) "-fconvert=big-endian -std=legacy -fno-inline-arg-packing" (in FPORTABILITY) "-mtune=generic -Ofast -mfpmath=sse -fno-associative-math" (in OPTIMIZE) "-fno-stack-arrays" (in EXTRA_OPTIMIZE) "-Wl,-z,muldefs" (in EXTRA_LDFLAGS) > /home/rguenther/src/gcc2/gcc/testsuite/gcc.target/i386/sse2-pr101059.c:20:21: > note: vect_is_simple_use: vectype vector(4) float > /home/rguenther/src/gcc2/gcc/testsuite/gcc.target/i386/sse2-pr101059.c:20:21: > missed: reduc op not supported by target.