https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94077
Kewen Lin <linkw at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Last reconfirmed| |2020-08-12 Ever confirmed|0 |1 CC| |linkw at gcc dot gnu.org Status|UNCONFIRMED |ASSIGNED --- Comment #1 from Kewen Lin <linkw at gcc dot gnu.org> --- This issue only exists on gcc8 and gcc9, it's gone with gcc10 and trunk. The main difference is listed below: with gcc8/gcc9, the cost modeling says it's not profitable because of high cost realign vector load/store for vectorization body, that is: gcc/testsuite/gcc.dg/gomp/pr82374.c:27:3: note: Cost model analysis: Vector inside of loop cost: 32 Vector prologue cost: 6 Vector epilogue cost: 0 Scalar iteration cost: 4 Scalar outside cost: 0 Vector outside cost: 6 prologue iterations: 0 epilogue iterations: 0 gcc/testsuite/gcc.dg/gomp/pr82374.c:27:3: note: cost model: the vector iteration cost = 32 divided by the scalar iteration cost = 4 is greater or equal to the vectorization factor = 4. gcc/testsuite/gcc.dg/gomp/pr82374.c:27:3: note: not vectorized: vectorization not profitable. gcc/testsuite/gcc.dg/gomp/pr82374.c:27:3: note: not vectorized: vector version will never be profitable. While with gcc10 and trunk, the information looks like: gcc/testsuite/gcc.dg/gomp/pr82374.c:27:3: note: Cost model analysis: Vector inside of loop cost: 6 Vector prologue cost: 0 Vector epilogue cost: 0 Scalar iteration cost: 6 Scalar outside cost: 0 Vector outside cost: 0 prologue iterations: 0 epilogue iterations: 0 Calculated minimum iters for profitability: 0 gcc/testsuite/gcc.dg/gomp/pr82374.c:27:3: note: Runtime profitability threshold = 4 gcc/testsuite/gcc.dg/gomp/pr82374.c:27:3: note: Static estimate profitability threshold = 4 By tracing back, I noticed the difference comes from: gcc8/gcc9 can't force alignment of ref: a[i_12] gcc10/trunk: force alignment of a[i_12] I guess it's not a good idea to backport some patch to get the alignment forced (probably risky?), instead I think we can append an additional option -mefficient-unaligned-vsx together with -mvsx to ensure we can use unaligned vector load/store, or set the target requirement into powerpc_vsx_ok && vect_hw_misalign, both meet the original testing purpose. Hi @Jakub, what do you think of this?