https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94077

Kewen Lin <linkw at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2020-08-12
     Ever confirmed|0                           |1
                 CC|                            |linkw at gcc dot gnu.org
             Status|UNCONFIRMED                 |ASSIGNED

--- Comment #1 from Kewen Lin <linkw at gcc dot gnu.org> ---
This issue only exists on gcc8 and gcc9, it's gone with gcc10 and trunk.

The main difference is listed below:

with gcc8/gcc9, the cost modeling says it's not profitable because of high cost
realign vector load/store for vectorization body, that is:

gcc/testsuite/gcc.dg/gomp/pr82374.c:27:3: note: Cost model analysis:
  Vector inside of loop cost: 32
  Vector prologue cost: 6
  Vector epilogue cost: 0
  Scalar iteration cost: 4
  Scalar outside cost: 0
  Vector outside cost: 6
  prologue iterations: 0
  epilogue iterations: 0
gcc/testsuite/gcc.dg/gomp/pr82374.c:27:3: note: cost model: the vector
iteration cost = 32 divided by the scalar iteration cost = 4 is greater or
equal to the vectorization factor = 4.
gcc/testsuite/gcc.dg/gomp/pr82374.c:27:3: note: not vectorized: vectorization
not profitable.
gcc/testsuite/gcc.dg/gomp/pr82374.c:27:3: note: not vectorized: vector version
will never be profitable.


While with gcc10 and trunk, the information looks like:

gcc/testsuite/gcc.dg/gomp/pr82374.c:27:3: note:  Cost model analysis:
  Vector inside of loop cost: 6
  Vector prologue cost: 0
  Vector epilogue cost: 0
  Scalar iteration cost: 6
  Scalar outside cost: 0
  Vector outside cost: 0
  prologue iterations: 0
  epilogue iterations: 0
  Calculated minimum iters for profitability: 0
gcc/testsuite/gcc.dg/gomp/pr82374.c:27:3: note:    Runtime profitability
threshold = 4
gcc/testsuite/gcc.dg/gomp/pr82374.c:27:3: note:    Static estimate
profitability threshold = 4

By tracing back, I noticed the difference comes from:

gcc8/gcc9
  can't force alignment of ref: a[i_12]

gcc10/trunk:
  force alignment of a[i_12]

I guess it's not a good idea to backport some patch to get the alignment forced
(probably risky?), instead I think we can append an additional option
-mefficient-unaligned-vsx together with -mvsx to ensure we can use unaligned
vector load/store, or set the target requirement into powerpc_vsx_ok &&
vect_hw_misalign, both meet the original testing purpose.

Hi @Jakub, what do you think of this?

Reply via email to