[Bug target/80834] PowerPC gcc -mcpu=power9 seems to turn off vectorization that -mcpu=power8 enables

helijia at gcc dot gnu.org Sun, 18 Aug 2019 19:27:35 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80834


Li Jia He <helijia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |helijia at gcc dot gnu.org

--- Comment #4 from Li Jia He <helijia at gcc dot gnu.org> ---
This question may not be as complicated as described.  May only have a more
important relationship with the setting of the vect-cost-model value
(rs6000_builtin_vectorization_cost).  And it has been vectorized on the current
trunk(subversion id 274560).

If we use the code that mike said(subversion id 248266), and compile option is 
```
-mcpu=power9 -O3 -ffast-math -fdump-tree-vect-details-all
-fdump-tree-slp-details-all
```
We can see the following analysis of vect-cost-model
```
m_amatvec.c:114:5: note: density 96%, cost 87 exceeds threshold, penalizing
loop body cost by 10%m_amatvec.c:114:5: note: Cost model analysis:
  Vector inside of loop cost: 92
  Vector prologue cost: 5
  Vector epilogue cost: 36
  Scalar iteration cost: 36
  Scalar outside cost: 1
  Vector outside cost: 41
  prologue iterations: 0
  epilogue iterations: 1
m_amatvec.c:114:5: note: cost model: the vector iteration cost = 92 divided by
the scalar iteration cost = 36 is greater or equal to the vectorization factor
= 2.
m_amatvec.c:114:5: note: not vectorized: vectorization not profitable.
m_amatvec.c:114:5: note: not vectorized: vector version will never be
profitable.
```
We can see that the value of ‘Vector inside of loop cost’ is 92, however (92 /
36 = 2) >= 2, which causes vect-cost-model to think that vector version will
never be profitable.

If we use the current trunk code(subversion id 274560), and compile option is 
```
-mcpu=power9 -O3 -ffast-math -fdump-tree-vect-details-all
-fdump-tree-slp-details-all
```
We can see the following analysis of vect-cost-model
```
m_amatvec.c:114:5: note:  Cost model analysis:
  Vector inside of loop cost: 60
  Vector prologue cost: 5
  Vector epilogue cost: 36
  Scalar iteration cost: 36
  Scalar outside cost: 1
  Vector outside cost: 41
  prologue iterations: 0
  epilogue iterations: 1
  Calculated minimum iters for profitability: 2
m_amatvec.c:114:5: note:    Runtime profitability threshold = 2
m_amatvec.c:114:5: note:    Static estimate profitability threshold = 2
```
At this point, we can see that the value of 'Vector inside of loop cost' is 60.
At this time (60 / 36 = 1) < 2, we think that vectorization can be profitable
at this time.

‘Vector inside of loop cost’ value change consists of 2 parts
  (1) The value of unaligned_store is reduced by ((3-1)*12)=24.
  (2) rs6000_density_test value is reduced by 8.

The change in the unaligned_store partial value fixed by the following patch.
```
commit 01cabe21e4ecae1e9c53fe12d7c0aa654143a3d2
Author: pthaugen <pthaugen@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Fri Oct 13 16:05:53 2017 +0000

            * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost):
Remove
            TARGET_P9_VECTOR code for unaligned_load case.

    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@253731
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index fefac6e0c95..00be94fe349 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2017-10-13  Pat Haugen  <pthau...@us.ibm.com>
+
+       * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): Remove
+       TARGET_P9_VECTOR code for unaligned_load case.
+
 2017-10-13  Jan Hubicka  <hubi...@ucw.cz>

        * cfghooks.c (verify_flow_info): Check that edge probabilities are
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index e6e254ac041..b08cd316e68 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -5419,9 +5419,6 @@ rs6000_builtin_vectorization_cost (enum
vect_cost_for_stmt type_of_cost,
         return 3;

       case unaligned_load:
-       if (TARGET_P9_VECTOR)
-         return 3;
-
        if (TARGET_EFFICIENT_UNALIGNED_VSX)
          return 1;

```
The analysis of the changes in the rs6000_density_test part of the data is as
follows:
As the code below, the density penalty fixup **depends on** the vec_cost.
```
  if (density_pct > DENSITY_PCT_THRESHOLD
      && vec_cost + not_vec_cost > DENSITY_SIZE_THRESHOLD)
    {
      data->cost[vect_body] = vec_cost * (100 + DENSITY_PENALTY) / 100;
      if (dump_enabled_p ())
        dump_printf_loc (MSG_NOTE, vect_location,
                         "density %d%%, cost %d exceeds threshold, penalizing "
                         "loop body cost by %d%%", density_pct,
                         vec_cost + not_vec_cost, DENSITY_PENALTY);
    }
```
With the commit 253731, the vec_cost is reduced by 24 as you mentioned, the
`vec_cost + not_vec_cost` is less than DENSITY_SIZE_THRESHOLD, so it's fine.
(btw, not_vec_cost can be calculated as 3 from the previous dump.)

By the way, if we use this option -fvect-cost-model=unlimited, with the
‘unlimited’ model the vectorized code-path is assumed to be profitable while
with the ‘dynamic’ model a runtime check guards the vectorized code-path to
enable it only for iteration counts that will likely execute faster than when
executing the original scalar loop.

Therefore, this issue has been resolved on the trunk.

[Bug target/80834] PowerPC gcc -mcpu=power9 seems to turn off vectorization that -mcpu=power8 enables

Reply via email to