https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80834

Li Jia He <helijia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |helijia at gcc dot gnu.org

--- Comment #4 from Li Jia He <helijia at gcc dot gnu.org> ---
This question may not be as complicated as described.  May only have a more
important relationship with the setting of the vect-cost-model value
(rs6000_builtin_vectorization_cost).  And it has been vectorized on the current
trunk(subversion id 274560).

If we use the code that mike said(subversion id 248266), and compile option is 
```
-mcpu=power9 -O3 -ffast-math -fdump-tree-vect-details-all
-fdump-tree-slp-details-all
```
We can see the following analysis of vect-cost-model
```
m_amatvec.c:114:5: note: density 96%, cost 87 exceeds threshold, penalizing
loop body cost by 10%m_amatvec.c:114:5: note: Cost model analysis:
  Vector inside of loop cost: 92
  Vector prologue cost: 5
  Vector epilogue cost: 36
  Scalar iteration cost: 36
  Scalar outside cost: 1
  Vector outside cost: 41
  prologue iterations: 0
  epilogue iterations: 1
m_amatvec.c:114:5: note: cost model: the vector iteration cost = 92 divided by
the scalar iteration cost = 36 is greater or equal to the vectorization factor
= 2.
m_amatvec.c:114:5: note: not vectorized: vectorization not profitable.
m_amatvec.c:114:5: note: not vectorized: vector version will never be
profitable.
```
We can see that the value of ‘Vector inside of loop cost’ is 92, however (92 /
36 = 2) >= 2, which causes vect-cost-model to think that vector version will
never be profitable.

If we use the current trunk code(subversion id 274560), and compile option is 
```
-mcpu=power9 -O3 -ffast-math -fdump-tree-vect-details-all
-fdump-tree-slp-details-all
```
We can see the following analysis of vect-cost-model
```
m_amatvec.c:114:5: note:  Cost model analysis:
  Vector inside of loop cost: 60
  Vector prologue cost: 5
  Vector epilogue cost: 36
  Scalar iteration cost: 36
  Scalar outside cost: 1
  Vector outside cost: 41
  prologue iterations: 0
  epilogue iterations: 1
  Calculated minimum iters for profitability: 2
m_amatvec.c:114:5: note:    Runtime profitability threshold = 2
m_amatvec.c:114:5: note:    Static estimate profitability threshold = 2
```
At this point, we can see that the value of 'Vector inside of loop cost' is 60.
At this time (60 / 36 = 1) < 2, we think that vectorization can be profitable
at this time.

‘Vector inside of loop cost’ value change consists of 2 parts
  (1) The value of unaligned_store is reduced by ((3-1)*12)=24.
  (2) rs6000_density_test value is reduced by 8.

The change in the unaligned_store partial value fixed by the following patch.
```
commit 01cabe21e4ecae1e9c53fe12d7c0aa654143a3d2
Author: pthaugen <pthaugen@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Fri Oct 13 16:05:53 2017 +0000

            * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost):
Remove
            TARGET_P9_VECTOR code for unaligned_load case.

    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@253731
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index fefac6e0c95..00be94fe349 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2017-10-13  Pat Haugen  <pthau...@us.ibm.com>
+
+       * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): Remove
+       TARGET_P9_VECTOR code for unaligned_load case.
+
 2017-10-13  Jan Hubicka  <hubi...@ucw.cz>

        * cfghooks.c (verify_flow_info): Check that edge probabilities are
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index e6e254ac041..b08cd316e68 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -5419,9 +5419,6 @@ rs6000_builtin_vectorization_cost (enum
vect_cost_for_stmt type_of_cost,
         return 3;

       case unaligned_load:
-       if (TARGET_P9_VECTOR)
-         return 3;
-
        if (TARGET_EFFICIENT_UNALIGNED_VSX)
          return 1;

```
The analysis of the changes in the rs6000_density_test part of the data is as
follows:
As the code below, the density penalty fixup **depends on** the vec_cost.
```
  if (density_pct > DENSITY_PCT_THRESHOLD
      && vec_cost + not_vec_cost > DENSITY_SIZE_THRESHOLD)
    {
      data->cost[vect_body] = vec_cost * (100 + DENSITY_PENALTY) / 100;
      if (dump_enabled_p ())
        dump_printf_loc (MSG_NOTE, vect_location,
                         "density %d%%, cost %d exceeds threshold, penalizing "
                         "loop body cost by %d%%", density_pct,
                         vec_cost + not_vec_cost, DENSITY_PENALTY);
    }
```
With the commit 253731, the vec_cost is reduced by 24 as you mentioned, the
`vec_cost + not_vec_cost` is less than DENSITY_SIZE_THRESHOLD, so it's fine.
(btw, not_vec_cost can be calculated as 3 from the previous dump.)

By the way, if we use this option -fvect-cost-model=unlimited, with the
‘unlimited’ model the vectorized code-path is assumed to be profitable while
with the ‘dynamic’ model a runtime check guards the vectorized code-path to
enable it only for iteration counts that will likely execute faster than when
executing the original scalar loop.

Therefore, this issue has been resolved on the trunk.

Reply via email to