https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80834
Li Jia He <helijia at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |helijia at gcc dot gnu.org --- Comment #4 from Li Jia He <helijia at gcc dot gnu.org> --- This question may not be as complicated as described. May only have a more important relationship with the setting of the vect-cost-model value (rs6000_builtin_vectorization_cost). And it has been vectorized on the current trunk(subversion id 274560). If we use the code that mike said(subversion id 248266), and compile option is ``` -mcpu=power9 -O3 -ffast-math -fdump-tree-vect-details-all -fdump-tree-slp-details-all ``` We can see the following analysis of vect-cost-model ``` m_amatvec.c:114:5: note: density 96%, cost 87 exceeds threshold, penalizing loop body cost by 10%m_amatvec.c:114:5: note: Cost model analysis: Vector inside of loop cost: 92 Vector prologue cost: 5 Vector epilogue cost: 36 Scalar iteration cost: 36 Scalar outside cost: 1 Vector outside cost: 41 prologue iterations: 0 epilogue iterations: 1 m_amatvec.c:114:5: note: cost model: the vector iteration cost = 92 divided by the scalar iteration cost = 36 is greater or equal to the vectorization factor = 2. m_amatvec.c:114:5: note: not vectorized: vectorization not profitable. m_amatvec.c:114:5: note: not vectorized: vector version will never be profitable. ``` We can see that the value of ‘Vector inside of loop cost’ is 92, however (92 / 36 = 2) >= 2, which causes vect-cost-model to think that vector version will never be profitable. If we use the current trunk code(subversion id 274560), and compile option is ``` -mcpu=power9 -O3 -ffast-math -fdump-tree-vect-details-all -fdump-tree-slp-details-all ``` We can see the following analysis of vect-cost-model ``` m_amatvec.c:114:5: note: Cost model analysis: Vector inside of loop cost: 60 Vector prologue cost: 5 Vector epilogue cost: 36 Scalar iteration cost: 36 Scalar outside cost: 1 Vector outside cost: 41 prologue iterations: 0 epilogue iterations: 1 Calculated minimum iters for profitability: 2 m_amatvec.c:114:5: note: Runtime profitability threshold = 2 m_amatvec.c:114:5: note: Static estimate profitability threshold = 2 ``` At this point, we can see that the value of 'Vector inside of loop cost' is 60. At this time (60 / 36 = 1) < 2, we think that vectorization can be profitable at this time. ‘Vector inside of loop cost’ value change consists of 2 parts (1) The value of unaligned_store is reduced by ((3-1)*12)=24. (2) rs6000_density_test value is reduced by 8. The change in the unaligned_store partial value fixed by the following patch. ``` commit 01cabe21e4ecae1e9c53fe12d7c0aa654143a3d2 Author: pthaugen <pthaugen@138bc75d-0d04-0410-961f-82ee72b054a4> Date: Fri Oct 13 16:05:53 2017 +0000 * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): Remove TARGET_P9_VECTOR code for unaligned_load case. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@253731 138bc75d-0d04-0410-961f-82ee72b054a4 diff --git a/gcc/ChangeLog b/gcc/ChangeLog index fefac6e0c95..00be94fe349 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,8 @@ +2017-10-13 Pat Haugen <pthau...@us.ibm.com> + + * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): Remove + TARGET_P9_VECTOR code for unaligned_load case. + 2017-10-13 Jan Hubicka <hubi...@ucw.cz> * cfghooks.c (verify_flow_info): Check that edge probabilities are diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index e6e254ac041..b08cd316e68 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -5419,9 +5419,6 @@ rs6000_builtin_vectorization_cost (enum vect_cost_for_stmt type_of_cost, return 3; case unaligned_load: - if (TARGET_P9_VECTOR) - return 3; - if (TARGET_EFFICIENT_UNALIGNED_VSX) return 1; ``` The analysis of the changes in the rs6000_density_test part of the data is as follows: As the code below, the density penalty fixup **depends on** the vec_cost. ``` if (density_pct > DENSITY_PCT_THRESHOLD && vec_cost + not_vec_cost > DENSITY_SIZE_THRESHOLD) { data->cost[vect_body] = vec_cost * (100 + DENSITY_PENALTY) / 100; if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, "density %d%%, cost %d exceeds threshold, penalizing " "loop body cost by %d%%", density_pct, vec_cost + not_vec_cost, DENSITY_PENALTY); } ``` With the commit 253731, the vec_cost is reduced by 24 as you mentioned, the `vec_cost + not_vec_cost` is less than DENSITY_SIZE_THRESHOLD, so it's fine. (btw, not_vec_cost can be calculated as 3 from the previous dump.) By the way, if we use this option -fvect-cost-model=unlimited, with the ‘unlimited’ model the vectorized code-path is assumed to be profitable while with the ‘dynamic’ model a runtime check guards the vectorized code-path to enable it only for iteration counts that will likely execute faster than when executing the original scalar loop. Therefore, this issue has been resolved on the trunk.