Hi Richi,

on 2021/5/19 下午4:15, Richard Biener wrote:
> On Wed, May 19, 2021 at 8:20 AM Kewen.Lin <li...@linux.ibm.com> wrote:
>>
>> Hi,
>>
>> This patch is to replace the current hardcoded weight factor 50
>> for those statements in an inner loop relative to the loop being
>> vectorized with a specific parameter vect-inner-loop-weight-factor.
>>
>> The motivation behind this change is: if targets want to have one
>> unique function to gather some information in each add_stmt_cost
>> call, no matter that it's put before or after the cost tweaking
>> part for inner loop, it may have the need to adjust (expand or
>> shrink) the gathered data as the factor.  Now the factor is
>> hardcoded, it's not easily maintained.  Since it's possible that
>> targets have their own decisions on this costing like the others,
>> I used parameter instead of one unique macro here.
>>
>> Testing is ongoing, is it ok for trunk if everything goes well?
> 
> Certainly an improvement.  I suppose we might want to put
> the factor into vinfo->inner_loop_cost_factor.  That way
> we could adjust it easily in common code in the vectorizer
> when we for example have (non-guessed) profile data.
> 
> "weight_factor" is kind-of double-speak and I'm missing 'cost' ...
> so, bike-shedding to vect_inner_loop_cost_factor?
> 
> Just suggestions - as said, the patch is an improvement already.
> 

Thanks for your nice suggestions!  I've updated the patch accordingly
and attached it.  Does it look better to you?

btw, the testing on the previous patch passed, new round testing was
just kicked off.

BR,
Kewen
------
gcc/ChangeLog:

        * doc/invoke.texi (vect-inner-loop-cost-factor): Document new
        parameter.
        * params.opt (vect-inner-loop-cost-factor): New.
        * targhooks.c (default_add_stmt_cost): Replace hardcoded factor
        50 with LOOP_VINFO_INNER_LOOP_COST_FACTOR, include head file
        tree-vectorizer.h and its required ones.
        * config/aarch64/aarch64.c (aarch64_add_stmt_cost): Replace
        hardcoded factor 50 with LOOP_VINFO_INNER_LOOP_COST_FACTOR.
        * config/arm/arm.c (arm_add_stmt_cost): Likewise.
        * config/i386/i386.c (ix86_add_stmt_cost): Likewise.
        * config/rs6000/rs6000.c (rs6000_add_stmt_cost): Likewise.
        * tree-vect-loop.c (vect_compute_single_scalar_iteration_cost):
        Likewise.
        (_loop_vec_info::_loop_vec_info): Init inner_loop_cost_factor.
        * tree-vectorizer.h (_loop_vec_info): Add inner_loop_cost_factor.
        (LOOP_VINFO_INNER_LOOP_COST_FACTOR): New macro.

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 12625a4bee3..be883b61059 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -15437,7 +15437,10 @@ aarch64_add_stmt_cost (class vec_info *vinfo, void 
*data, int count,
         arbitrary and could potentially be improved with analysis.  */
       if (where == vect_body && stmt_info
          && stmt_in_inner_loop_p (vinfo, stmt_info))
-       count *= 50; /*  FIXME  */
+       {
+         gcc_assert (loop_vinfo);
+         count *= LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo); /*  FIXME  */
+       }
 
       retval = (unsigned) (count * stmt_cost);
       costs->region[where] += retval;
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 340f7c95d76..223faa49b11 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -12201,7 +12201,11 @@ arm_add_stmt_cost (vec_info *vinfo, void *data, int 
count,
         arbitrary and could potentially be improved with analysis.  */
       if (where == vect_body && stmt_info
          && stmt_in_inner_loop_p (vinfo, stmt_info))
-       count *= 50;  /* FIXME.  */
+       {
+         loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (vinfo);
+         gcc_assert (loop_vinfo);
+         count *= LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo); /* FIXME.  */
+       }
 
       retval = (unsigned) (count * stmt_cost);
       cost[where] += retval;
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 7c41302c75b..43b1fb0de0b 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -22396,7 +22396,11 @@ ix86_add_stmt_cost (class vec_info *vinfo, void *data, 
int count,
      arbitrary and could potentially be improved with analysis.  */
   if (where == vect_body && stmt_info
       && stmt_in_inner_loop_p (vinfo, stmt_info))
-    count *= 50;  /* FIXME.  */
+    {
+      loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (vinfo);
+      gcc_assert (loop_vinfo);
+      count *= LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo); /* FIXME.  */
+    }
 
   retval = (unsigned) (count * stmt_cost);
 
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 48b8efd732b..859da8bd0ed 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -5348,7 +5348,11 @@ rs6000_add_stmt_cost (class vec_info *vinfo, void *data, 
int count,
         arbitrary and could potentially be improved with analysis.  */
       if (where == vect_body && stmt_info
          && stmt_in_inner_loop_p (vinfo, stmt_info))
-       count *= 50;  /* FIXME.  */
+       {
+         loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (vinfo);
+         gcc_assert (loop_vinfo);
+         count *= LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo); /* FIXME.  */
+       }
 
       retval = (unsigned) (count * stmt_cost);
       cost_data->cost[where] += retval;
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 8b70fdf580d..2234801cab4 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -14221,6 +14221,10 @@ code to iterate.  2 allows partial vector loads and 
stores in all loops.
 The parameter only has an effect on targets that support partial
 vector loads and stores.
 
+@item vect-inner-loop-cost-factor
+The factor which loop vectorizer uses to over weight those statements in
+an inner loop relative to the loop being vectorized.
+
 @item avoid-fma-max-bits
 Maximum number of bits for which we avoid creating FMAs.
 
diff --git a/gcc/params.opt b/gcc/params.opt
index 7c7aa78992a..a35c2abe359 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -1089,4 +1089,8 @@ Bound on number of runtime checks inserted by the 
vectorizer's loop versioning f
 Common Joined UInteger Var(param_vect_partial_vector_usage) Init(2) 
IntegerRange(0, 2) Param Optimization
 Controls how loop vectorizer uses partial vectors.  0 means never, 1 means 
only for loops whose need to iterate can be removed, 2 means for all loops.  
The default value is 2.
 
+-param=vect-inner-loop-cost-factor=
+Common Joined UInteger Var(param_vect_inner_loop_cost_factor) Init(50) 
IntegerRange(1, 999999) Param Optimization
+Indicates the factor which loop vectorizer uses to over weight those 
statements in an inner loop relative to the loop being vectorized.  The default 
value is 50.
+
 ; This comment is to ensure we retain the blank line above.
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 952fad422eb..b595b7838af 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -90,6 +90,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "attribs.h"
 #include "asan.h"
 #include "emit-rtl.h"
+#include "gimple.h"
+#include "cfgloop.h"
+#include "tree-vectorizer.h"
 
 bool
 default_legitimate_address_p (machine_mode mode ATTRIBUTE_UNUSED,
@@ -1400,7 +1403,11 @@ default_add_stmt_cost (class vec_info *vinfo, void 
*data, int count,
       arbitrary and could potentially be improved with analysis.  */
   if (where == vect_body && stmt_info
       && stmt_in_inner_loop_p (vinfo, stmt_info))
-    count *= 50;  /* FIXME.  */
+    {
+      loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (vinfo);
+      gcc_assert (loop_vinfo);
+      count *= LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo);
+    }
 
   retval = (unsigned) (count * stmt_cost);
   cost[where] += retval;
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 2aba503fef7..106c91964b5 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -836,6 +836,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, 
vec_info_shared *shared)
     single_scalar_iteration_cost (0),
     vec_outside_cost (0),
     vec_inside_cost (0),
+    inner_loop_cost_factor (param_vect_inner_loop_cost_factor),
     vectorizable (false),
     can_use_partial_vectors_p (param_vect_partial_vector_usage != 0),
     using_partial_vectors_p (false),
@@ -1237,7 +1238,7 @@ vect_compute_single_scalar_iteration_cost (loop_vec_info 
loop_vinfo)
   /* FORNOW.  */
   innerloop_iters = 1;
   if (loop->inner)
-    innerloop_iters = 50; /* FIXME */
+    innerloop_iters = LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo);
 
   for (i = 0; i < nbbs; i++)
     {
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 9861d9e8810..b8ba63cc8e2 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -689,6 +689,10 @@ public:
   /* The cost of the vector loop body.  */
   int vec_inside_cost;
 
+  /* The factor used to over weight those statements in an inner loop
+     relative to the loop being vectorized.  */
+  unsigned int inner_loop_cost_factor;
+
   /* Is the loop vectorizable? */
   bool vectorizable;
 
@@ -807,6 +811,7 @@ public:
 #define LOOP_VINFO_SINGLE_SCALAR_ITERATION_COST(L) 
(L)->single_scalar_iteration_cost
 #define LOOP_VINFO_ORIG_LOOP_INFO(L)       (L)->orig_loop_info
 #define LOOP_VINFO_SIMD_IF_COND(L)         (L)->simd_if_cond
+#define LOOP_VINFO_INNER_LOOP_COST_FACTOR(L) (L)->inner_loop_cost_factor
 
 #define LOOP_VINFO_FULLY_MASKED_P(L)           \
   (LOOP_VINFO_USING_PARTIAL_VECTORS_P (L)      \

Reply via email to