On Fri, 8 Jun 2012, William J. Schmidt wrote:

> This patch adds a heuristic to the vectorizer when estimating the
> minimum profitable number of iterations.  The heuristic is
> target-dependent, and is currently disabled for all targets except
> PowerPC.  However, the intent is to make it general enough to be useful
> for other targets that want to opt in.
> 
> A previous patch addressed some PowerPC SPEC degradations by modifying
> the vector cost model values for vec_perm and vec_promote_demote.  The
> values were set a little higher than their natural values because the
> natural values were not sufficient to prevent a poor vectorization
> choice.  However, this is not the right long-term solution, since it can
> unnecessarily constrain other vectorization choices involving permute
> instructions.
> 
> Analysis of the badly vectorized loop (in sphinx3) showed that the
> problem was overcommitment of vector resources -- too many vector
> instructions issued without enough non-vector instructions available to
> cover the delays.  The vector cost model assumes that instructions
> always have a constant cost, and doesn't have a way of judging this kind
> of "density" of vector instructions.
> 
> The present patch adds a heuristic to recognize when a loop is likely to
> overcommit resources, and adds a small penalty to the inside-loop cost
> to account for the expected stalls.  The heuristic is parameterized with
> three target-specific values:
> 
>  * Density threshold: The heuristic will apply only when the
>    percentage of inside-loop cost attributable to vectorized
>    instructions exceeds this value.
> 
>  * Size threshold: The heuristic will apply only when the
>    inside-loop cost exceeds this value.
> 
>  * Penalty: The inside-loop cost will be increased by this
>    percentage value when the heuristic applies.
> 
> Thus only reasonably large loop bodies that are mostly vectorized
> instructions will be affected.
> 
> By applying only a small percentage bump to the inside-loop cost, the
> heuristic will only turn off vectorization for loops that were
> considered "barely profitable" to begin with (such as the sphinx3 loop).
> So the heuristic is quite conservative and should not affect the vast
> majority of vectorization decisions.
> 
> Together with the new heuristic, this patch reduces the vec_perm and
> vec_promote_demote costs for PowerPC to their natural values.
> 
> I've regstrapped this with no regressions on powerpc64-unknown-linux-gnu
> and verified that no performance regressions occur on SPEC cpu2006.  Is
> this ok for trunk?

Hmm.  I don't like this patch or its general idea too much.  Instead
I'd like us to move more of the cost model detail to the target, giving
it a chance to look at the whole loop before deciding on a cost.  ISTR
posting the overall idea at some point, but let me repeat it here instead
of trying to find that e-mail.

The basic interface of the cost model should be, in targetm.vectorize

  /* Tell the target to start cost analysis of a loop or a basic-block
     (if the loop argument is NULL).  Returns an opaque pointer to
     target-private data.  */
  void *init_cost (struct loop *loop);

  /* Add cost for N vectorized-stmt-kind statements in vector_mode.  */
  void add_stmt_cost (void *data, unsigned n,
                      vectorized-stmt-kind,
                      enum machine_mode vector_mode);

  /* Tell the target to compute and return the cost of the accumulated
     statements and free any target-private data.  */
  unsigned finish_cost (void *data);

with eventually slightly different signatures for add_stmt_cost
(like pass in the original scalar stmt?).

It allows the target, at finish_cost time, to evaluate things like
register pressure and resource utilization.

Thanks,
Richard.

> Thanks,
> Bill
> 
> 
> 2012-06-08  Bill Schmidt  <wschm...@linux.ibm.com>
> 
>       * doc/tm.texi.in: Add vectorization density hooks.
>       * doc/tm.texi: Regenerate.
>       * targhooks.c (default_density_pct_threshold): New.
>       (default_density_size_threshold): New.
>       (default_density_penalty): New.
>       * targhooks.h: New decls for new targhooks.c functions.
>       * target.def (density_pct_threshold): New DEF_HOOK.
>       (density_size_threshold): Likewise.
>       (density_penalty): Likewise.
>       * tree-vect-loop.c (accum_stmt_cost): New.
>       (vect_estimate_min_profitable_iters): Perform density test.
>       * config/rs6000/rs6000.c (TARGET_VECTORIZE_DENSITY_PCT_THRESHOLD):
>       New macro definition.
>       (TARGET_VECTORIZE_DENSITY_SIZE_THRESHOLD): Likewise.
>       (TARGET_VECTORIZE_DENSITY_PENALTY): Likewise.
>       (rs6000_builtin_vectorization_cost): Reduce costs of vec_perm and
>       vec_promote_demote to correct values.
>       (rs6000_density_pct_threshold): New.
>       (rs6000_density_size_threshold): New.
>       (rs6000_density_penalty): New.
> 
> 
> Index: gcc/doc/tm.texi
> ===================================================================
> --- gcc/doc/tm.texi   (revision 188305)
> +++ gcc/doc/tm.texi   (working copy)
> @@ -5798,6 +5798,27 @@ The default is @code{NULL_TREE} which means to not
>  loads.
>  @end deftypefn
>  
> +@deftypefn {Target Hook} int TARGET_VECTORIZE_DENSITY_PCT_THRESHOLD (void)
> +This hook should return the maximum density, expressed in percent, for
> +which autovectorization of loops with large bodies should be constrained.
> +See also @code{TARGET_VECTORIZE_DENSITY_SIZE_THRESHOLD}.  The default
> +is to return 100, which disables the density test.
> +@end deftypefn
> +
> +@deftypefn {Target Hook} int TARGET_VECTORIZE_DENSITY_SIZE_THRESHOLD (void)
> +This hook should return the minimum estimated size of a vectorized
> +loop body for which the density test should apply.  See also
> +@code{TARGET_VECTORIZE_DENSITY_PCT_THRESHOLD}.  The default is set
> +to the unreasonable value of 1000000, which effectively disables 
> +the density test.
> +@end deftypefn
> +
> +@deftypefn {Target Hook} int TARGET_VECTORIZE_DENSITY_PENALTY (void)
> +This hook should return the penalty, expressed in percent, to be applied
> +to the inside-of-loop vectorization costs for a loop failing the density
> +test.  The default is 10.
> +@end deftypefn
> +
>  @node Anchored Addresses
>  @section Anchored Addresses
>  @cindex anchored addresses
> Index: gcc/doc/tm.texi.in
> ===================================================================
> --- gcc/doc/tm.texi.in        (revision 188305)
> +++ gcc/doc/tm.texi.in        (working copy)
> @@ -5730,6 +5730,27 @@ The default is @code{NULL_TREE} which means to not
>  loads.
>  @end deftypefn
>  
> +@hook TARGET_VECTORIZE_DENSITY_PCT_THRESHOLD
> +This hook should return the maximum density, expressed in percent, for
> +which autovectorization of loops with large bodies should be constrained.
> +See also @code{TARGET_VECTORIZE_DENSITY_SIZE_THRESHOLD}.  The default
> +is to return 100, which disables the density test.
> +@end deftypefn
> +
> +@hook TARGET_VECTORIZE_DENSITY_SIZE_THRESHOLD
> +This hook should return the minimum estimated size of a vectorized
> +loop body for which the density test should apply.  See also
> +@code{TARGET_VECTORIZE_DENSITY_PCT_THRESHOLD}.  The default is set
> +to the unreasonable value of 1000000, which effectively disables 
> +the density test.
> +@end deftypefn
> +
> +@hook TARGET_VECTORIZE_DENSITY_PENALTY
> +This hook should return the penalty, expressed in percent, to be applied
> +to the inside-of-loop vectorization costs for a loop failing the density
> +test.  The default is 10.
> +@end deftypefn
> +
>  @node Anchored Addresses
>  @section Anchored Addresses
>  @cindex anchored addresses
> Index: gcc/targhooks.c
> ===================================================================
> --- gcc/targhooks.c   (revision 188305)
> +++ gcc/targhooks.c   (working copy)
> @@ -990,6 +990,33 @@ default_autovectorize_vector_sizes (void)
>    return 0;
>  }
>  
> +/* By default, the density test for autovectorization is disabled by
> +   setting the minimum percentage to 100.  */
> +
> +int
> +default_density_pct_threshold (void)
> +{
> +  return 100;
> +}
> +
> +/* By default, the density size threshold for autovectorization is
> +   meaningless since the density test is disabled.  An unreasonably
> +   large number is used to further inhibit the density test.  */
> +
> +int
> +default_density_size_threshold (void)
> +{
> +  return 1000000;
> +}
> +
> +/* By default, the density penalty for autovectorization is set to 10%.  */
> +
> +int
> +default_density_penalty (void)
> +{
> +  return 10;
> +}
> +
>  /* Determine whether or not a pointer mode is valid. Assume defaults
>     of ptr_mode or Pmode - can be overridden.  */
>  bool
> Index: gcc/targhooks.h
> ===================================================================
> --- gcc/targhooks.h   (revision 188305)
> +++ gcc/targhooks.h   (working copy)
> @@ -90,6 +90,9 @@ default_builtin_support_vector_misalignment (enum
>                                            int, bool);
>  extern enum machine_mode default_preferred_simd_mode (enum machine_mode 
> mode);
>  extern unsigned int default_autovectorize_vector_sizes (void);
> +extern int default_density_pct_threshold (void);
> +extern int default_density_size_threshold (void);
> +extern int default_density_penalty (void);
>  
>  /* These are here, and not in hooks.[ch], because not all users of
>     hooks.h include tm.h, and thus we don't have CUMULATIVE_ARGS.  */
> Index: gcc/target.def
> ===================================================================
> --- gcc/target.def    (revision 188305)
> +++ gcc/target.def    (working copy)
> @@ -1054,6 +1054,32 @@ DEFHOOK
>   (const_tree mem_vectype, const_tree index_type, int scale),
>   NULL)
>  
> +/* Return the maximum density in percent for loop vectorization.  */
> +DEFHOOK
> +(density_pct_threshold,
> +"",
> +int,
> +(void),
> +default_density_pct_threshold)
> +
> +/* Return the minimum size of a loop iteration for applying the density
> +   test for loop vectorization.  */
> +DEFHOOK
> +(density_size_threshold,
> +"",
> +int,
> +(void),
> +default_density_size_threshold)
> +
> +/* Return the penalty in percent for vectorizing a loop failing the
> +   density test.  */
> +DEFHOOK
> +(density_penalty,
> +"",
> +int,
> +(void),
> +default_density_penalty)
> +
>  HOOK_VECTOR_END (vectorize)
>  
>  #undef HOOK_PREFIX
> Index: gcc/tree-vect-loop.c
> ===================================================================
> --- gcc/tree-vect-loop.c      (revision 188305)
> +++ gcc/tree-vect-loop.c      (working copy)
> @@ -2485,6 +2485,58 @@ vect_get_known_peeling_cost (loop_vec_info loop_vi
>             + peel_guard_costs;
>  }
>  
> +/* Add the inside-loop cost of STMT to either *REL_COST or *IRREL_COST,
> +   depending on whether or not STMT will be vectorized.  For vectorized
> +   statements, the inside-loop cost is as already computed.  For other
> +   statements, assume a cost of one.  */
> +
> +static void
> +accum_stmt_cost (gimple stmt, int *rel_cost, int *irrel_cost)
> +{
> +  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
> +  gimple pattern_stmt = STMT_VINFO_RELATED_STMT (stmt_info);
> +  gimple_seq pattern_def_seq;
> +
> +  /* If the statement is irrelevant, but it has a related pattern
> +     statement that is relevant, process just the related statement.
> +     If the statement is relevant and it has a related pattern
> +     statement that is also relevant, process them both.  */
> +  if (!STMT_VINFO_RELEVANT_P (stmt_info)
> +      && !STMT_VINFO_LIVE_P (stmt_info))
> +    {
> +      if (STMT_VINFO_IN_PATTERN_P (stmt_info)
> +       && pattern_stmt
> +       && (STMT_VINFO_RELEVANT_P (vinfo_for_stmt (pattern_stmt))
> +           || STMT_VINFO_LIVE_P (vinfo_for_stmt (pattern_stmt))))
> +     accum_stmt_cost (pattern_stmt, rel_cost, irrel_cost);
> +      else
> +     (*irrel_cost)++;
> +    }
> +  else if (STMT_VINFO_IN_PATTERN_P (stmt_info)
> +        && pattern_stmt
> +        && (STMT_VINFO_RELEVANT_P (vinfo_for_stmt (pattern_stmt))
> +            || STMT_VINFO_LIVE_P (vinfo_for_stmt (pattern_stmt))))
> +    accum_stmt_cost (pattern_stmt, rel_cost, irrel_cost);
> +
> +  /* If we're looking at a pattern that has additional statements,
> +     count them as well.  */
> +  if (is_pattern_stmt_p (stmt_info)
> +      && (pattern_def_seq = STMT_VINFO_PATTERN_DEF_SEQ (stmt_info)))
> +    {
> +      gimple_stmt_iterator gsi;
> +      for (gsi = gsi_start (pattern_def_seq); !gsi_end_p (gsi); gsi_next 
> (&gsi))
> +     {
> +       gimple pattern_def_stmt = gsi_stmt (gsi);
> +       if (STMT_VINFO_RELEVANT_P (vinfo_for_stmt (pattern_def_stmt))
> +           || STMT_VINFO_LIVE_P (vinfo_for_stmt (pattern_def_stmt)))
> +         accum_stmt_cost (pattern_def_stmt, rel_cost, irrel_cost);
> +     }
> +    }
> +
> +  /* Accumulate the inside-loop cost of this vectorizable statement.  */
> +  *rel_cost += STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info);
> +}
> +
>  /* Function vect_estimate_min_profitable_iters
>  
>     Return the number of iterations required for the vector version of the
> @@ -2743,6 +2795,45 @@ vect_estimate_min_profitable_iters (loop_vec_info
>        vec_inside_cost += SLP_INSTANCE_INSIDE_OF_LOOP_COST (instance);
>      }
>  
> +  /* Test for likely overcommitment of vector hardware resources.  If a
> +     loop iteration is relatively large, and too large a percentage of
> +     instructions in the loop are vectorized, the cost model may not
> +     adequately reflect delays from unavailable vector resources.
> +     Penalize vec_inside_cost for this case, using target-specific
> +     parameters.  */
> +  if (targetm.vectorize.density_pct_threshold () < 100)
> +    {
> +      int rel_cost = 0, irrel_cost = 0;
> +      int density_pct;
> +
> +      for (i = 0; i < nbbs; i++)
> +     {
> +       basic_block bb = bbs[i];
> +       gimple_stmt_iterator gsi;
> +
> +       for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> +         {
> +           gimple stmt = gsi_stmt (gsi);
> +           accum_stmt_cost (stmt, &rel_cost, &irrel_cost);
> +         }
> +     }
> +
> +      density_pct = (rel_cost * 100) / (rel_cost + irrel_cost);
> +
> +      if (density_pct > targetm.vectorize.density_pct_threshold ()
> +       && (rel_cost + irrel_cost
> +           > targetm.vectorize.density_size_threshold ()))
> +     {
> +       int penalty = targetm.vectorize.density_penalty ();
> +       vec_inside_cost = vec_inside_cost * (100 + penalty) / 100;
> +       if (vect_print_dump_info (REPORT_DETAILS))
> +         fprintf (vect_dump,
> +                  "density %d%%, cost %d exceeds threshold"
> +                  ", penalizing inside-loop cost by %d%%.",
> +                  density_pct, rel_cost + irrel_cost, penalty);
> +     }
> +    }
> +
>    /* Calculate number of iterations required to make the vector version
>       profitable, relative to the loop bodies only.  The following condition
>       must hold true:
> Index: gcc/config/rs6000/rs6000.c
> ===================================================================
> --- gcc/config/rs6000/rs6000.c        (revision 188305)
> +++ gcc/config/rs6000/rs6000.c        (working copy)
> @@ -1289,6 +1289,15 @@ static const struct attribute_spec rs6000_attribut
>  #undef TARGET_VECTORIZE_PREFERRED_SIMD_MODE
>  #define TARGET_VECTORIZE_PREFERRED_SIMD_MODE \
>    rs6000_preferred_simd_mode
> +#undef TARGET_VECTORIZE_DENSITY_PCT_THRESHOLD
> +#define TARGET_VECTORIZE_DENSITY_PCT_THRESHOLD \
> +  rs6000_density_pct_threshold
> +#undef TARGET_VECTORIZE_DENSITY_SIZE_THRESHOLD
> +#define TARGET_VECTORIZE_DENSITY_SIZE_THRESHOLD \
> +  rs6000_density_size_threshold
> +#undef TARGET_VECTORIZE_DENSITY_PENALTY
> +#define TARGET_VECTORIZE_DENSITY_PENALTY \
> + rs6000_density_penalty
>  
>  #undef TARGET_INIT_BUILTINS
>  #define TARGET_INIT_BUILTINS rs6000_init_builtins
> @@ -3421,13 +3430,13 @@ rs6000_builtin_vectorization_cost (enum vect_cost_
>  
>        case vec_perm:
>       if (TARGET_VSX)
> -       return 4;
> +       return 3;
>       else
>         return 1;
>  
>        case vec_promote_demote:
>          if (TARGET_VSX)
> -          return 5;
> +          return 4;
>          else
>            return 1;
>  
> @@ -3551,6 +3560,30 @@ rs6000_preferred_simd_mode (enum machine_mode mode
>    return word_mode;
>  }
>  
> +/* Implement targetm.vectorize.density_pct_threshold.  */
> +
> +static int
> +rs6000_density_pct_threshold (void)
> +{
> +  return 85;
> +}
> +
> +/* Implement targetm.vectorize.density_size_threshold.  */
> +
> +static int
> +rs6000_density_size_threshold (void)
> +{
> +  return 70;
> +}
> +
> +/* Implement targetm.vectorize.density_penalty.  */
> +
> +static int
> +rs6000_density_penalty (void)
> +{
> +  return 10;
> +}
> +
>  /* Handler for the Mathematical Acceleration Subsystem (mass) interface to a
>     library with vectorized intrinsics.  */
>  
> 
> 
> 

-- 
Richard Guenther <rguent...@suse.de>
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer

Reply via email to