14 Regression] SLP cost of constructors is off

rguenth at gcc dot gnu.org via Gcc-bugs Fri, 05 May 2023 05:03:04 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109747


Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |12.3
             Target|                            |x86_64-*-* i?86-*-*
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot 
gnu.org
             Status|UNCONFIRMED                 |ASSIGNED
                 CC|                            |rsandifo at gcc dot gnu.org
     Ever confirmed|0                           |1
           Keywords|                            |missed-optimization
   Last reconfirmed|                            |2023-05-05

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
A fix, and maybe exactly a step in the right direction, would be to construct
individual new SLP nodes for each call to record_stmt_cost from
vect_prologue_cost_for_slp:

  /* ???  We're just tracking whether vectors in a single node are the same.
     Ideally we'd do something more global.  */
  for (unsigned int start : starts)
    { 
      vect_cost_for_stmt kind;
      if (SLP_TREE_DEF_TYPE (node) == vect_constant_def)
        kind = vector_load;
      else if (vect_scalar_ops_slice { ops, start, nelt_limit }.all_same_p ())
        kind = scalar_to_vec;
      else
        kind = vec_construct;
      record_stmt_cost (cost_vec, 1, kind, node, vectype, 0, vect_prologue);
    }                       

alternatively we could pass down 'start' as well.  The x86 backend code
could also detect the mismatch of TYPE_VECTOR_SUBPARTS * count and
the number of SLP lanes (but not sure what it should do in that case).

Note we can't currently meaningfully put such a split set of SLP nodes
into the SLP graph, but in the end we might want to go into the direction
of splitting it into individual vector ops, esp. for load/store vectorization
and interleaving.

Short-term passing down 'start' (and only interpreting it with count is one?)
might be easiest.

Any opinions?

[Bug target/109747] [12/13/14 Regression] SLP cost of constructors is off

Reply via email to