https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110381

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rsandifo at gcc dot gnu.org

--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
I think the transform phase is correct but the analysis phase fails to reject
this case because there's a permutation we elide even though that will not
preserve the fold-left reduction semantics.  We analyze the SLP node to

t.c:12:23: note:   Final SLP tree for instance 0x4c90840:
t.c:12:23: note:   node 0x4d57380 (max_nunits=2, refcnt=3) vector(2) double
t.c:12:23: note:   op template: sum_13 = foo$c_8 + sum_22;
t.c:12:23: note:        stmt 0 sum_13 = foo$c_8 + sum_22;
t.c:12:23: note:        stmt 1 sum_14 = foo$b_9 + sum_13;
t.c:12:23: note:        stmt 2 sum_15 = foo$a_11 + sum_14;
t.c:12:23: note:        children 0x4d57408 0x4d57490
t.c:12:23: note:   node 0x4d57408 (max_nunits=2, refcnt=2) vector(2) double
t.c:12:23: note:   op template: foo$c_8 = _3->c;
t.c:12:23: note:        stmt 0 foo$c_8 = _3->c;
t.c:12:23: note:        stmt 1 foo$b_9 = _3->b;
t.c:12:23: note:        stmt 2 foo$a_11 = _3->a;
t.c:12:23: note:        load permutation { 2 1 0 }
t.c:12:23: note:   node 0x4d57490 (max_nunits=2, refcnt=2) vector(2) double
t.c:12:23: note:   op template: sum_22 = PHI <sum_15(5), 0.0(2)>
t.c:12:23: note:        stmt 0 sum_22 = PHI <sum_15(5), 0.0(2)>
t.c:12:23: note:        stmt 1 sum_22 = PHI <sum_15(5), 0.0(2)>
t.c:12:23: note:        stmt 2 sum_22 = PHI <sum_15(5), 0.0(2)>
t.c:12:23: note:        children 0x4d57380 (nil)

but optimize_slp mangles things here.

We have

  /* We have to mark outgoing permutations facing non-reduction graph
     entries that are not represented as to be materialized.  */
  for (slp_instance instance : m_vinfo->slp_instances)
    if (SLP_INSTANCE_KIND (instance) == slp_inst_kind_ctor)
      {       
        unsigned int node_i = SLP_INSTANCE_TREE (instance)->vertex;
        m_partitions[m_vertices[node_i].partition].layout = 0;
      }

unfortunately this all happens before we determine the reduction is
in-order.  The only thing we can do here is check
needs_fold_left_reduction_p directly.

I'm testing a patch.

Reply via email to