https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110381
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rsandifo at gcc dot gnu.org --- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> --- I think the transform phase is correct but the analysis phase fails to reject this case because there's a permutation we elide even though that will not preserve the fold-left reduction semantics. We analyze the SLP node to t.c:12:23: note: Final SLP tree for instance 0x4c90840: t.c:12:23: note: node 0x4d57380 (max_nunits=2, refcnt=3) vector(2) double t.c:12:23: note: op template: sum_13 = foo$c_8 + sum_22; t.c:12:23: note: stmt 0 sum_13 = foo$c_8 + sum_22; t.c:12:23: note: stmt 1 sum_14 = foo$b_9 + sum_13; t.c:12:23: note: stmt 2 sum_15 = foo$a_11 + sum_14; t.c:12:23: note: children 0x4d57408 0x4d57490 t.c:12:23: note: node 0x4d57408 (max_nunits=2, refcnt=2) vector(2) double t.c:12:23: note: op template: foo$c_8 = _3->c; t.c:12:23: note: stmt 0 foo$c_8 = _3->c; t.c:12:23: note: stmt 1 foo$b_9 = _3->b; t.c:12:23: note: stmt 2 foo$a_11 = _3->a; t.c:12:23: note: load permutation { 2 1 0 } t.c:12:23: note: node 0x4d57490 (max_nunits=2, refcnt=2) vector(2) double t.c:12:23: note: op template: sum_22 = PHI <sum_15(5), 0.0(2)> t.c:12:23: note: stmt 0 sum_22 = PHI <sum_15(5), 0.0(2)> t.c:12:23: note: stmt 1 sum_22 = PHI <sum_15(5), 0.0(2)> t.c:12:23: note: stmt 2 sum_22 = PHI <sum_15(5), 0.0(2)> t.c:12:23: note: children 0x4d57380 (nil) but optimize_slp mangles things here. We have /* We have to mark outgoing permutations facing non-reduction graph entries that are not represented as to be materialized. */ for (slp_instance instance : m_vinfo->slp_instances) if (SLP_INSTANCE_KIND (instance) == slp_inst_kind_ctor) { unsigned int node_i = SLP_INSTANCE_TREE (instance)->vertex; m_partitions[m_vertices[node_i].partition].layout = 0; } unfortunately this all happens before we determine the reduction is in-order. The only thing we can do here is check needs_fold_left_reduction_p directly. I'm testing a patch.