https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113205
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rguenth at gcc dot gnu.org, | |rsandifo at gcc dot gnu.org See Also| |https://gcc.gnu.org/bugzill | |a/show_bug.cgi?id=110935 --- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> --- OK, so this should already reproduce before the change when removing the invariant add (p + 8000). The issue seems to be that SLP build ends up with an unsupported load permutation when we try with V2SImode vectorization after V4SImode is scrapped because of cost issues. We have t.c:18:10: note: node 0x6471a48 (max_nunits=2, refcnt=2) vector(2) int t.c:18:10: note: op template: _3 = MEM[(int *)i.0_1 + 4B]; t.c:18:10: note: stmt 0 _3 = MEM[(int *)i.0_1 + 4B]; t.c:18:10: note: stmt 1 _5 = MEM[(int *)i.0_1 + 12B]; t.c:18:10: note: stmt 2 _4 = MEM[(int *)i.0_1 + 8B]; t.c:18:10: note: stmt 3 _2 = *i.0_1; t.c:18:10: note: load permutation { 1 3 2 0 } I'm not sure whether that's a supported situation. Changing the code to be more graceful like diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index b6cce55ce90..a12214bc1ad 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -5343,8 +5343,8 @@ vect_optimize_slp_pass::backward_pass () } } - gcc_assert (min_layout_cost.is_possible ()); - partition.layout = min_layout_i; + if (min_layout_cost.is_possible ()) + partition.layout = min_layout_i; } } then yields t.c:18:10: note: SLP optimize permutations: t.c:18:10: note: 1: { 1, 3, 2, 0 } t.c:18:10: note: SLP optimize partitions: t.c:18:10: note: ------------- t.c:18:10: note: partition 0 (layout 0): t.c:18:10: note: nodes: t.c:18:10: note: - 0x5f0d9b0: t.c:18:10: note: weight: 1.000000 t.c:18:10: note: out weight: 1.000000 (degree 1) t.c:18:10: note: op template: _20 = (int) _19; t.c:18:10: note: edges: t.c:18:10: note: - 0x5f0d9b0 --> [2] 0x5f0d928 t.c:18:10: note: layout 0: rejected t.c:18:10: note: layout 1: rejected t.c:18:10: note: ------------- t.c:18:10: note: partition 1 (layout 1): t.c:18:10: note: nodes: t.c:18:10: note: - 0x5f0da38: t.c:18:10: note: weight: 1.000000 t.c:18:10: note: out weight: 1.000000 (degree 1) t.c:18:10: note: op template: _3 = MEM[(int *)i.0_1 + 4B]; t.c:18:10: note: edges: t.c:18:10: note: - 0x5f0da38 --> [2] 0x5f0d928 t.c:18:10: note: layout 0: rejected t.c:18:10: note: layout 1: rejected t.c:18:10: note: ------------- t.c:18:10: note: partition 2 (layout 1): t.c:18:10: note: nodes: t.c:18:10: note: - 0x5f0d928: t.c:18:10: note: weight: 1.000000 t.c:18:10: note: out weight: 1.000000 (degree 1) t.c:18:10: note: op template: _21 = _3 * _20; t.c:18:10: note: edges: t.c:18:10: note: - 0x5f0d928 --> [3] 0x5f0d8a0 t.c:18:10: note: - 0x5f0d9b0 [0] --> 0x5f0d928 t.c:18:10: note: - 0x5f0da38 [1] --> 0x5f0d928 t.c:18:10: note: layout 0: rejected t.c:18:10: note: layout 1: rejected t.c:18:10: note: ------------- t.c:18:10: note: partition 3 (layout 1): t.c:18:10: note: nodes: t.c:18:10: note: - 0x5f0d8a0: t.c:18:10: note: weight: 1.000000 t.c:18:10: note: op template: _22 = (unsigned int) _21; t.c:18:10: note: edges: t.c:18:10: note: - 0x5f0d928 [2] --> 0x5f0d8a0 t.c:18:10: note: layout 0: t.c:18:10: note: {depth: 1.000000, total: 1.000000} t.c:18:10: note: + {depth: 0.000000, total: 0.000000} t.c:18:10: note: + {depth: 0.000000, total: 0.000000} t.c:18:10: note: = {depth: 1.000000, total: 1.000000} t.c:18:10: note: layout 1: (*) t.c:18:10: note: {depth: 0.000000, total: 0.000000} t.c:18:10: note: + {depth: 0.000000, total: 0.000000} t.c:18:10: note: + {depth: 0.000000, total: 0.000000} t.c:18:10: note: = {depth: 0.000000, total: 0.000000} t.c:18:10: note: inserting permutation node in place of 0x5f0d9b0 t.c:18:10: note: recording new base alignment for i.0_1 ... t.c:18:10: note: vectorizing permutation op0[3] op0[0] op0[2] op0[1] t.c:18:10: note: vectorizing permutation op0[3] op0[0] op0[2] op0[1] t.c:18:10: note: as vops0[1][1] vops0[0][0], vops0[1][0] vops0[0][1] t.c:18:10: missed: unsupported vect permute { 1 2 } t.c:18:10: note: Building vector operands of 0x5f0db48 from scalars instead ... t.c:18:10: note: removing SLP instance operations starting from: _25 = _24 + _40; t.c:18:10: missed: not vectorized: bad operation in basic block. t.c:18:10: note: ***** Analysis failed with vector mode V8QI t.c:18:10: note: ***** Re-trying analysis with vector mode V4QI and the ICE is gone. I'm not sure if we can "recover" in this way or whether leaving partition.layout unchanged could lead to wrong-code if it were actually possible to code generate it, thus whether it's really the inability to generate the permute that triggers this issue. Related to PR110935, with -Ofast we should elide the unsupported permute.