https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113205

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org,
                   |                            |rsandifo at gcc dot gnu.org
           See Also|                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=110935

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
OK, so this should already reproduce before the change when removing the
invariant add (p + 8000).  The issue seems to be that SLP build ends up
with an unsupported load permutation when we try with V2SImode vectorization
after V4SImode is scrapped because of cost issues.  We have

t.c:18:10: note:   node 0x6471a48 (max_nunits=2, refcnt=2) vector(2) int
t.c:18:10: note:   op template: _3 = MEM[(int *)i.0_1 + 4B];
t.c:18:10: note:        stmt 0 _3 = MEM[(int *)i.0_1 + 4B];
t.c:18:10: note:        stmt 1 _5 = MEM[(int *)i.0_1 + 12B];
t.c:18:10: note:        stmt 2 _4 = MEM[(int *)i.0_1 + 8B];
t.c:18:10: note:        stmt 3 _2 = *i.0_1;
t.c:18:10: note:        load permutation { 1 3 2 0 }

I'm not sure whether that's a supported situation.  Changing the code
to be more graceful like

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index b6cce55ce90..a12214bc1ad 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -5343,8 +5343,8 @@ vect_optimize_slp_pass::backward_pass ()
            }
        }

-      gcc_assert (min_layout_cost.is_possible ());
-      partition.layout = min_layout_i;
+      if (min_layout_cost.is_possible ())
+       partition.layout = min_layout_i;
     }
 }

then yields

t.c:18:10: note:  SLP optimize permutations:
t.c:18:10: note:    1: { 1, 3, 2, 0 }
t.c:18:10: note:  SLP optimize partitions:
t.c:18:10: note:    -------------
t.c:18:10: note:    partition 0 (layout 0):
t.c:18:10: note:      nodes:
t.c:18:10: note:        - 0x5f0d9b0:
t.c:18:10: note:            weight: 1.000000
t.c:18:10: note:            out weight: 1.000000 (degree 1)
t.c:18:10: note:            op template: _20 = (int) _19;
t.c:18:10: note:      edges:
t.c:18:10: note:        - 0x5f0d9b0 --> [2] 0x5f0d928
t.c:18:10: note:      layout 0: rejected
t.c:18:10: note:      layout 1: rejected
t.c:18:10: note:    -------------
t.c:18:10: note:    partition 1 (layout 1):
t.c:18:10: note:      nodes:
t.c:18:10: note:        - 0x5f0da38:
t.c:18:10: note:            weight: 1.000000
t.c:18:10: note:            out weight: 1.000000 (degree 1)
t.c:18:10: note:            op template: _3 = MEM[(int *)i.0_1 + 4B];
t.c:18:10: note:      edges:
t.c:18:10: note:        - 0x5f0da38 --> [2] 0x5f0d928
t.c:18:10: note:      layout 0: rejected
t.c:18:10: note:      layout 1: rejected
t.c:18:10: note:    -------------
t.c:18:10: note:    partition 2 (layout 1):
t.c:18:10: note:      nodes:
t.c:18:10: note:        - 0x5f0d928:
t.c:18:10: note:            weight: 1.000000
t.c:18:10: note:            out weight: 1.000000 (degree 1)
t.c:18:10: note:            op template: _21 = _3 * _20;
t.c:18:10: note:      edges:
t.c:18:10: note:        - 0x5f0d928 --> [3] 0x5f0d8a0
t.c:18:10: note:        - 0x5f0d9b0 [0] --> 0x5f0d928
t.c:18:10: note:        - 0x5f0da38 [1] --> 0x5f0d928
t.c:18:10: note:      layout 0: rejected
t.c:18:10: note:      layout 1: rejected
t.c:18:10: note:    -------------
t.c:18:10: note:    partition 3 (layout 1):
t.c:18:10: note:      nodes:
t.c:18:10: note:        - 0x5f0d8a0:
t.c:18:10: note:            weight: 1.000000
t.c:18:10: note:            op template: _22 = (unsigned int) _21;
t.c:18:10: note:      edges:
t.c:18:10: note:        - 0x5f0d928 [2] --> 0x5f0d8a0
t.c:18:10: note:      layout 0:
t.c:18:10: note:          {depth: 1.000000, total: 1.000000}
t.c:18:10: note:        + {depth: 0.000000, total: 0.000000}
t.c:18:10: note:        + {depth: 0.000000, total: 0.000000}
t.c:18:10: note:        = {depth: 1.000000, total: 1.000000}
t.c:18:10: note:      layout 1: (*)
t.c:18:10: note:          {depth: 0.000000, total: 0.000000}
t.c:18:10: note:        + {depth: 0.000000, total: 0.000000}
t.c:18:10: note:        + {depth: 0.000000, total: 0.000000}
t.c:18:10: note:        = {depth: 0.000000, total: 0.000000}
t.c:18:10: note:  inserting permutation node in place of 0x5f0d9b0
t.c:18:10: note:  recording new base alignment for i.0_1
...
t.c:18:10: note:   vectorizing permutation op0[3] op0[0] op0[2] op0[1]
t.c:18:10: note:   vectorizing permutation op0[3] op0[0] op0[2] op0[1]
t.c:18:10: note:   as vops0[1][1] vops0[0][0], vops0[1][0] vops0[0][1]
t.c:18:10: missed:   unsupported vect permute { 1 2 }
t.c:18:10: note:   Building vector operands of 0x5f0db48 from scalars instead
...
t.c:18:10: note:   removing SLP instance operations starting from: _25 = _24 +
_40;
t.c:18:10: missed:  not vectorized: bad operation in basic block.
t.c:18:10: note: ***** Analysis failed with vector mode V8QI
t.c:18:10: note: ***** Re-trying analysis with vector mode V4QI

and the ICE is gone.

I'm not sure if we can "recover" in this way or whether leaving
partition.layout unchanged could lead to wrong-code if it were actually
possible to code generate it, thus whether it's really the inability
to generate the permute that triggers this issue.

Related to PR110935, with -Ofast we should elide the unsupported permute.

Reply via email to