https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110206
--- Comment #9 from Uroš Bizjak <ubizjak at gmail dot com> --- Some more digging through the code: In cprop.cc/try_replace_reg, we try to simplify the source of the set given our substitution: Breakpoint 1, try_replace_reg (from=0x7fffe9f0b7f8, to=0x7fffe9f099e0, insn=0x7fffea01b6c0) at ../../git/gcc/gcc/cprop.cc:789 789 src = simplify_replace_rtx (SET_SRC (set), from, to); (gdb) list 784 if (!success && set && reg_mentioned_p (from, SET_SRC (set))) 785 { 786 /* If above failed and this is a single set, try to simplify the source 787 of the set given our substitution. We could perhaps try this for 788 multiple SETs, but it probably won't buy us anything. */ 789 src = simplify_replace_rtx (SET_SRC (set), from, to); (gdb) p debug_rtx (set) (set (reg:V8HI 100) (zero_extend:V8HI (vec_select:V8QI (subreg:V16QI (reg:V4QI 98) 0) (parallel [ (const_int 0 [0]) (const_int 1 [0x1]) (const_int 2 [0x2]) (const_int 3 [0x3]) (const_int 4 [0x4]) (const_int 5 [0x5]) (const_int 6 [0x6]) (const_int 7 [0x7]) ])))) (gdb) p debug_rtx (from) (reg:V4QI 98) (gdb) p debug_rtx (to) (const_vector:V4QI [ (const_int -52 [0xffffffffffffffcc]) repeated x4 ]) and simplify_replace_rtx simplifies the above to: (gdb) p debug_rtx (src) (const_vector:V8HI [ (const_int 204 [0xcc]) repeated x8 ]) which is obviously wrong, we have V4QImode input register holding V4QImode constant. Tracing through simplify-rtx.cc brings us to a recursive simplify_replace_fn_rtx, which gets us to: Breakpoint 1, simplify_replace_fn_rtx (x=0x7fffe9f0b888, old_rtx=0x7fffe9f0b7f8, fn=0x0, data=0x7fffe9f099e0) at ../../git/gcc/gcc/simplify-rtx.cc:474 474 op0 = simplify_gen_subreg (GET_MODE (x), op0, (gdb) list 469 if (code == SUBREG) 470 { 471 op0 = simplify_replace_fn_rtx (SUBREG_REG (x), old_rtx, fn, data); 472 if (op0 == SUBREG_REG (x)) 473 return x; 474 op0 = simplify_gen_subreg (GET_MODE (x), op0, 475 GET_MODE (SUBREG_REG (x)), 476 SUBREG_BYTE (x)); 477 return op0 ? op0 : x; 478 } (gdb) p debug_rtx (op0) (const_vector:V4QI [ (const_int -52 [0xffffffffffffffcc]) repeated x4 ]) (gdb) p debug_rtx (x) (subreg:V16QI (reg:V4QI 98) 0) and simplify_gen_subreg with the above arguments returns: (gdb) p debug_rtx (op0) (const_vector:V16QI [ (const_int -52 [0xffffffffffffffcc]) repeated x16 ]) No way! It is not possible to get V16QImode vector from V4QImode vector, even when all elements are duplicates. Tracing even deeper to simplify_context::simplify_subreg, we found the following: Breakpoint 1, simplify_context::simplify_subreg (this=0x7fffffffd528, outermode=E_V16QImode, op=0x7fffe9f099e0, innermode=E_V4QImode, byte=...) at ../../git/gcc/gcc/simplify-rtx.cc:7561 7561 return gen_vec_duplicate (outermode, elt); (gdb) list 7556 rtx elt; 7557 7558 if (VECTOR_MODE_P (outermode) 7559 && GET_MODE_INNER (outermode) == GET_MODE_INNER (innermode) 7560 && vec_duplicate_p (op, &elt)) 7561 return gen_vec_duplicate (outermode, elt); 7562 7563 if (outermode == GET_MODE_INNER (innermode) 7564 && vec_duplicate_p (op, &elt)) 7565 return elt; (gdb) p outermode $1 = E_V16QImode (gdb) p debug_rtx (elt) (const_int -52 [0xffffffffffffffcc]) (gdb) fin Run till exit from #0 simplify_context::simplify_subreg (this=0x7fffffffd528, outermode=E_V16QImode, op=0x7fffe9f099e0, innermode=E_V4QImode, byte=...) at ../../git/gcc/gcc/simplify-rtx.cc:7561 0x0000000000eb24d3 in simplify_subreg (byte=..., innermode=E_V4QImode, op=<optimized out>, outermode=<optimized out>) at ../../git/gcc/gcc/rtl.h:3513 3513 return simplify_context ().simplify_subreg (outermode, op, innermode, byte); Value returned is $4 = (rtx_def *) 0x7fffe9f09c10 (gdb) p debug_rtx ($4) (const_vector:V16QI [ (const_int -52 [0xffffffffffffffcc]) repeated x16 ]) Nope. This transformation is valid only for non-paradoxical registers. Patch is then obvious: diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc index d7315d82aa3..87ca25086dc 100644 --- a/gcc/simplify-rtx.cc +++ b/gcc/simplify-rtx.cc @@ -7557,6 +7557,7 @@ simplify_context::simplify_subreg (machine_mode outermode, rtx op, if (VECTOR_MODE_P (outermode) && GET_MODE_INNER (outermode) == GET_MODE_INNER (innermode) + && !paradoxical_subreg_p (outermode, innermode) && vec_duplicate_p (op, &elt)) return gen_vec_duplicate (outermode, elt);