https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88278
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |missed-optimization Target| |x86_64-*-* i?86-*-* CC| |jakub at gcc dot gnu.org, | |uros at gcc dot gnu.org --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- Hmm, it looks like *movdi_internal and friends to not represent the implicit zeroing of the upper part? I guess RTL in general (before reload) doesn't know that V2SI is the low part of a V4SI register. But after reload we see (split2) (insn 7 5 8 2 (set (reg:V8QI 20 xmm0 [orig:88 MEM[(unsigned char *)p_1(D)] ] [88]) (mem:V8QI (reg:DI 5 di [91]) [0 MEM[(unsigned char *)p_1(D)]+0 S8 A8])) 1078 {*movv8qi_internal} (nil)) (insn 8 7 9 2 (set (reg:V2SI 21 xmm1 [90]) (const_vector:V2SI [ (const_int 0 [0]) repeated x2 ])) 1080 {*movv2si_internal} (expr_list:REG_EQUIV (const_vector:V2SI [ (const_int 0 [0]) repeated x2 ]) (nil))) (insn 9 8 16 2 (set (reg:V4SI 20 xmm0 [89]) (vec_concat:V4SI (reg:V2SI 20 xmm0 [orig:88 MEM[(unsigned char *)p_1(D)] ] [88]) (reg:V2SI 21 xmm1 [90]))) 3955 {*vec_concatv4si} (expr_list:REG_EQUAL (vec_concat:V4SI (subreg:V2SI (reg:V8QI 20 xmm0 [orig:88 MEM[(unsigned char *)p_1(D)] ] [88]) 0) (const_vector:V2SI [ (const_int 0 [0]) repeated x2 ])) (nil))) where the pattern is probably easier to optimize (but we then fail to elide the xmm1 register as not needed eventually)?