https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88278

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
             Target|                            |x86_64-*-* i?86-*-*
                 CC|                            |jakub at gcc dot gnu.org,
                   |                            |uros at gcc dot gnu.org

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Hmm, it looks like *movdi_internal and friends to not represent the implicit
zeroing of the upper part?

I guess RTL in general (before reload) doesn't know that V2SI is the low
part of a V4SI register.  But after reload we see (split2)

(insn 7 5 8 2 (set (reg:V8QI 20 xmm0 [orig:88 MEM[(unsigned char *)p_1(D)] ]
[88])
        (mem:V8QI (reg:DI 5 di [91]) [0 MEM[(unsigned char *)p_1(D)]+0 S8 A8]))
1078 {*movv8qi_internal}
     (nil))
(insn 8 7 9 2 (set (reg:V2SI 21 xmm1 [90])
        (const_vector:V2SI [
                (const_int 0 [0]) repeated x2
            ])) 1080 {*movv2si_internal}
     (expr_list:REG_EQUIV (const_vector:V2SI [
                (const_int 0 [0]) repeated x2
            ])
        (nil)))
(insn 9 8 16 2 (set (reg:V4SI 20 xmm0 [89])
        (vec_concat:V4SI (reg:V2SI 20 xmm0 [orig:88 MEM[(unsigned char
*)p_1(D)] ] [88])
            (reg:V2SI 21 xmm1 [90]))) 3955 {*vec_concatv4si}
     (expr_list:REG_EQUAL (vec_concat:V4SI (subreg:V2SI (reg:V8QI 20 xmm0
[orig:88 MEM[(unsigned char *)p_1(D)] ] [88]) 0)
            (const_vector:V2SI [
                    (const_int 0 [0]) repeated x2
                ]))
        (nil)))

where the pattern is probably easier to optimize (but we then fail to elide
the xmm1 register as not needed eventually)?

Reply via email to