https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110372
--- Comment #1 from Uroš Bizjak <ubizjak at gmail dot com> --- Before reload, we have this sequence: --cut here-- (insn 34 4 2 2 (set (reg:TI 119) (reg:TI 20 xmm0 [ u ])) "pr110372.c":8:1 89 {*movti_internal} (expr_list:REG_DEAD (reg:TI 20 xmm0 [ u ]) (nil))) (insn 2 34 3 2 (set (reg/v:TI 98 [ u ]) (reg:TI 119)) "pr110372.c":8:1 89 {*movti_internal} (expr_list:REG_DEAD (reg:TI 119) (nil))) (note 3 2 7 2 NOTE_INSN_FUNCTION_BEG) (insn 7 3 9 2 (set (reg:V4SI 83 [ _2 ]) (and:V4SI (subreg:V4SI (reg/v:TI 98 [ u ]) 0) (mem/u/c:V4SI (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0 S16 A128]))) "pr110372.c":9:14 6840 {*andv4si3} (expr_list:REG_EQUAL (and:V4SI (subreg:V4SI (reg/v:TI 98 [ u ]) 0) (const_vector:V4SI [ (const_int 4 [0x4]) repeated x4 ])) (nil))) --cut here-- And reload tries to move xmm0 to xmm1 with: (insn 36 3 41 2 (set (reg:DI 0 ax [123]) (reg:DI 20 xmm0 [orig:98 u ] [98])) "pr110372.c":9:14 90 {*movdi_internal} (nil)) (insn 41 36 37 2 (set (mem/c:TI (plus:DI (reg/f:DI 7 sp) (const_int -40 [0xffffffffffffffd8])) [2 %sfp+-32 S16 A128]) (reg/v:TI 20 xmm0 [orig:98 u ] [98])) "pr110372.c":9:14 89 {*movti_internal} (nil)) (insn 37 41 43 2 (set (reg:DI 24 xmm4 [124]) (mem/c:DI (plus:DI (reg/f:DI 7 sp) (const_int -32 [0xffffffffffffffe0])) [2 %sfp+-24 S8 A64])) "pr110372.c":9:14 90 {*movdi_internal} (nil)) (insn 43 37 38 2 (set (reg:DI 23 xmm3 [122]) (reg:DI 0 ax [123])) "pr110372.c":9:14 90 {*movdi_internal} (nil)) (insn 38 43 39 2 (set (reg:V2DI 23 xmm3 [122]) (vec_concat:V2DI (reg:DI 23 xmm3 [122]) (reg:DI 24 xmm4 [124]))) "pr110372.c":9:14 7265 {vec_concatv2di} (nil)) (insn 39 38 7 2 (set (reg:V4SI 21 xmm1 [orig:83 _2 ] [83]) (reg:V4SI 23 xmm3 [122])) "pr110372.c":9:14 1869 {movv4si_internal} (nil)) in order to satisfy constraints of: (insn 7 39 9 2 (set (reg:V4SI 21 xmm1 [orig:83 _2 ] [83]) (and:V4SI (reg:V4SI 21 xmm1 [orig:83 _2 ] [83]) (mem/u/c:V4SI (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0 S16 A128]))) "pr110372.c":9:14 6840 {*andv4si3} (expr_list:REG_EQUAL (and:V4SI (subreg:V4SI (reg/v:TI 20 xmm0 [orig:98 u ] [98]) 0) (const_vector:V4SI [ (const_int 4 [0x4]) repeated x4 ])) (nil))) Please note that alternative 19 of *movdi_internal from i386.md (?r,?v) is correctly enabled only for x64_sse2 ISA, so unavailable without SSE2. We have: #define VALID_SSE_REG_MODE(MODE) \ ((MODE) == V1TImode || (MODE) == TImode \ || (MODE) == V4SFmode || (MODE) == V4SImode \ || (MODE) == SFmode || (MODE) == SImode \ || (MODE) == TFmode || (MODE) == TDmode) So, TImode and V4SImode should be tieable for XMM registers and RA should just copy the value between XMM registers.