http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44141
--- Comment #4 from Venkataramanan <venkataramanan.kumar at amd dot com> 2012-03-22 13:17:34 UTC --- I dont have permission to confirm this bug. Here is my analysis for the cause. #(insn:TI 4886 4885 4888 132 (set (reg:V2DF 25 xmm4 [8797]) # (mult:V2DF (reg:V2DF 25 xmm4 [8795]) # (reg:V2DF 22 xmm1 [8758]))) ac.f90:499 1138 {*mulv2df3} # (nil)) vmulpd %xmm1, %xmm4, %xmm4 # 4886 *mulv2df3/2 [length = 4] We are forcing a conversion from V2DF to V4SF mode here for unaligned moves when TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL is set. (-----Snip ix86_expand_vector_move_misalign-----) case V2DFmode: if (TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL) { op0 = gen_lowpart (V4SFmode, op0); op1 = gen_lowpart (V4SFmode, op1); emit_insn (gen_sse_movups (op0, op1)); return; } (-----Snip-----) This conversion generates RTL as shown below. #(insn:TI 4888 4886 4890 132 (set (mem/c:V4SF (plus:DI (reg/f:DI 7 sp) # (const_int 6136 [0x17f8])) [3 MEM[(real(kind=8)[26] *)&dclroo + 152B]+0 S16 A64]) # (unspec:V4SF [ # (reg:V4SF 25 xmm4 [8797]) # ] UNSPEC_MOVU)) ac.f90:499 1104 {*sse_movups} # (expr_list:REG_DEAD (reg:V4SF 25 xmm4 [8797]) # (nil))) vmovups %xmm4, 6136(%rsp) # 4888 *sse_movups/2 [length = 9] Now GCC does not know how to come back to V2DF mode again. As Uros said, it reloads through memory. #(insn 4930 4929 8259 132 (set (reg:V4SF 23 xmm2) # (unspec:V4SF [ # (mem/c:V4SF (plus:DI (reg/f:DI 7 sp) # (const_int 6136 [0x17f8])) [3 MEM[(real(kind=8)[26] *)&dclroo + 152B]+0 S16 A64]) # ] UNSPEC_MOVU)) ac.f90:503 1104 {*sse_movups} # (nil)) vmovups 6136(%rsp), %xmm2 # 4930 *sse_movups/1 [length = 9] #(insn:TI 8259 4930 8261 132 (set (mem/c:V4SF (plus:DI (reg/f:DI 7 sp) # (const_int 240 [0xf0])) [12 %sfp+-11184 S16 A128]) # (reg:V4SF 23 xmm2)) ac.f90:503 1098 {*movv4sf_internal} # (expr_list:REG_DEAD (reg:V4SF 23 xmm2) # (nil))) vmovaps %xmm2, 240(%rsp) # 8259 *movv4sf_internal/3 [length = 9] #(insn 8261 8259 4931 132 (set (reg:V2DF 23 xmm2) # (mem/c:V2DF (plus:DI (reg/f:DI 7 sp) # (const_int 240 [0xf0])) [12 %sfp+-11184 S16 A128])) ac.f90:503 1100 {*movv2df_internal} # (nil)) vmovaps 240(%rsp), %xmm2 # 8261 *movv2df_internal/2 [length = 9] #(insn:TI 4931 8261 8260 132 (set (reg:V2DF 23 xmm2) # (div:V2DF (reg:V2DF 23 xmm2) # (mem/c:V2DF (plus:DI (reg/f:DI 7 sp) # (const_int 6128 [0x17f0])) [3 MEM[(real(kind=8)[26] *)&dclroo + 144B]+0 S16 A128]))) ac.f90:503 1144 {sse2_divv2df3} # (nil)) vdivpd 6128(%rsp), %xmm2, %xmm2 # 4931 sse2_divv2df3/2 [length = 9]