http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49133
--- Comment #3 from Uros Bizjak <ubizjak at gmail dot com> 2011-05-24 10:51:25 UTC --- (In reply to comment #2) > I applied the patch to the latest 4.6 snapshot. I confirm that it fixes the > bug. Also, there are no regressions in my testsuite. > > Just for confirmation, the patched sse.md looks like this for me now (starting > from line 4952): > (define_insn "sse2_loadhpd" > [(set (match_operand:V2DF 0 "nonimmediate_operand" "=x,x,x,o,o,o") > (vec_concat:V2DF > (vec_select:DF > (match_operand:V2DF 1 "nonimmediate_operand" " 0,0,x,0,0,0") > (parallel [(const_int 0)])) > (match_operand:DF 2 "nonimmediate_operand" " m,x,0,x,*f,r")))] > "TARGET_SSE2 && !(MEM_P (operands[1]) && MEM_P (operands[2]))" > "@ > movhpd\t{%2, %0|%0, %2} > unpcklpd\t{%2, %0|%0, %2} > shufpd\t{$0, %1, %0|%0, %1, 0} > # > > Question, why not use unpcklpd instead of shufpd $0? On older CPUs unpcklpd > should be slightly faster than shufpd. OTOH, it looks that this alternative is wrong entirely. Unmodified operand can only be passed in lower half (operand 1 in the pattern above). GCC will then generate unpcklpd, as suggested.