http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53101
--- Comment #2 from Marc Glisse <marc.glisse at normalesup dot org> 2012-05-01 15:10:26 UTC --- (In reply to comment #1) > We get MEM[(T * {ref-all})&x] for the casting (not a BIT_FIELD_REF for > example). > This gets expanded to > > (insn 6 5 7 (set (reg:OI 63) > (subreg:OI (reg/v:V4DF 61 [ x ]) 0)) t.c:8 -1 > (nil)) > > (insn 7 6 8 (set (reg:V2DF 60 [ <retval> ]) > (subreg:V2DF (reg:OI 63) 0)) t.c:8 -1 > (nil)) > > but that should be perfectly optimizable. A bit hard for me (never touched those md files before)... This obviously incorrect code does the transformation: (define_peephole2 [ (set (match_operand:V8SF 2 "memory_operand") (match_operand:V8SF 1 "register_operand") ) (set (match_operand:V4SF 0 "register_operand") (match_operand:V4SF 3 "memory_operand") ) ] "TARGET_AVX" [(const_int 0)] { emit_insn (gen_vec_extract_lo_v8sf (operands[0], operands[1])); DONE; }) (the code in this experiment uses __v4sf and __v8sf instead of __m128d/__m256d in the description above) but operands[2] and operands[3] don't compare equal with rtx_equal_p, and trying a match_dup refuses to compile because of the mode mismatch, so I don't know how to constrain 2 and 3 to be "the same". I tried adding some (subreg: ...) in there, but it didn't match, and looking at the rtl peephole dump, there isn't any subreg there. Then maybe peephole isn't the right place, but that's the only one where I managed to get something that compiles and is executed by the compiler on this testcase.