On Tue, Dec 13, 2011 at 9:58 AM, Richard Henderson <r...@redhat.com> wrote: > On 12/12/2011 06:05 PM, Sriraman Tallam wrote: >> On core2, unaligned vector load/store using movdqu is a very slow operation. >> Experiments show it is six times slower than movdqa (aligned) and this is >> irrespective of whether the resulting data happens to be aligned or not. >> For Corei7, there is no performance difference between the two and on AMDs, >> movdqu is only about 10% slower. >> >> This patch does not vectorize loops that need to generate the slow unaligned >> memory load/stores on core2. > > What happens if you temporarily disable > > /* ??? Similar to above, only less clear because of quote > typeless stores unquote. */ > if (TARGET_SSE2 && !TARGET_SSE_TYPELESS_STORES > && GET_MODE_CLASS (mode) == MODE_VECTOR_INT) > { > op0 = gen_lowpart (V16QImode, op0); > op1 = gen_lowpart (V16QImode, op1); > emit_insn (gen_sse2_movdqu (op0, op1)); > return; > } > > so that the unaligned store happens via movlps + movhps?
Cool, this works for stores! It generates the movlps + movhps. I have to also make a similar change to another call to gen_sse2_movdqu for loads. Would it be ok to not do this when tune=core2? Thanks, -Sri. > > > r~