On Tue, Dec 13, 2011 at 10:56 AM, Richard Henderson <r...@redhat.com> wrote: > On 12/13/2011 10:26 AM, Sriraman Tallam wrote: >> Cool, this works for stores! It generates the movlps + movhps. I have >> to also make a similar change to another call to gen_sse2_movdqu for >> loads. Would it be ok to not do this when tune=core2? > > We can work something out. > > I'd like you to do the benchmarking to know if unaligned loads are really as > expensive as unaligned stores, and whether there are reformatting penalties > that make the movlps+movhps option for either load or store less attractive.
I can confirm that movhps+movlps is *not at all* a good substitute for movdqu on core2. It makes it much worse. MOVHPS/MOVLPS has a very high penalty (~10x) for unaligned load/stores. > > > r~