On 12/13/2011 10:26 AM, Sriraman Tallam wrote: > Cool, this works for stores! It generates the movlps + movhps. I have > to also make a similar change to another call to gen_sse2_movdqu for > loads. Would it be ok to not do this when tune=core2?
We can work something out. I'd like you to do the benchmarking to know if unaligned loads are really as expensive as unaligned stores, and whether there are reformatting penalties that make the movlps+movhps option for either load or store less attractive. r~