On 12/13/2011 10:26 AM, Sriraman Tallam wrote:
> Cool, this works for stores!  It generates the movlps + movhps. I have
> to also make a similar change to another call to gen_sse2_movdqu for
> loads. Would it be ok to not do this when tune=core2?

We can work something out.  

I'd like you to do the benchmarking to know if unaligned loads are really as 
expensive as unaligned stores, and whether there are reformatting penalties 
that make the movlps+movhps option for either load or store less attractive.


r~

Reply via email to