------- Comment #6 from scovich at gmail dot com 2007-07-11 20:27 ------- (In reply to comment #5) > SImode moves will be a bit harder, because shufps insn pattern is involved in > the vector expansion.
IIRC, shufps takes 3 cycles on Core2 (http://www.agner.org/optimize/instruction_tables.pdf), even without the operand type mismatch (does that still exist?). That's >=4 cycles. Storing the vector to stack and load the desired entry would take <=4 cycles, even without Intel's store-load optimizations, and I imagine the optimizer would be able to deal with it better. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32661