http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54716
--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> 2012-09-26 13:46:25 UTC --- Created attachment 28282 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28282 gcc48-pr54716.patch Untested patch to optimize this. Unfortunately it will also change generated code for: __m256d i (__m256d x, __m256d y) { return (__m256d) _mm256_or_si256 ((__m256i) x, (__m256i) y); } Not sure if that is an issue or not. If we wanted to emit what the user for whatever reason asked for, the builtin expander could perhaps in those cases copy one of the arguments into a temporary pseudo before expansion.