http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26546
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords|meta-bug | Target| |x86_64-*-*, i?86-*-* Component|tree-optimization |target Version|4.1.0 |4.8.0 Summary|[meta-bugs] couple of |missed optimization with |missed optimization with |respect of vector |respect of vector and |intrinsics |unions | --- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> 2013-02-26 10:42:04 UTC --- I get: main: .LFB518: .cfi_startproc xorps %xmm0, %xmm0 subq $8, %rsp .cfi_def_cfa_offset 16 movl $.LC0, %edi movl $1, %eax unpcklps %xmm0, %xmm0 cvtps2pd %xmm0, %xmm0 call printf xorl %eax, %eax addq $8, %rsp .cfi_def_cfa_offset 8 ret with 4.8 and the asm from the description with 4.7 (with -O2). Leaving the union uninitialized of course makes it a bad testcase and probably makes it optimized in the first place. With #include <xmmintrin.h> typedef union { __m128 vec; float data[4]; struct { float x,y,z,w; }; } vec4f_t; static inline float __attribute__((__always_inline__)) acc(vec4f_t src) { float a; src.vec = _mm_add_ps(src.vec, _mm_movehl_ps(src.vec, src.vec)); _mm_store_ss(&a, _mm_add_ss(src.vec, _mm_shuffle_ps(src.vec, src.vec, _MM_SHUFFLE(3,2,1,1)))); return a; } vec4f_t b; int main(int argc, char *argv[]) { __builtin_printf("%f\n", acc(b)); return 0; } we are back to the unoptimized assembly. Tree optimizers have no chance optimizing this because they see target builtins: __m128 src; float a; double _2; __m128 _4; __m128 _5; __m128 _6; __m128 _7; <bb 2>: src_9 = MEM[(union *)&b]; _4 = __builtin_ia32_movhlps (src_9, src_9); _5 = __builtin_ia32_addps (src_9, _4); _6 = __builtin_ia32_shufps (_5, _5, 229); _7 = __builtin_ia32_addss (_5, _6); a_8 = __builtin_ia32_vec_ext_v4sf (_7, 0); _2 = (double) a_8; printf ("%f\n", _2); but in all this is now a target issue.