http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26546



Richard Biener <rguenth at gcc dot gnu.org> changed:



           What    |Removed                     |Added

----------------------------------------------------------------------------

           Keywords|meta-bug                    |

             Target|                            |x86_64-*-*, i?86-*-*

          Component|tree-optimization           |target

            Version|4.1.0                       |4.8.0

            Summary|[meta-bugs] couple of       |missed optimization with

                   |missed optimization with    |respect of vector

                   |respect of vector and       |intrinsics

                   |unions                      |



--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> 2013-02-26 
10:42:04 UTC ---

I get:



main:

.LFB518:

        .cfi_startproc

        xorps   %xmm0, %xmm0

        subq    $8, %rsp

        .cfi_def_cfa_offset 16

        movl    $.LC0, %edi

        movl    $1, %eax

        unpcklps        %xmm0, %xmm0

        cvtps2pd        %xmm0, %xmm0

        call    printf

        xorl    %eax, %eax

        addq    $8, %rsp

        .cfi_def_cfa_offset 8

        ret



with 4.8 and the asm from the description with 4.7 (with -O2).  Leaving

the union uninitialized of course makes it a bad testcase and probably

makes it optimized in the first place.



With



#include <xmmintrin.h>



typedef union

{

  __m128 vec;

  float data[4];

  struct { float x,y,z,w; };

} vec4f_t;



static inline float __attribute__((__always_inline__))

acc(vec4f_t src)

{

  float a;

  src.vec = _mm_add_ps(src.vec, _mm_movehl_ps(src.vec, src.vec));

  _mm_store_ss(&a, _mm_add_ss(src.vec, _mm_shuffle_ps(src.vec, src.vec,

                                                      _MM_SHUFFLE(3,2,1,1))));

  return a;

}



vec4f_t b;



int

main(int argc, char *argv[])

{

  __builtin_printf("%f\n", acc(b));

  return 0;

}



we are back to the unoptimized assembly.  Tree optimizers have no chance

optimizing this because they see target builtins:



  __m128 src;

  float a;

  double _2;

  __m128 _4;

  __m128 _5;

  __m128 _6;

  __m128 _7;



  <bb 2>:

  src_9 = MEM[(union  *)&b];

  _4 = __builtin_ia32_movhlps (src_9, src_9);

  _5 = __builtin_ia32_addps (src_9, _4);

  _6 = __builtin_ia32_shufps (_5, _5, 229);

  _7 = __builtin_ia32_addss (_5, _6);

  a_8 = __builtin_ia32_vec_ext_v4sf (_7, 0);

  _2 = (double) a_8;

  printf ("%f\n", _2);



but in all this is now a target issue.

Reply via email to