http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55448



Jakub Jelinek <jakub at gcc dot gnu.org> changed:



           What    |Removed                     |Added

----------------------------------------------------------------------------

                 CC|                            |uros at gcc dot gnu.org

   Target Milestone|---                         |4.8.0



--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> 2012-11-23 
17:50:56 UTC ---

With -O2 -mavx -fno-ipa-sra the whole

#include <x86intrin.h>



static inline __m256

add1 (const __m256 & a, const __m256 & b)

{

  return _mm256_add_ps (a, b);

}



void

f1 (__m256 & a, const __m256 b)

{

  a = add1 (a, b);

}



static inline __m128

add2 (const __m128 & a, const __m128 & b)

{

  return _mm_add_ps (a, b);

}



void

f2 (__m128 & a, const __m128 b)

{

  a = add2 (a, b);

}



static inline __m256

add3 (const __m256 *a, const __m256 *b)

{

  return _mm256_add_ps (*a, *b);

}



void

f3 (__m256 *a, const __m256 b)

{

  *a = add3 (a, &b);

}



static inline __m128

add4 (const __m128 *a, const __m128 *b)

{

  return _mm_add_ps (*a, *b);

}



void

f4 (__m128 *a, const __m128 b)

{

  *a = add4 (a, &b);

}



testcase compiles into optimal code.  Beyond the eipa_sra issue the thing is

that for AVX/AVX2 we generally should attempt to combine unaligned loads with

operations that use them (unless it is a plain move), but there is UNSPEC_LOADU

involved (and for 256-bit values also vec_concat with another MEM load), so not

sure what would be the best pass to handle that, if some hack in the combiner,

peephole2 (but we'd need many of them) or what.

Reply via email to