Re: [PATCH] x86: _mm512_set1_p[sd]

2014-03-25 Thread Ulrich Drepper
On Mon, Mar 24, 2014 at 9:09 AM, Jakub Jelinek ja...@redhat.com wrote:
 The following is recognized well:

 typedef char v32qi __attribute__((vector_size (32)));
 v32qi foo (char a)
 {
   return (v32qi) { a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, 
 a, a, a, a, a, a, a, a, a, a, a, a, a };
 }


Perhaps well but not optimal.  The created code is

vmovd   %edi, %xmm0
vpbroadcastb%xmm0, %xmm0
vinserti128   $1, %xmm0, %ymm0, %ymm0

It should generate for AVX2

vmovd   %edi, %xmm0
vpbroadcastb%xmm0, %ymm0


Re: [PATCH] x86: _mm512_set1_p[sd]

2014-03-24 Thread Ulrich Drepper
On Mon, Mar 24, 2014 at 1:50 AM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 Your patch is correct IMHO, but maybe it worst to add all missing
 `mm512_set1*' stuff?

 According to trunk and [1] we're still missing (beside mentioned by you)
 _mm512_set1_epi16 and  _mm512_set1_epi8 broadcasts.

Yes, more are missing, but I think those will need new builtins.  The
_ps and _pd don't require additional instructions.

_mm512_set1_epi16 might have to map to vpbroadcastw. _mm512_set1_epi8
might have to map to vpbroadcastb.  I haven't seen a way to generate
those instructions if needed and so this work was out of scope for now
due to time constraints.  I agree, they should be added as quickly as
possible to avoid releasing headers with incomplete APIs.

What is the verdict on checking these changes in?  Too late for the
next release?


Re: [PATCH] x86: _mm512_set1_p[sd]

2014-03-24 Thread Richard Biener
On Mon, Mar 24, 2014 at 12:13 PM, Ulrich Drepper drep...@gmail.com wrote:
 On Mon, Mar 24, 2014 at 1:50 AM, Kirill Yukhin kirill.yuk...@gmail.com 
 wrote:
 Your patch is correct IMHO, but maybe it worst to add all missing
 `mm512_set1*' stuff?

 According to trunk and [1] we're still missing (beside mentioned by you)
 _mm512_set1_epi16 and  _mm512_set1_epi8 broadcasts.

 Yes, more are missing, but I think those will need new builtins.  The
 _ps and _pd don't require additional instructions.

 _mm512_set1_epi16 might have to map to vpbroadcastw. _mm512_set1_epi8
 might have to map to vpbroadcastb.  I haven't seen a way to generate
 those instructions if needed and so this work was out of scope for now
 due to time constraints.  I agree, they should be added as quickly as
 possible to avoid releasing headers with incomplete APIs.

 What is the verdict on checking these changes in?  Too late for the
 next release?

This kind of changes can also be made for 4.9.1 for example.

Richard.


Re: [PATCH] x86: _mm512_set1_p[sd]

2014-03-24 Thread Uros Bizjak
Hello!

 On Mon, Mar 24, 2014 at 12:13 PM, Ulrich Drepper drep...@gmail.com wrote:
 Your patch is correct IMHO, but maybe it worst to add all missing
 `mm512_set1*' stuff?

 According to trunk and [1] we're still missing (beside mentioned by you)
 _mm512_set1_epi16 and  _mm512_set1_epi8 broadcasts.

 Yes, more are missing, but I think those will need new builtins.  The
 _ps and _pd don't require additional instructions.

 _mm512_set1_epi16 might have to map to vpbroadcastw. _mm512_set1_epi8
 might have to map to vpbroadcastb.  I haven't seen a way to generate
 those instructions if needed and so this work was out of scope for now
 due to time constraints.  I agree, they should be added as quickly as
 possible to avoid releasing headers with incomplete APIs.

 What is the verdict on checking these changes in?  Too late for the
 next release?

 This kind of changes can also be made for 4.9.1 for example.

OTOH, these changes are isolated to intrinsic header files, and we
have quite extensive testsuite for these. I see no problem to check-in
these changes even at this stage.

So, if there is no better solution I propose to check these changes
in, since the benefit to users outweight (minor) risk. Would this be
OK from RM POV, also weighting in benefits to users?

Uros


Re: [PATCH] x86: _mm512_set1_p[sd]

2014-03-24 Thread Richard Biener
On Mon, Mar 24, 2014 at 1:25 PM, Uros Bizjak ubiz...@gmail.com wrote:
 Hello!

 On Mon, Mar 24, 2014 at 12:13 PM, Ulrich Drepper drep...@gmail.com wrote:
 Your patch is correct IMHO, but maybe it worst to add all missing
 `mm512_set1*' stuff?

 According to trunk and [1] we're still missing (beside mentioned by you)
 _mm512_set1_epi16 and  _mm512_set1_epi8 broadcasts.

 Yes, more are missing, but I think those will need new builtins.  The
 _ps and _pd don't require additional instructions.

 _mm512_set1_epi16 might have to map to vpbroadcastw. _mm512_set1_epi8
 might have to map to vpbroadcastb.  I haven't seen a way to generate
 those instructions if needed and so this work was out of scope for now
 due to time constraints.  I agree, they should be added as quickly as
 possible to avoid releasing headers with incomplete APIs.

 What is the verdict on checking these changes in?  Too late for the
 next release?

 This kind of changes can also be made for 4.9.1 for example.

 OTOH, these changes are isolated to intrinsic header files, and we
 have quite extensive testsuite for these. I see no problem to check-in
 these changes even at this stage.

 So, if there is no better solution I propose to check these changes
 in, since the benefit to users outweight (minor) risk. Would this be
 OK from RM POV, also weighting in benefits to users?

Yes, sure.  I've just meant that it's ok to do more work for 4.9.1, too.

Richard.

 Uros


Re: [PATCH] x86: _mm512_set1_p[sd]

2014-03-24 Thread Jakub Jelinek
On Mon, Mar 24, 2014 at 01:33:11PM +0100, Richard Biener wrote:
 On Mon, Mar 24, 2014 at 1:25 PM, Uros Bizjak ubiz...@gmail.com wrote:
  Hello!
 
  On Mon, Mar 24, 2014 at 12:13 PM, Ulrich Drepper drep...@gmail.com wrote:
  Your patch is correct IMHO, but maybe it worst to add all missing
  `mm512_set1*' stuff?
 
  According to trunk and [1] we're still missing (beside mentioned by you)
  _mm512_set1_epi16 and  _mm512_set1_epi8 broadcasts.
 
  Yes, more are missing, but I think those will need new builtins.  The
  _ps and _pd don't require additional instructions.
 
  _mm512_set1_epi16 might have to map to vpbroadcastw. _mm512_set1_epi8
  might have to map to vpbroadcastb.  I haven't seen a way to generate
  those instructions if needed and so this work was out of scope for now
  due to time constraints.  I agree, they should be added as quickly as
  possible to avoid releasing headers with incomplete APIs.
 
  What is the verdict on checking these changes in?  Too late for the
  next release?
 
  This kind of changes can also be made for 4.9.1 for example.
 
  OTOH, these changes are isolated to intrinsic header files, and we
  have quite extensive testsuite for these. I see no problem to check-in
  these changes even at this stage.
 
  So, if there is no better solution I propose to check these changes
  in, since the benefit to users outweight (minor) risk. Would this be
  OK from RM POV, also weighting in benefits to users?
 
 Yes, sure.  I've just meant that it's ok to do more work for 4.9.1, too.

But, if for say _mm512_set1_epi8 you have no intrinsics, just do something
similar to what _mm256_set_epi8 and _mm256_set1_epi8 do, the compiler should
be smart enough to recognize those as broadcasts.

The following is recognized well:

typedef char v32qi __attribute__((vector_size (32)));
v32qi foo (char a)
{
  return (v32qi) { a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, 
a, a, a, a, a, a, a, a, a, a, a, a };
}

This isn't:

typedef char v64qi __attribute__((vector_size (64)));
v64qi foo (char a)
{
  return (v64qi) { a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, 
a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, 
a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a };
}

But I believe it has been discussed already that the V32HImode and V64QImode
support is incomplete in 4.9.  While I think there are no direct broadcasts
for these modes, one can e.g. use AVX2 broadcasts and then duplicate into
the 512-bit mode.
See http://gcc.gnu.org/ml/gcc-patches/2014-01/msg00757.html

Jakub


Re: [PATCH] x86: _mm512_set1_p[sd]

2014-03-24 Thread Uros Bizjak
Hello!

 Another set of functions missing are those to set all elements of a
 512-bit vector to the same float or double value.  I think the patch
 below uses the optimal code sequence for that.  The patch requires the
 previous patch introducing _mm*_undefined_*.


 2014-03-19  Ulrich Drepper  drep...@gmail.com

 * config/i386/avx512fintrin.h: Define _mm512_set1_ps and
 _mm512_set1_pd.

Based on Kirill's feedback, the patch is OK for mainline.

Thanks,
Uros.


Re: [PATCH] x86: _mm512_set1_p[sd]

2014-03-23 Thread Kirill Yukhin
Hello Ulrich,
On 19 Mar 22:41, Ulrich Drepper wrote:
 Another set of functions missing are those to set all elements of a
 512-bit vector to the same float or double value.  I think the patch
 below uses the optimal code sequence for that.  The patch requires the
 previous patch introducing _mm*_undefined_*.
 
 
 2014-03-19  Ulrich Drepper  drep...@gmail.com
 
   * config/i386/avx512fintrin.h: Define _mm512_set1_ps and
   _mm512_set1_pd.
Your patch is correct IMHO, but maybe it worst to add all missing
`mm512_set1*' stuff?

According to trunk and [1] we're still missing (beside mentioned by you)
_mm512_set1_epi16 and  _mm512_set1_epi8 broadcasts.

[1] - http://software.intel.com/sites/landingpage/IntrinsicsGuide/

--
Thanks, K


[PATCH] x86: _mm512_set1_p[sd]

2014-03-19 Thread Ulrich Drepper
Another set of functions missing are those to set all elements of a
512-bit vector to the same float or double value.  I think the patch
below uses the optimal code sequence for that.  The patch requires the
previous patch introducing _mm*_undefined_*.


2014-03-19  Ulrich Drepper  drep...@gmail.com

* config/i386/avx512fintrin.h: Define _mm512_set1_ps and
_mm512_set1_pd.


diff -u b/gcc/config/i386/avx512fintrin.h b/gcc/config/i386/avx512fintrin.h
--- b/gcc/config/i386/avx512fintrin.h
+++ b/gcc/config/i386/avx512fintrin.h
@@ -130,6 +130,28 @@
   return __Y;
 }
 
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_set1_pd (double __A)
+{
+  return (__m512d) __builtin_ia32_broadcastsd512 (__extension__
+ (__v2df) { __A, },
+ (__v8df)
+ _mm512_undefined_pd (),
+ (__mmask8) -1);
+}
+
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_set1_ps (float __A)
+{
+  return (__m512) __builtin_ia32_broadcastss512 (__extension__
+(__v4sf) { __A, },
+(__v16sf)
+_mm512_undefined_ps (),
+(__mmask16) -1);
+}
+
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_setzero_ps (void)