Re: [PATCH] x86: _mm512_set1_p[sd]
On Mon, Mar 24, 2014 at 9:09 AM, Jakub Jelinek ja...@redhat.com wrote: The following is recognized well: typedef char v32qi __attribute__((vector_size (32))); v32qi foo (char a) { return (v32qi) { a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a }; } Perhaps well but not optimal. The created code is vmovd %edi, %xmm0 vpbroadcastb%xmm0, %xmm0 vinserti128 $1, %xmm0, %ymm0, %ymm0 It should generate for AVX2 vmovd %edi, %xmm0 vpbroadcastb%xmm0, %ymm0
Re: [PATCH] x86: _mm512_set1_p[sd]
On Mon, Mar 24, 2014 at 1:50 AM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Your patch is correct IMHO, but maybe it worst to add all missing `mm512_set1*' stuff? According to trunk and [1] we're still missing (beside mentioned by you) _mm512_set1_epi16 and _mm512_set1_epi8 broadcasts. Yes, more are missing, but I think those will need new builtins. The _ps and _pd don't require additional instructions. _mm512_set1_epi16 might have to map to vpbroadcastw. _mm512_set1_epi8 might have to map to vpbroadcastb. I haven't seen a way to generate those instructions if needed and so this work was out of scope for now due to time constraints. I agree, they should be added as quickly as possible to avoid releasing headers with incomplete APIs. What is the verdict on checking these changes in? Too late for the next release?
Re: [PATCH] x86: _mm512_set1_p[sd]
On Mon, Mar 24, 2014 at 12:13 PM, Ulrich Drepper drep...@gmail.com wrote: On Mon, Mar 24, 2014 at 1:50 AM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Your patch is correct IMHO, but maybe it worst to add all missing `mm512_set1*' stuff? According to trunk and [1] we're still missing (beside mentioned by you) _mm512_set1_epi16 and _mm512_set1_epi8 broadcasts. Yes, more are missing, but I think those will need new builtins. The _ps and _pd don't require additional instructions. _mm512_set1_epi16 might have to map to vpbroadcastw. _mm512_set1_epi8 might have to map to vpbroadcastb. I haven't seen a way to generate those instructions if needed and so this work was out of scope for now due to time constraints. I agree, they should be added as quickly as possible to avoid releasing headers with incomplete APIs. What is the verdict on checking these changes in? Too late for the next release? This kind of changes can also be made for 4.9.1 for example. Richard.
Re: [PATCH] x86: _mm512_set1_p[sd]
Hello! On Mon, Mar 24, 2014 at 12:13 PM, Ulrich Drepper drep...@gmail.com wrote: Your patch is correct IMHO, but maybe it worst to add all missing `mm512_set1*' stuff? According to trunk and [1] we're still missing (beside mentioned by you) _mm512_set1_epi16 and _mm512_set1_epi8 broadcasts. Yes, more are missing, but I think those will need new builtins. The _ps and _pd don't require additional instructions. _mm512_set1_epi16 might have to map to vpbroadcastw. _mm512_set1_epi8 might have to map to vpbroadcastb. I haven't seen a way to generate those instructions if needed and so this work was out of scope for now due to time constraints. I agree, they should be added as quickly as possible to avoid releasing headers with incomplete APIs. What is the verdict on checking these changes in? Too late for the next release? This kind of changes can also be made for 4.9.1 for example. OTOH, these changes are isolated to intrinsic header files, and we have quite extensive testsuite for these. I see no problem to check-in these changes even at this stage. So, if there is no better solution I propose to check these changes in, since the benefit to users outweight (minor) risk. Would this be OK from RM POV, also weighting in benefits to users? Uros
Re: [PATCH] x86: _mm512_set1_p[sd]
On Mon, Mar 24, 2014 at 1:25 PM, Uros Bizjak ubiz...@gmail.com wrote: Hello! On Mon, Mar 24, 2014 at 12:13 PM, Ulrich Drepper drep...@gmail.com wrote: Your patch is correct IMHO, but maybe it worst to add all missing `mm512_set1*' stuff? According to trunk and [1] we're still missing (beside mentioned by you) _mm512_set1_epi16 and _mm512_set1_epi8 broadcasts. Yes, more are missing, but I think those will need new builtins. The _ps and _pd don't require additional instructions. _mm512_set1_epi16 might have to map to vpbroadcastw. _mm512_set1_epi8 might have to map to vpbroadcastb. I haven't seen a way to generate those instructions if needed and so this work was out of scope for now due to time constraints. I agree, they should be added as quickly as possible to avoid releasing headers with incomplete APIs. What is the verdict on checking these changes in? Too late for the next release? This kind of changes can also be made for 4.9.1 for example. OTOH, these changes are isolated to intrinsic header files, and we have quite extensive testsuite for these. I see no problem to check-in these changes even at this stage. So, if there is no better solution I propose to check these changes in, since the benefit to users outweight (minor) risk. Would this be OK from RM POV, also weighting in benefits to users? Yes, sure. I've just meant that it's ok to do more work for 4.9.1, too. Richard. Uros
Re: [PATCH] x86: _mm512_set1_p[sd]
On Mon, Mar 24, 2014 at 01:33:11PM +0100, Richard Biener wrote: On Mon, Mar 24, 2014 at 1:25 PM, Uros Bizjak ubiz...@gmail.com wrote: Hello! On Mon, Mar 24, 2014 at 12:13 PM, Ulrich Drepper drep...@gmail.com wrote: Your patch is correct IMHO, but maybe it worst to add all missing `mm512_set1*' stuff? According to trunk and [1] we're still missing (beside mentioned by you) _mm512_set1_epi16 and _mm512_set1_epi8 broadcasts. Yes, more are missing, but I think those will need new builtins. The _ps and _pd don't require additional instructions. _mm512_set1_epi16 might have to map to vpbroadcastw. _mm512_set1_epi8 might have to map to vpbroadcastb. I haven't seen a way to generate those instructions if needed and so this work was out of scope for now due to time constraints. I agree, they should be added as quickly as possible to avoid releasing headers with incomplete APIs. What is the verdict on checking these changes in? Too late for the next release? This kind of changes can also be made for 4.9.1 for example. OTOH, these changes are isolated to intrinsic header files, and we have quite extensive testsuite for these. I see no problem to check-in these changes even at this stage. So, if there is no better solution I propose to check these changes in, since the benefit to users outweight (minor) risk. Would this be OK from RM POV, also weighting in benefits to users? Yes, sure. I've just meant that it's ok to do more work for 4.9.1, too. But, if for say _mm512_set1_epi8 you have no intrinsics, just do something similar to what _mm256_set_epi8 and _mm256_set1_epi8 do, the compiler should be smart enough to recognize those as broadcasts. The following is recognized well: typedef char v32qi __attribute__((vector_size (32))); v32qi foo (char a) { return (v32qi) { a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a }; } This isn't: typedef char v64qi __attribute__((vector_size (64))); v64qi foo (char a) { return (v64qi) { a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a }; } But I believe it has been discussed already that the V32HImode and V64QImode support is incomplete in 4.9. While I think there are no direct broadcasts for these modes, one can e.g. use AVX2 broadcasts and then duplicate into the 512-bit mode. See http://gcc.gnu.org/ml/gcc-patches/2014-01/msg00757.html Jakub
Re: [PATCH] x86: _mm512_set1_p[sd]
Hello! Another set of functions missing are those to set all elements of a 512-bit vector to the same float or double value. I think the patch below uses the optimal code sequence for that. The patch requires the previous patch introducing _mm*_undefined_*. 2014-03-19 Ulrich Drepper drep...@gmail.com * config/i386/avx512fintrin.h: Define _mm512_set1_ps and _mm512_set1_pd. Based on Kirill's feedback, the patch is OK for mainline. Thanks, Uros.
Re: [PATCH] x86: _mm512_set1_p[sd]
Hello Ulrich, On 19 Mar 22:41, Ulrich Drepper wrote: Another set of functions missing are those to set all elements of a 512-bit vector to the same float or double value. I think the patch below uses the optimal code sequence for that. The patch requires the previous patch introducing _mm*_undefined_*. 2014-03-19 Ulrich Drepper drep...@gmail.com * config/i386/avx512fintrin.h: Define _mm512_set1_ps and _mm512_set1_pd. Your patch is correct IMHO, but maybe it worst to add all missing `mm512_set1*' stuff? According to trunk and [1] we're still missing (beside mentioned by you) _mm512_set1_epi16 and _mm512_set1_epi8 broadcasts. [1] - http://software.intel.com/sites/landingpage/IntrinsicsGuide/ -- Thanks, K
[PATCH] x86: _mm512_set1_p[sd]
Another set of functions missing are those to set all elements of a 512-bit vector to the same float or double value. I think the patch below uses the optimal code sequence for that. The patch requires the previous patch introducing _mm*_undefined_*. 2014-03-19 Ulrich Drepper drep...@gmail.com * config/i386/avx512fintrin.h: Define _mm512_set1_ps and _mm512_set1_pd. diff -u b/gcc/config/i386/avx512fintrin.h b/gcc/config/i386/avx512fintrin.h --- b/gcc/config/i386/avx512fintrin.h +++ b/gcc/config/i386/avx512fintrin.h @@ -130,6 +130,28 @@ return __Y; } +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_set1_pd (double __A) +{ + return (__m512d) __builtin_ia32_broadcastsd512 (__extension__ + (__v2df) { __A, }, + (__v8df) + _mm512_undefined_pd (), + (__mmask8) -1); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_set1_ps (float __A) +{ + return (__m512) __builtin_ia32_broadcastss512 (__extension__ +(__v4sf) { __A, }, +(__v16sf) +_mm512_undefined_ps (), +(__mmask16) -1); +} + extern __inline __m512 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm512_setzero_ps (void)