On Mon, May 23, 2016 at 10:15 AM, Jakub Jelinek <ja...@redhat.com> wrote: > Hi! > > The vbroadcastss and vpermilps insns are already in AVX512F & AVX512VL, > so can be used with v instead of x, the splitter case where we for AVX > emit vpermilps plus vpermf128 is more problematic, because the latter > insn isn't available in EVEX. But, we can get the same effect with > vshuff32x4 when both source operands are the same. > Alternatively, we could replace the vpermilps and vshuff32x4 insns > with the AVX512VL arbitrary permutations I think, the question is > what is faster, because we'd need to load the mask from memory. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2016-05-23 Jakub Jelinek <ja...@redhat.com> > > * config/i386/sse.md > (<mask_codefor>avx512vl_shuf_<shuffletype>32x4_1<mask_name>): Rename > to ... > (avx512vl_shuf_<shuffletype>32x4_1<mask_name>): ... this. > (*avx_vperm_broadcast_v4sf): Use v constraint instead of x. Use > maybe_evex prefix instead of vex. > (*avx_vperm_broadcast_<mode>): Use v constraint instead of x. Handle > EXT_REX_SSE_REG_P (op0) case in the splitter. > > * gcc.target/i386/avx512vl-vbroadcast-3.c: New test. >
The new test fails on x32 due to 32-bit register in address. This patch fixes it. Tested on x86-64. OK for trunk? Thanks. H.J. ---- 2016-05-31 H.J. Lu <hongjiu...@intel.com> * gcc.target/i386/avx512vl-vbroadcast-3.c: Scan %\[re\]di instead of %rdi. * gcc.target/i386/avx512vl-vcvtps2ph-3.c: Likewise. diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-vbroadcast-3.c b/gcc/testsuite/gcc.target/i386/avx512vl-vbroadcast-3.c index d981fe4..7233398 100644 --- a/gcc/testsuite/gcc.target/i386/avx512vl-vbroadcast-3.c +++ b/gcc/testsuite/gcc.target/i386/avx512vl-vbroadcast-3.c @@ -150,9 +150,9 @@ f16 (V2 *x) asm volatile ("" : "+v" (a)); } -/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\r]*%rdi\[^\n\r]*%xmm16" 4 } } */ +/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\r]*%\[re\]di\[^\n\r]*%xmm16" 4 } } */ /* { dg-final { scan-assembler-times "vbroadcastss\[^\n\r]*%xmm16\[^\n\r]*%ymm16" 3 } } */ -/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\r]*%rdi\[^\n\r]*%ymm16" 3 } } */ +/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\r]*%\[re\]di\[^\n\r]*%ymm16" 3 } } */ /* { dg-final { scan-assembler-times "vpermilps\[^\n\r]*\\\$0\[^\n\r]*%xmm16\[^\n\r]*%xmm16" 1 } } */ /* { dg-final { scan-assembler-times "vpermilps\[^\n\r]*\\\$85\[^\n\r]*%xmm16\[^\n\r]*%xmm16" 1 } } */ /* { dg-final { scan-assembler-times "vpermilps\[^\n\r]*\\\$170\[^\n\r]*%xmm16\[^\n\r]*%xmm16" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-3.c b/gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-3.c index 2fd2215..c2e3f01 100644 --- a/gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-3.c +++ b/gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-3.c @@ -38,4 +38,4 @@ f3 (__m256 x, __v8hi *y) *y = (__v8hi) _mm256_cvtps_ph (a, 1); } -/* { dg-final { scan-assembler "vcvtps2ph\[^\n\r]*\\\$1\[^\n\r]*%ymm16\[^\n\r]*%rdi" } } */ +/* { dg-final { scan-assembler "vcvtps2ph\[^\n\r]*\\\$1\[^\n\r]*%ymm16\[^\n\r]*%\[re\]di" } } */