On Mon, Oct 29, 2018 at 2:02 PM Uros Bizjak <ubiz...@gmail.com> wrote:
>
> On Sat, Oct 27, 2018 at 8:03 AM H.J. Lu <hjl.to...@gmail.com> wrote:
> >
> > Use scalar operand in SF/DF/SI/DI vec_dup patterns which enables combiner
> > to generate
> >
> > (set (reg:V8SF 84)
> >      (vec_duplicate:V8SF (mem/c:SF (symbol_ref:DI ("y")))))
> >
> > const_vector_duplicate_operand is added for constant vector broadcast.
> > We split
> >
> > (set (reg:V16SF 86)
> >      (const_vector:V16SF
> >        [(const_double:SF 2.0e+0 [0x0.8p+2]) repeated x16])
> >
> > to
> >
> > (set (reg:V16SF 86)
> >      (vec_duplicate:V16SF (mem/u/c:SF (symbol_ref/u:DI ("*.LC1")))))
>
> Why not at the expand time? Rewrite vector constant as vec_duplicate
> from memory and combine will do the stuff for you. We do have _bcst
> instruction patterns.
>

Here is the updated patch to do that.  OK for trunk?

Thanks.


-- 
H.J.
From 0c2ffe8a627c64263805baba8c9d9754dbb30f4b Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.to...@gmail.com>
Date: Tue, 2 Oct 2018 14:27:55 -0700
Subject: [PATCH] i386: Use scalar operand in SF/DF/SI/DI vec_dup patterns

Use scalar operand in SF/DF/SI/DI vec_dup patterns for AVX512 which
enables combiner to generate

(set (reg:V8SF 84)
     (vec_duplicate:V8SF (mem/c:SF (symbol_ref:DI ("y")))))

To support it, the following changes are made:

1. For AVX512 broadcast instructions from integer register operand, we
only need to broadcast integer to integer vectors.
2. Replace nonimmediate_operand with register_operand in vec_dup patterns
since memory operand size is wrong.  Add vec_dup patterns with
memory_operand of correct operand size.
3. Replace duplicated vec_dup patterns with subreg.
4. Update AVX512 broadcast expanders to optimize constant SF/DF/SI/DI
vector broadcasts.
5. Add const_vector_duplicate_operand for constant vector broadcast.
We split

(set (reg:V16SF 86)
     (const_vector:V16SF
       [(const_double:SF 2.0e+0 [0x0.8p+2]) repeated x16])

to

(set (reg:V16SF 86)
     (vec_duplicate:V16SF (mem/u/c:SF (symbol_ref/u:DI ("*.LC1")))))

before IRA so tha IRA can turn

(set (reg:V16SF 86)
     (vec_duplicate:V16SF (mem/u/c:SF (symbol_ref/u:DI ("*.LC1")))))
(set (reg:V16SF 90)
     (plus:V16SF (reg/v:V16SF 85 [ x ])
		 (reg:V16SF 86)))

into

(set (reg:V16SF 90)
     (plus:V16SF
       (vec_duplicate:V16SF (mem/u/c:SF (symbol_ref/u:DI ("*.LC1"))))
       (reg/v:V16SF 85 [ x ])))

gcc/

	PR target/87537
	PR target/87767
	* config/i386/i386-builtin-types.def: Replace
	CODE_FOR_avx2_vec_dupv4sf, CODE_FOR_avx2_vec_dupv8sf and
	CODE_FOR_avx2_vec_dupv4df with CODE_FOR_vec_dupv4sf,
	CODE_FOR_vec_dupv8sf and CODE_FOR_vec_dupv4df, respectively.
	* config/i386/i386.c (ix86_expand_args_builtin): Handle
	SF/DF/SI/DI constant vector broadcast.
	(expand_vec_perm_1): Updated.  Duplicate them from source operand.
	* config/i386/i386.md (SF to DF splitter): Replace
	gen_avx512f_vec_dupv16sf_1 with gen_avx512f_vec_dupv16sf.
	* config/i386/predicates.md (const_vector_duplicate_operand): New.
	* config/i386/sse.md (VF48_AVX512VL): New.
	(avx2_vec_dup<mode>): Removed.
	(avx2_vec_dupv8sf_1): Likewise.
	(avx512f_vec_dup<mode>_1): Likewise.
	(avx2_pbroadcast<mode>_1): Likewise.
	(avx2_vec_dupv4df): Likewise.
	(<avx512>_vec_dup<mode>_1): Likewise.
	(<avx512>_vec_dup<mode><mask_name>:V48_AVX512VL): Likewise.
	(avx2_pbroadcast<mode>): Replace nonimmediate_operand with
	register_operand.
	(<avx512>_vec_dup<mode><mask_name>:VI48_AVX512VL): Likewise.
	(<avx512>_vec_dup<mode><mask_name>:VI12_AVX512VL): Likewise.
	(<avx512>_vec_dup<mode><mask_name>:VF48_AVX512VL): New.
	(*const_vec_dup<mode>): Likewise.
	(<avx512>_vec_dup<mode><mask_name>:VI48_AVX512VL): Likewise.
	(<avx512>_vec_dup<mode><mask_name>_1:VI48_AVX512VL): Likewise.
	(<mask_codefor><avx512>_vec_dup_gpr<mode><mask_name>): Replace
	V48_AVX512VL with VI48_AVX512VL.
	(*avx_vperm_broadcast_<mode>): Replace gen_avx2_vec_dupv8sf with
	gen_vec_dupv8sf.

gcc/testsuite/

	PR target/87537
	PR target/87767
	* gcc.target/i386/avx2-vbroadcastss_ps256-1.c: Updated.
	* gcc.target/i386/avx512vl-vbroadcast-3.c: Likewise.
	* gcc.target/i386/avx512-binop-7.h: New file.
	* gcc.target/i386/avx512f-add-sf-zmm-7.c: Likewise.
	* gcc.target/i386/avx512f-add-si-zmm-7.c: Likewise.
	* gcc.target/i386/avx512vl-add-di-xmm-7.c: Likewise.
	* gcc.target/i386/avx512vl-add-sf-xmm-7.c: Likewise.
	* gcc.target/i386/avx512vl-add-sf-ymm-7.c: Likewise.
	* gcc.target/i386/avx512vl-add-si-xmm-7.c: Likewise.
	* gcc.target/i386/avx512vl-add-si-ymm-7.c: Likewise.
	* gcc.target/i386/pr87537-2.c: Likewise.
	* gcc.target/i386/pr87537-3.c: Likewise.
	* gcc.target/i386/pr87537-4.c: Likewise.
	* gcc.target/i386/pr87537-5.c: Likewise.
	* gcc.target/i386/pr87537-6.c: Likewise.
	* gcc.target/i386/pr87537-7.c: Likewise.
	* gcc.target/i386/pr87537-8.c: Likewise.
	* gcc.target/i386/pr87537-9.c: Likewise.
---
 gcc/config/i386/i386-builtin.def              |   6 +-
 gcc/config/i386/i386.c                        | 212 ++++++++++++++++--
 gcc/config/i386/i386.md                       |   2 +-
 gcc/config/i386/predicates.md                 |  13 ++
 gcc/config/i386/sse.md                        | 145 +++++-------
 .../i386/avx2-vbroadcastss_ps256-1.c          |   3 +-
 .../gcc.target/i386/avx512-binop-7.h          |  12 +
 .../gcc.target/i386/avx512f-add-sf-zmm-7.c    |  14 ++
 .../gcc.target/i386/avx512f-add-si-zmm-7.c    |  12 +
 .../gcc.target/i386/avx512vl-add-di-xmm-7.c   |  13 ++
 .../gcc.target/i386/avx512vl-add-sf-xmm-7.c   |  13 ++
 .../gcc.target/i386/avx512vl-add-sf-ymm-7.c   |  13 ++
 .../gcc.target/i386/avx512vl-add-si-xmm-7.c   |  13 ++
 .../gcc.target/i386/avx512vl-add-si-ymm-7.c   |  13 ++
 .../gcc.target/i386/avx512vl-vbroadcast-3.c   |   5 +-
 gcc/testsuite/gcc.target/i386/pr87537-2.c     |  12 +
 gcc/testsuite/gcc.target/i386/pr87537-3.c     |  12 +
 gcc/testsuite/gcc.target/i386/pr87537-4.c     |  12 +
 gcc/testsuite/gcc.target/i386/pr87537-5.c     |  12 +
 gcc/testsuite/gcc.target/i386/pr87537-6.c     |  12 +
 gcc/testsuite/gcc.target/i386/pr87537-7.c     |  12 +
 gcc/testsuite/gcc.target/i386/pr87537-8.c     |  12 +
 gcc/testsuite/gcc.target/i386/pr87537-9.c     |  12 +
 23 files changed, 462 insertions(+), 123 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512-binop-7.h
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-add-sf-zmm-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-add-si-zmm-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-add-di-xmm-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-add-sf-xmm-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-add-sf-ymm-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-add-si-xmm-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-add-si-ymm-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr87537-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr87537-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr87537-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr87537-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr87537-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr87537-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr87537-8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr87537-9.c

diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index df0f7e975ac..d217add8ee2 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -1194,9 +1194,9 @@ BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_interleave_lowv16hi, "__builtin_ia32_
 BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_interleave_lowv8si, "__builtin_ia32_punpckldq256", IX86_BUILTIN_PUNPCKLDQ256, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI)
 BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_interleave_lowv4di, "__builtin_ia32_punpcklqdq256", IX86_BUILTIN_PUNPCKLQDQ256, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI)
 BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_xorv4di3, "__builtin_ia32_pxor256", IX86_BUILTIN_PXOR256, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI)
-BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_vec_dupv4sf, "__builtin_ia32_vbroadcastss_ps", IX86_BUILTIN_VBROADCASTSS_PS, UNKNOWN, (int) V4SF_FTYPE_V4SF)
-BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_vec_dupv8sf, "__builtin_ia32_vbroadcastss_ps256", IX86_BUILTIN_VBROADCASTSS_PS256, UNKNOWN, (int) V8SF_FTYPE_V4SF)
-BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_vec_dupv4df, "__builtin_ia32_vbroadcastsd_pd256", IX86_BUILTIN_VBROADCASTSD_PD256, UNKNOWN, (int) V4DF_FTYPE_V2DF)
+BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_vec_dupv4sf, "__builtin_ia32_vbroadcastss_ps", IX86_BUILTIN_VBROADCASTSS_PS, UNKNOWN, (int) V4SF_FTYPE_V4SF)
+BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_vec_dupv8sf, "__builtin_ia32_vbroadcastss_ps256", IX86_BUILTIN_VBROADCASTSS_PS256, UNKNOWN, (int) V8SF_FTYPE_V4SF)
+BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_vec_dupv4df, "__builtin_ia32_vbroadcastsd_pd256", IX86_BUILTIN_VBROADCASTSD_PD256, UNKNOWN, (int) V4DF_FTYPE_V2DF)
 BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_vbroadcasti128_v4di, "__builtin_ia32_vbroadcastsi256", IX86_BUILTIN_VBROADCASTSI256, UNKNOWN, (int) V4DI_FTYPE_V2DI)
 BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_pblenddv4si, "__builtin_ia32_pblendd128", IX86_BUILTIN_PBLENDD128, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_INT)
 BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_pblenddv8si, "__builtin_ia32_pblendd256", IX86_BUILTIN_PBLENDD256, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_INT)
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 963c7fcbb34..af5bd2bebdb 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -35028,6 +35028,9 @@ ix86_expand_args_builtin (const struct builtin_description *d,
       target = lowpart_subreg (rmode, real_target, tmode);
     }
 
+  bool const_vec_dup = false;
+  bool all_1s_mask = false;
+
   for (i = 0; i < nargs; i++)
     {
       tree arg = CALL_EXPR_ARG (exp, i);
@@ -35035,6 +35038,61 @@ ix86_expand_args_builtin (const struct builtin_description *d,
       machine_mode mode = insn_p->operand[i + 1].mode;
       bool match = insn_p->operand[i + 1].predicate (op, mode);
 
+      if (!match)
+	{
+	  switch (icode)
+	    {
+	    case CODE_FOR_avx512f_vec_dupv16sf_mask:
+	    case CODE_FOR_avx512f_vec_dupv8df_mask:
+	    case CODE_FOR_avx512vl_vec_dupv8sf_mask:
+	    case CODE_FOR_avx512vl_vec_dupv4df_mask:
+	    case CODE_FOR_avx512vl_vec_dupv4sf_mask:
+	    case CODE_FOR_avx512vl_vec_dupv2df_mask:
+	      if (i == 0 && GET_CODE (op) == CONST_VECTOR)
+		{
+		  match = true;
+		  const_vec_dup = true;
+		  op = CONST_VECTOR_ELT (op, 0);
+		}
+	      break;
+	    case CODE_FOR_avx512f_vec_dup_gprv16si_mask:
+	    case CODE_FOR_avx512f_vec_dup_gprv8di_mask:
+	    case CODE_FOR_avx512vl_vec_dup_gprv8si_mask:
+	    case CODE_FOR_avx512vl_vec_dup_gprv4si_mask:
+	    case CODE_FOR_avx512vl_vec_dup_gprv4di_mask:
+	    case CODE_FOR_avx512vl_vec_dup_gprv2di_mask:
+	      if (i == 0 && CONST_INT_P (op))
+		{
+		  match = true;
+		  const_vec_dup = true;
+		}
+	      break;
+	    default:
+	      break;
+	    }
+
+	  if (i == 0)
+	    {
+	      if (match)
+		{
+		  op = force_const_mem (GET_MODE_INNER (tmode), op);
+		  op = validize_mem (op);
+		}
+	    }
+	  else if (const_vec_dup && i == 2)
+	    {
+	      if (CONST_INT_P (op))
+		{
+		  unsigned int nunits = GET_MODE_NUNITS (tmode);
+		  if (INTVAL (op) == (1 << nunits) - 1)
+		    {
+		      all_1s_mask = true;
+		      match = true;
+		    }
+		}
+	    }
+	}
+
       if (second_arg_count && i == 1)
 	{
 	  /* SIMD shift insns take either an 8-bit immediate or
@@ -35198,15 +35256,18 @@ ix86_expand_args_builtin (const struct builtin_description *d,
 
 	  op = fixup_modeless_constant (op, mode);
 
-	  if (GET_MODE (op) == mode || GET_MODE (op) == VOIDmode)
+	  if (!const_vec_dup || !match || i == 1)
 	    {
-	      if (optimize || !match || num_memory > 1)
-		op = copy_to_mode_reg (mode, op);
-	    }
-	  else
-	    {
-	      op = copy_to_reg (op);
-	      op = lowpart_subreg (mode, op, GET_MODE (op));
+	      if (GET_MODE (op) == mode || GET_MODE (op) == VOIDmode)
+		{
+		  if (optimize || !match || num_memory > 1)
+		    op = copy_to_mode_reg (mode, op);
+		}
+	      else
+		{
+		  op = copy_to_reg (op);
+		  op = lowpart_subreg (mode, op, GET_MODE (op));
+		}
 	    }
 	}
 
@@ -35223,8 +35284,82 @@ ix86_expand_args_builtin (const struct builtin_description *d,
       pat = GEN_FCN (icode) (real_target, args[0].op, args[1].op);
       break;
     case 3:
-      pat = GEN_FCN (icode) (real_target, args[0].op, args[1].op,
-			     args[2].op);
+      if (const_vec_dup)
+	{
+	  switch (icode)
+	    {
+	    case CODE_FOR_avx512f_vec_dupv16sf_mask:
+	      if (all_1s_mask)
+		icode = CODE_FOR_avx512f_vec_dupv16sf;
+	      break;
+	    case CODE_FOR_avx512f_vec_dupv8df_mask:
+	      if (all_1s_mask)
+		icode = CODE_FOR_avx512f_vec_dupv8df;
+	      break;
+	    case CODE_FOR_avx512vl_vec_dupv8sf_mask:
+	      if (all_1s_mask)
+		icode = CODE_FOR_avx512vl_vec_dupv8sf;
+	      break;
+	    case CODE_FOR_avx512vl_vec_dupv4df_mask:
+	      if (all_1s_mask)
+		icode = CODE_FOR_avx512vl_vec_dupv4df;
+	      break;
+	    case CODE_FOR_avx512vl_vec_dupv4sf_mask:
+	      if (all_1s_mask)
+		icode = CODE_FOR_avx512vl_vec_dupv4sf;
+	      break;
+	    case CODE_FOR_avx512vl_vec_dupv2df_mask:
+	      if (all_1s_mask)
+		icode = CODE_FOR_avx512vl_vec_dupv2df;
+	      break;
+	    case CODE_FOR_avx512f_vec_dup_gprv16si_mask:
+	      if (all_1s_mask)
+		icode = CODE_FOR_avx512f_vec_dupv16si_1;
+	      else
+		icode = CODE_FOR_avx512f_vec_dupv16si_mask_1;
+	      break;
+	    case CODE_FOR_avx512f_vec_dup_gprv8di_mask:
+	      if (all_1s_mask)
+		icode = CODE_FOR_avx512f_vec_dupv8di_1;
+	      else
+		icode = CODE_FOR_avx512f_vec_dupv8di_mask_1;
+	      break;
+	    case CODE_FOR_avx512vl_vec_dup_gprv8si_mask:
+	      if (all_1s_mask)
+		icode = CODE_FOR_avx512vl_vec_dupv8si_1;
+	      else
+		icode = CODE_FOR_avx512vl_vec_dupv8si_mask_1;
+	      break;
+	    case CODE_FOR_avx512vl_vec_dup_gprv4si_mask:
+	      if (all_1s_mask)
+		icode = CODE_FOR_avx512vl_vec_dupv4si_1;
+	      else
+		icode = CODE_FOR_avx512vl_vec_dupv4si_mask_1;
+	      break;
+	    case CODE_FOR_avx512vl_vec_dup_gprv4di_mask:
+	      if (all_1s_mask)
+		icode = CODE_FOR_avx512vl_vec_dupv4di_1;
+	      else
+		icode = CODE_FOR_avx512vl_vec_dupv4di_mask_1;
+	      break;
+	    case CODE_FOR_avx512vl_vec_dup_gprv2di_mask:
+	      if (all_1s_mask)
+		icode = CODE_FOR_avx512vl_vec_dupv2di_1;
+	      else
+		icode = CODE_FOR_avx512vl_vec_dupv2di_mask_1;
+	      break;
+	    default:
+	      break;
+	    }
+	  if (all_1s_mask)
+	    pat = GEN_FCN (icode) (real_target, args[0].op);
+	  else
+	    pat = GEN_FCN (icode) (real_target, args[0].op, args[1].op,
+				   args[2].op);
+	}
+      else
+	pat = GEN_FCN (icode) (real_target, args[0].op, args[1].op,
+			       args[2].op);
       break;
     case 4:
       pat = GEN_FCN (icode) (real_target, args[0].op, args[1].op,
@@ -45963,28 +46098,41 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d)
 	{
 	  /* Use vpbroadcast{b,w,d}.  */
 	  rtx (*gen) (rtx, rtx) = NULL;
+	  machine_mode smode = VOIDmode;
 	  switch (d->vmode)
 	    {
 	    case E_V64QImode:
 	      if (TARGET_AVX512BW)
-		gen = gen_avx512bw_vec_dupv64qi_1;
+		{
+		  smode = V16QImode;
+		  gen = gen_avx512bw_vec_dupv64qi;
+		}
 	      break;
 	    case E_V32QImode:
-	      gen = gen_avx2_pbroadcastv32qi_1;
+	      smode = V16QImode;
+	      gen = gen_avx2_pbroadcastv32qi;
 	      break;
 	    case E_V32HImode:
 	      if (TARGET_AVX512BW)
-		gen = gen_avx512bw_vec_dupv32hi_1;
+		{
+		  smode = V8HImode;
+		  gen = gen_avx512bw_vec_dupv32hi;
+		}
 	      break;
 	    case E_V16HImode:
-	      gen = gen_avx2_pbroadcastv16hi_1;
+	      smode = V8HImode;
+	      gen = gen_avx2_pbroadcastv16hi;
 	      break;
 	    case E_V16SImode:
 	      if (TARGET_AVX512F)
-		gen = gen_avx512f_vec_dupv16si_1;
+		{
+		  smode = V4SImode;
+		  gen = gen_avx512f_vec_dupv16si;
+		}
 	      break;
 	    case E_V8SImode:
-	      gen = gen_avx2_pbroadcastv8si_1;
+	      smode = V4SImode;
+	      gen = gen_avx2_pbroadcastv8si;
 	      break;
 	    case E_V16QImode:
 	      gen = gen_avx2_pbroadcastv16qi;
@@ -45993,19 +46141,25 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d)
 	      gen = gen_avx2_pbroadcastv8hi;
 	      break;
 	    case E_V16SFmode:
+	      smode = SFmode;
 	      if (TARGET_AVX512F)
-		gen = gen_avx512f_vec_dupv16sf_1;
+		gen = gen_avx512f_vec_dupv16sf;
 	      break;
 	    case E_V8SFmode:
-	      gen = gen_avx2_vec_dupv8sf_1;
+	      smode = SFmode;
+	      gen = gen_vec_dupv8sf;
 	      break;
 	    case E_V8DFmode:
+	      smode = DFmode;
 	      if (TARGET_AVX512F)
-		gen = gen_avx512f_vec_dupv8df_1;
+		gen = gen_avx512f_vec_dupv8df;
 	      break;
 	    case E_V8DImode:
 	      if (TARGET_AVX512F)
-		gen = gen_avx512f_vec_dupv8di_1;
+		{
+		  smode = V2DImode;
+		  gen = gen_avx512f_vec_dupv8di;
+		}
 	      break;
 	    /* For other modes prefer other shuffles this function creates.  */
 	    default: break;
@@ -46013,7 +46167,23 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d)
 	  if (gen != NULL)
 	    {
 	      if (!d->testing_p)
-		emit_insn (gen (d->target, d->op0));
+		{
+		  if (smode == VOIDmode)
+		    emit_insn (gen (d->target, d->op0));
+		  else
+		    {
+		      rtx op = d->op0;
+		      unsigned int oppos = 0;
+		      if (SUBREG_P (op))
+			{
+			  op = SUBREG_REG (op);
+			  oppos = SUBREG_BYTE (op);
+			}
+		      emit_insn (gen (d->target,
+				      gen_rtx_SUBREG (smode, op,
+						      oppos)));
+		    }
+		}
 	      return true;
 	    }
 	}
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 7fb2b144f47..4a6fa077db5 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -4399,7 +4399,7 @@
       else
 	{
 	  rtx tmp = lowpart_subreg (V16SFmode, operands[3], V4SFmode);
-	  emit_insn (gen_avx512f_vec_dupv16sf_1 (tmp, tmp));
+	  emit_insn (gen_avx512f_vec_dupv16sf (tmp, tmp));
 	}
     }
   else
diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index bd262d77c6b..1d80de0634f 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -1048,6 +1048,19 @@
   (ior (match_operand 0 "nonimmediate_operand")
        (match_code "const_vector")))
 
+;; Return true when OP is CONST_VECTOR which can be represented by
+;; VEC_DUPLICATE.
+(define_predicate "const_vector_duplicate_operand"
+  (and (match_code "const_vector")
+       (match_test "!standard_sse_constant_p (op, mode)"))
+{
+  int i, nunits = GET_MODE_NUNITS (mode);
+  for (i = 1; i < nunits; i++)
+    if (CONST_VECTOR_ELT (op, i) != CONST_VECTOR_ELT (op, 0))
+     return false;
+  return true;
+})
+
 ;; Return true when OP is nonimmediate or standard SSE constant.
 (define_predicate "nonimmediate_or_sse_const_operand"
   (ior (match_operand 0 "nonimmediate_operand")
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index ee73e1fdf80..065d6ab63b6 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -304,6 +304,10 @@
 (define_mode_iterator VF_512
   [V16SF V8DF])
 
+(define_mode_iterator VF48_AVX512VL
+  [V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
+   V8DF  (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
+
 (define_mode_iterator VI48_AVX512VL
   [V16SI (V8SI  "TARGET_AVX512VL") (V4SI  "TARGET_AVX512VL")
    V8DI  (V4DI  "TARGET_AVX512VL") (V2DI  "TARGET_AVX512VL")])
@@ -7117,42 +7121,6 @@
    (set_attr "prefix" "orig,maybe_evex")
    (set_attr "mode" "SF")])
 
-(define_insn "avx2_vec_dup<mode>"
-  [(set (match_operand:VF1_128_256 0 "register_operand" "=v")
-	(vec_duplicate:VF1_128_256
-	  (vec_select:SF
-	    (match_operand:V4SF 1 "register_operand" "v")
-	    (parallel [(const_int 0)]))))]
-  "TARGET_AVX2"
-  "vbroadcastss\t{%1, %0|%0, %1}"
-  [(set_attr "type" "sselog1")
-    (set_attr "prefix" "maybe_evex")
-    (set_attr "mode" "<MODE>")])
-
-(define_insn "avx2_vec_dupv8sf_1"
-  [(set (match_operand:V8SF 0 "register_operand" "=v")
-	(vec_duplicate:V8SF
-	  (vec_select:SF
-	    (match_operand:V8SF 1 "register_operand" "v")
-	    (parallel [(const_int 0)]))))]
-  "TARGET_AVX2"
-  "vbroadcastss\t{%x1, %0|%0, %x1}"
-  [(set_attr "type" "sselog1")
-    (set_attr "prefix" "maybe_evex")
-    (set_attr "mode" "V8SF")])
-
-(define_insn "avx512f_vec_dup<mode>_1"
-  [(set (match_operand:VF_512 0 "register_operand" "=v")
-	(vec_duplicate:VF_512
-	  (vec_select:<ssescalarmode>
-	    (match_operand:VF_512 1 "register_operand" "v")
-	    (parallel [(const_int 0)]))))]
-  "TARGET_AVX512F"
-  "vbroadcast<bcstscalarsuff>\t{%x1, %0|%0, %x1}"
-  [(set_attr "type" "sselog1")
-    (set_attr "prefix" "evex")
-    (set_attr "mode" "<MODE>")])
-
 ;; Although insertps takes register source, we prefer
 ;; unpcklps with register source since it is shorter.
 (define_insn "*vec_concatv2sf_sse4_1"
@@ -17908,34 +17876,16 @@
   [(set (match_operand:VI 0 "register_operand" "=x,v")
 	(vec_duplicate:VI
 	  (vec_select:<ssescalarmode>
-	    (match_operand:<ssexmmmode> 1 "nonimmediate_operand" "xm,vm")
+	    (match_operand:<ssexmmmode> 1 "register_operand" "x,v")
 	    (parallel [(const_int 0)]))))]
   "TARGET_AVX2"
-  "vpbroadcast<ssemodesuffix>\t{%1, %0|%0, %<iptr>1}"
+  "vpbroadcast<ssemodesuffix>\t{%1, %0|%0, %1}"
   [(set_attr "isa" "*,<pbroadcast_evex_isa>")
    (set_attr "type" "ssemov")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "vex,evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "avx2_pbroadcast<mode>_1"
-  [(set (match_operand:VI_256 0 "register_operand" "=x,x,v,v")
-	(vec_duplicate:VI_256
-	  (vec_select:<ssescalarmode>
-	    (match_operand:VI_256 1 "nonimmediate_operand" "m,x,m,v")
-	    (parallel [(const_int 0)]))))]
-  "TARGET_AVX2"
-  "@
-   vpbroadcast<ssemodesuffix>\t{%1, %0|%0, %<iptr>1}
-   vpbroadcast<ssemodesuffix>\t{%x1, %0|%0, %x1}
-   vpbroadcast<ssemodesuffix>\t{%1, %0|%0, %<iptr>1}
-   vpbroadcast<ssemodesuffix>\t{%x1, %0|%0, %x1}"
-  [(set_attr "isa" "*,*,<pbroadcast_evex_isa>,<pbroadcast_evex_isa>")
-   (set_attr "type" "ssemov")
-   (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "vex")
-   (set_attr "mode" "<sseinsnmode>")])
-
 (define_insn "<avx2_avx512>_permvar<mode><mask_name>"
   [(set (match_operand:VI48F_256_512 0 "register_operand" "=v")
 	(unspec:VI48F_256_512
@@ -18111,38 +18061,10 @@
    (set_attr "prefix" "vex")
    (set_attr "mode" "OI")])
 
-(define_insn "avx2_vec_dupv4df"
-  [(set (match_operand:V4DF 0 "register_operand" "=v")
-	(vec_duplicate:V4DF
-	  (vec_select:DF
-	    (match_operand:V2DF 1 "register_operand" "v")
-	    (parallel [(const_int 0)]))))]
-  "TARGET_AVX2"
-  "vbroadcastsd\t{%1, %0|%0, %1}"
-  [(set_attr "type" "sselog1")
-   (set_attr "prefix" "maybe_evex")
-   (set_attr "mode" "V4DF")])
-
-(define_insn "<avx512>_vec_dup<mode>_1"
-  [(set (match_operand:VI_AVX512BW 0 "register_operand" "=v,v")
-	(vec_duplicate:VI_AVX512BW
-	  (vec_select:<ssescalarmode>
-	    (match_operand:VI_AVX512BW 1 "nonimmediate_operand" "v,m")
-	    (parallel [(const_int 0)]))))]
-  "TARGET_AVX512F"
-  "@
-   vpbroadcast<ssemodesuffix>\t{%x1, %0|%0, %x1}
-   vpbroadcast<ssemodesuffix>\t{%x1, %0|%0, %<iptr>1}"
-  [(set_attr "type" "ssemov")
-   (set_attr "prefix" "evex")
-   (set_attr "mode" "<sseinsnmode>")])
-
 (define_insn "<avx512>_vec_dup<mode><mask_name>"
-  [(set (match_operand:V48_AVX512VL 0 "register_operand" "=v")
-	(vec_duplicate:V48_AVX512VL
-	  (vec_select:<ssescalarmode>
-	    (match_operand:<ssexmmmode> 1 "nonimmediate_operand" "vm")
-	    (parallel [(const_int 0)]))))]
+  [(set (match_operand:VF48_AVX512VL 0 "register_operand" "=v")
+	(vec_duplicate:VF48_AVX512VL
+	  (match_operand:<ssescalarmode> 1 "nonimmediate_operand" "vm")))]
   "TARGET_AVX512F"
 {
   /*  There is no DF broadcast (in AVX-512*) to 128b register.
@@ -18156,14 +18078,49 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
+(define_insn_and_split "*const_vec_dup<mode>"
+  [(set (match_operand:V48_AVX512VL 0 "register_operand")
+	(match_operand:V48_AVX512VL 1 "const_vector_duplicate_operand"))]
+  "TARGET_AVX512F && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(set (match_dup 0) (vec_duplicate:V48_AVX512VL (match_dup 1)))]
+{
+  rtx val = CONST_VECTOR_ELT (operands[1], 0);
+  machine_mode scalar_mode = GET_MODE_INNER (<MODE>mode);
+  operands[1] = validize_mem (force_const_mem (scalar_mode, val));
+})
+
+(define_insn "<avx512>_vec_dup<mode><mask_name>"
+  [(set (match_operand:VI48_AVX512VL 0 "register_operand" "=v")
+	(vec_duplicate:VI48_AVX512VL
+	  (vec_select:<ssescalarmode>
+	    (match_operand:<ssexmmmode> 1 "register_operand" "v")
+	    (parallel [(const_int 0)]))))]
+  "TARGET_AVX512F"
+  "v<sseintprefix>broadcast<bcstscalarsuff>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
+  [(set_attr "type" "ssemov")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_insn "<avx512>_vec_dup<mode><mask_name>_1"
+  [(set (match_operand:VI48_AVX512VL 0 "register_operand" "=v")
+	(vec_duplicate:VI48_AVX512VL
+	  (match_operand:<ssescalarmode> 1 "memory_operand" "m")))]
+  "TARGET_AVX512F"
+  "v<sseintprefix>broadcast<bcstscalarsuff>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %<iptr>1}"
+  [(set_attr "type" "ssemov")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
 (define_insn "<avx512>_vec_dup<mode><mask_name>"
   [(set (match_operand:VI12_AVX512VL 0 "register_operand" "=v")
 	(vec_duplicate:VI12_AVX512VL
 	  (vec_select:<ssescalarmode>
-	    (match_operand:<ssexmmmode> 1 "nonimmediate_operand" "vm")
+	    (match_operand:<ssexmmmode> 1 "register_operand" "v")
 	    (parallel [(const_int 0)]))))]
   "TARGET_AVX512BW"
-  "vpbroadcast<bcstscalarsuff>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %<iptr>1}"
+  "vpbroadcast<bcstscalarsuff>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssemov")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
@@ -18205,8 +18162,8 @@
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_insn "<mask_codefor><avx512>_vec_dup_gpr<mode><mask_name>"
-  [(set (match_operand:V48_AVX512VL 0 "register_operand" "=v,v")
-	(vec_duplicate:V48_AVX512VL
+  [(set (match_operand:VI48_AVX512VL 0 "register_operand" "=v,v")
+	(vec_duplicate:VI48_AVX512VL
 	  (match_operand:<ssescalarmode> 1 "nonimmediate_operand" "vm,r")))]
   "TARGET_AVX512F"
   "v<sseintprefix>broadcast<bcstscalarsuff>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
@@ -18215,8 +18172,7 @@
    (set_attr "mode" "<sseinsnmode>")
    (set (attr "enabled")
      (if_then_else (eq_attr "alternative" "1")
-	(symbol_ref "GET_MODE_CLASS (<ssescalarmode>mode) == MODE_INT
-		     && (<ssescalarmode>mode != DImode || TARGET_64BIT)")
+	(symbol_ref "<ssescalarmode>mode != DImode || TARGET_64BIT")
 	(const_int 1)))])
 
 (define_insn "vec_dupv4sf"
@@ -18545,8 +18501,7 @@
 	     or VSHUFF128.  */
 	  gcc_assert (<MODE>mode == V8SFmode);
 	  if ((mask & 1) == 0)
-	    emit_insn (gen_avx2_vec_dupv8sf (op0,
-					     gen_lowpart (V4SFmode, op0)));
+	    emit_insn (gen_vec_dupv8sf (op0, gen_lowpart (V4SFmode, op0)));
 	  else
 	    emit_insn (gen_avx512vl_shuf_f32x4_1 (op0, op0, op0,
 						  GEN_INT (4), GEN_INT (5),
diff --git a/gcc/testsuite/gcc.target/i386/avx2-vbroadcastss_ps256-1.c b/gcc/testsuite/gcc.target/i386/avx2-vbroadcastss_ps256-1.c
index dfac3916b08..3ff7497aa21 100644
--- a/gcc/testsuite/gcc.target/i386/avx2-vbroadcastss_ps256-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx2-vbroadcastss_ps256-1.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-mavx2 -O2" } */
-/* { dg-final { scan-assembler "vbroadcastss\[ \\t\]+\[^\n\]*%xmm\[0-9\]" } } */
+/* { dg-final { scan-assembler "vbroadcastss\[ \\t\]+\[^\n\]*%ymm\[0-9\]" } } */
+/* { dg-final { scan-assembler-not "vmovaps\[\t \]*\[^,\]*,%xmm\[0-9\]" } } */
 
 #include <immintrin.h>
 
diff --git a/gcc/testsuite/gcc.target/i386/avx512-binop-7.h b/gcc/testsuite/gcc.target/i386/avx512-binop-7.h
new file mode 100644
index 00000000000..513901847a9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512-binop-7.h
@@ -0,0 +1,12 @@
+#include <immintrin.h>
+
+#define PASTER2(x,y)		x##y
+#define PASTER3(x,y,z)		_mm##x##_##y##_##z
+#define OP(vec, op, suffix)	PASTER3 (vec, op, suffix)
+#define DUP(vec, suffix, val)	PASTER3 (vec, set1, suffix) (val)
+
+type
+foo (type x)
+{
+  return OP (vec, op, op_suffix) (DUP (vec, dup_suffix, 2.1f), x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-add-sf-zmm-7.c b/gcc/testsuite/gcc.target/i386/avx512f-add-sf-zmm-7.c
new file mode 100644
index 00000000000..de23c73e71c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-add-sf-zmm-7.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vaddps\[ \\t\]+\[^\n\]*\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */
+/* { dg-final { scan-assembler-times "long\[ \\t\]+1074161254" 1 } } */
+/* { dg-final { scan-assembler-not "vbroadcastss\[^\n\]*%zmm\[0-9\]+" } } */
+
+#define type __m512
+#define vec 512
+#define op add
+#define op_suffix ps
+#define dup_suffix ps
+#define SCALAR float
+
+#include "avx512-binop-7.h"
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-add-si-zmm-7.c b/gcc/testsuite/gcc.target/i386/avx512f-add-si-zmm-7.c
new file mode 100644
index 00000000000..9e5f800118d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-add-si-zmm-7.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vpaddd\[ \\t\]+\[^\n\]*\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */
+
+#define type __m512i
+#define vec 512
+#define op add
+#define op_suffix epi32
+#define dup_suffix epi32
+#define SCALAR int
+
+#include "avx512-binop-7.h"
diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-add-di-xmm-7.c b/gcc/testsuite/gcc.target/i386/avx512vl-add-di-xmm-7.c
new file mode 100644
index 00000000000..7d921aded31
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512vl-add-di-xmm-7.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vpaddq\[ \\t\]+\[^\n\]*\\\{1to\[1-8\]+\\\}, %xmm\[0-9\]+, %xmm0" 1 } } */
+/* { dg-final { scan-assembler-times "(?:long|quad)\[ \\t\]+2" 1 } } */
+
+#define type __m128i
+#define vec
+#define op add
+#define op_suffix epi64
+#define dup_suffix epi64x
+#define SCALAR int
+
+#include "avx512-binop-7.h"
diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-add-sf-xmm-7.c b/gcc/testsuite/gcc.target/i386/avx512vl-add-sf-xmm-7.c
new file mode 100644
index 00000000000..2fc1d5c4824
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512vl-add-sf-xmm-7.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vaddps\[ \\t\]+\[^\n\]*\\\{1to\[1-8\]+\\\}, %xmm\[0-9\]+, %xmm0" 1 } } */
+/* { dg-final { scan-assembler-times "long\[ \\t\]+1074161254" 1 } } */
+
+#define type __m128
+#define vec
+#define op add
+#define op_suffix ps
+#define dup_suffix ps
+#define SCALAR float
+
+#include "avx512-binop-7.h"
diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-add-sf-ymm-7.c b/gcc/testsuite/gcc.target/i386/avx512vl-add-sf-ymm-7.c
new file mode 100644
index 00000000000..436aae757ca
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512vl-add-sf-ymm-7.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vaddps\[ \\t\]+\[^\n\]*\\\{1to\[1-8\]+\\\}, %ymm\[0-9\]+, %ymm0" 1 } } */
+/* { dg-final { scan-assembler-times "long\[ \\t\]+1074161254" 1 } } */
+
+#define type __m256
+#define vec 256
+#define op add
+#define op_suffix ps
+#define dup_suffix ps
+#define SCALAR float
+
+#include "avx512-binop-7.h"
diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-add-si-xmm-7.c b/gcc/testsuite/gcc.target/i386/avx512vl-add-si-xmm-7.c
new file mode 100644
index 00000000000..0bd7a0c5e96
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512vl-add-si-xmm-7.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vpaddd\[ \\t\]+\[^\n\]*\\\{1to\[1-8\]+\\\}, %xmm\[0-9\]+, %xmm0" 1 } } */
+/* { dg-final { scan-assembler-times "(?:long|quad)\[ \\t\]+2" 1 } } */
+
+#define type __m128i
+#define vec
+#define op add
+#define op_suffix epi32
+#define dup_suffix epi32
+#define SCALAR int
+
+#include "avx512-binop-7.h"
diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-add-si-ymm-7.c b/gcc/testsuite/gcc.target/i386/avx512vl-add-si-ymm-7.c
new file mode 100644
index 00000000000..fdde09fca1e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512vl-add-si-ymm-7.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vpaddd\[ \\t\]+\[^\n\]*\\\{1to\[1-8\]+\\\}, %ymm\[0-9\]+, %ymm0" 1 } } */
+/* { dg-final { scan-assembler-times "(?:long|quad)\[ \\t\]+2" 1 } } */
+
+#define type __m256i
+#define vec 256
+#define op add
+#define op_suffix epi32
+#define dup_suffix epi32
+#define SCALAR int
+
+#include "avx512-binop-7.h"
diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-vbroadcast-3.c b/gcc/testsuite/gcc.target/i386/avx512vl-vbroadcast-3.c
index 7233398cd64..1c62364dac4 100644
--- a/gcc/testsuite/gcc.target/i386/avx512vl-vbroadcast-3.c
+++ b/gcc/testsuite/gcc.target/i386/avx512vl-vbroadcast-3.c
@@ -151,8 +151,8 @@ f16 (V2 *x)
 }
 
 /* { dg-final { scan-assembler-times "vbroadcastss\[^\n\r]*%\[re\]di\[^\n\r]*%xmm16" 4 } } */
-/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\r]*%xmm16\[^\n\r]*%ymm16" 3 } } */
-/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\r]*%\[re\]di\[^\n\r]*%ymm16" 3 } } */
+/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\r]*%xmm16\[^\n\r]*%ymm16" 1 } } */
+/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\r]*%\[re\]di\[^\n\r]*%ymm16" 4 } } */
 /* { dg-final { scan-assembler-times "vpermilps\[^\n\r]*\\\$0\[^\n\r]*%xmm16\[^\n\r]*%xmm16" 1 } } */
 /* { dg-final { scan-assembler-times "vpermilps\[^\n\r]*\\\$85\[^\n\r]*%xmm16\[^\n\r]*%xmm16" 1 } } */
 /* { dg-final { scan-assembler-times "vpermilps\[^\n\r]*\\\$170\[^\n\r]*%xmm16\[^\n\r]*%xmm16" 1 } } */
@@ -160,3 +160,4 @@ f16 (V2 *x)
 /* { dg-final { scan-assembler-times "vpermilps\[^\n\r]*\\\$0\[^\n\r]*%ymm16\[^\n\r]*%ymm16" 1 } } */
 /* { dg-final { scan-assembler-times "vpermilps\[^\n\r]*\\\$85\[^\n\r]*%ymm16\[^\n\r]*%ymm16" 2 } } */
 /* { dg-final { scan-assembler-times "vshuff32x4\[^\n\r]*\\\$3\[^\n\r]*%ymm16\[^\n\r]*%ymm16\[^\n\r]*%ymm16" 2 } } */
+/* { dg-final { scan-assembler-times "vshuff32x4\[^\n\r]*\\\$0\[^\n\r]*%ymm16\[^\n\r]*%ymm16\[^\n\r]*%ymm16" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr87537-2.c b/gcc/testsuite/gcc.target/i386/pr87537-2.c
new file mode 100644
index 00000000000..19ded7e64b2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr87537-2.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2 -mtune=skylake" } */
+/* { dg-final { scan-assembler-times "vbroadcastss\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]" 1 } } */
+/* { dg-final { scan-assembler-not "vmovss" } } */
+
+#include <immintrin.h>
+
+__m512
+foo (float *x)
+{
+  return _mm512_broadcastss_ps (_mm_load_ss(x));
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr87537-3.c b/gcc/testsuite/gcc.target/i386/pr87537-3.c
new file mode 100644
index 00000000000..ee7781a69e4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr87537-3.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2 -mtune=skylake" } */
+/* { dg-final { scan-assembler-times "vbroadcastss\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]" 1 } } */
+/* { dg-final { scan-assembler-not "vmovss" } } */
+
+#include <immintrin.h>
+
+__m512
+foo (void)
+{
+  return _mm512_set1_ps (2.0f);
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr87537-4.c b/gcc/testsuite/gcc.target/i386/pr87537-4.c
new file mode 100644
index 00000000000..c5bfef1366e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr87537-4.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2 -mtune=skylake" } */
+/* { dg-final { scan-assembler-times "vbroadcastsd\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]" 1 } } */
+/* { dg-final { scan-assembler-not "vmovsd" } } */
+
+#include <immintrin.h>
+
+__m512d
+foo (double *x)
+{
+  return _mm512_broadcastsd_pd (_mm_load_sd(x));
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr87537-5.c b/gcc/testsuite/gcc.target/i386/pr87537-5.c
new file mode 100644
index 00000000000..4f806f4fbf3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr87537-5.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2 -mtune=skylake" } */
+/* { dg-final { scan-assembler-times "vbroadcastsd\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]" 1 } } */
+/* { dg-final { scan-assembler-not "vmovsd" } } */
+
+#include <immintrin.h>
+
+__m512d
+foo (void)
+{
+  return _mm512_set1_pd (2.0f);
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr87537-6.c b/gcc/testsuite/gcc.target/i386/pr87537-6.c
new file mode 100644
index 00000000000..b53588b907b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr87537-6.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512vl -O2 -mtune=skylake" } */
+/* { dg-final { scan-assembler-times "vbroadcastss\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]" 1 } } */
+/* { dg-final { scan-assembler-not "vmovss" } } */
+
+#include <immintrin.h>
+
+__m256
+foo (float *x)
+{
+  return _mm256_broadcastss_ps (_mm_load_ss(x));
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr87537-7.c b/gcc/testsuite/gcc.target/i386/pr87537-7.c
new file mode 100644
index 00000000000..d07a1e3de55
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr87537-7.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512vl -O2 -mtune=skylake" } */
+/* { dg-final { scan-assembler-times "vbroadcastss\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]" 1 } } */
+/* { dg-final { scan-assembler-not "vmovss" } } */
+
+#include <immintrin.h>
+
+__m256
+foo (void)
+{
+  return _mm256_set1_ps (2.0f);
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr87537-8.c b/gcc/testsuite/gcc.target/i386/pr87537-8.c
new file mode 100644
index 00000000000..dbf4ee3551d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr87537-8.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512vl -O2 -mtune=skylake" } */
+/* { dg-final { scan-assembler-times "vbroadcastss\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]" 1 } } */
+/* { dg-final { scan-assembler-not "vmovss" } } */
+
+#include <immintrin.h>
+
+__m128
+foo (float *x)
+{
+  return _mm_broadcastss_ps (_mm_load_ss(x));
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr87537-9.c b/gcc/testsuite/gcc.target/i386/pr87537-9.c
new file mode 100644
index 00000000000..8e09382d876
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr87537-9.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512vl -O2 -mtune=skylake" } */
+/* { dg-final { scan-assembler-times "vbroadcastss\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]" 1 } } */
+/* { dg-final { scan-assembler-not "vmovss" } } */
+
+#include <immintrin.h>
+
+__m128
+foo (void)
+{
+  return _mm_set1_ps (2.0f);
+}
-- 
2.17.2

Reply via email to