On Mon, Oct 29, 2018 at 2:02 PM Uros Bizjak <ubiz...@gmail.com> wrote: > > On Sat, Oct 27, 2018 at 8:03 AM H.J. Lu <hjl.to...@gmail.com> wrote: > > > > Use scalar operand in SF/DF/SI/DI vec_dup patterns which enables combiner > > to generate > > > > (set (reg:V8SF 84) > > (vec_duplicate:V8SF (mem/c:SF (symbol_ref:DI ("y"))))) > > > > const_vector_duplicate_operand is added for constant vector broadcast. > > We split > > > > (set (reg:V16SF 86) > > (const_vector:V16SF > > [(const_double:SF 2.0e+0 [0x0.8p+2]) repeated x16]) > > > > to > > > > (set (reg:V16SF 86) > > (vec_duplicate:V16SF (mem/u/c:SF (symbol_ref/u:DI ("*.LC1"))))) > > Why not at the expand time? Rewrite vector constant as vec_duplicate > from memory and combine will do the stuff for you. We do have _bcst > instruction patterns. >
Here is the updated patch to do that. OK for trunk? Thanks. -- H.J.
From 0c2ffe8a627c64263805baba8c9d9754dbb30f4b Mon Sep 17 00:00:00 2001 From: "H.J. Lu" <hjl.to...@gmail.com> Date: Tue, 2 Oct 2018 14:27:55 -0700 Subject: [PATCH] i386: Use scalar operand in SF/DF/SI/DI vec_dup patterns Use scalar operand in SF/DF/SI/DI vec_dup patterns for AVX512 which enables combiner to generate (set (reg:V8SF 84) (vec_duplicate:V8SF (mem/c:SF (symbol_ref:DI ("y"))))) To support it, the following changes are made: 1. For AVX512 broadcast instructions from integer register operand, we only need to broadcast integer to integer vectors. 2. Replace nonimmediate_operand with register_operand in vec_dup patterns since memory operand size is wrong. Add vec_dup patterns with memory_operand of correct operand size. 3. Replace duplicated vec_dup patterns with subreg. 4. Update AVX512 broadcast expanders to optimize constant SF/DF/SI/DI vector broadcasts. 5. Add const_vector_duplicate_operand for constant vector broadcast. We split (set (reg:V16SF 86) (const_vector:V16SF [(const_double:SF 2.0e+0 [0x0.8p+2]) repeated x16]) to (set (reg:V16SF 86) (vec_duplicate:V16SF (mem/u/c:SF (symbol_ref/u:DI ("*.LC1"))))) before IRA so tha IRA can turn (set (reg:V16SF 86) (vec_duplicate:V16SF (mem/u/c:SF (symbol_ref/u:DI ("*.LC1"))))) (set (reg:V16SF 90) (plus:V16SF (reg/v:V16SF 85 [ x ]) (reg:V16SF 86))) into (set (reg:V16SF 90) (plus:V16SF (vec_duplicate:V16SF (mem/u/c:SF (symbol_ref/u:DI ("*.LC1")))) (reg/v:V16SF 85 [ x ]))) gcc/ PR target/87537 PR target/87767 * config/i386/i386-builtin-types.def: Replace CODE_FOR_avx2_vec_dupv4sf, CODE_FOR_avx2_vec_dupv8sf and CODE_FOR_avx2_vec_dupv4df with CODE_FOR_vec_dupv4sf, CODE_FOR_vec_dupv8sf and CODE_FOR_vec_dupv4df, respectively. * config/i386/i386.c (ix86_expand_args_builtin): Handle SF/DF/SI/DI constant vector broadcast. (expand_vec_perm_1): Updated. Duplicate them from source operand. * config/i386/i386.md (SF to DF splitter): Replace gen_avx512f_vec_dupv16sf_1 with gen_avx512f_vec_dupv16sf. * config/i386/predicates.md (const_vector_duplicate_operand): New. * config/i386/sse.md (VF48_AVX512VL): New. (avx2_vec_dup<mode>): Removed. (avx2_vec_dupv8sf_1): Likewise. (avx512f_vec_dup<mode>_1): Likewise. (avx2_pbroadcast<mode>_1): Likewise. (avx2_vec_dupv4df): Likewise. (<avx512>_vec_dup<mode>_1): Likewise. (<avx512>_vec_dup<mode><mask_name>:V48_AVX512VL): Likewise. (avx2_pbroadcast<mode>): Replace nonimmediate_operand with register_operand. (<avx512>_vec_dup<mode><mask_name>:VI48_AVX512VL): Likewise. (<avx512>_vec_dup<mode><mask_name>:VI12_AVX512VL): Likewise. (<avx512>_vec_dup<mode><mask_name>:VF48_AVX512VL): New. (*const_vec_dup<mode>): Likewise. (<avx512>_vec_dup<mode><mask_name>:VI48_AVX512VL): Likewise. (<avx512>_vec_dup<mode><mask_name>_1:VI48_AVX512VL): Likewise. (<mask_codefor><avx512>_vec_dup_gpr<mode><mask_name>): Replace V48_AVX512VL with VI48_AVX512VL. (*avx_vperm_broadcast_<mode>): Replace gen_avx2_vec_dupv8sf with gen_vec_dupv8sf. gcc/testsuite/ PR target/87537 PR target/87767 * gcc.target/i386/avx2-vbroadcastss_ps256-1.c: Updated. * gcc.target/i386/avx512vl-vbroadcast-3.c: Likewise. * gcc.target/i386/avx512-binop-7.h: New file. * gcc.target/i386/avx512f-add-sf-zmm-7.c: Likewise. * gcc.target/i386/avx512f-add-si-zmm-7.c: Likewise. * gcc.target/i386/avx512vl-add-di-xmm-7.c: Likewise. * gcc.target/i386/avx512vl-add-sf-xmm-7.c: Likewise. * gcc.target/i386/avx512vl-add-sf-ymm-7.c: Likewise. * gcc.target/i386/avx512vl-add-si-xmm-7.c: Likewise. * gcc.target/i386/avx512vl-add-si-ymm-7.c: Likewise. * gcc.target/i386/pr87537-2.c: Likewise. * gcc.target/i386/pr87537-3.c: Likewise. * gcc.target/i386/pr87537-4.c: Likewise. * gcc.target/i386/pr87537-5.c: Likewise. * gcc.target/i386/pr87537-6.c: Likewise. * gcc.target/i386/pr87537-7.c: Likewise. * gcc.target/i386/pr87537-8.c: Likewise. * gcc.target/i386/pr87537-9.c: Likewise. --- gcc/config/i386/i386-builtin.def | 6 +- gcc/config/i386/i386.c | 212 ++++++++++++++++-- gcc/config/i386/i386.md | 2 +- gcc/config/i386/predicates.md | 13 ++ gcc/config/i386/sse.md | 145 +++++------- .../i386/avx2-vbroadcastss_ps256-1.c | 3 +- .../gcc.target/i386/avx512-binop-7.h | 12 + .../gcc.target/i386/avx512f-add-sf-zmm-7.c | 14 ++ .../gcc.target/i386/avx512f-add-si-zmm-7.c | 12 + .../gcc.target/i386/avx512vl-add-di-xmm-7.c | 13 ++ .../gcc.target/i386/avx512vl-add-sf-xmm-7.c | 13 ++ .../gcc.target/i386/avx512vl-add-sf-ymm-7.c | 13 ++ .../gcc.target/i386/avx512vl-add-si-xmm-7.c | 13 ++ .../gcc.target/i386/avx512vl-add-si-ymm-7.c | 13 ++ .../gcc.target/i386/avx512vl-vbroadcast-3.c | 5 +- gcc/testsuite/gcc.target/i386/pr87537-2.c | 12 + gcc/testsuite/gcc.target/i386/pr87537-3.c | 12 + gcc/testsuite/gcc.target/i386/pr87537-4.c | 12 + gcc/testsuite/gcc.target/i386/pr87537-5.c | 12 + gcc/testsuite/gcc.target/i386/pr87537-6.c | 12 + gcc/testsuite/gcc.target/i386/pr87537-7.c | 12 + gcc/testsuite/gcc.target/i386/pr87537-8.c | 12 + gcc/testsuite/gcc.target/i386/pr87537-9.c | 12 + 23 files changed, 462 insertions(+), 123 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/avx512-binop-7.h create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-add-sf-zmm-7.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-add-si-zmm-7.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-add-di-xmm-7.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-add-sf-xmm-7.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-add-sf-ymm-7.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-add-si-xmm-7.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-add-si-ymm-7.c create mode 100644 gcc/testsuite/gcc.target/i386/pr87537-2.c create mode 100644 gcc/testsuite/gcc.target/i386/pr87537-3.c create mode 100644 gcc/testsuite/gcc.target/i386/pr87537-4.c create mode 100644 gcc/testsuite/gcc.target/i386/pr87537-5.c create mode 100644 gcc/testsuite/gcc.target/i386/pr87537-6.c create mode 100644 gcc/testsuite/gcc.target/i386/pr87537-7.c create mode 100644 gcc/testsuite/gcc.target/i386/pr87537-8.c create mode 100644 gcc/testsuite/gcc.target/i386/pr87537-9.c diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index df0f7e975ac..d217add8ee2 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -1194,9 +1194,9 @@ BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_interleave_lowv16hi, "__builtin_ia32_ BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_interleave_lowv8si, "__builtin_ia32_punpckldq256", IX86_BUILTIN_PUNPCKLDQ256, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI) BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_interleave_lowv4di, "__builtin_ia32_punpcklqdq256", IX86_BUILTIN_PUNPCKLQDQ256, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI) BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_xorv4di3, "__builtin_ia32_pxor256", IX86_BUILTIN_PXOR256, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI) -BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_vec_dupv4sf, "__builtin_ia32_vbroadcastss_ps", IX86_BUILTIN_VBROADCASTSS_PS, UNKNOWN, (int) V4SF_FTYPE_V4SF) -BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_vec_dupv8sf, "__builtin_ia32_vbroadcastss_ps256", IX86_BUILTIN_VBROADCASTSS_PS256, UNKNOWN, (int) V8SF_FTYPE_V4SF) -BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_vec_dupv4df, "__builtin_ia32_vbroadcastsd_pd256", IX86_BUILTIN_VBROADCASTSD_PD256, UNKNOWN, (int) V4DF_FTYPE_V2DF) +BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_vec_dupv4sf, "__builtin_ia32_vbroadcastss_ps", IX86_BUILTIN_VBROADCASTSS_PS, UNKNOWN, (int) V4SF_FTYPE_V4SF) +BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_vec_dupv8sf, "__builtin_ia32_vbroadcastss_ps256", IX86_BUILTIN_VBROADCASTSS_PS256, UNKNOWN, (int) V8SF_FTYPE_V4SF) +BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_vec_dupv4df, "__builtin_ia32_vbroadcastsd_pd256", IX86_BUILTIN_VBROADCASTSD_PD256, UNKNOWN, (int) V4DF_FTYPE_V2DF) BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_vbroadcasti128_v4di, "__builtin_ia32_vbroadcastsi256", IX86_BUILTIN_VBROADCASTSI256, UNKNOWN, (int) V4DI_FTYPE_V2DI) BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_pblenddv4si, "__builtin_ia32_pblendd128", IX86_BUILTIN_PBLENDD128, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_INT) BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_pblenddv8si, "__builtin_ia32_pblendd256", IX86_BUILTIN_PBLENDD256, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_INT) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 963c7fcbb34..af5bd2bebdb 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -35028,6 +35028,9 @@ ix86_expand_args_builtin (const struct builtin_description *d, target = lowpart_subreg (rmode, real_target, tmode); } + bool const_vec_dup = false; + bool all_1s_mask = false; + for (i = 0; i < nargs; i++) { tree arg = CALL_EXPR_ARG (exp, i); @@ -35035,6 +35038,61 @@ ix86_expand_args_builtin (const struct builtin_description *d, machine_mode mode = insn_p->operand[i + 1].mode; bool match = insn_p->operand[i + 1].predicate (op, mode); + if (!match) + { + switch (icode) + { + case CODE_FOR_avx512f_vec_dupv16sf_mask: + case CODE_FOR_avx512f_vec_dupv8df_mask: + case CODE_FOR_avx512vl_vec_dupv8sf_mask: + case CODE_FOR_avx512vl_vec_dupv4df_mask: + case CODE_FOR_avx512vl_vec_dupv4sf_mask: + case CODE_FOR_avx512vl_vec_dupv2df_mask: + if (i == 0 && GET_CODE (op) == CONST_VECTOR) + { + match = true; + const_vec_dup = true; + op = CONST_VECTOR_ELT (op, 0); + } + break; + case CODE_FOR_avx512f_vec_dup_gprv16si_mask: + case CODE_FOR_avx512f_vec_dup_gprv8di_mask: + case CODE_FOR_avx512vl_vec_dup_gprv8si_mask: + case CODE_FOR_avx512vl_vec_dup_gprv4si_mask: + case CODE_FOR_avx512vl_vec_dup_gprv4di_mask: + case CODE_FOR_avx512vl_vec_dup_gprv2di_mask: + if (i == 0 && CONST_INT_P (op)) + { + match = true; + const_vec_dup = true; + } + break; + default: + break; + } + + if (i == 0) + { + if (match) + { + op = force_const_mem (GET_MODE_INNER (tmode), op); + op = validize_mem (op); + } + } + else if (const_vec_dup && i == 2) + { + if (CONST_INT_P (op)) + { + unsigned int nunits = GET_MODE_NUNITS (tmode); + if (INTVAL (op) == (1 << nunits) - 1) + { + all_1s_mask = true; + match = true; + } + } + } + } + if (second_arg_count && i == 1) { /* SIMD shift insns take either an 8-bit immediate or @@ -35198,15 +35256,18 @@ ix86_expand_args_builtin (const struct builtin_description *d, op = fixup_modeless_constant (op, mode); - if (GET_MODE (op) == mode || GET_MODE (op) == VOIDmode) + if (!const_vec_dup || !match || i == 1) { - if (optimize || !match || num_memory > 1) - op = copy_to_mode_reg (mode, op); - } - else - { - op = copy_to_reg (op); - op = lowpart_subreg (mode, op, GET_MODE (op)); + if (GET_MODE (op) == mode || GET_MODE (op) == VOIDmode) + { + if (optimize || !match || num_memory > 1) + op = copy_to_mode_reg (mode, op); + } + else + { + op = copy_to_reg (op); + op = lowpart_subreg (mode, op, GET_MODE (op)); + } } } @@ -35223,8 +35284,82 @@ ix86_expand_args_builtin (const struct builtin_description *d, pat = GEN_FCN (icode) (real_target, args[0].op, args[1].op); break; case 3: - pat = GEN_FCN (icode) (real_target, args[0].op, args[1].op, - args[2].op); + if (const_vec_dup) + { + switch (icode) + { + case CODE_FOR_avx512f_vec_dupv16sf_mask: + if (all_1s_mask) + icode = CODE_FOR_avx512f_vec_dupv16sf; + break; + case CODE_FOR_avx512f_vec_dupv8df_mask: + if (all_1s_mask) + icode = CODE_FOR_avx512f_vec_dupv8df; + break; + case CODE_FOR_avx512vl_vec_dupv8sf_mask: + if (all_1s_mask) + icode = CODE_FOR_avx512vl_vec_dupv8sf; + break; + case CODE_FOR_avx512vl_vec_dupv4df_mask: + if (all_1s_mask) + icode = CODE_FOR_avx512vl_vec_dupv4df; + break; + case CODE_FOR_avx512vl_vec_dupv4sf_mask: + if (all_1s_mask) + icode = CODE_FOR_avx512vl_vec_dupv4sf; + break; + case CODE_FOR_avx512vl_vec_dupv2df_mask: + if (all_1s_mask) + icode = CODE_FOR_avx512vl_vec_dupv2df; + break; + case CODE_FOR_avx512f_vec_dup_gprv16si_mask: + if (all_1s_mask) + icode = CODE_FOR_avx512f_vec_dupv16si_1; + else + icode = CODE_FOR_avx512f_vec_dupv16si_mask_1; + break; + case CODE_FOR_avx512f_vec_dup_gprv8di_mask: + if (all_1s_mask) + icode = CODE_FOR_avx512f_vec_dupv8di_1; + else + icode = CODE_FOR_avx512f_vec_dupv8di_mask_1; + break; + case CODE_FOR_avx512vl_vec_dup_gprv8si_mask: + if (all_1s_mask) + icode = CODE_FOR_avx512vl_vec_dupv8si_1; + else + icode = CODE_FOR_avx512vl_vec_dupv8si_mask_1; + break; + case CODE_FOR_avx512vl_vec_dup_gprv4si_mask: + if (all_1s_mask) + icode = CODE_FOR_avx512vl_vec_dupv4si_1; + else + icode = CODE_FOR_avx512vl_vec_dupv4si_mask_1; + break; + case CODE_FOR_avx512vl_vec_dup_gprv4di_mask: + if (all_1s_mask) + icode = CODE_FOR_avx512vl_vec_dupv4di_1; + else + icode = CODE_FOR_avx512vl_vec_dupv4di_mask_1; + break; + case CODE_FOR_avx512vl_vec_dup_gprv2di_mask: + if (all_1s_mask) + icode = CODE_FOR_avx512vl_vec_dupv2di_1; + else + icode = CODE_FOR_avx512vl_vec_dupv2di_mask_1; + break; + default: + break; + } + if (all_1s_mask) + pat = GEN_FCN (icode) (real_target, args[0].op); + else + pat = GEN_FCN (icode) (real_target, args[0].op, args[1].op, + args[2].op); + } + else + pat = GEN_FCN (icode) (real_target, args[0].op, args[1].op, + args[2].op); break; case 4: pat = GEN_FCN (icode) (real_target, args[0].op, args[1].op, @@ -45963,28 +46098,41 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d) { /* Use vpbroadcast{b,w,d}. */ rtx (*gen) (rtx, rtx) = NULL; + machine_mode smode = VOIDmode; switch (d->vmode) { case E_V64QImode: if (TARGET_AVX512BW) - gen = gen_avx512bw_vec_dupv64qi_1; + { + smode = V16QImode; + gen = gen_avx512bw_vec_dupv64qi; + } break; case E_V32QImode: - gen = gen_avx2_pbroadcastv32qi_1; + smode = V16QImode; + gen = gen_avx2_pbroadcastv32qi; break; case E_V32HImode: if (TARGET_AVX512BW) - gen = gen_avx512bw_vec_dupv32hi_1; + { + smode = V8HImode; + gen = gen_avx512bw_vec_dupv32hi; + } break; case E_V16HImode: - gen = gen_avx2_pbroadcastv16hi_1; + smode = V8HImode; + gen = gen_avx2_pbroadcastv16hi; break; case E_V16SImode: if (TARGET_AVX512F) - gen = gen_avx512f_vec_dupv16si_1; + { + smode = V4SImode; + gen = gen_avx512f_vec_dupv16si; + } break; case E_V8SImode: - gen = gen_avx2_pbroadcastv8si_1; + smode = V4SImode; + gen = gen_avx2_pbroadcastv8si; break; case E_V16QImode: gen = gen_avx2_pbroadcastv16qi; @@ -45993,19 +46141,25 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d) gen = gen_avx2_pbroadcastv8hi; break; case E_V16SFmode: + smode = SFmode; if (TARGET_AVX512F) - gen = gen_avx512f_vec_dupv16sf_1; + gen = gen_avx512f_vec_dupv16sf; break; case E_V8SFmode: - gen = gen_avx2_vec_dupv8sf_1; + smode = SFmode; + gen = gen_vec_dupv8sf; break; case E_V8DFmode: + smode = DFmode; if (TARGET_AVX512F) - gen = gen_avx512f_vec_dupv8df_1; + gen = gen_avx512f_vec_dupv8df; break; case E_V8DImode: if (TARGET_AVX512F) - gen = gen_avx512f_vec_dupv8di_1; + { + smode = V2DImode; + gen = gen_avx512f_vec_dupv8di; + } break; /* For other modes prefer other shuffles this function creates. */ default: break; @@ -46013,7 +46167,23 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d) if (gen != NULL) { if (!d->testing_p) - emit_insn (gen (d->target, d->op0)); + { + if (smode == VOIDmode) + emit_insn (gen (d->target, d->op0)); + else + { + rtx op = d->op0; + unsigned int oppos = 0; + if (SUBREG_P (op)) + { + op = SUBREG_REG (op); + oppos = SUBREG_BYTE (op); + } + emit_insn (gen (d->target, + gen_rtx_SUBREG (smode, op, + oppos))); + } + } return true; } } diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 7fb2b144f47..4a6fa077db5 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -4399,7 +4399,7 @@ else { rtx tmp = lowpart_subreg (V16SFmode, operands[3], V4SFmode); - emit_insn (gen_avx512f_vec_dupv16sf_1 (tmp, tmp)); + emit_insn (gen_avx512f_vec_dupv16sf (tmp, tmp)); } } else diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md index bd262d77c6b..1d80de0634f 100644 --- a/gcc/config/i386/predicates.md +++ b/gcc/config/i386/predicates.md @@ -1048,6 +1048,19 @@ (ior (match_operand 0 "nonimmediate_operand") (match_code "const_vector"))) +;; Return true when OP is CONST_VECTOR which can be represented by +;; VEC_DUPLICATE. +(define_predicate "const_vector_duplicate_operand" + (and (match_code "const_vector") + (match_test "!standard_sse_constant_p (op, mode)")) +{ + int i, nunits = GET_MODE_NUNITS (mode); + for (i = 1; i < nunits; i++) + if (CONST_VECTOR_ELT (op, i) != CONST_VECTOR_ELT (op, 0)) + return false; + return true; +}) + ;; Return true when OP is nonimmediate or standard SSE constant. (define_predicate "nonimmediate_or_sse_const_operand" (ior (match_operand 0 "nonimmediate_operand") diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index ee73e1fdf80..065d6ab63b6 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -304,6 +304,10 @@ (define_mode_iterator VF_512 [V16SF V8DF]) +(define_mode_iterator VF48_AVX512VL + [V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL") + V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")]) + (define_mode_iterator VI48_AVX512VL [V16SI (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL") V8DI (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")]) @@ -7117,42 +7121,6 @@ (set_attr "prefix" "orig,maybe_evex") (set_attr "mode" "SF")]) -(define_insn "avx2_vec_dup<mode>" - [(set (match_operand:VF1_128_256 0 "register_operand" "=v") - (vec_duplicate:VF1_128_256 - (vec_select:SF - (match_operand:V4SF 1 "register_operand" "v") - (parallel [(const_int 0)]))))] - "TARGET_AVX2" - "vbroadcastss\t{%1, %0|%0, %1}" - [(set_attr "type" "sselog1") - (set_attr "prefix" "maybe_evex") - (set_attr "mode" "<MODE>")]) - -(define_insn "avx2_vec_dupv8sf_1" - [(set (match_operand:V8SF 0 "register_operand" "=v") - (vec_duplicate:V8SF - (vec_select:SF - (match_operand:V8SF 1 "register_operand" "v") - (parallel [(const_int 0)]))))] - "TARGET_AVX2" - "vbroadcastss\t{%x1, %0|%0, %x1}" - [(set_attr "type" "sselog1") - (set_attr "prefix" "maybe_evex") - (set_attr "mode" "V8SF")]) - -(define_insn "avx512f_vec_dup<mode>_1" - [(set (match_operand:VF_512 0 "register_operand" "=v") - (vec_duplicate:VF_512 - (vec_select:<ssescalarmode> - (match_operand:VF_512 1 "register_operand" "v") - (parallel [(const_int 0)]))))] - "TARGET_AVX512F" - "vbroadcast<bcstscalarsuff>\t{%x1, %0|%0, %x1}" - [(set_attr "type" "sselog1") - (set_attr "prefix" "evex") - (set_attr "mode" "<MODE>")]) - ;; Although insertps takes register source, we prefer ;; unpcklps with register source since it is shorter. (define_insn "*vec_concatv2sf_sse4_1" @@ -17908,34 +17876,16 @@ [(set (match_operand:VI 0 "register_operand" "=x,v") (vec_duplicate:VI (vec_select:<ssescalarmode> - (match_operand:<ssexmmmode> 1 "nonimmediate_operand" "xm,vm") + (match_operand:<ssexmmmode> 1 "register_operand" "x,v") (parallel [(const_int 0)]))))] "TARGET_AVX2" - "vpbroadcast<ssemodesuffix>\t{%1, %0|%0, %<iptr>1}" + "vpbroadcast<ssemodesuffix>\t{%1, %0|%0, %1}" [(set_attr "isa" "*,<pbroadcast_evex_isa>") (set_attr "type" "ssemov") (set_attr "prefix_extra" "1") (set_attr "prefix" "vex,evex") (set_attr "mode" "<sseinsnmode>")]) -(define_insn "avx2_pbroadcast<mode>_1" - [(set (match_operand:VI_256 0 "register_operand" "=x,x,v,v") - (vec_duplicate:VI_256 - (vec_select:<ssescalarmode> - (match_operand:VI_256 1 "nonimmediate_operand" "m,x,m,v") - (parallel [(const_int 0)]))))] - "TARGET_AVX2" - "@ - vpbroadcast<ssemodesuffix>\t{%1, %0|%0, %<iptr>1} - vpbroadcast<ssemodesuffix>\t{%x1, %0|%0, %x1} - vpbroadcast<ssemodesuffix>\t{%1, %0|%0, %<iptr>1} - vpbroadcast<ssemodesuffix>\t{%x1, %0|%0, %x1}" - [(set_attr "isa" "*,*,<pbroadcast_evex_isa>,<pbroadcast_evex_isa>") - (set_attr "type" "ssemov") - (set_attr "prefix_extra" "1") - (set_attr "prefix" "vex") - (set_attr "mode" "<sseinsnmode>")]) - (define_insn "<avx2_avx512>_permvar<mode><mask_name>" [(set (match_operand:VI48F_256_512 0 "register_operand" "=v") (unspec:VI48F_256_512 @@ -18111,38 +18061,10 @@ (set_attr "prefix" "vex") (set_attr "mode" "OI")]) -(define_insn "avx2_vec_dupv4df" - [(set (match_operand:V4DF 0 "register_operand" "=v") - (vec_duplicate:V4DF - (vec_select:DF - (match_operand:V2DF 1 "register_operand" "v") - (parallel [(const_int 0)]))))] - "TARGET_AVX2" - "vbroadcastsd\t{%1, %0|%0, %1}" - [(set_attr "type" "sselog1") - (set_attr "prefix" "maybe_evex") - (set_attr "mode" "V4DF")]) - -(define_insn "<avx512>_vec_dup<mode>_1" - [(set (match_operand:VI_AVX512BW 0 "register_operand" "=v,v") - (vec_duplicate:VI_AVX512BW - (vec_select:<ssescalarmode> - (match_operand:VI_AVX512BW 1 "nonimmediate_operand" "v,m") - (parallel [(const_int 0)]))))] - "TARGET_AVX512F" - "@ - vpbroadcast<ssemodesuffix>\t{%x1, %0|%0, %x1} - vpbroadcast<ssemodesuffix>\t{%x1, %0|%0, %<iptr>1}" - [(set_attr "type" "ssemov") - (set_attr "prefix" "evex") - (set_attr "mode" "<sseinsnmode>")]) - (define_insn "<avx512>_vec_dup<mode><mask_name>" - [(set (match_operand:V48_AVX512VL 0 "register_operand" "=v") - (vec_duplicate:V48_AVX512VL - (vec_select:<ssescalarmode> - (match_operand:<ssexmmmode> 1 "nonimmediate_operand" "vm") - (parallel [(const_int 0)]))))] + [(set (match_operand:VF48_AVX512VL 0 "register_operand" "=v") + (vec_duplicate:VF48_AVX512VL + (match_operand:<ssescalarmode> 1 "nonimmediate_operand" "vm")))] "TARGET_AVX512F" { /* There is no DF broadcast (in AVX-512*) to 128b register. @@ -18156,14 +18078,49 @@ (set_attr "prefix" "evex") (set_attr "mode" "<sseinsnmode>")]) +(define_insn_and_split "*const_vec_dup<mode>" + [(set (match_operand:V48_AVX512VL 0 "register_operand") + (match_operand:V48_AVX512VL 1 "const_vector_duplicate_operand"))] + "TARGET_AVX512F && can_create_pseudo_p ()" + "#" + "&& 1" + [(set (match_dup 0) (vec_duplicate:V48_AVX512VL (match_dup 1)))] +{ + rtx val = CONST_VECTOR_ELT (operands[1], 0); + machine_mode scalar_mode = GET_MODE_INNER (<MODE>mode); + operands[1] = validize_mem (force_const_mem (scalar_mode, val)); +}) + +(define_insn "<avx512>_vec_dup<mode><mask_name>" + [(set (match_operand:VI48_AVX512VL 0 "register_operand" "=v") + (vec_duplicate:VI48_AVX512VL + (vec_select:<ssescalarmode> + (match_operand:<ssexmmmode> 1 "register_operand" "v") + (parallel [(const_int 0)]))))] + "TARGET_AVX512F" + "v<sseintprefix>broadcast<bcstscalarsuff>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}" + [(set_attr "type" "ssemov") + (set_attr "prefix" "evex") + (set_attr "mode" "<sseinsnmode>")]) + +(define_insn "<avx512>_vec_dup<mode><mask_name>_1" + [(set (match_operand:VI48_AVX512VL 0 "register_operand" "=v") + (vec_duplicate:VI48_AVX512VL + (match_operand:<ssescalarmode> 1 "memory_operand" "m")))] + "TARGET_AVX512F" + "v<sseintprefix>broadcast<bcstscalarsuff>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %<iptr>1}" + [(set_attr "type" "ssemov") + (set_attr "prefix" "evex") + (set_attr "mode" "<sseinsnmode>")]) + (define_insn "<avx512>_vec_dup<mode><mask_name>" [(set (match_operand:VI12_AVX512VL 0 "register_operand" "=v") (vec_duplicate:VI12_AVX512VL (vec_select:<ssescalarmode> - (match_operand:<ssexmmmode> 1 "nonimmediate_operand" "vm") + (match_operand:<ssexmmmode> 1 "register_operand" "v") (parallel [(const_int 0)]))))] "TARGET_AVX512BW" - "vpbroadcast<bcstscalarsuff>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %<iptr>1}" + "vpbroadcast<bcstscalarsuff>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}" [(set_attr "type" "ssemov") (set_attr "prefix" "evex") (set_attr "mode" "<sseinsnmode>")]) @@ -18205,8 +18162,8 @@ (set_attr "mode" "<sseinsnmode>")]) (define_insn "<mask_codefor><avx512>_vec_dup_gpr<mode><mask_name>" - [(set (match_operand:V48_AVX512VL 0 "register_operand" "=v,v") - (vec_duplicate:V48_AVX512VL + [(set (match_operand:VI48_AVX512VL 0 "register_operand" "=v,v") + (vec_duplicate:VI48_AVX512VL (match_operand:<ssescalarmode> 1 "nonimmediate_operand" "vm,r")))] "TARGET_AVX512F" "v<sseintprefix>broadcast<bcstscalarsuff>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}" @@ -18215,8 +18172,7 @@ (set_attr "mode" "<sseinsnmode>") (set (attr "enabled") (if_then_else (eq_attr "alternative" "1") - (symbol_ref "GET_MODE_CLASS (<ssescalarmode>mode) == MODE_INT - && (<ssescalarmode>mode != DImode || TARGET_64BIT)") + (symbol_ref "<ssescalarmode>mode != DImode || TARGET_64BIT") (const_int 1)))]) (define_insn "vec_dupv4sf" @@ -18545,8 +18501,7 @@ or VSHUFF128. */ gcc_assert (<MODE>mode == V8SFmode); if ((mask & 1) == 0) - emit_insn (gen_avx2_vec_dupv8sf (op0, - gen_lowpart (V4SFmode, op0))); + emit_insn (gen_vec_dupv8sf (op0, gen_lowpart (V4SFmode, op0))); else emit_insn (gen_avx512vl_shuf_f32x4_1 (op0, op0, op0, GEN_INT (4), GEN_INT (5), diff --git a/gcc/testsuite/gcc.target/i386/avx2-vbroadcastss_ps256-1.c b/gcc/testsuite/gcc.target/i386/avx2-vbroadcastss_ps256-1.c index dfac3916b08..3ff7497aa21 100644 --- a/gcc/testsuite/gcc.target/i386/avx2-vbroadcastss_ps256-1.c +++ b/gcc/testsuite/gcc.target/i386/avx2-vbroadcastss_ps256-1.c @@ -1,6 +1,7 @@ /* { dg-do compile } */ /* { dg-options "-mavx2 -O2" } */ -/* { dg-final { scan-assembler "vbroadcastss\[ \\t\]+\[^\n\]*%xmm\[0-9\]" } } */ +/* { dg-final { scan-assembler "vbroadcastss\[ \\t\]+\[^\n\]*%ymm\[0-9\]" } } */ +/* { dg-final { scan-assembler-not "vmovaps\[\t \]*\[^,\]*,%xmm\[0-9\]" } } */ #include <immintrin.h> diff --git a/gcc/testsuite/gcc.target/i386/avx512-binop-7.h b/gcc/testsuite/gcc.target/i386/avx512-binop-7.h new file mode 100644 index 00000000000..513901847a9 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512-binop-7.h @@ -0,0 +1,12 @@ +#include <immintrin.h> + +#define PASTER2(x,y) x##y +#define PASTER3(x,y,z) _mm##x##_##y##_##z +#define OP(vec, op, suffix) PASTER3 (vec, op, suffix) +#define DUP(vec, suffix, val) PASTER3 (vec, set1, suffix) (val) + +type +foo (type x) +{ + return OP (vec, op, op_suffix) (DUP (vec, dup_suffix, 2.1f), x); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512f-add-sf-zmm-7.c b/gcc/testsuite/gcc.target/i386/avx512f-add-sf-zmm-7.c new file mode 100644 index 00000000000..de23c73e71c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512f-add-sf-zmm-7.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512f -O2" } */ +/* { dg-final { scan-assembler-times "vaddps\[ \\t\]+\[^\n\]*\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */ +/* { dg-final { scan-assembler-times "long\[ \\t\]+1074161254" 1 } } */ +/* { dg-final { scan-assembler-not "vbroadcastss\[^\n\]*%zmm\[0-9\]+" } } */ + +#define type __m512 +#define vec 512 +#define op add +#define op_suffix ps +#define dup_suffix ps +#define SCALAR float + +#include "avx512-binop-7.h" diff --git a/gcc/testsuite/gcc.target/i386/avx512f-add-si-zmm-7.c b/gcc/testsuite/gcc.target/i386/avx512f-add-si-zmm-7.c new file mode 100644 index 00000000000..9e5f800118d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512f-add-si-zmm-7.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512f -O2" } */ +/* { dg-final { scan-assembler-times "vpaddd\[ \\t\]+\[^\n\]*\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */ + +#define type __m512i +#define vec 512 +#define op add +#define op_suffix epi32 +#define dup_suffix epi32 +#define SCALAR int + +#include "avx512-binop-7.h" diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-add-di-xmm-7.c b/gcc/testsuite/gcc.target/i386/avx512vl-add-di-xmm-7.c new file mode 100644 index 00000000000..7d921aded31 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512vl-add-di-xmm-7.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vpaddq\[ \\t\]+\[^\n\]*\\\{1to\[1-8\]+\\\}, %xmm\[0-9\]+, %xmm0" 1 } } */ +/* { dg-final { scan-assembler-times "(?:long|quad)\[ \\t\]+2" 1 } } */ + +#define type __m128i +#define vec +#define op add +#define op_suffix epi64 +#define dup_suffix epi64x +#define SCALAR int + +#include "avx512-binop-7.h" diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-add-sf-xmm-7.c b/gcc/testsuite/gcc.target/i386/avx512vl-add-sf-xmm-7.c new file mode 100644 index 00000000000..2fc1d5c4824 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512vl-add-sf-xmm-7.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vaddps\[ \\t\]+\[^\n\]*\\\{1to\[1-8\]+\\\}, %xmm\[0-9\]+, %xmm0" 1 } } */ +/* { dg-final { scan-assembler-times "long\[ \\t\]+1074161254" 1 } } */ + +#define type __m128 +#define vec +#define op add +#define op_suffix ps +#define dup_suffix ps +#define SCALAR float + +#include "avx512-binop-7.h" diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-add-sf-ymm-7.c b/gcc/testsuite/gcc.target/i386/avx512vl-add-sf-ymm-7.c new file mode 100644 index 00000000000..436aae757ca --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512vl-add-sf-ymm-7.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vaddps\[ \\t\]+\[^\n\]*\\\{1to\[1-8\]+\\\}, %ymm\[0-9\]+, %ymm0" 1 } } */ +/* { dg-final { scan-assembler-times "long\[ \\t\]+1074161254" 1 } } */ + +#define type __m256 +#define vec 256 +#define op add +#define op_suffix ps +#define dup_suffix ps +#define SCALAR float + +#include "avx512-binop-7.h" diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-add-si-xmm-7.c b/gcc/testsuite/gcc.target/i386/avx512vl-add-si-xmm-7.c new file mode 100644 index 00000000000..0bd7a0c5e96 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512vl-add-si-xmm-7.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vpaddd\[ \\t\]+\[^\n\]*\\\{1to\[1-8\]+\\\}, %xmm\[0-9\]+, %xmm0" 1 } } */ +/* { dg-final { scan-assembler-times "(?:long|quad)\[ \\t\]+2" 1 } } */ + +#define type __m128i +#define vec +#define op add +#define op_suffix epi32 +#define dup_suffix epi32 +#define SCALAR int + +#include "avx512-binop-7.h" diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-add-si-ymm-7.c b/gcc/testsuite/gcc.target/i386/avx512vl-add-si-ymm-7.c new file mode 100644 index 00000000000..fdde09fca1e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512vl-add-si-ymm-7.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vpaddd\[ \\t\]+\[^\n\]*\\\{1to\[1-8\]+\\\}, %ymm\[0-9\]+, %ymm0" 1 } } */ +/* { dg-final { scan-assembler-times "(?:long|quad)\[ \\t\]+2" 1 } } */ + +#define type __m256i +#define vec 256 +#define op add +#define op_suffix epi32 +#define dup_suffix epi32 +#define SCALAR int + +#include "avx512-binop-7.h" diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-vbroadcast-3.c b/gcc/testsuite/gcc.target/i386/avx512vl-vbroadcast-3.c index 7233398cd64..1c62364dac4 100644 --- a/gcc/testsuite/gcc.target/i386/avx512vl-vbroadcast-3.c +++ b/gcc/testsuite/gcc.target/i386/avx512vl-vbroadcast-3.c @@ -151,8 +151,8 @@ f16 (V2 *x) } /* { dg-final { scan-assembler-times "vbroadcastss\[^\n\r]*%\[re\]di\[^\n\r]*%xmm16" 4 } } */ -/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\r]*%xmm16\[^\n\r]*%ymm16" 3 } } */ -/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\r]*%\[re\]di\[^\n\r]*%ymm16" 3 } } */ +/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\r]*%xmm16\[^\n\r]*%ymm16" 1 } } */ +/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\r]*%\[re\]di\[^\n\r]*%ymm16" 4 } } */ /* { dg-final { scan-assembler-times "vpermilps\[^\n\r]*\\\$0\[^\n\r]*%xmm16\[^\n\r]*%xmm16" 1 } } */ /* { dg-final { scan-assembler-times "vpermilps\[^\n\r]*\\\$85\[^\n\r]*%xmm16\[^\n\r]*%xmm16" 1 } } */ /* { dg-final { scan-assembler-times "vpermilps\[^\n\r]*\\\$170\[^\n\r]*%xmm16\[^\n\r]*%xmm16" 1 } } */ @@ -160,3 +160,4 @@ f16 (V2 *x) /* { dg-final { scan-assembler-times "vpermilps\[^\n\r]*\\\$0\[^\n\r]*%ymm16\[^\n\r]*%ymm16" 1 } } */ /* { dg-final { scan-assembler-times "vpermilps\[^\n\r]*\\\$85\[^\n\r]*%ymm16\[^\n\r]*%ymm16" 2 } } */ /* { dg-final { scan-assembler-times "vshuff32x4\[^\n\r]*\\\$3\[^\n\r]*%ymm16\[^\n\r]*%ymm16\[^\n\r]*%ymm16" 2 } } */ +/* { dg-final { scan-assembler-times "vshuff32x4\[^\n\r]*\\\$0\[^\n\r]*%ymm16\[^\n\r]*%ymm16\[^\n\r]*%ymm16" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr87537-2.c b/gcc/testsuite/gcc.target/i386/pr87537-2.c new file mode 100644 index 00000000000..19ded7e64b2 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr87537-2.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512f -O2 -mtune=skylake" } */ +/* { dg-final { scan-assembler-times "vbroadcastss\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]" 1 } } */ +/* { dg-final { scan-assembler-not "vmovss" } } */ + +#include <immintrin.h> + +__m512 +foo (float *x) +{ + return _mm512_broadcastss_ps (_mm_load_ss(x)); +} diff --git a/gcc/testsuite/gcc.target/i386/pr87537-3.c b/gcc/testsuite/gcc.target/i386/pr87537-3.c new file mode 100644 index 00000000000..ee7781a69e4 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr87537-3.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512f -O2 -mtune=skylake" } */ +/* { dg-final { scan-assembler-times "vbroadcastss\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]" 1 } } */ +/* { dg-final { scan-assembler-not "vmovss" } } */ + +#include <immintrin.h> + +__m512 +foo (void) +{ + return _mm512_set1_ps (2.0f); +} diff --git a/gcc/testsuite/gcc.target/i386/pr87537-4.c b/gcc/testsuite/gcc.target/i386/pr87537-4.c new file mode 100644 index 00000000000..c5bfef1366e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr87537-4.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512f -O2 -mtune=skylake" } */ +/* { dg-final { scan-assembler-times "vbroadcastsd\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]" 1 } } */ +/* { dg-final { scan-assembler-not "vmovsd" } } */ + +#include <immintrin.h> + +__m512d +foo (double *x) +{ + return _mm512_broadcastsd_pd (_mm_load_sd(x)); +} diff --git a/gcc/testsuite/gcc.target/i386/pr87537-5.c b/gcc/testsuite/gcc.target/i386/pr87537-5.c new file mode 100644 index 00000000000..4f806f4fbf3 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr87537-5.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512f -O2 -mtune=skylake" } */ +/* { dg-final { scan-assembler-times "vbroadcastsd\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]" 1 } } */ +/* { dg-final { scan-assembler-not "vmovsd" } } */ + +#include <immintrin.h> + +__m512d +foo (void) +{ + return _mm512_set1_pd (2.0f); +} diff --git a/gcc/testsuite/gcc.target/i386/pr87537-6.c b/gcc/testsuite/gcc.target/i386/pr87537-6.c new file mode 100644 index 00000000000..b53588b907b --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr87537-6.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512vl -O2 -mtune=skylake" } */ +/* { dg-final { scan-assembler-times "vbroadcastss\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]" 1 } } */ +/* { dg-final { scan-assembler-not "vmovss" } } */ + +#include <immintrin.h> + +__m256 +foo (float *x) +{ + return _mm256_broadcastss_ps (_mm_load_ss(x)); +} diff --git a/gcc/testsuite/gcc.target/i386/pr87537-7.c b/gcc/testsuite/gcc.target/i386/pr87537-7.c new file mode 100644 index 00000000000..d07a1e3de55 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr87537-7.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512vl -O2 -mtune=skylake" } */ +/* { dg-final { scan-assembler-times "vbroadcastss\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]" 1 } } */ +/* { dg-final { scan-assembler-not "vmovss" } } */ + +#include <immintrin.h> + +__m256 +foo (void) +{ + return _mm256_set1_ps (2.0f); +} diff --git a/gcc/testsuite/gcc.target/i386/pr87537-8.c b/gcc/testsuite/gcc.target/i386/pr87537-8.c new file mode 100644 index 00000000000..dbf4ee3551d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr87537-8.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512vl -O2 -mtune=skylake" } */ +/* { dg-final { scan-assembler-times "vbroadcastss\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]" 1 } } */ +/* { dg-final { scan-assembler-not "vmovss" } } */ + +#include <immintrin.h> + +__m128 +foo (float *x) +{ + return _mm_broadcastss_ps (_mm_load_ss(x)); +} diff --git a/gcc/testsuite/gcc.target/i386/pr87537-9.c b/gcc/testsuite/gcc.target/i386/pr87537-9.c new file mode 100644 index 00000000000..8e09382d876 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr87537-9.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512vl -O2 -mtune=skylake" } */ +/* { dg-final { scan-assembler-times "vbroadcastss\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]" 1 } } */ +/* { dg-final { scan-assembler-not "vmovss" } } */ + +#include <immintrin.h> + +__m128 +foo (void) +{ + return _mm_set1_ps (2.0f); +} -- 2.17.2