On Fri, Oct 25, 2019 at 1:23 PM Hongtao Liu <crazy...@gmail.com> wrote:
>
> On Fri, Oct 25, 2019 at 2:39 AM Uros Bizjak <ubiz...@gmail.com> wrote:
> >
> > On Wed, Oct 23, 2019 at 7:48 AM Hongtao Liu <crazy...@gmail.com> wrote:
> > >
> > > Update patch:
> > > Add m constraint to define_insn (sse_1_round<ssescalaemodesuffix,
> > > *sse_1_round<ssescalaemodesuffix)  to fix some unrecognized insn error
> > > when under sse4 but not avx512f.
> >
> > It looks to me that the original insn is incompletely defined. It
> > should use nonimmediate_operand, "m" constraint and <iptr> pointer
> > size modifier. Something like:
> >
> > (define_insn "sse4_1_round<ssescalarmodesuffix>"
> >   [(set (match_operand:VF_128 0 "register_operand" "=Yr,*x,x,v")
> >     (vec_merge:VF_128
> >       (unspec:VF_128
> >         [(match_operand:VF_128 2 "nonimmediate_operand" "Yrm,*xm,xm,vm")
> >          (match_operand:SI 3 "const_0_to_15_operand" "n,n,n,n")]
> >         UNSPEC_ROUND)
> >       (match_operand:VF_128 1 "register_operand" "0,0,x,v")
> >       (const_int 1)))]
> >   "TARGET_SSE4_1"
> >   "@
> >    round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}
> >    round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}
> >    vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}
> >    vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}"
> >
> > >
> > > Changelog:
> > > gcc/
> > >         * config/i386/sse.md:  (sse4_1_round<ssescalarmodesuffix>):
> > > Change constraint x to xm
> > >         since vround support memory operand.
> > >         * (*sse4_1_round<ssescalarmodesuffix>): Ditto.
> > >
> > > Bootstrap and regression test ok.
> > >
> > > On Wed, Oct 23, 2019 at 9:56 AM Hongtao Liu <crazy...@gmail.com> wrote:
> > > >
> > > > Hi uros:
> > > >   This patch fixes false dependence of scalar operations
> > > > vrcp/vsqrt/vrsqrt/vrndscale.
> > > >   Bootstrap ok, regression test on i386/x86 ok.
> > > >
> > > >   It does something like this:
> > > > -----
> > > > For scalar instructions with both xmm operands:
> > > >
> > > > op %xmmN,%xmmQ,%xmmQ ----> op %xmmN, %xmmN, %xmmQ
> > > >
> > > > for scalar instructions with one mem  or gpr operand:
> > > >
> > > > op mem/gpr, %xmmQ, %xmmQ
> > > >
> > > > --->  using pass rpad ---->
> > > >
> > > > xorps %xmmN, %xmmN, %xxN
> > > > op mem/gpr, %xmmN, %xmmQ
> > > >
> > > > Performance influence of SPEC2017 fprate which is tested on SKX
> > > >
> > > > 503.bwaves_r -0.03%
> > > > 507.cactuBSSN_r -0.22%
> > > > 508.namd_r -0.02%
> > > > 510.parest_r 0.37%
> > > > 511.povray_r 0.74%
> > > > 519.lbm_r 0.24%
> > > > 521.wrf_r 2.35%
> > > > 526.blender_r 0.71%
> > > > 527.cam4_r 0.65%
> > > > 538.imagick_r 0.95%
> > > > 544.nab_r -0.37
> > > > 549.fotonik3d_r 0.24%
> > > > 554.roms_r 0.90%
> > > > fprate geomean 0.50%
> > > > -----
> > > >
> > > > Changelog
> > > > gcc/
> > > >         * config/i386/i386.md (*rcpsf2_sse): Add
> > > >         avx_partial_xmm_update, prefer m constraint for TARGET_AVX.
> > > >         (*rsqrtsf2_sse): Ditto.
> > > >         (*sqrt<mode>2_sse): Ditto.
> > > >         (sse4_1_round<mode>2): separate constraint vm, add
> > > >         avx_partail_xmm_update, prefer m constraint for TARGET_AVX.
> > > >         * config/i386/sse.md (*sse_vmrcpv4sf2"): New define_insn used
> > > >         by pass rpad.
> > > >         (*<sse>_vmsqrt<mode>2<mask_scalar_name><round_scalar_name>*):
> > > >         Ditto.
> > > >         (*sse_vmrsqrtv4sf2): Ditto.
> > > >         (*avx512f_rndscale<mode><round_saeonly_name>): Ditto.
> > > >         (*sse4_1_round<ssescalarmodesuffix>): Ditto.
> > > >
> > > > gcc/testsuite
> > > >         * gcc.target/i386/pr87007-4.c: New test.
> > > >         * gcc.target/i386/pr87007-5.c: Ditto.
> > > >
> > > >
> > > > --
> > > > BR,
> > > > Hongtao
> >
> >     (set (attr "preferred_for_speed")
> >       (cond [(eq_attr "alternative" "1")
> >            (symbol_ref "TARGET_AVX || !TARGET_SSE_PARTIAL_REG_DEPENDENCY")
> >         (eq_attr "alternative" "2")
> > -          (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
> > +          (symbol_ref "TARGET_AVX || !TARGET_SSE_PARTIAL_REG_DEPENDENCY")
> >         ]
> >         (symbol_ref "true")))])
> >
> > This can be written as:
> >
> >     (set (attr "preferred_for_speed")
> >       (cond [(match_test "TARGET_AVX")
> >            (symbol_ref "true")
> >         (eq_attr "alternative" "1,2")
> >           (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
> >         ]
> >         (symbol_ref "true")))])
> >
> > Uros.
>
> Yes, after these fixed, i'll upstream to trunk, ok?
Update patch.
> --
> BR,
> Hongtao



-- 
BR,
Hongtao
From 1892f7b52ea0c5b59d3d0c9e50330f70712fc9cc Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao....@intel.com>
Date: Wed, 9 Oct 2019 11:21:25 +0800
Subject: [PATCH] Fix false dependence of scalar operation
 vrcp/vsqrt/vrsqrt/vrndscale

For instructions with xmm operand:

op %xmmN,%xmmQ,%xmmQ ----> op %xmmN, %xmmN, %xmmQ

for instruction with mem operand or gpr operand:

op mem/gpr, %xmmQ, %xmmQ

--->  using pass rpad ---->

xorps %xmmN, %xmmN, %xxN
op mem/gpr, %xmmN, %xmmQ

Performance influence of SPEC2017 fprate which is tested on SKX
----
503.bwaves_r	-0.03%
507.cactuBSSN_r -0.22%
508.namd_r	-0.02%
510.parest_r	0.37%
511.povray_r	0.74%
519.lbm_r	0.24%
521.wrf_r	2.35%
526.blender_r	0.71%
527.cam4_r	0.65%
538.imagick_r	0.95%
544.nab_r	-0.37
549.fotonik3d_r 0.24%
554.roms_r	0.90%
fprate geomean	0.50%
-----

Changelog
gcc/
	* config/i386/i386.md (*rcpsf2_sse): Add
	avx_partial_xmm_update, prefer m constraint for TARGET_AVX.
	(*rsqrtsf2_sse): Ditto.
	(*sqrt<mode>2_sse): Ditto.
	(sse4_1_round<mode>2): separate constraint vm, add
	avx_partail_xmm_update, prefer m constraint for TARGET_AVX.
	* config/i386/sse.md (*sse_vmrcpv4sf2"): New define_insn used
	by pass rpad.
	(*<sse>_vmsqrt<mode>2<mask_scalar_name><round_scalar_name>*):
	Ditto.
	(*sse_vmrsqrtv4sf2): Ditto.
	(*avx512f_rndscale<mode><round_saeonly_name>): Ditto.
	(*sse4_1_round<ssescalarmodesuffix>): Ditto.
	(sse4_1_round<ssescalarmodesuffix>): Add m constraint and
	<iptr> pointer size modifier since vround support memory operand.

gcc/testsuite
	* gcc.target/i386/pr87007-4.c: New test.
	* gcc.target/i386/pr87007-5.c: Ditto.
---
 gcc/config/i386/i386.md                   |  67 +++++++-------
 gcc/config/i386/sse.md                    | 105 ++++++++++++++++++++--
 gcc/testsuite/gcc.target/i386/pr87007-4.c |  18 ++++
 gcc/testsuite/gcc.target/i386/pr87007-5.c |  18 ++++
 4 files changed, 172 insertions(+), 36 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr87007-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr87007-5.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 5e0795953d8..fb2235a5e34 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -14843,13 +14843,14 @@
    (set_attr "btver2_sse_attr" "rcp")
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "SF")
+   (set_attr "avx_partial_xmm_update" "false,false,true")
    (set (attr "preferred_for_speed")
-     (cond [(eq_attr "alternative" "1")
-	      (symbol_ref "TARGET_AVX || !TARGET_SSE_PARTIAL_REG_DEPENDENCY")
-	   (eq_attr "alternative" "2")
-	      (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
-	   ]
-	   (symbol_ref "true")))])
+      (cond [(match_test "TARGET_AVX")
+	       (symbol_ref "true")
+	     (eq_attr "alternative" "1,2")
+	       (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
+	    ]
+	    (symbol_ref "true")))])
 
 (define_insn "*fop_xf_1_i387"
   [(set (match_operand:XF 0 "register_operand" "=f,f")
@@ -15089,13 +15090,14 @@
    (set_attr "btver2_sse_attr" "rcp")
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "SF")
+   (set_attr "avx_partial_xmm_update" "false,false,true")
    (set (attr "preferred_for_speed")
-     (cond [(eq_attr "alternative" "1")
-	      (symbol_ref "TARGET_AVX || !TARGET_SSE_PARTIAL_REG_DEPENDENCY")
-	   (eq_attr "alternative" "2")
-	      (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
-	   ]
-	   (symbol_ref "true")))])
+      (cond [(match_test "TARGET_AVX")
+	       (symbol_ref "true")
+	     (eq_attr "alternative" "1,2")
+	       (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
+	    ]
+	    (symbol_ref "true")))])
 
 (define_expand "rsqrtsf2"
   [(set (match_operand:SF 0 "register_operand")
@@ -15120,14 +15122,15 @@
    (set_attr "atom_sse_attr" "sqrt")
    (set_attr "btver2_sse_attr" "sqrt")
    (set_attr "prefix" "maybe_vex")
+   (set_attr "avx_partial_xmm_update" "false,false,true")
    (set_attr "mode" "<MODE>")
    (set (attr "preferred_for_speed")
-     (cond [(eq_attr "alternative" "1")
-	      (symbol_ref "TARGET_AVX || !TARGET_SSE_PARTIAL_REG_DEPENDENCY")
-	   (eq_attr "alternative" "2")
-	      (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
-	   ]
-	   (symbol_ref "true")))])
+      (cond [(match_test "TARGET_AVX")
+	       (symbol_ref "true")
+	     (eq_attr "alternative" "1,2")
+	       (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
+	    ]
+	    (symbol_ref "true")))])
 
 (define_expand "sqrt<mode>2"
   [(set (match_operand:MODEF 0 "register_operand")
@@ -16261,30 +16264,32 @@
 
 
 (define_insn "sse4_1_round<mode>2"
-  [(set (match_operand:MODEF 0 "register_operand" "=x,x,x,v")
+  [(set (match_operand:MODEF 0 "register_operand" "=x,x,x,v,v")
 	(unspec:MODEF
-	  [(match_operand:MODEF 1 "nonimmediate_operand" "0,x,m,vm")
-	   (match_operand:SI 2 "const_0_to_15_operand" "n,n,n,n")]
+	  [(match_operand:MODEF 1 "nonimmediate_operand" "0,x,m,v,m")
+	   (match_operand:SI 2 "const_0_to_15_operand" "n,n,n,n,n")]
 	  UNSPEC_ROUND))]
   "TARGET_SSE4_1"
   "@
    %vround<ssemodesuffix>\t{%2, %d1, %0|%0, %d1, %2}
    %vround<ssemodesuffix>\t{%2, %d1, %0|%0, %d1, %2}
    %vround<ssemodesuffix>\t{%2, %1, %d0|%d0, %1, %2}
+   vrndscale<ssemodesuffix>\t{%2, %d1, %0|%0, %d1, %2}
    vrndscale<ssemodesuffix>\t{%2, %1, %d0|%d0, %1, %2}"
   [(set_attr "type" "ssecvt")
-   (set_attr "prefix_extra" "1,1,1,*")
-   (set_attr "length_immediate" "*,*,*,1")
-   (set_attr "prefix" "maybe_vex,maybe_vex,maybe_vex,evex")
-   (set_attr "isa" "noavx512f,noavx512f,noavx512f,avx512f")
+   (set_attr "prefix_extra" "1,1,1,*,*")
+   (set_attr "length_immediate" "*,*,*,1,1")
+   (set_attr "prefix" "maybe_vex,maybe_vex,maybe_vex,evex,evex")
+   (set_attr "isa" "noavx512f,noavx512f,noavx512f,avx512f,avx512f")
+   (set_attr "avx_partial_xmm_update" "false,false,true,false,true")
    (set_attr "mode" "<MODE>")
    (set (attr "preferred_for_speed")
-     (cond [(eq_attr "alternative" "1")
-	      (symbol_ref "TARGET_AVX || !TARGET_SSE_PARTIAL_REG_DEPENDENCY")
-	   (eq_attr "alternative" "2")
-	      (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
-	   ]
-	   (symbol_ref "true")))])
+      (cond [(match_test "TARGET_AVX")
+	       (symbol_ref "true")
+	     (eq_attr "alternative" "1,2")
+	       (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
+	    ]
+	    (symbol_ref "true")))])
 
 (define_insn "rintxf2"
   [(set (match_operand:XF 0 "register_operand" "=f")
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 403e91d4b17..a0d611c5128 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -2035,6 +2035,25 @@
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "SF")])
 
+(define_insn "*sse_vmrcpv4sf2"
+  [(set (match_operand:V4SF 0 "register_operand" "=x,x")
+	(vec_merge:V4SF
+	  (vec_duplicate:V4SF
+	    (unspec:SF [(match_operand:SF 1 "nonimmediate_operand" "xm,xm")]
+		         UNSPEC_RCP))
+	  (match_operand:V4SF 2 "register_operand" "0,x")
+	  (const_int 1)))]
+  "TARGET_SSE"
+  "@
+   rcpss\t{%1, %0|%0, %k1}
+   vrcpss\t{%1, %2, %0|%0, %2, %k1}"
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "sse")
+   (set_attr "atom_sse_attr" "rcp")
+   (set_attr "btver2_sse_attr" "rcp")
+   (set_attr "prefix" "orig,vex")
+   (set_attr "mode" "SF")])
+
 (define_insn "<mask_codefor>rcp14<mode><mask_name>"
   [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v")
 	(unspec:VF_AVX512VL
@@ -2130,6 +2149,25 @@
    (set_attr "btver2_sse_attr" "sqrt")
    (set_attr "mode" "<ssescalarmode>")])
 
+(define_insn "*<sse>_vmsqrt<mode>2<mask_scalar_name><round_scalar_name>"
+  [(set (match_operand:VF_128 0 "register_operand" "=x,v")
+	(vec_merge:VF_128
+	  (vec_duplicate:VF_128
+	    (sqrt:<ssescalarmode>
+	      (match_operand:<ssescalarmode> 1 "vector_operand" "xBm,<round_scalar_constraint>")))
+	  (match_operand:VF_128 2 "register_operand" "0,v")
+	  (const_int 1)))]
+  "TARGET_SSE"
+  "@
+   sqrt<ssescalarmodesuffix>\t{%1, %0|%0, %<iptr>1}
+   vsqrt<ssescalarmodesuffix>\t{<round_scalar_mask_op3>%1, %2, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %2, %<iptr>1<round_scalar_mask_op3>}"
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "sse")
+   (set_attr "atom_sse_attr" "sqrt")
+   (set_attr "prefix" "<round_scalar_prefix>")
+   (set_attr "btver2_sse_attr" "sqrt")
+   (set_attr "mode" "<ssescalarmode>")])
+
 (define_expand "rsqrt<mode>2"
   [(set (match_operand:VF1_128_256 0 "register_operand")
 	(unspec:VF1_128_256
@@ -2219,6 +2257,23 @@
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "SF")])
 
+(define_insn "*sse_vmrsqrtv4sf2"
+  [(set (match_operand:V4SF 0 "register_operand" "=x,x")
+	(vec_merge:V4SF
+	  (vec_duplicate:V4SF
+	    (unspec:SF [(match_operand:SF 1 "nonimmediate_operand" "xm,xm")]
+		         UNSPEC_RSQRT))
+	  (match_operand:V4SF 2 "register_operand" "0,x")
+	  (const_int 1)))]
+  "TARGET_SSE"
+  "@
+   rsqrtss\t{%1, %0|%0, %k1}
+   vrsqrtss\t{%1, %2, %0|%0, %2, %k1}"
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "sse")
+   (set_attr "prefix" "orig,vex")
+   (set_attr "mode" "SF")])
+
 (define_expand "<code><mode>3<mask_name><round_saeonly_name>"
   [(set (match_operand:VF 0 "register_operand")
 	(smaxmin:VF
@@ -9709,6 +9764,22 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "*avx512f_rndscale<mode><round_saeonly_name>"
+  [(set (match_operand:VF_128 0 "register_operand" "=v")
+	(vec_merge:VF_128
+	  (vec_duplicate:VF_128
+	    (unspec:<ssescalarmode>
+	      [(match_operand:<ssescalarmode> 2 "<round_saeonly_nimm_predicate>" "<round_saeonly_constraint>")
+	       (match_operand:SI 3 "const_0_to_255_operand")]
+	      UNSPEC_ROUND))
+          (match_operand:VF_128 1 "register_operand" "v")
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "vrndscale<ssescalarmodesuffix>\t{%3, <round_saeonly_op4>%2, %1, %0|%0, %1, %<iptr>2<round_saeonly_op4>, %3}"
+  [(set_attr "length_immediate" "1")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<MODE>")])
+
 ;; One bit in mask selects 2 elements.
 (define_insn "avx512f_shufps512_1<mask_name>"
   [(set (match_operand:V16SF 0 "register_operand" "=v")
@@ -17954,17 +18025,41 @@
   [(set (match_operand:VF_128 0 "register_operand" "=Yr,*x,x,v")
 	(vec_merge:VF_128
 	  (unspec:VF_128
-	    [(match_operand:VF_128 2 "register_operand" "Yr,*x,x,v")
+	    [(match_operand:VF_128 2 "nonimmediate_operand" "Yrm,*xm,xm,vm")
 	     (match_operand:SI 3 "const_0_to_15_operand" "n,n,n,n")]
 	    UNSPEC_ROUND)
 	  (match_operand:VF_128 1 "register_operand" "0,0,x,v")
 	  (const_int 1)))]
   "TARGET_SSE4_1"
   "@
-   round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}
-   round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}
-   vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}
-   vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+   round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}
+   round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}
+   vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}
+   vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}"
+  [(set_attr "isa" "noavx,noavx,avx,avx512f")
+   (set_attr "type" "ssecvt")
+   (set_attr "length_immediate" "1")
+   (set_attr "prefix_data16" "1,1,*,*")
+   (set_attr "prefix_extra" "1")
+   (set_attr "prefix" "orig,orig,vex,evex")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "*sse4_1_round<ssescalarmodesuffix>"
+  [(set (match_operand:VF_128 0 "register_operand" "=Yr,*x,x,v")
+	(vec_merge:VF_128
+	  (vec_duplicate:VF_128
+	    (unspec:<ssescalarmode>
+	      [(match_operand:<ssescalarmode> 2 "nonimmediate_operand" "Yrm,*xm,xm,vm")
+	       (match_operand:SI 3 "const_0_to_15_operand" "n,n,n,n")]
+	      UNSPEC_ROUND))
+	  (match_operand:VF_128 1 "register_operand" "0,0,x,v")
+	  (const_int 1)))]
+  "TARGET_SSE4_1"
+  "@
+   round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}
+   round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}
+   vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}
+   vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}"
   [(set_attr "isa" "noavx,noavx,avx,avx512f")
    (set_attr "type" "ssecvt")
    (set_attr "length_immediate" "1")
diff --git a/gcc/testsuite/gcc.target/i386/pr87007-4.c b/gcc/testsuite/gcc.target/i386/pr87007-4.c
new file mode 100644
index 00000000000..e91bdcbac44
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr87007-4.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -march=skylake-avx512 -mfpmath=sse" } */
+
+
+#include<math.h>
+
+extern double d1, d2, d3;
+void
+foo (int n, int k)
+{
+  for (int i = 0; i != n; i++)
+    if(i < k)
+      d1 = floor (d2);
+    else
+      d1 = ceil (d3);
+}
+
+/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr87007-5.c b/gcc/testsuite/gcc.target/i386/pr87007-5.c
new file mode 100644
index 00000000000..20d13cf650b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr87007-5.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -march=skylake-avx512 -mfpmath=sse" } */
+
+
+#include<math.h>
+
+extern double d1, d2, d3;
+void
+foo (int n, int k)
+{
+  for (int i = 0; i != n; i++)
+    if(i < k)
+      d1 = sqrt (d2);
+    else
+      d1 = sqrt (d3);
+}
+
+/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */
-- 
2.19.1

Reply via email to