Re: [PATCH, i386]: Fix PR85950, Unsafe-math-optimizations regresses optimization using SSE4.1 roundss

2018-06-02 Thread H.J. Lu
On Tue, May 29, 2018 at 11:38 AM, Uros Bizjak  wrote:
> Hello!
>
> Attached patch enables l2 for
> TARGET_SSE4.1, and while there, also corrects operand 1 predicate of
> rounds{s,d} instruction.
>
> 2018-05-29  Uros Bizjak  
>
> PR target/85950
> * config/i386/i386.md (l2):
> Enable for TARGET_SSE4_1 and generate rounds{s,d} and cvtts{s,d}2si{,q}
> sequence.
> (sse4_1_round2): Use nonimmediate_operand
> for operand 1 predicate.
>
> testsuite/ChangeLog:
>
> 2018-05-29  Uros Bizjak  
>
> PR target/85950
> * gcc.target/i386/pr85950.c: New test.
>
> Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
>
> Committed to mainline SVN.
>

The testcase needs -mno-avx512f.   Otherwise -march=native on AVX512
machine will generate

vrndscalesd $9, %xmm0, %xmm0, %xmm0
vcvttsd2siq %xmm0, %rax

instead of

vroundsd $9, %xmm0, %xmm0, %xmm0
vcvttsd2siq %xmm0, %rax


-- 
H.J.


[PATCH, i386]: Fix PR85950, Unsafe-math-optimizations regresses optimization using SSE4.1 roundss

2018-05-29 Thread Uros Bizjak
Hello!

Attached patch enables l2 for
TARGET_SSE4.1, and while there, also corrects operand 1 predicate of
rounds{s,d} instruction.

2018-05-29  Uros Bizjak  

PR target/85950
* config/i386/i386.md (l2):
Enable for TARGET_SSE4_1 and generate rounds{s,d} and cvtts{s,d}2si{,q}
sequence.
(sse4_1_round2): Use nonimmediate_operand
for operand 1 predicate.

testsuite/ChangeLog:

2018-05-29  Uros Bizjak  

PR target/85950
* gcc.target/i386/pr85950.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 260850)
+++ config/i386/i386.md (working copy)
@@ -16655,7 +16655,7 @@
 
 (define_insn "sse4_1_round2"
   [(set (match_operand:MODEF 0 "register_operand" "=x,v")
-   (unspec:MODEF [(match_operand:MODEF 1 "register_operand" "x,v")
+   (unspec:MODEF [(match_operand:MODEF 1 "nonimmediate_operand" "xm,vm")
   (match_operand:SI 2 "const_0_to_15_operand" "n,n")]
  UNSPEC_ROUND))]
   "TARGET_SSE4_1"
@@ -17251,12 +17251,19 @@
 FIST_ROUNDING))
  (clobber (reg:CC FLAGS_REG))])]
   "SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH
-   && !flag_trapping_math"
+   && (TARGET_SSE4_1 || !flag_trapping_math)"
 {
-  if (TARGET_64BIT && optimize_insn_for_size_p ())
-FAIL;
+  if (TARGET_SSE4_1)
+{
+  rtx tmp = gen_reg_rtx (mode);
 
-  if (ROUND_ == ROUND_FLOOR)
+  emit_insn (gen_sse4_1_round2
+(tmp, operands[1], GEN_INT (ROUND_
+| ROUND_NO_EXC)));
+  emit_insn (gen_fix_trunc2
+(operands[0], tmp));
+}
+  else if (ROUND_ == ROUND_FLOOR)
 ix86_expand_lfloorceil (operands[0], operands[1], true);
   else if (ROUND_ == ROUND_CEIL)
 ix86_expand_lfloorceil (operands[0], operands[1], false);
Index: testsuite/gcc.target/i386/pr85950.c
===
--- testsuite/gcc.target/i386/pr85950.c (nonexistent)
+++ testsuite/gcc.target/i386/pr85950.c (working copy)
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse4.1 -mfpmath=sse" } */
+
+double floor (double);
+double ceil (double);
+
+int ifloor (double x) { return floor (x); }
+int iceil (double x) { return ceil (x); }
+
+#ifdef __x86_64__
+long long llfloor (double x) { return floor (x); }
+long long llceil (double x) { return ceil (x); }
+#endif
+  
+/* { dg-final { scan-assembler-times "roundsd" 2 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "roundsd" 4 { target { ! ia32 } } } } */