[PATCH, i386]: Fix PR79593: Poor/Worse code generation for FPU on versions after 6

2017-02-21 Thread Uros Bizjak
Hello!

Attached patch fixes oversight in standard_x87sse_constant_load
splitter and its float-extend counterpart, where a FP reg-reg move
insn RTX can be tagged with REG_EQUIV or REG_EQUAL const_double RTX.

find_constant_src and ix86_standard_x87sse_constant_load_p predicate
are able to handle this situation, and patched splitters emit direct
constant load instead of a reg-reg move. This also lowers regstack
register pressure, as evident from the testcase:

--- pr79593.s_  2017-02-21 19:41:36.615740647 +0100
+++ pr79593.s   2017-02-21 19:41:47.251622966 +0100
@@ -15,21 +15,16 @@
fldz
 .L2:
fld1
-   fld %st(0)
-   fcomp   %st(2)
+   fcomp   %st(1)
fnstsw  %ax
sahf
-   jnb .L5
-   fstp%st(1)
-   jmp .L3
-   .p2align 4,,10
-   .p2align 3
-.L5:
+   jnb .L3
fstp%st(0)
+   fld1
 .L3:
rep ret
.cfi_endproc
 .LFE2:
.size   bar, .-bar
-   .ident  "GCC: (GNU) 7.0.0 20170117 (experimental) [trunk
revision 244540]"
+   .ident  "GCC: (GNU) 7.0.1 20170221 (experimental) [trunk
revision 245630]"
.section.note.GNU-stack,"",@progbits

Patched compiler also removed a jump to a BB where only compensating
regstack pop was emitted.

2017-02-21  Uros Bizjak  

PR target/79593
* config/i386/i386.md (standard_x87sse_constant_load splitter):
Use nonimmediate_operand instead of memory_operand for operand 1.
(float-extend standard_x87sse_constant_load splitter): Ditto.

testsuite/ChangeLog:

2017-02-21  Uros Bizjak  

PR target/79593
* gcc.target/i386/pr79593.c: New test.

Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.


Re: [PATCH, i386]: Fix PR79593: Poor/Worse code generation for FPU on versions after 6

2017-02-23 Thread Uros Bizjak
On Tue, Feb 21, 2017 at 7:52 PM, Uros Bizjak  wrote:
> Hello!
>
> Attached patch fixes oversight in standard_x87sse_constant_load
> splitter and its float-extend counterpart, where a FP reg-reg move
> insn RTX can be tagged with REG_EQUIV or REG_EQUAL const_double RTX.
>
> find_constant_src and ix86_standard_x87sse_constant_load_p predicate
> are able to handle this situation, and patched splitters emit direct
> constant load instead of a reg-reg move. This also lowers regstack
> register pressure, as evident from the testcase:
>
> --- pr79593.s_  2017-02-21 19:41:36.615740647 +0100
> +++ pr79593.s   2017-02-21 19:41:47.251622966 +0100
> @@ -15,21 +15,16 @@
> fldz
>  .L2:
> fld1
> -   fld %st(0)
> -   fcomp   %st(2)
> +   fcomp   %st(1)
> fnstsw  %ax
> sahf
> -   jnb .L5
> -   fstp%st(1)
> -   jmp .L3
> -   .p2align 4,,10
> -   .p2align 3
> -.L5:
> +   jnb .L3
> fstp%st(0)
> +   fld1
>  .L3:
> rep ret
> .cfi_endproc
>  .LFE2:
> .size   bar, .-bar
> -   .ident  "GCC: (GNU) 7.0.0 20170117 (experimental) [trunk
> revision 244540]"
> +   .ident  "GCC: (GNU) 7.0.1 20170221 (experimental) [trunk
> revision 245630]"
> .section.note.GNU-stack,"",@progbits
>
> Patched compiler also removed a jump to a BB where only compensating
> regstack pop was emitted.
>
> 2017-02-21  Uros Bizjak  
>
> PR target/79593
> * config/i386/i386.md (standard_x87sse_constant_load splitter):
> Use nonimmediate_operand instead of memory_operand for operand 1.
> (float-extend standard_x87sse_constant_load splitter): Ditto.
>
> testsuite/ChangeLog:
>
> 2017-02-21  Uros Bizjak  
>
> PR target/79593
> * gcc.target/i386/pr79593.c: New test.
>
> Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
>
> Committed to mainline SVN.

Now with a patch.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index cfbe0b0..23f2ea0 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -3660,7 +3660,7 @@
 
 (define_split
   [(set (match_operand 0 "any_fp_register_operand")
-   (match_operand 1 "memory_operand"))]
+   (match_operand 1 "nonimmediate_operand"))]
   "reload_completed
&& (GET_MODE (operands[0]) == TFmode
|| GET_MODE (operands[0]) == XFmode
@@ -3672,7 +3672,7 @@
 
 (define_split
   [(set (match_operand 0 "any_fp_register_operand")
-   (float_extend (match_operand 1 "memory_operand")))]
+   (float_extend (match_operand 1 "nonimmediate_operand")))]
   "reload_completed
&& (GET_MODE (operands[0]) == TFmode
|| GET_MODE (operands[0]) == XFmode
/* PR target/79593 */
/* { dg-do compile } */
/* { dg-options "-Ofast -mfpmath=387" } */

extern float global_data[1024];

static long double MIN (long double a, long double b) { return a < b ? a : b; }
static long double MAX (long double a, long double b) { return a > b ? a : b; }

float bar (void)
{
  long double delta = (global_data[0]);

  return (MIN (MAX (delta, 0.0l), 1.0l));
}

/* { dg-final { scan-assembler-not "fld\[ \t\]+%st" } } */