Re: inhibit the sincos optimization when the target has sin and cos instructions

2016-05-19 Thread Nathan Sidwell

On 05/19/16 14:42, Cesar Philippidis wrote:


+  "operands[2] = make_safe_from (operands[2], operands[0]);"



Please use { ... } rather than "" for readability. Ok  with that change.

nathan


Re: inhibit the sincos optimization when the target has sin and cos instructions

2016-05-19 Thread Nathan Sidwell

On 05/18/16 23:42, Cesar Philippidis wrote:


+(define_expand "sincossf3"
+  [(set (match_operand:SF 0 "nvptx_register_operand" "=R")
+(unspec:SF [(match_operand:SF 2 "nvptx_register_operand" "R")]
+   UNSPEC_COS))
+   (set (match_operand:SF 1 "nvptx_register_operand" "=R")
+(unspec:SF [(match_dup 2)] UNSPEC_SIN))]
+  "flag_unsafe_math_optimizations"
+{
+  emit_insn (gen_sinsf2 (operands[1], operands[2]));
+  emit_insn (gen_cossf2 (operands[0], operands[2]));
+
+  DONE;
+})


Why the emit_insn code?  that seems to be replicating the RTL
representation -- you're saying the same thing twice.


You've not answered this.  I always find it confusing when insn expansions say 
the same thing in the RTL represntation as the C fragment is generating.



My intent was to verify that I got the sin and cos arguments right,
i.e., make sure that this sincos expansion didn't mix up sin(x) with
cos(x). I guess I can create a test that uses vprintf and scans


That's a good idea, but I think you need  a much clearer testcase. It'd be 
better to check both the sin and cos outputs (and, that's going to be 
'interesting' making sure only one path gets to be seen by the sincos optimizer 
-- but not difficult.


nathan


Re: inhibit the sincos optimization when the target has sin and cos instructions

2016-05-19 Thread Cesar Philippidis
On 05/19/2016 04:29 AM, Alexander Monakov wrote:
> On Wed, 18 May 2016, Cesar Philippidis wrote:

> Note that the documentation suggests using 'make_safe_from' to concisely
> express conflict resolution:
> 
>> diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
>> index 33a4862..69bbb22 100644
>> --- a/gcc/config/nvptx/nvptx.md
>> +++ b/gcc/config/nvptx/nvptx.md
>> @@ -794,6 +794,24 @@
>>""
>>"%.\\tsqrt%#%t0\\t%0, %1;")
>>  
>> +(define_expand "sincossf3"
>> +  [(set (match_operand:SF 0 "nvptx_register_operand" "=R")
>> +(unspec:SF [(match_operand:SF 2 "nvptx_register_operand" "R")]
>> +   UNSPEC_COS))
>> +   (set (match_operand:SF 1 "nvptx_register_operand" "=R")
>> +(unspec:SF [(match_dup 2)] UNSPEC_SIN))]
>> +  "flag_unsafe_math_optimizations"
> 
> ... here instead of special-casing the conflict case in curly braces you can
> just write:
> 
> "operands[2] = make_safe_from (operands[2], operands[0]);"
> 
>> +{
>> +  if (REGNO (operands[0]) == REGNO (operands[2]))
>> +{
>> +  rtx tmp = gen_reg_rtx (GET_MODE (operands[2]));
>> +  emit_insn (gen_rtx_SET (tmp, operands[2]));
>> +  emit_insn (gen_sinsf2 (operands[1], tmp));
>> +  emit_insn (gen_cossf2 (operands[0], tmp));
>> +  DONE;
>> +}
>> +})

Done. Is this ok for trunk?

Cesar

2016-05-19  Cesar Philippidis  

	gcc/
	* config/nvptx/nvptx.md (sincossf3): New pattern.

	gcc/testsuite/
	* gcc.target/nvptx/sincos.c: New test.


diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 33a4862..1dd256d 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -794,6 +794,16 @@
   ""
   "%.\\tsqrt%#%t0\\t%0, %1;")
 
+(define_expand "sincossf3"
+  [(set (match_operand:SF 0 "nvptx_register_operand" "=R")
+	(unspec:SF [(match_operand:SF 2 "nvptx_register_operand" "R")]
+	   UNSPEC_COS))
+   (set (match_operand:SF 1 "nvptx_register_operand" "=R")
+	(unspec:SF [(match_dup 2)] UNSPEC_SIN))]
+  "flag_unsafe_math_optimizations"
+  "operands[2] = make_safe_from (operands[2], operands[0]);"
+)
+
 (define_insn "sinsf2"
   [(set (match_operand:SF 0 "nvptx_register_operand" "=R")
 	(unspec:SF [(match_operand:SF 1 "nvptx_register_operand" "R")]
diff --git a/gcc/testsuite/gcc.target/nvptx/sincos.c b/gcc/testsuite/gcc.target/nvptx/sincos.c
new file mode 100644
index 000..921ec41
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/sincos.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math" } */
+
+extern float sinf (float);
+extern float cosf (float);
+
+float
+sincos_add (float x)
+{
+  float s = sinf (x);
+  float c = cosf (x);
+
+  return s + c;
+}
+
+/* { dg-final { scan-assembler-times "sin.approx.f32" 1 } } */
+/* { dg-final { scan-assembler-times "cos.approx.f32" 1 } } */


Re: inhibit the sincos optimization when the target has sin and cos instructions

2016-05-19 Thread Alexander Monakov
On Wed, 18 May 2016, Cesar Philippidis wrote:
> >> +(define_expand "sincossf3"
> >> +  [(set (match_operand:SF 0 "nvptx_register_operand" "=R")
> >> +(unspec:SF [(match_operand:SF 2 "nvptx_register_operand" "R")]
> >> +   UNSPEC_COS))
> >> +   (set (match_operand:SF 1 "nvptx_register_operand" "=R")
> >> +(unspec:SF [(match_dup 2)] UNSPEC_SIN))]
> >> +  "flag_unsafe_math_optimizations"
> >> +{
> >> +  emit_insn (gen_sinsf2 (operands[1], operands[2]));
> >> +  emit_insn (gen_cossf2 (operands[0], operands[2]));
> >> +
> >> +  DONE;
> >> +})
> > 
> > Why the emit_insn code?  that seems to be replicating the RTL
> > representation -- you're saying the same thing twice.
> > 
> > Doesn't operands[2] need (conditionally) copying to a new register --
> > what if it aliases operands[1]?
> 
> This patch does that now.

Note that the documentation suggests using 'make_safe_from' to concisely
express conflict resolution:

> diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
> index 33a4862..69bbb22 100644
> --- a/gcc/config/nvptx/nvptx.md
> +++ b/gcc/config/nvptx/nvptx.md
> @@ -794,6 +794,24 @@
>""
>"%.\\tsqrt%#%t0\\t%0, %1;")
>  
> +(define_expand "sincossf3"
> +  [(set (match_operand:SF 0 "nvptx_register_operand" "=R")
> + (unspec:SF [(match_operand:SF 2 "nvptx_register_operand" "R")]
> +UNSPEC_COS))
> +   (set (match_operand:SF 1 "nvptx_register_operand" "=R")
> + (unspec:SF [(match_dup 2)] UNSPEC_SIN))]
> +  "flag_unsafe_math_optimizations"

... here instead of special-casing the conflict case in curly braces you can
just write:

"operands[2] = make_safe_from (operands[2], operands[0]);"

> +{
> +  if (REGNO (operands[0]) == REGNO (operands[2]))
> +{
> +  rtx tmp = gen_reg_rtx (GET_MODE (operands[2]));
> +  emit_insn (gen_rtx_SET (tmp, operands[2]));
> +  emit_insn (gen_sinsf2 (operands[1], tmp));
> +  emit_insn (gen_cossf2 (operands[0], tmp));
> +  DONE;
> +}
> +})

Alexander


Re: inhibit the sincos optimization when the target has sin and cos instructions

2016-05-18 Thread Cesar Philippidis
On 05/18/2016 05:29 AM, Nathan Sidwell wrote:
> On 05/17/16 17:30, Cesar Philippidis wrote:
>> On 05/17/2016 02:22 PM, Andrew Pinski wrote:

 gcc.sum
 Tests that now fail, but worked before:

 nvptx-none-run: gcc.c-torture/execute/20100316-1.c   -Os  execution
 test
 nvptx-none-run: gcc.c-torture/execute/20100708-1.c   -O1  execution
 test
 nvptx-none-run: gcc.c-torture/execute/20100805-1.c   -O0  execution
 test
 nvptx-none-run: gcc.dg/torture/pr52028.c   -O3 -fomit-frame-pointer
 -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
 nvptx-none-run: gcc.dg/torture/pr52028.c   -O3 -g  execution test

> 
> Please determine why these now fail.

Those were failing intermittently, at least on my desktop. I'll look
into that it next.

>> +(define_expand "sincossf3"
>> +  [(set (match_operand:SF 0 "nvptx_register_operand" "=R")
>> +(unspec:SF [(match_operand:SF 2 "nvptx_register_operand" "R")]
>> +   UNSPEC_COS))
>> +   (set (match_operand:SF 1 "nvptx_register_operand" "=R")
>> +(unspec:SF [(match_dup 2)] UNSPEC_SIN))]
>> +  "flag_unsafe_math_optimizations"
>> +{
>> +  emit_insn (gen_sinsf2 (operands[1], operands[2]));
>> +  emit_insn (gen_cossf2 (operands[0], operands[2]));
>> +
>> +  DONE;
>> +})
> 
> Why the emit_insn code?  that seems to be replicating the RTL
> representation -- you're saying the same thing twice.
> 
> Doesn't operands[2] need (conditionally) copying to a new register --
> what if it aliases operands[1]?

This patch does that now.

>> +++ b/gcc/testsuite/gcc.target/nvptx/sincos-2.c
>> @@ -0,0 +1,30 @@
>> +/* { dg-do run } */
>> +/* { dg-options "-O2 -ffast-math" } */
>> +
> 
> What is this test trying to test?  I'm puzzled by it.  (btw, don't use
> assert, either abort, exit(1) or return from main.)

My intent was to verify that I got the sin and cos arguments right,
i.e., make sure that this sincos expansion didn't mix up sin(x) with
cos(x). I guess I can create a test that uses vprintf and scans
dg-output for the proper results. But in this patch I just omitted that
test case altogether.

Is this patch ok for trunk?

Cesar

2016-05-18  Cesar Philippidis  

	gcc/
	* config/nvptx/nvptx.md (sincossf3): New pattern.

	gcc/testsuite/
	* gcc.target/nvptx/sincos.c: New test.


diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 33a4862..69bbb22 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -794,6 +794,24 @@
   ""
   "%.\\tsqrt%#%t0\\t%0, %1;")
 
+(define_expand "sincossf3"
+  [(set (match_operand:SF 0 "nvptx_register_operand" "=R")
+	(unspec:SF [(match_operand:SF 2 "nvptx_register_operand" "R")]
+	   UNSPEC_COS))
+   (set (match_operand:SF 1 "nvptx_register_operand" "=R")
+	(unspec:SF [(match_dup 2)] UNSPEC_SIN))]
+  "flag_unsafe_math_optimizations"
+{
+  if (REGNO (operands[0]) == REGNO (operands[2]))
+{
+  rtx tmp = gen_reg_rtx (GET_MODE (operands[2]));
+  emit_insn (gen_rtx_SET (tmp, operands[2]));
+  emit_insn (gen_sinsf2 (operands[1], tmp));
+  emit_insn (gen_cossf2 (operands[0], tmp));
+  DONE;
+}
+})
+
 (define_insn "sinsf2"
   [(set (match_operand:SF 0 "nvptx_register_operand" "=R")
 	(unspec:SF [(match_operand:SF 1 "nvptx_register_operand" "R")]
diff --git a/gcc/testsuite/gcc.target/nvptx/sincos.c b/gcc/testsuite/gcc.target/nvptx/sincos.c
new file mode 100644
index 000..921ec41
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/sincos.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math" } */
+
+extern float sinf (float);
+extern float cosf (float);
+
+float
+sincos_add (float x)
+{
+  float s = sinf (x);
+  float c = cosf (x);
+
+  return s + c;
+}
+
+/* { dg-final { scan-assembler-times "sin.approx.f32" 1 } } */
+/* { dg-final { scan-assembler-times "cos.approx.f32" 1 } } */


Re: inhibit the sincos optimization when the target has sin and cos instructions

2016-05-18 Thread Nathan Sidwell

On 05/17/16 17:30, Cesar Philippidis wrote:

On 05/17/2016 02:22 PM, Andrew Pinski wrote:



Good eyes, thanks! I thought I had to create a new insn, but I got away
with an expand. I attached the updated patch.

Cesar



gcc.sum
Tests that now fail, but worked before:

nvptx-none-run: gcc.c-torture/execute/20100316-1.c   -Os  execution test
nvptx-none-run: gcc.c-torture/execute/20100708-1.c   -O1  execution test
nvptx-none-run: gcc.c-torture/execute/20100805-1.c   -O0  execution test
nvptx-none-run: gcc.dg/torture/pr52028.c   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
nvptx-none-run: gcc.dg/torture/pr52028.c   -O3 -g  execution test



Please determine why these now fail.


+(define_expand "sincossf3"
+  [(set (match_operand:SF 0 "nvptx_register_operand" "=R")
+   (unspec:SF [(match_operand:SF 2 "nvptx_register_operand" "R")]
+  UNSPEC_COS))
+   (set (match_operand:SF 1 "nvptx_register_operand" "=R")
+   (unspec:SF [(match_dup 2)] UNSPEC_SIN))]
+  "flag_unsafe_math_optimizations"
+{
+  emit_insn (gen_sinsf2 (operands[1], operands[2]));
+  emit_insn (gen_cossf2 (operands[0], operands[2]));
+
+  DONE;
+})


Why the emit_insn code?  that seems to be replicating the RTL representation -- 
you're saying the same thing twice.


Doesn't operands[2] need (conditionally) copying to a new register -- what if it 
aliases operands[1]?



+++ b/gcc/testsuite/gcc.target/nvptx/sincos-2.c
@@ -0,0 +1,30 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -ffast-math" } */
+


What is this test trying to test?  I'm puzzled by it.  (btw, don't use assert, 
either abort, exit(1) or return from main.)


nathan


Re: inhibit the sincos optimization when the target has sin and cos instructions

2016-05-17 Thread Cesar Philippidis
On 05/17/2016 02:22 PM, Andrew Pinski wrote:
> On Tue, May 17, 2016 at 2:10 PM, Cesar Philippidis
>  wrote:
>> On 05/13/2016 01:13 PM, Andrew Pinski wrote:
>>> On Fri, May 13, 2016 at 12:58 PM, Richard Biener
>>>  wrote:
 On May 13, 2016 9:18:57 PM GMT+02:00, Cesar Philippidis 
  wrote:
> The cse_sincos pass tries to optimize sequences such as
>
>  sin (x);
>  cos (x);
>
> into a single call to sincos, or cexpi, when available. However, the
> nvptx target has sin and cos instructions, albeit with some loss of
> precision (so it's only enabled with -ffast-math). This patch teaches
> cse_sincos pass to ignore sin, cos and cexpi instructions when the
> target can expand those calls. This yields a 6x speedup in 314.omriq
 >from spec accel when running on Nvidia accelerators.
>
> Is this OK for trunk?

 Isn't there an optab for sincos?
>>>
>>> This is exactly what I was going to suggest.  This transformation
>>> should be done in the back-end back to sin/cos instructions.
>>
>> I didn't realize that the 387 has sin, cos and sincos instructions,
>> so yeah, my original patch is bad.
>>
>> Nathan, is this patch ok for trunk and gcc-6? It adds a new sincos
>> pattern in the nvptx backend. I haven't testing a standalone nvptx
>> toolchain prior to this patch, so I'm not sure if my test results
>> look sane. I seem to be getting a different set of failures when I
>> test a clean trunk build multiple times. I attached my results
>> below for reference.
> 
> 
> UNSPEC_SINCOS is unused so why add it?

Good eyes, thanks! I thought I had to create a new insn, but I got away
with an expand. I attached the updated patch.

Cesar

>> g++.sum
>> Tests that now fail, but worked before:
>>
>> nvptx-none-run: g++.dg/abi/param1.C  -std=c++14 execution test
>>
>> Tests that now work, but didn't before:
>>
>> nvptx-none-run: g++.dg/opt/pr30590.C  -std=gnu++98 execution test
>> nvptx-none-run: g++.dg/opt/pr36187.C  -std=gnu++14 execution test
>>
>> gfortran.sum
>> Tests that now fail, but worked before:
>>
>> nvptx-none-run: gfortran.dg/alloc_comp_assign_10.f90   -O3 
>> -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions 
>>  execution test
>> nvptx-none-run: gfortran.dg/allocate_with_source_5.f90   -O1  execution test
>> nvptx-none-run: gfortran.dg/func_assign_3.f90   -O3 -g  execution test
>> nvptx-none-run: gfortran.dg/inline_sum_3.f90   -O1  execution test
>> nvptx-none-run: gfortran.dg/inline_sum_3.f90   -O3 -g  execution test
>> nvptx-none-run: gfortran.dg/internal_pack_15.f90   -O2  execution test
>> nvptx-none-run: gfortran.dg/internal_pack_8.f90   -Os  execution test
>> nvptx-none-run: gfortran.dg/intrinsic_ifunction_2.f90   -O0  execution test
>> nvptx-none-run: gfortran.dg/intrinsic_ifunction_2.f90   -O3 
>> -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions 
>>  execution test
>> nvptx-none-run: gfortran.dg/intrinsic_pack_5.f90   -O3 -g  execution test
>> nvptx-none-run: gfortran.dg/intrinsic_product_1.f90   -O1  execution test
>> nvptx-none-run: gfortran.dg/intrinsic_verify_1.f90   -O3 -g  execution test
>> nvptx-none-run: gfortran.dg/is_iostat_end_eor_1.f90   -O3 
>> -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions 
>>  execution test
>> nvptx-none-run: gfortran.dg/iso_c_binding_rename_1.f03   -O3 
>> -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions 
>>  execution test
>>
>> Tests that now work, but didn't before:
>>
>> nvptx-none-run: gfortran.dg/char_pointer_assign.f90   -O3 
>> -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions 
>>  execution test
>> nvptx-none-run: gfortran.dg/char_pointer_dummy.f90   -O1  execution test
>> nvptx-none-run: gfortran.dg/char_pointer_dummy.f90   -Os  execution test
>> nvptx-none-run: gfortran.dg/char_result_13.f90   -O3 -g  execution test
>> nvptx-none-run: gfortran.dg/char_result_2.f90   -O1  execution test
>> nvptx-none-run: gfortran.dg/char_type_len.f90   -Os  execution test
>> nvptx-none-run: gfortran.dg/character_array_constructor_1.f90   -O0  
>> execution test
>> nvptx-none-run: gfortran.dg/nested_allocatables_1.f90   -O3 
>> -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions 
>>  execution test
>>
>> gcc.sum
>> Tests that now fail, but worked before:
>>
>> nvptx-none-run: gcc.c-torture/execute/20100316-1.c   -Os  execution test
>> nvptx-none-run: gcc.c-torture/execute/20100708-1.c   -O1  execution test
>> nvptx-none-run: gcc.c-torture/execute/20100805-1.c   -O0  execution test
>> nvptx-none-run: gcc.dg/torture/pr52028.c   -O3 -fomit-frame-pointer 
>> -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
>> nvptx-none-run: gcc.dg/torture/pr52028.c   -O3 -g  execution test
>>
>> Tests that now work, but didn't before:
>>
>> nvptx-none-run: 

Re: inhibit the sincos optimization when the target has sin and cos instructions

2016-05-17 Thread Andrew Pinski
On Tue, May 17, 2016 at 2:10 PM, Cesar Philippidis
 wrote:
> On 05/13/2016 01:13 PM, Andrew Pinski wrote:
>> On Fri, May 13, 2016 at 12:58 PM, Richard Biener
>>  wrote:
>>> On May 13, 2016 9:18:57 PM GMT+02:00, Cesar Philippidis 
>>>  wrote:
 The cse_sincos pass tries to optimize sequences such as

  sin (x);
  cos (x);

 into a single call to sincos, or cexpi, when available. However, the
 nvptx target has sin and cos instructions, albeit with some loss of
 precision (so it's only enabled with -ffast-math). This patch teaches
 cse_sincos pass to ignore sin, cos and cexpi instructions when the
 target can expand those calls. This yields a 6x speedup in 314.omriq
>>> >from spec accel when running on Nvidia accelerators.

 Is this OK for trunk?
>>>
>>> Isn't there an optab for sincos?
>>
>> This is exactly what I was going to suggest.  This transformation
>> should be done in the back-end back to sin/cos instructions.
>
> I didn't realize that the 387 has sin, cos and sincos instructions,
> so yeah, my original patch is bad.
>
> Nathan, is this patch ok for trunk and gcc-6? It adds a new sincos
> pattern in the nvptx backend. I haven't testing a standalone nvptx
> toolchain prior to this patch, so I'm not sure if my test results
> look sane. I seem to be getting a different set of failures when I
> test a clean trunk build multiple times. I attached my results
> below for reference.


UNSPEC_SINCOS is unused so why add it?

Thanks,
Andrew Pinski


>
> Cesar
>
> g++.sum
> Tests that now fail, but worked before:
>
> nvptx-none-run: g++.dg/abi/param1.C  -std=c++14 execution test
>
> Tests that now work, but didn't before:
>
> nvptx-none-run: g++.dg/opt/pr30590.C  -std=gnu++98 execution test
> nvptx-none-run: g++.dg/opt/pr36187.C  -std=gnu++14 execution test
>
> gfortran.sum
> Tests that now fail, but worked before:
>
> nvptx-none-run: gfortran.dg/alloc_comp_assign_10.f90   -O3 
> -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  
> execution test
> nvptx-none-run: gfortran.dg/allocate_with_source_5.f90   -O1  execution test
> nvptx-none-run: gfortran.dg/func_assign_3.f90   -O3 -g  execution test
> nvptx-none-run: gfortran.dg/inline_sum_3.f90   -O1  execution test
> nvptx-none-run: gfortran.dg/inline_sum_3.f90   -O3 -g  execution test
> nvptx-none-run: gfortran.dg/internal_pack_15.f90   -O2  execution test
> nvptx-none-run: gfortran.dg/internal_pack_8.f90   -Os  execution test
> nvptx-none-run: gfortran.dg/intrinsic_ifunction_2.f90   -O0  execution test
> nvptx-none-run: gfortran.dg/intrinsic_ifunction_2.f90   -O3 
> -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  
> execution test
> nvptx-none-run: gfortran.dg/intrinsic_pack_5.f90   -O3 -g  execution test
> nvptx-none-run: gfortran.dg/intrinsic_product_1.f90   -O1  execution test
> nvptx-none-run: gfortran.dg/intrinsic_verify_1.f90   -O3 -g  execution test
> nvptx-none-run: gfortran.dg/is_iostat_end_eor_1.f90   -O3 
> -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  
> execution test
> nvptx-none-run: gfortran.dg/iso_c_binding_rename_1.f03   -O3 
> -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  
> execution test
>
> Tests that now work, but didn't before:
>
> nvptx-none-run: gfortran.dg/char_pointer_assign.f90   -O3 
> -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  
> execution test
> nvptx-none-run: gfortran.dg/char_pointer_dummy.f90   -O1  execution test
> nvptx-none-run: gfortran.dg/char_pointer_dummy.f90   -Os  execution test
> nvptx-none-run: gfortran.dg/char_result_13.f90   -O3 -g  execution test
> nvptx-none-run: gfortran.dg/char_result_2.f90   -O1  execution test
> nvptx-none-run: gfortran.dg/char_type_len.f90   -Os  execution test
> nvptx-none-run: gfortran.dg/character_array_constructor_1.f90   -O0  
> execution test
> nvptx-none-run: gfortran.dg/nested_allocatables_1.f90   -O3 
> -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  
> execution test
>
> gcc.sum
> Tests that now fail, but worked before:
>
> nvptx-none-run: gcc.c-torture/execute/20100316-1.c   -Os  execution test
> nvptx-none-run: gcc.c-torture/execute/20100708-1.c   -O1  execution test
> nvptx-none-run: gcc.c-torture/execute/20100805-1.c   -O0  execution test
> nvptx-none-run: gcc.dg/torture/pr52028.c   -O3 -fomit-frame-pointer 
> -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
> nvptx-none-run: gcc.dg/torture/pr52028.c   -O3 -g  execution test
>
> Tests that now work, but didn't before:
>
> nvptx-none-run: gcc.c-torture/execute/20091229-1.c   -O3 -g  execution test
> nvptx-none-run: gcc.c-torture/execute/20101013-1.c   -Os  execution test
> nvptx-none-run: gcc.c-torture/execute/20101025-1.c   -Os  execution test
> nvptx-none-run: gcc.c-torture/execute/20120105-1.c   

Re: inhibit the sincos optimization when the target has sin and cos instructions

2016-05-17 Thread Cesar Philippidis
On 05/13/2016 01:13 PM, Andrew Pinski wrote:
> On Fri, May 13, 2016 at 12:58 PM, Richard Biener
>  wrote:
>> On May 13, 2016 9:18:57 PM GMT+02:00, Cesar Philippidis 
>>  wrote:
>>> The cse_sincos pass tries to optimize sequences such as
>>>
>>>  sin (x);
>>>  cos (x);
>>>
>>> into a single call to sincos, or cexpi, when available. However, the
>>> nvptx target has sin and cos instructions, albeit with some loss of
>>> precision (so it's only enabled with -ffast-math). This patch teaches
>>> cse_sincos pass to ignore sin, cos and cexpi instructions when the
>>> target can expand those calls. This yields a 6x speedup in 314.omriq
>> >from spec accel when running on Nvidia accelerators.
>>>
>>> Is this OK for trunk?
>>
>> Isn't there an optab for sincos?
> 
> This is exactly what I was going to suggest.  This transformation
> should be done in the back-end back to sin/cos instructions.

I didn't realize that the 387 has sin, cos and sincos instructions,
so yeah, my original patch is bad.

Nathan, is this patch ok for trunk and gcc-6? It adds a new sincos 
pattern in the nvptx backend. I haven't testing a standalone nvptx 
toolchain prior to this patch, so I'm not sure if my test results 
look sane. I seem to be getting a different set of failures when I
test a clean trunk build multiple times. I attached my results
below for reference.

Cesar

g++.sum
Tests that now fail, but worked before:

nvptx-none-run: g++.dg/abi/param1.C  -std=c++14 execution test

Tests that now work, but didn't before:

nvptx-none-run: g++.dg/opt/pr30590.C  -std=gnu++98 execution test
nvptx-none-run: g++.dg/opt/pr36187.C  -std=gnu++14 execution test

gfortran.sum
Tests that now fail, but worked before:

nvptx-none-run: gfortran.dg/alloc_comp_assign_10.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
nvptx-none-run: gfortran.dg/allocate_with_source_5.f90   -O1  execution test
nvptx-none-run: gfortran.dg/func_assign_3.f90   -O3 -g  execution test
nvptx-none-run: gfortran.dg/inline_sum_3.f90   -O1  execution test
nvptx-none-run: gfortran.dg/inline_sum_3.f90   -O3 -g  execution test
nvptx-none-run: gfortran.dg/internal_pack_15.f90   -O2  execution test
nvptx-none-run: gfortran.dg/internal_pack_8.f90   -Os  execution test
nvptx-none-run: gfortran.dg/intrinsic_ifunction_2.f90   -O0  execution test
nvptx-none-run: gfortran.dg/intrinsic_ifunction_2.f90   -O3 
-fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  
execution test
nvptx-none-run: gfortran.dg/intrinsic_pack_5.f90   -O3 -g  execution test
nvptx-none-run: gfortran.dg/intrinsic_product_1.f90   -O1  execution test
nvptx-none-run: gfortran.dg/intrinsic_verify_1.f90   -O3 -g  execution test
nvptx-none-run: gfortran.dg/is_iostat_end_eor_1.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
nvptx-none-run: gfortran.dg/iso_c_binding_rename_1.f03   -O3 
-fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  
execution test

Tests that now work, but didn't before:

nvptx-none-run: gfortran.dg/char_pointer_assign.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
nvptx-none-run: gfortran.dg/char_pointer_dummy.f90   -O1  execution test
nvptx-none-run: gfortran.dg/char_pointer_dummy.f90   -Os  execution test
nvptx-none-run: gfortran.dg/char_result_13.f90   -O3 -g  execution test
nvptx-none-run: gfortran.dg/char_result_2.f90   -O1  execution test
nvptx-none-run: gfortran.dg/char_type_len.f90   -Os  execution test
nvptx-none-run: gfortran.dg/character_array_constructor_1.f90   -O0  execution 
test
nvptx-none-run: gfortran.dg/nested_allocatables_1.f90   -O3 
-fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  
execution test

gcc.sum
Tests that now fail, but worked before:

nvptx-none-run: gcc.c-torture/execute/20100316-1.c   -Os  execution test
nvptx-none-run: gcc.c-torture/execute/20100708-1.c   -O1  execution test
nvptx-none-run: gcc.c-torture/execute/20100805-1.c   -O0  execution test
nvptx-none-run: gcc.dg/torture/pr52028.c   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
nvptx-none-run: gcc.dg/torture/pr52028.c   -O3 -g  execution test

Tests that now work, but didn't before:

nvptx-none-run: gcc.c-torture/execute/20091229-1.c   -O3 -g  execution test
nvptx-none-run: gcc.c-torture/execute/20101013-1.c   -Os  execution test
nvptx-none-run: gcc.c-torture/execute/20101025-1.c   -Os  execution test
nvptx-none-run: gcc.c-torture/execute/20120105-1.c   -O0  execution test
nvptx-none-run: gcc.c-torture/execute/20120111-1.c   -O0  execution test

New tests that PASS:

nvptx-none-run: gcc.target/nvptx/sincos-1.c (test for excess errors)
nvptx-none-run: gcc.target/nvptx/sincos-1.c scan-assembler-times cos.approx.f32 
1
nvptx-none-run: gcc.target/nvptx/sincos-1.c 

Re: inhibit the sincos optimization when the target has sin and cos instructions

2016-05-13 Thread Andrew Pinski
On Fri, May 13, 2016 at 12:58 PM, Richard Biener
 wrote:
> On May 13, 2016 9:18:57 PM GMT+02:00, Cesar Philippidis 
>  wrote:
>>The cse_sincos pass tries to optimize sequences such as
>>
>>  sin (x);
>>  cos (x);
>>
>>into a single call to sincos, or cexpi, when available. However, the
>>nvptx target has sin and cos instructions, albeit with some loss of
>>precision (so it's only enabled with -ffast-math). This patch teaches
>>cse_sincos pass to ignore sin, cos and cexpi instructions when the
>>target can expand those calls. This yields a 6x speedup in 314.omriq
>>from spec accel when running on Nvidia accelerators.
>>
>>Is this OK for trunk?
>
> Isn't there an optab for sincos?

This is exactly what I was going to suggest.  This transformation
should be done in the back-end back to sin/cos instructions.

Thanks,
Andrew

> ISTR x87 handles this pass just fine and also can do sin and cos.
>
> Richard.
>
>>Cesar
>
>


Re: inhibit the sincos optimization when the target has sin and cos instructions

2016-05-13 Thread Richard Biener
On May 13, 2016 9:18:57 PM GMT+02:00, Cesar Philippidis 
 wrote:
>The cse_sincos pass tries to optimize sequences such as
>
>  sin (x);
>  cos (x);
>
>into a single call to sincos, or cexpi, when available. However, the
>nvptx target has sin and cos instructions, albeit with some loss of
>precision (so it's only enabled with -ffast-math). This patch teaches
>cse_sincos pass to ignore sin, cos and cexpi instructions when the
>target can expand those calls. This yields a 6x speedup in 314.omriq
>from spec accel when running on Nvidia accelerators.
>
>Is this OK for trunk?

Isn't there an optab for sincos?  ISTR x87 handles this pass just fine and also 
can do sin and cos.

Richard.

>Cesar




inhibit the sincos optimization when the target has sin and cos instructions

2016-05-13 Thread Cesar Philippidis
The cse_sincos pass tries to optimize sequences such as

  sin (x);
  cos (x);

into a single call to sincos, or cexpi, when available. However, the
nvptx target has sin and cos instructions, albeit with some loss of
precision (so it's only enabled with -ffast-math). This patch teaches
cse_sincos pass to ignore sin, cos and cexpi instructions when the
target can expand those calls. This yields a 6x speedup in 314.omriq
from spec accel when running on Nvidia accelerators.

Is this OK for trunk?

Cesar
2016-05-13  Cesar Philippidis  

	gcc/
	* tree-ssa-math-opts.c (pass_cse_sincos::execute): Don't optimize
	sin and cos calls when the target has instructions for them.
	
	gcc/testsuite/
	* gcc.target/nvptx/sincos.c: New test.


diff --git a/gcc/testsuite/gcc.target/nvptx/sincos.c b/gcc/testsuite/gcc.target/nvptx/sincos.c
new file mode 100644
index 000..921ec41
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/sincos.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math" } */
+
+extern float sinf (float);
+extern float cosf (float);
+
+float
+sincos_add (float x)
+{
+  float s = sinf (x);
+  float c = cosf (x);
+
+  return s + c;
+}
+
+/* { dg-final { scan-assembler-times "sin.approx.f32" 1 } } */
+/* { dg-final { scan-assembler-times "cos.approx.f32" 1 } } */
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index 81688cd..38051e1 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -1806,6 +1806,11 @@ pass_cse_sincos::execute (function *fun)
 		CASE_CFN_COS:
 		CASE_CFN_SIN:
 		CASE_CFN_CEXPI:
+		  /* Don't modify these calls if they can be translated
+		 directly into hardware instructions.  */
+		  if (replacement_internal_fn (as_a  (stmt))
+		  != IFN_LAST)
+		break;
 		  /* Make sure we have either sincos or cexp.  */
 		  if (!targetm.libc_has_function (function_c99_math_complex)
 		  && !targetm.libc_has_function (function_sincos))