Re: [PATCH, dataflow]: Fix PR55845, 454.calculix miscompares on x86 AVX due to movement of vzeroupper

2013-01-08 Thread Richard Henderson
On 01/07/2013 03:26 PM, Jakub Jelinek wrote:
> 2012-01-08  Jakub Jelinek  
>   Uros Bizjak  
> 
>   PR rtl-optimization/55845
>   * df-problems.c (can_move_insns_across): Stop scanning at
>   volatile_insn_p source instruction or give up if
>   across_from .. across_to range contains any volatile_insn_p
>   instructions.

Ok.


r~


Re: [PATCH, dataflow]: Fix PR55845, 454.calculix miscompares on x86 AVX due to movement of vzeroupper

2013-01-07 Thread Uros Bizjak
On Tue, Jan 8, 2013 at 12:26 AM, Jakub Jelinek  wrote:
> On Mon, Jan 07, 2013 at 05:52:23PM +0100, Uros Bizjak wrote:
>> TBH, I'm not that familiar with the RTL infrastructure enough to
>> answer these questions. While I can spend some time on this problem,
>> and probably waste quite some reviewer's time, the problem is not that
>> trivial as I hoped to be, so I would kindly ask someone with better
>> understanding of this part of the compiler for the proper solution.
>
> After discussion with rth on IRC, this modified patch just uses
> volatile_insn_p, making all UNSPEC_VOLATILE (wherever in insn) and asm
> volatile into a complete scheduling barrier for optimizations that use this
> function.

Thanks!

Just two little nits in the testcase:

> +foo (int size, double y[], double x[])

foo (int size, double *y, double *x)

> +  return (sum);

return sum;

Uros.


Re: [PATCH, dataflow]: Fix PR55845, 454.calculix miscompares on x86 AVX due to movement of vzeroupper

2013-01-07 Thread Jakub Jelinek
On Mon, Jan 07, 2013 at 05:52:23PM +0100, Uros Bizjak wrote:
> TBH, I'm not that familiar with the RTL infrastructure enough to
> answer these questions. While I can spend some time on this problem,
> and probably waste quite some reviewer's time, the problem is not that
> trivial as I hoped to be, so I would kindly ask someone with better
> understanding of this part of the compiler for the proper solution.

After discussion with rth on IRC, this modified patch just uses
volatile_insn_p, making all UNSPEC_VOLATILE (wherever in insn) and asm
volatile into a complete scheduling barrier for optimizations that use this
function.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2012-01-08  Jakub Jelinek  
Uros Bizjak  

PR rtl-optimization/55845
* df-problems.c (can_move_insns_across): Stop scanning at
volatile_insn_p source instruction or give up if
across_from .. across_to range contains any volatile_insn_p
instructions.

2012-01-08  Uros Bizjak  
Vladimir Yakovlev  

PR rtl-optimization/55845
* gcc.target/i386/pr55845.c: New test.

--- gcc/df-problems.c.jj2012-11-19 14:41:26.181898964 +0100
+++ gcc/df-problems.c   2013-01-07 18:38:33.064974313 +0100
@@ -3858,6 +3858,8 @@ can_move_insns_across (rtx from, rtx to,
}
   if (NONDEBUG_INSN_P (insn))
{
+ if (volatile_insn_p (PATTERN (insn)))
+   return false;
  memrefs_in_across |= for_each_rtx (&PATTERN (insn), find_memory,
 NULL);
  note_stores (PATTERN (insn), find_memory_stores,
@@ -3917,7 +3919,9 @@ can_move_insns_across (rtx from, rtx to,
   if (NONDEBUG_INSN_P (insn))
{
  if (may_trap_or_fault_p (PATTERN (insn))
- && (trapping_insns_in_across || other_branch_live != NULL))
+ && (trapping_insns_in_across
+ || other_branch_live != NULL
+ || volatile_insn_p (PATTERN (insn
break;
 
  /* We cannot move memory stores past each other, or move memory
--- gcc/testsuite/gcc.target/i386/pr55845.c.jj  2013-01-07 18:30:19.168801389 
+0100
+++ gcc/testsuite/gcc.target/i386/pr55845.c 2013-01-07 18:30:19.168801389 
+0100
@@ -0,0 +1,39 @@
+/* { dg-do run } */
+/* { dg-require-effective-target avx } */
+/* { dg-options "-O3 -ffast-math -fschedule-insns -mavx -mvzeroupper" } */
+
+#include "avx-check.h"
+
+#define N 100
+
+double
+__attribute__((noinline))
+foo (int size, double y[], double x[])
+{
+  double sum = 0.0;
+  int i;
+  for (i = 0, sum = 0.; i < size; i++)
+sum += y[i] * x[i];
+  return (sum);
+}
+
+static void
+__attribute__ ((noinline))
+avx_test ()
+{
+  double x[N];
+  double y[N];
+  double s;
+  int i;
+
+  for (i = 0; i < N; i++)
+{
+  x[i] = i;
+  y[i] = i;
+}
+
+  s = foo (N, y, x);
+
+  if (s != 328350.0)
+abort ();
+}


Jakub


Re: [PATCH, dataflow]: Fix PR55845, 454.calculix miscompares on x86 AVX due to movement of vzeroupper

2013-01-07 Thread Uros Bizjak
On Sun, Jan 6, 2013 at 5:22 PM, Jakub Jelinek  wrote:

>> --- df-problems.c (revision 194945)
>> +++ df-problems.c (working copy)
>> @@ -3916,6 +3916,10 @@ can_move_insns_across (rtx from, rtx to, rtx acros
>>   break;
>>if (NONDEBUG_INSN_P (insn))
>>   {
>> +   /* Do not move unspec_volatile insns.  */
>> +   if (GET_CODE (PATTERN (insn)) == UNSPEC_VOLATILE)
>> + break;
>> +
>
> Shouldn't UNSPEC_VOLATILE be handled similarly in the across_from ..
> across_to loop?  Both UNSPEC_VOLATILE and volatile asm are handled there
> just with
> trapping_insns_in_across |= may_trap_p (PATTERN (insn));
> but your new change doesn't prevent moving just trapping insns across
> UNSPEC_VOLATILE, but any insns whatsoever.  So supposedly for UNSPEC_VOLATILE
> the first loop should just return false; (or fail = 1; ?).
> For asm volatile I guess the code is fine as is, it must always describe
> what exactly it modifies, so supposedly non-trapping insns can be moved
> across asm volatile.
>
>> if (may_trap_or_fault_p (PATTERN (insn))
>> && (trapping_insns_in_across || other_branch_live != NULL))
>>   break;
>
> You could do the check only for may_trap_or_fault_p, all UNSPEC_VOLATILE
> may trap.
>
> BTW, can't UNSPEC_VOLATILE be embedded deeply in the pattern?
> So volatile_insn_p (insn) && asm_noperands (PATTERN (insn)) == -1?
> But perhaps you want to treat that way only UNSPEC_VOLATILE directly in the
> pattern and all other UNSPEC_VOLATILE insns must describe in detail what
> exactly they are changing?  This really needs to be better documented.

TBH, I'm not that familiar with the RTL infrastructure enough to
answer these questions. While I can spend some time on this problem,
and probably waste quite some reviewer's time, the problem is not that
trivial as I hoped to be, so I would kindly ask someone with better
understanding of this part of the compiler for the proper solution.

Uros.


Re: [PATCH, dataflow]: Fix PR55845, 454.calculix miscompares on x86 AVX due to movement of vzeroupper

2013-01-06 Thread Eric Botcazou
> BTW, can't UNSPEC_VOLATILE be embedded deeply in the pattern?
> So volatile_insn_p (insn) && asm_noperands (PATTERN (insn)) == -1?
> But perhaps you want to treat that way only UNSPEC_VOLATILE directly in the
> pattern and all other UNSPEC_VOLATILE insns must describe in detail what
> exactly they are changing?  This really needs to be better documented.

Yes, I think that we should document that UNSPEC_Vs are full optimization 
barriers so the existing blockage insn of all ports are really blockage.
That's already what is implemented and seems non-controversial (unlike the 
volatile asms).  Something like:

Index: rtl.def
===
--- rtl.def (revision 194946)
+++ rtl.def (working copy)
@@ -213,7 +213,9 @@ DEF_RTL_EXPR(ASM_OPERANDS, "asm_operands
*/
 DEF_RTL_EXPR(UNSPEC, "unspec", "Ei", RTX_EXTRA)
 
-/* Similar, but a volatile operation and one which may trap.  */
+/* Similar, but a volatile operation and one which may trap.  Moreover, it's 
a
+   full optimization barrier, i.e. no instructions may be moved and no 
register
+   (hard or pseudo) or memory equivalences may be used across it.  */
 DEF_RTL_EXPR(UNSPEC_VOLATILE, "unspec_volatile", "Ei", RTX_EXTRA)
 
 /* Vector of addresses, stored as full words.  */

I'd also propose that blockage insns always be UNSPEC_Vs (that's already the 
case in practice, but the manual also lists volatile asms).

And I'm somewhat dubious about the distinction between toplevel and embedded 
UNSPEC_Vs in a pattern; IMO, that shouldn't make any difference.

-- 
Eric Botcazou


Re: [PATCH, dataflow]: Fix PR55845, 454.calculix miscompares on x86 AVX due to movement of vzeroupper

2013-01-06 Thread Jakub Jelinek
On Sun, Jan 06, 2013 at 04:48:03PM +0100, Uros Bizjak wrote:
> --- df-problems.c (revision 194945)
> +++ df-problems.c (working copy)
> @@ -3916,6 +3916,10 @@ can_move_insns_across (rtx from, rtx to, rtx acros
>   break;
>if (NONDEBUG_INSN_P (insn))
>   {
> +   /* Do not move unspec_volatile insns.  */
> +   if (GET_CODE (PATTERN (insn)) == UNSPEC_VOLATILE)
> + break;
> +

Shouldn't UNSPEC_VOLATILE be handled similarly in the across_from ..
across_to loop?  Both UNSPEC_VOLATILE and volatile asm are handled there
just with
trapping_insns_in_across |= may_trap_p (PATTERN (insn));
but your new change doesn't prevent moving just trapping insns across
UNSPEC_VOLATILE, but any insns whatsoever.  So supposedly for UNSPEC_VOLATILE
the first loop should just return false; (or fail = 1; ?).
For asm volatile I guess the code is fine as is, it must always describe
what exactly it modifies, so supposedly non-trapping insns can be moved
across asm volatile.

> if (may_trap_or_fault_p (PATTERN (insn))
> && (trapping_insns_in_across || other_branch_live != NULL))
>   break;

You could do the check only for may_trap_or_fault_p, all UNSPEC_VOLATILE
may trap.

BTW, can't UNSPEC_VOLATILE be embedded deeply in the pattern?
So volatile_insn_p (insn) && asm_noperands (PATTERN (insn)) == -1?
But perhaps you want to treat that way only UNSPEC_VOLATILE directly in the
pattern and all other UNSPEC_VOLATILE insns must describe in detail what
exactly they are changing?  This really needs to be better documented.

Jakub