Re: [PATCH, dataflow]: Fix PR55845, 454.calculix miscompares on x86 AVX due to movement of vzeroupper
On 01/07/2013 03:26 PM, Jakub Jelinek wrote: > 2012-01-08 Jakub Jelinek > Uros Bizjak > > PR rtl-optimization/55845 > * df-problems.c (can_move_insns_across): Stop scanning at > volatile_insn_p source instruction or give up if > across_from .. across_to range contains any volatile_insn_p > instructions. Ok. r~
Re: [PATCH, dataflow]: Fix PR55845, 454.calculix miscompares on x86 AVX due to movement of vzeroupper
On Tue, Jan 8, 2013 at 12:26 AM, Jakub Jelinek wrote: > On Mon, Jan 07, 2013 at 05:52:23PM +0100, Uros Bizjak wrote: >> TBH, I'm not that familiar with the RTL infrastructure enough to >> answer these questions. While I can spend some time on this problem, >> and probably waste quite some reviewer's time, the problem is not that >> trivial as I hoped to be, so I would kindly ask someone with better >> understanding of this part of the compiler for the proper solution. > > After discussion with rth on IRC, this modified patch just uses > volatile_insn_p, making all UNSPEC_VOLATILE (wherever in insn) and asm > volatile into a complete scheduling barrier for optimizations that use this > function. Thanks! Just two little nits in the testcase: > +foo (int size, double y[], double x[]) foo (int size, double *y, double *x) > + return (sum); return sum; Uros.
Re: [PATCH, dataflow]: Fix PR55845, 454.calculix miscompares on x86 AVX due to movement of vzeroupper
On Mon, Jan 07, 2013 at 05:52:23PM +0100, Uros Bizjak wrote: > TBH, I'm not that familiar with the RTL infrastructure enough to > answer these questions. While I can spend some time on this problem, > and probably waste quite some reviewer's time, the problem is not that > trivial as I hoped to be, so I would kindly ask someone with better > understanding of this part of the compiler for the proper solution. After discussion with rth on IRC, this modified patch just uses volatile_insn_p, making all UNSPEC_VOLATILE (wherever in insn) and asm volatile into a complete scheduling barrier for optimizations that use this function. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2012-01-08 Jakub Jelinek Uros Bizjak PR rtl-optimization/55845 * df-problems.c (can_move_insns_across): Stop scanning at volatile_insn_p source instruction or give up if across_from .. across_to range contains any volatile_insn_p instructions. 2012-01-08 Uros Bizjak Vladimir Yakovlev PR rtl-optimization/55845 * gcc.target/i386/pr55845.c: New test. --- gcc/df-problems.c.jj2012-11-19 14:41:26.181898964 +0100 +++ gcc/df-problems.c 2013-01-07 18:38:33.064974313 +0100 @@ -3858,6 +3858,8 @@ can_move_insns_across (rtx from, rtx to, } if (NONDEBUG_INSN_P (insn)) { + if (volatile_insn_p (PATTERN (insn))) + return false; memrefs_in_across |= for_each_rtx (&PATTERN (insn), find_memory, NULL); note_stores (PATTERN (insn), find_memory_stores, @@ -3917,7 +3919,9 @@ can_move_insns_across (rtx from, rtx to, if (NONDEBUG_INSN_P (insn)) { if (may_trap_or_fault_p (PATTERN (insn)) - && (trapping_insns_in_across || other_branch_live != NULL)) + && (trapping_insns_in_across + || other_branch_live != NULL + || volatile_insn_p (PATTERN (insn break; /* We cannot move memory stores past each other, or move memory --- gcc/testsuite/gcc.target/i386/pr55845.c.jj 2013-01-07 18:30:19.168801389 +0100 +++ gcc/testsuite/gcc.target/i386/pr55845.c 2013-01-07 18:30:19.168801389 +0100 @@ -0,0 +1,39 @@ +/* { dg-do run } */ +/* { dg-require-effective-target avx } */ +/* { dg-options "-O3 -ffast-math -fschedule-insns -mavx -mvzeroupper" } */ + +#include "avx-check.h" + +#define N 100 + +double +__attribute__((noinline)) +foo (int size, double y[], double x[]) +{ + double sum = 0.0; + int i; + for (i = 0, sum = 0.; i < size; i++) +sum += y[i] * x[i]; + return (sum); +} + +static void +__attribute__ ((noinline)) +avx_test () +{ + double x[N]; + double y[N]; + double s; + int i; + + for (i = 0; i < N; i++) +{ + x[i] = i; + y[i] = i; +} + + s = foo (N, y, x); + + if (s != 328350.0) +abort (); +} Jakub
Re: [PATCH, dataflow]: Fix PR55845, 454.calculix miscompares on x86 AVX due to movement of vzeroupper
On Sun, Jan 6, 2013 at 5:22 PM, Jakub Jelinek wrote: >> --- df-problems.c (revision 194945) >> +++ df-problems.c (working copy) >> @@ -3916,6 +3916,10 @@ can_move_insns_across (rtx from, rtx to, rtx acros >> break; >>if (NONDEBUG_INSN_P (insn)) >> { >> + /* Do not move unspec_volatile insns. */ >> + if (GET_CODE (PATTERN (insn)) == UNSPEC_VOLATILE) >> + break; >> + > > Shouldn't UNSPEC_VOLATILE be handled similarly in the across_from .. > across_to loop? Both UNSPEC_VOLATILE and volatile asm are handled there > just with > trapping_insns_in_across |= may_trap_p (PATTERN (insn)); > but your new change doesn't prevent moving just trapping insns across > UNSPEC_VOLATILE, but any insns whatsoever. So supposedly for UNSPEC_VOLATILE > the first loop should just return false; (or fail = 1; ?). > For asm volatile I guess the code is fine as is, it must always describe > what exactly it modifies, so supposedly non-trapping insns can be moved > across asm volatile. > >> if (may_trap_or_fault_p (PATTERN (insn)) >> && (trapping_insns_in_across || other_branch_live != NULL)) >> break; > > You could do the check only for may_trap_or_fault_p, all UNSPEC_VOLATILE > may trap. > > BTW, can't UNSPEC_VOLATILE be embedded deeply in the pattern? > So volatile_insn_p (insn) && asm_noperands (PATTERN (insn)) == -1? > But perhaps you want to treat that way only UNSPEC_VOLATILE directly in the > pattern and all other UNSPEC_VOLATILE insns must describe in detail what > exactly they are changing? This really needs to be better documented. TBH, I'm not that familiar with the RTL infrastructure enough to answer these questions. While I can spend some time on this problem, and probably waste quite some reviewer's time, the problem is not that trivial as I hoped to be, so I would kindly ask someone with better understanding of this part of the compiler for the proper solution. Uros.
Re: [PATCH, dataflow]: Fix PR55845, 454.calculix miscompares on x86 AVX due to movement of vzeroupper
> BTW, can't UNSPEC_VOLATILE be embedded deeply in the pattern? > So volatile_insn_p (insn) && asm_noperands (PATTERN (insn)) == -1? > But perhaps you want to treat that way only UNSPEC_VOLATILE directly in the > pattern and all other UNSPEC_VOLATILE insns must describe in detail what > exactly they are changing? This really needs to be better documented. Yes, I think that we should document that UNSPEC_Vs are full optimization barriers so the existing blockage insn of all ports are really blockage. That's already what is implemented and seems non-controversial (unlike the volatile asms). Something like: Index: rtl.def === --- rtl.def (revision 194946) +++ rtl.def (working copy) @@ -213,7 +213,9 @@ DEF_RTL_EXPR(ASM_OPERANDS, "asm_operands */ DEF_RTL_EXPR(UNSPEC, "unspec", "Ei", RTX_EXTRA) -/* Similar, but a volatile operation and one which may trap. */ +/* Similar, but a volatile operation and one which may trap. Moreover, it's a + full optimization barrier, i.e. no instructions may be moved and no register + (hard or pseudo) or memory equivalences may be used across it. */ DEF_RTL_EXPR(UNSPEC_VOLATILE, "unspec_volatile", "Ei", RTX_EXTRA) /* Vector of addresses, stored as full words. */ I'd also propose that blockage insns always be UNSPEC_Vs (that's already the case in practice, but the manual also lists volatile asms). And I'm somewhat dubious about the distinction between toplevel and embedded UNSPEC_Vs in a pattern; IMO, that shouldn't make any difference. -- Eric Botcazou
Re: [PATCH, dataflow]: Fix PR55845, 454.calculix miscompares on x86 AVX due to movement of vzeroupper
On Sun, Jan 06, 2013 at 04:48:03PM +0100, Uros Bizjak wrote: > --- df-problems.c (revision 194945) > +++ df-problems.c (working copy) > @@ -3916,6 +3916,10 @@ can_move_insns_across (rtx from, rtx to, rtx acros > break; >if (NONDEBUG_INSN_P (insn)) > { > + /* Do not move unspec_volatile insns. */ > + if (GET_CODE (PATTERN (insn)) == UNSPEC_VOLATILE) > + break; > + Shouldn't UNSPEC_VOLATILE be handled similarly in the across_from .. across_to loop? Both UNSPEC_VOLATILE and volatile asm are handled there just with trapping_insns_in_across |= may_trap_p (PATTERN (insn)); but your new change doesn't prevent moving just trapping insns across UNSPEC_VOLATILE, but any insns whatsoever. So supposedly for UNSPEC_VOLATILE the first loop should just return false; (or fail = 1; ?). For asm volatile I guess the code is fine as is, it must always describe what exactly it modifies, so supposedly non-trapping insns can be moved across asm volatile. > if (may_trap_or_fault_p (PATTERN (insn)) > && (trapping_insns_in_across || other_branch_live != NULL)) > break; You could do the check only for may_trap_or_fault_p, all UNSPEC_VOLATILE may trap. BTW, can't UNSPEC_VOLATILE be embedded deeply in the pattern? So volatile_insn_p (insn) && asm_noperands (PATTERN (insn)) == -1? But perhaps you want to treat that way only UNSPEC_VOLATILE directly in the pattern and all other UNSPEC_VOLATILE insns must describe in detail what exactly they are changing? This really needs to be better documented. Jakub