[PATCH] rebuild frequency after vrp
This patch rebuilds frequency after vrp. Bootstrapped and testing on-going. OK for trunk if test pass? Thanks, Dehao gcc/ChangeLog: 2014-06-02 Dehao Chen PR tree-optimization/61384 * tree-vrp.c (execute_vrp): rebuild frequency after vrp. gcc/testsuite/ChangeLog: 2014-06-02 Dehao Chen PR tree-optimization/61384 * gcc.dg/pr61384.c: New testcase. Index: gcc/testsuite/gcc.dg/pr61384.c === --- gcc/testsuite/gcc.dg/pr61384.c (revision 0) +++ gcc/testsuite/gcc.dg/pr61384.c (revision 0) @@ -0,0 +1,32 @@ +/* PR tree-optimization/61384 */ +/* { dg-do compile } */ +/* { dg-options "-O3" } */ + +int a, b, d, e, g, h; +static int c = 1, f[5]; + +int +fn1 (int p1, int p2) +{ + return p1 && p2 && p2; +} + +void +fn2 () +{ + for (d = 0; d < 1; d++) +{ + g = f[0]; + e = 0; + h = fn1 (1, (a && c) ^ b); +} + for (; e < 5; e++) +f[e] = 1; +} + +int +main () +{ + fn2 (); + return 0; +} Index: gcc/tree-vrp.c === --- gcc/tree-vrp.c (revision 211137) +++ gcc/tree-vrp.c (working copy) @@ -9794,7 +9794,7 @@ execute_vrp (void) scev_finalize (); loop_optimizer_finalize (); - return 0; + return TODO_rebuild_frequencies; } namespace {
Re: [PATCH] rebuild frequency after vrp
> This patch rebuilds frequency after vrp. Why do you need to rebuild frequency after VRP? I always tought it may be useful to do VRP as early optimization (modulo to compile time costs), but I do not think we should unconditionally rebuild frequencies like this... Honza > > Bootstrapped and testing on-going. OK for trunk if test pass? > > Thanks, > Dehao > > gcc/ChangeLog: > 2014-06-02 Dehao Chen > > PR tree-optimization/61384 > * tree-vrp.c (execute_vrp): rebuild frequency after vrp. > > gcc/testsuite/ChangeLog: > 2014-06-02 Dehao Chen > > PR tree-optimization/61384 > * gcc.dg/pr61384.c: New testcase. > > Index: gcc/testsuite/gcc.dg/pr61384.c > === > --- gcc/testsuite/gcc.dg/pr61384.c (revision 0) > +++ gcc/testsuite/gcc.dg/pr61384.c (revision 0) > @@ -0,0 +1,32 @@ > +/* PR tree-optimization/61384 */ > +/* { dg-do compile } */ > +/* { dg-options "-O3" } */ > + > +int a, b, d, e, g, h; > +static int c = 1, f[5]; > + > +int > +fn1 (int p1, int p2) > +{ > + return p1 && p2 && p2; > +} > + > +void > +fn2 () > +{ > + for (d = 0; d < 1; d++) > +{ > + g = f[0]; > + e = 0; > + h = fn1 (1, (a && c) ^ b); > +} > + for (; e < 5; e++) > +f[e] = 1; > +} > + > +int > +main () > +{ > + fn2 (); > + return 0; > +} > > Index: gcc/tree-vrp.c > === > --- gcc/tree-vrp.c (revision 211137) > +++ gcc/tree-vrp.c (working copy) > @@ -9794,7 +9794,7 @@ execute_vrp (void) > >scev_finalize (); >loop_optimizer_finalize (); > - return 0; > + return TODO_rebuild_frequencies; > } > > namespace {
Re: [PATCH] rebuild frequency after vrp
We need to rebuild frequency after vrp, otherwise the following code in tree-ssa-threadupdate.c will make the frequency larger than upper-bound. /* Excessive jump threading may make frequencies large enough so the computation overflows. */ if (rd->dup_blocks[0]->frequency < BB_FREQ_MAX * 2) rd->dup_blocks[0]->frequency += EDGE_FREQUENCY (e); This is referring to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61384 Thanks, Dehao On Mon, Jun 2, 2014 at 9:13 AM, Jan Hubicka wrote: >> This patch rebuilds frequency after vrp. > > Why do you need to rebuild frequency after VRP? I always tought it may be > useful to do VRP as early optimization (modulo to compile time costs), > but I do not think we should unconditionally rebuild frequencies like this... > > Honza >> >> Bootstrapped and testing on-going. OK for trunk if test pass? >> >> Thanks, >> Dehao >> >> gcc/ChangeLog: >> 2014-06-02 Dehao Chen >> >> PR tree-optimization/61384 >> * tree-vrp.c (execute_vrp): rebuild frequency after vrp. >> >> gcc/testsuite/ChangeLog: >> 2014-06-02 Dehao Chen >> >> PR tree-optimization/61384 >> * gcc.dg/pr61384.c: New testcase. >> >> Index: gcc/testsuite/gcc.dg/pr61384.c >> === >> --- gcc/testsuite/gcc.dg/pr61384.c (revision 0) >> +++ gcc/testsuite/gcc.dg/pr61384.c (revision 0) >> @@ -0,0 +1,32 @@ >> +/* PR tree-optimization/61384 */ >> +/* { dg-do compile } */ >> +/* { dg-options "-O3" } */ >> + >> +int a, b, d, e, g, h; >> +static int c = 1, f[5]; >> + >> +int >> +fn1 (int p1, int p2) >> +{ >> + return p1 && p2 && p2; >> +} >> + >> +void >> +fn2 () >> +{ >> + for (d = 0; d < 1; d++) >> +{ >> + g = f[0]; >> + e = 0; >> + h = fn1 (1, (a && c) ^ b); >> +} >> + for (; e < 5; e++) >> +f[e] = 1; >> +} >> + >> +int >> +main () >> +{ >> + fn2 (); >> + return 0; >> +} >> >> Index: gcc/tree-vrp.c >> === >> --- gcc/tree-vrp.c (revision 211137) >> +++ gcc/tree-vrp.c (working copy) >> @@ -9794,7 +9794,7 @@ execute_vrp (void) >> >>scev_finalize (); >>loop_optimizer_finalize (); >> - return 0; >> + return TODO_rebuild_frequencies; >> } >> >> namespace {
Re: [PATCH] rebuild frequency after vrp
On 06/02/14 10:17, Dehao Chen wrote: We need to rebuild frequency after vrp, otherwise the following code in tree-ssa-threadupdate.c will make the frequency larger than upper-bound. /* Excessive jump threading may make frequencies large enough so the computation overflows. */ if (rd->dup_blocks[0]->frequency < BB_FREQ_MAX * 2) rd->dup_blocks[0]->frequency += EDGE_FREQUENCY (e); This is referring to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61384 Can you try this with Teresa's revamping of the jump threading frequency updates? It's still in my queue of things to review. Fixing this stuff in the updater would be better than rebuilding the frequencies, IMHO. Jeff
Re: [PATCH] rebuild frequency after vrp
Just tried with Teresa's patch, the ICE in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61384 is not resolved. Dehao On Mon, Jun 2, 2014 at 9:45 AM, Jeff Law wrote: > On 06/02/14 10:17, Dehao Chen wrote: >> >> We need to rebuild frequency after vrp, otherwise the following code >> in tree-ssa-threadupdate.c will make the frequency larger than >> upper-bound. >> >>/* Excessive jump threading may make frequencies large enough >> so >> the computation overflows. */ >>if (rd->dup_blocks[0]->frequency < BB_FREQ_MAX * 2) >> rd->dup_blocks[0]->frequency += EDGE_FREQUENCY (e); >> >> This is referring to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61384 > > Can you try this with Teresa's revamping of the jump threading frequency > updates? It's still in my queue of things to review. > > Fixing this stuff in the updater would be better than rebuilding the > frequencies, IMHO. > > Jeff >
Re: [PATCH] rebuild frequency after vrp
> On 06/02/14 10:17, Dehao Chen wrote: > >We need to rebuild frequency after vrp, otherwise the following code > >in tree-ssa-threadupdate.c will make the frequency larger than > >upper-bound. > > > > /* Excessive jump threading may make frequencies large enough so > > the computation overflows. */ > > if (rd->dup_blocks[0]->frequency < BB_FREQ_MAX * 2) > > rd->dup_blocks[0]->frequency += EDGE_FREQUENCY (e); > > > >This is referring to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61384 > Can you try this with Teresa's revamping of the jump threading > frequency updates? It's still in my queue of things to review. > > Fixing this stuff in the updater would be better than rebuilding the > frequencies, IMHO. Agreed. However one of problems is that the frequencies after VRP may be unfixable. If i.e. you have if (a) prob 20 else prob 80 if (a) prob 50 else prob 50 that can easily happen if the second conditional was more convoluted earlier in the optimization queue. We may track when this change and ask for frequency rebuild, but issue is that I am not completely sure if that is going to give us consistently more reliable profiles than without: the problem is that branch probabilities themselves may be misupdated by earlier passes. It may be interesting to do some analysis how well estimated profile corelate with measured profile thorough the optimization. The problem above would be less troublesome if we jump threaded at least some cases pre-branch prediction. It is one of reasions why I think it would be cool to do jump threading in early opts, too, at least in a lightweidt form. Now when VRP is able to annotate SSA names, VRP results could feed loop estimation analysis for i.e. get upper bound on iterations of: if (a<50) for (i=0;i > Jeff
Re: [PATCH] rebuild frequency after vrp
> We need to rebuild frequency after vrp, otherwise the following code > in tree-ssa-threadupdate.c will make the frequency larger than > upper-bound. > > /* Excessive jump threading may make frequencies large enough so > the computation overflows. */ > if (rd->dup_blocks[0]->frequency < BB_FREQ_MAX * 2) > rd->dup_blocks[0]->frequency += EDGE_FREQUENCY (e); > > This is referring to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61384 I see, I still think C++ conversion gives us chance to turn both counts and frequencies to types that are safe WRT such calculations. Counts can be either software floats as discussed or fixed point 64bit type (I think we only need something like 8-10 bits to get problems Teresa is running into resolved - the cases where expanding statements gets sub 1 counts that needs to be diferent from 0) that does caps instead of overflow. Frequencies can be either the same or 32bit fixed point with capping as we do, sort of, now. I would really welcome if someone worked on this rather than papering around the bugs in current hand written fixed point arithmetics. (I am currently occupied by other projects, so I am not sure I can do it myself really soon) Honza
Re: [PATCH] rebuild frequency after vrp
On 06/02/14 12:07, Jan Hubicka wrote: It is one of reasions why I think it would be cool to do jump threading in early opts, too, at least in a lightweidt form. Conceptually it's pretty easy to do during the into-ssa step. Not sure it it'd catch the cases you care about though. jeff
Re: [PATCH] rebuild frequency after vrp
On Mon, Jun 2, 2014 at 10:26 AM, Dehao Chen wrote: > Just tried with Teresa's patch, the ICE in > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61384 is not resolved. I will take a look to see why this wasn't fixed by my profile redesign. Teresa > > Dehao > > On Mon, Jun 2, 2014 at 9:45 AM, Jeff Law wrote: >> On 06/02/14 10:17, Dehao Chen wrote: >>> >>> We need to rebuild frequency after vrp, otherwise the following code >>> in tree-ssa-threadupdate.c will make the frequency larger than >>> upper-bound. >>> >>>/* Excessive jump threading may make frequencies large enough >>> so >>> the computation overflows. */ >>>if (rd->dup_blocks[0]->frequency < BB_FREQ_MAX * 2) >>> rd->dup_blocks[0]->frequency += EDGE_FREQUENCY (e); >>> >>> This is referring to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61384 >> >> Can you try this with Teresa's revamping of the jump threading frequency >> updates? It's still in my queue of things to review. >> >> Fixing this stuff in the updater would be better than rebuilding the >> frequencies, IMHO. >> >> Jeff >> -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [PATCH] rebuild frequency after vrp
On Mon, Jun 2, 2014 at 11:04 AM, Teresa Johnson wrote: > On Mon, Jun 2, 2014 at 10:26 AM, Dehao Chen wrote: >> Just tried with Teresa's patch, the ICE in >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61384 is not resolved. > > I will take a look to see why this wasn't fixed by my profile redesign. Turns out this is a case of garbage-in, garbage-out. Jump threading is producing a block with a frequency >10K, but it is directly due to a block with an invalid frequency after the upstream ccp2 pass. After ccp2: ;; basic block 3, loop depth 1, count 0, freq 9100, maybe hot ;;prev block 2, next block 4, flags: (NEW, REACHABLE) ;;pred: 10 [91.0%] (TRUE_VALUE,EXECUTABLE) ... ;;succ: 5 [50.0%] (TRUE_VALUE,EXECUTABLE) ;;4 [50.0%] (FALSE_VALUE,EXECUTABLE) ;; basic block 4, loop depth 1, count 0, freq 6825, maybe hot ;; Invalid sum of incoming frequencies 4550, should be 6825 ;;prev block 3, next block 5, flags: (NEW, REACHABLE) ;;pred: 3 [50.0%] (FALSE_VALUE,EXECUTABLE) ;; , discriminator 4 ;;succ: 5 [100.0%] (FALLTHRU,EXECUTABLE) Jump threading introduced BB 13, with predecessors 3 and 4 shown above: ;; basic block 13, loop depth 1, count 0, freq 11375, maybe hot ;; Invalid sum of outgoing probabilities 40.0% ;;prev block 12, next block 1, flags: (NEW, REACHABLE) ;;pred: 4 [100.0%] (FALLTHRU,EXECUTABLE) ;;3 [50.0%] (TRUE_VALUE,EXECUTABLE) 11375 = 9100/2 + 6825 If bb 4 had the correct frequency of 4550 (9100/2), this block would have gotten the correct count of 9100. BB 13's successor has the correct frequency of 9100. Teresa > > Teresa > >> >> Dehao >> >> On Mon, Jun 2, 2014 at 9:45 AM, Jeff Law wrote: >>> On 06/02/14 10:17, Dehao Chen wrote: We need to rebuild frequency after vrp, otherwise the following code in tree-ssa-threadupdate.c will make the frequency larger than upper-bound. /* Excessive jump threading may make frequencies large enough so the computation overflows. */ if (rd->dup_blocks[0]->frequency < BB_FREQ_MAX * 2) rd->dup_blocks[0]->frequency += EDGE_FREQUENCY (e); This is referring to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61384 >>> >>> Can you try this with Teresa's revamping of the jump threading frequency >>> updates? It's still in my queue of things to review. >>> >>> Fixing this stuff in the updater would be better than rebuilding the >>> frequencies, IMHO. >>> >>> Jeff >>> > > > > -- > Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [PATCH] rebuild frequency after vrp
> On 06/02/14 12:07, Jan Hubicka wrote: > > > >It is one of reasions why I think it would be cool to do jump threading in > >early opts, too, at least in a lightweidt form. > Conceptually it's pretty easy to do during the into-ssa step. Not > sure it it'd catch the cases you care about though. Yep, it may be an option, too. I always had in mind Muchnick style cheap DOM pass run during into-SSA or as very first cleanup to get rid of unnecesary code quickly and cheaply. Our DOM is bit different beast though :). Very basic jump threading may fit here - never really tought about that. You probably know better than me how many of threading oppurtunities are "obvious" and how many are the difficult ones solved by VRP&DOM. My gut feeling is that doing the obvious ones may be good enough. But I do not know. Honza > > jeff >
Re: [PATCH] rebuild frequency after vrp
On 06/02/14 14:27, Jan Hubicka wrote: On 06/02/14 12:07, Jan Hubicka wrote: It is one of reasions why I think it would be cool to do jump threading in early opts, too, at least in a lightweidt form. Conceptually it's pretty easy to do during the into-ssa step. Not sure it it'd catch the cases you care about though. Yep, it may be an option, too. I always had in mind Muchnick style cheap DOM pass run during into-SSA or as very first cleanup to get rid of unnecesary code quickly and cheaply. Our DOM is bit different beast though :). Very basic jump threading may fit here - never really tought about that. We had all this about 10 years ago ;-) DOM actually started its life a Morgan-esque bolt onto the renamer. You probably know better than me how many of threading oppurtunities are "obvious" and how many are the difficult ones solved by VRP&DOM. My gut feeling is that doing the obvious ones may be good enough. But I do not know. A huge number of them are trivially derivable by just looking at the PHI nodes in the current block and I've pondered many times having a simpler jump threading pass that runs early in the gimple pipeline. Jeff
Re: [PATCH] rebuild frequency after vrp
> On 06/02/14 14:27, Jan Hubicka wrote: > >>On 06/02/14 12:07, Jan Hubicka wrote: > >>> > >>>It is one of reasions why I think it would be cool to do jump threading in > >>>early opts, too, at least in a lightweidt form. > >>Conceptually it's pretty easy to do during the into-ssa step. Not > >>sure it it'd catch the cases you care about though. > > > >Yep, it may be an option, too. I always had in mind Muchnick style cheap > >DOM pass run during into-SSA or as very first cleanup to get rid of > >unnecesary > >code quickly and cheaply. Our DOM is bit different beast though :). > >Very basic jump threading may fit here - never really tought about that. > We had all this about 10 years ago ;-) DOM actually started its > life a Morgan-esque bolt onto the renamer. Yep, I know :)) But being one of first SSA passes, it envoled to be bit too smart for what I want. I used to have some numbers on running dom or VRP early. I suppose I can re-test them and see how much benefits that makes. This things are bit hard to get - strenghtening early opts always makes inliner to increase its activity making a mess of code size comparsions. > > > > >You probably know better than me how many of threading oppurtunities are > >"obvious" and how many are the difficult ones solved by VRP&DOM. My gut > >feeling > >is that doing the obvious ones may be good enough. But I do not know. > A huge number of them are trivially derivable by just looking at the > PHI nodes in the current block and I've pondered many times having a > simpler jump threading pass that runs early in the gimple pipeline. Yep, I think it would make sense indeed even though we currently have 3 threading passes and I am responsible for one of them already :) Honza > > Jeff