Re: [RFC] Replace VRP with EVRP passes

2021-10-14 Thread Jeff Law via Gcc-patches




On 10/13/2021 2:58 PM, Andrew MacLeod wrote:
As work has progressed, we're pretty close to being able to 
functionally replace VRP with another EVRP pass.  At least it seems 
close enough that we should discuss if thats something we might want 
to consider for this release.   Replacing just one of the 2 VRP passes 
is another option.


First, lets examine simplifications/folds.

Running over my set of 380 GCC source files, we see the following 
results for number of cases we currently get:


Number of EVRP cases :   5789
Number of VRP1 cases :   4913
Number of VRP2 cases :   279
 combined VRP1/2:   5192

The 2 passes of VRP get a total of 5192 cases.

If we run EVRP instead of each VRP pass, we get the following results:

Number of EVRP1 cases :   5789
Number of EVRP2 cases :   7521
Number of EVRP3 cases :   2240
 combined EVRP2/3:   9761

so the EVRP passes find an additional 4569 opportunities,  or 88% 
more. This initially surprised me until it occurred to me that this is 
probably due to the pruning require for ASSERT_EXPRs, which means it 
never had a chance to see some of those cases. Notice how the second 
pass appears far more effective now.
Wow.  It'd be nice to know for sure if it's the pruning, but the data is 
pretty clear, EVRP does a considerably better job.




Regarding what we would miss if we took VRP out, if we run  EVRP 
passes first then a VRP pass immediately after, we see what VRP finds 
that EVRP cannot:


Number of EVRP2 cases :  7521
Number of VRP1 cases :   11
Number of EVRP3 cases :  2269
Number of VRP2 cases :   54

I have looked at some of these, and so far they all appear to be cases 
which are solved via the iteration model VRP uses. regardless, missing 
65 cases and getting 4569 new ones would seem to be a win. I will 
continue to investigate them.
Well, that would seem to support our long-held belief that the iteration 
step just isn't that important.  Good to finally get a confirmation.





== Performance ==

The threading work has been pulled out of VRP, so we get a better idea 
of what VRPs time really is. we're showing about a 58% slowdown in VRP 
over the 2 passes.  I've begun investigating because it shouldn't be 
off by that much, Im seeing a lot of excess time being wasted with 
callback queries from the substitute_and_fold engine when processing 
PHIs.  It should be possible to bring this all back in line as that 
isnt the model ranger should be using anyway.


I figured while I'm looking into the performance side of it, maybe we 
should start talking about whether we want to replace one or both of 
the VRP passes with an EVRP instance.



I see 3 primary options:
1 - replace both VRP passes with EVRP instances.
2 - replace VRP2 with EVRP2
3 - Replace neither, leave it as is.

I figure since the second pass of VRP doesn't get a lot to start with, 
it probably doesn't make sense to replace VRP1 and not VRP2.


Option 1 is what I would expect the strategic move next release to be, 
it seems ready now, its just a matter of whether we want to give it 
more time.  It would also be trivial to turn VRP back on for one for 
both later in the cycle if we determines there was something important 
missing.


option 2 is something we ought to really consider if we don't want to 
do option 1.  There are a few PRs that are starting to open that have 
VRP not getting something due to the whims of more precise 
mutli-ranges being converted back to a value_range, and replacing VRP2 
would allow us to catch those..  plus, we pick up a lot more than  
VRP2 does.


I would propose we add a param, similar to what EVRP has which will 
allow us to choose which pass is called for VRP1 and VRP2 and set our 
defaults appropriately.  I wouldn't work with a hybrid like we did 
with EVRP... just choose which pass runs.     And we'll have to adjust 
some testcases based one whatever our default is.


Thoughts?

Personally I think we give option 1 a go and if something shows up 
over the next couple of months, or  we cant get performance in line 
with where we want it, then we can switch back to VRP for one or both 
passes.  I wouldn't  expect either, but one never knows :-)


If that isn't palatable for everyone, then I'd suggest option 2

I'd be happy with #1 or #2.   The only one I don't like is #3.

Jeff


Re: [RFC] Replace VRP with EVRP passes

2021-10-14 Thread Andrew MacLeod via Gcc-patches

On 10/14/21 8:54 AM, Richard Biener wrote:

On Thu, Oct 14, 2021 at 2:46 PM Andrew MacLeod  wrote:


I always test ada, objc.. anything that can be built as part of my
normal build :-)  so yes.

As far as I am aware, we are not missing any symbolic/relational cases,
that has all been functioning for a couple of months.



I think option 2 would be safest at least so any important iteration/symbolic
work is done early.


Certainly works as a good first step.

So let's pull the trigger on that one?


OK, we'll put something together.



Re: [RFC] Replace VRP with EVRP passes

2021-10-14 Thread Richard Biener via Gcc-patches
On Thu, Oct 14, 2021 at 2:46 PM Andrew MacLeod  wrote:
>
> On 10/14/21 4:27 AM, Richard Biener wrote:
> > On Wed, Oct 13, 2021 at 10:58 PM Andrew MacLeod  wrote:
> >> As work has progressed, we're pretty close to being able to functionally
> >> replace VRP with another EVRP pass.  At least it seems close enough that
> >> we should discuss if thats something we might want to consider for this
> >> release.   Replacing just one of the 2 VRP passes is another option.
> >>
> >> First, lets examine simplifications/folds.
> >>
> >> Running over my set of 380 GCC source files, we see the following
> >> results for number of cases we currently get:
> >>
> >> Number of EVRP cases :   5789
> >> Number of VRP1 cases :   4913
> >> Number of VRP2 cases :   279
> >>combined VRP1/2:   5192
> >>
> >> The 2 passes of VRP get a total of 5192 cases.
> >>
> >> If we run EVRP instead of each VRP pass, we get the following results:
> >>
> >> Number of EVRP1 cases :   5789
> >> Number of EVRP2 cases :   7521
> >> Number of EVRP3 cases :   2240
> >>combined EVRP2/3:   9761
> >>
> >> so the EVRP passes find an additional 4569 opportunities,  or 88% more.
> >> This initially surprised me until it occurred to me that this is
> >> probably due to the pruning require for ASSERT_EXPRs, which means it
> >> never had a chance to see some of those cases. Notice how the second
> >> pass appears far more effective now.
> >>
> >> Regarding what we would miss if we took VRP out, if we run  EVRP passes
> >> first then a VRP pass immediately after, we see what VRP finds that EVRP
> >> cannot:
> >>
> >> Number of EVRP2 cases :  7521
> >> Number of VRP1 cases :   11
> >> Number of EVRP3 cases :  2269
> >> Number of VRP2 cases :   54
> >>
> >> I have looked at some of these, and so far they all appear to be cases
> >> which are solved via the iteration model VRP uses.
> > Most should be now handled by using SCEV analysis on PHIs rather
> > than VRP iteration so can you share an example where the iteration helps?
>
> I didn't delve too deeply since I was skimming for whats missing, and
> they all seemed to be in very large programs with cyclic phis feeding
> each other.
>
> I'll pick one shortly and do a deeper dive.
>
>
> >
> >> regardless, missing
> >> 65 cases and getting 4569 new ones would seem to be a win. I will
> >> continue to investigate them.
> >>
> >> == Performance ==
> >>
> >> The threading work has been pulled out of VRP, so we get a better idea
> >> of what VRPs time really is. we're showing about a 58% slowdown in VRP
> >> over the 2 passes.  I've begun investigating because it shouldn't be off
> >> by that much, Im seeing a lot of excess time being wasted with callback
> >> queries from the substitute_and_fold engine when processing PHIs.  It
> >> should be possible to bring this all back in line as that isnt the model
> >> ranger should be using anyway.
> >>
> >> I figured while I'm looking into the performance side of it, maybe we
> >> should start talking about whether we want to replace one or both of the
> >> VRP passes with an EVRP instance.
> >>
> >>
> >> I see 3 primary options:
> >> 1 - replace both VRP passes with EVRP instances.
> >> 2 - replace VRP2 with EVRP2
> >> 3 - Replace neither, leave it as is.
> >>
> >> I figure since the second pass of VRP doesn't get a lot to start with,
> >> it probably doesn't make sense to replace VRP1 and not VRP2.
> >>
> >> Option 1 is what I would expect the strategic move next release to be,
> >> it seems ready now, its just a matter of whether we want to give it more
> >> time.  It would also be trivial to turn VRP back on for one for both
> >> later in the cycle if we determines there was something important missing.
> >>
> >> option 2 is something we ought to really consider if we don't want to do
> >> option 1.  There are a few PRs that are starting to open that have VRP
> >> not getting something due to the whims of more precise mutli-ranges
> >> being converted back to a value_range, and replacing VRP2 would allow us
> >> to catch those..  plus, we pick up a lot more than  VRP2 does.
> >>
> >> I would propose we add a param, similar to what EVRP has which will
> >> allow us to choose which pass is called for VRP1 and VRP2 and set our
> >> defaults appropriately.  I wouldn't work with a hybrid like we did with
> >> EVRP... just choose which pass runs. And we'll have to adjust some
> >> testcases based one whatever our default is.
> >>
> >> Thoughts?
> >>
> >> Personally I think we give option 1 a go and if something shows up over
> >> the next couple of months, or  we cant get performance in line with
> >> where we want it, then we can switch back to VRP for one or both
> >> passes.  I wouldn't  expect either, but one never knows :-)
> >>
> >> If that isn't palatable for everyone, then I'd suggest option 2
> > How far are you with handling the symbolic range cases VRP gets
> > with the relation work?  The 

Re: [RFC] Replace VRP with EVRP passes

2021-10-14 Thread Andrew MacLeod via Gcc-patches

On 10/14/21 4:27 AM, Richard Biener wrote:

On Wed, Oct 13, 2021 at 10:58 PM Andrew MacLeod  wrote:

As work has progressed, we're pretty close to being able to functionally
replace VRP with another EVRP pass.  At least it seems close enough that
we should discuss if thats something we might want to consider for this
release.   Replacing just one of the 2 VRP passes is another option.

First, lets examine simplifications/folds.

Running over my set of 380 GCC source files, we see the following
results for number of cases we currently get:

Number of EVRP cases :   5789
Number of VRP1 cases :   4913
Number of VRP2 cases :   279
   combined VRP1/2:   5192

The 2 passes of VRP get a total of 5192 cases.

If we run EVRP instead of each VRP pass, we get the following results:

Number of EVRP1 cases :   5789
Number of EVRP2 cases :   7521
Number of EVRP3 cases :   2240
   combined EVRP2/3:   9761

so the EVRP passes find an additional 4569 opportunities,  or 88% more.
This initially surprised me until it occurred to me that this is
probably due to the pruning require for ASSERT_EXPRs, which means it
never had a chance to see some of those cases. Notice how the second
pass appears far more effective now.

Regarding what we would miss if we took VRP out, if we run  EVRP passes
first then a VRP pass immediately after, we see what VRP finds that EVRP
cannot:

Number of EVRP2 cases :  7521
Number of VRP1 cases :   11
Number of EVRP3 cases :  2269
Number of VRP2 cases :   54

I have looked at some of these, and so far they all appear to be cases
which are solved via the iteration model VRP uses.

Most should be now handled by using SCEV analysis on PHIs rather
than VRP iteration so can you share an example where the iteration helps?


I didn't delve too deeply since I was skimming for whats missing, and 
they all seemed to be in very large programs with cyclic phis feeding 
each other.


I'll pick one shortly and do a deeper dive.





regardless, missing
65 cases and getting 4569 new ones would seem to be a win. I will
continue to investigate them.

== Performance ==

The threading work has been pulled out of VRP, so we get a better idea
of what VRPs time really is. we're showing about a 58% slowdown in VRP
over the 2 passes.  I've begun investigating because it shouldn't be off
by that much, Im seeing a lot of excess time being wasted with callback
queries from the substitute_and_fold engine when processing PHIs.  It
should be possible to bring this all back in line as that isnt the model
ranger should be using anyway.

I figured while I'm looking into the performance side of it, maybe we
should start talking about whether we want to replace one or both of the
VRP passes with an EVRP instance.


I see 3 primary options:
1 - replace both VRP passes with EVRP instances.
2 - replace VRP2 with EVRP2
3 - Replace neither, leave it as is.

I figure since the second pass of VRP doesn't get a lot to start with,
it probably doesn't make sense to replace VRP1 and not VRP2.

Option 1 is what I would expect the strategic move next release to be,
it seems ready now, its just a matter of whether we want to give it more
time.  It would also be trivial to turn VRP back on for one for both
later in the cycle if we determines there was something important missing.

option 2 is something we ought to really consider if we don't want to do
option 1.  There are a few PRs that are starting to open that have VRP
not getting something due to the whims of more precise mutli-ranges
being converted back to a value_range, and replacing VRP2 would allow us
to catch those..  plus, we pick up a lot more than  VRP2 does.

I would propose we add a param, similar to what EVRP has which will
allow us to choose which pass is called for VRP1 and VRP2 and set our
defaults appropriately.  I wouldn't work with a hybrid like we did with
EVRP... just choose which pass runs. And we'll have to adjust some
testcases based one whatever our default is.

Thoughts?

Personally I think we give option 1 a go and if something shows up over
the next couple of months, or  we cant get performance in line with
where we want it, then we can switch back to VRP for one or both
passes.  I wouldn't  expect either, but one never knows :-)

If that isn't palatable for everyone, then I'd suggest option 2

How far are you with handling the symbolic range cases VRP gets
with the relation work?  The symbolic range handling is important
for Ada IIRC.  I would of course have hoped the testsuite would
catch important regressions there (did you test Ada?)


I always test ada, objc.. anything that can be built as part of my 
normal build :-)  so yes.


As far as I am aware, we are not missing any symbolic/relational cases, 
that has all been functioning for a couple of months.




I think option 2 would be safest at least so any important iteration/symbolic
work is done early.


Certainly works as a good first step.


Re: [RFC] Replace VRP with EVRP passes

2021-10-14 Thread Richard Biener via Gcc-patches
On Wed, Oct 13, 2021 at 10:58 PM Andrew MacLeod  wrote:
>
> As work has progressed, we're pretty close to being able to functionally
> replace VRP with another EVRP pass.  At least it seems close enough that
> we should discuss if thats something we might want to consider for this
> release.   Replacing just one of the 2 VRP passes is another option.
>
> First, lets examine simplifications/folds.
>
> Running over my set of 380 GCC source files, we see the following
> results for number of cases we currently get:
>
> Number of EVRP cases :   5789
> Number of VRP1 cases :   4913
> Number of VRP2 cases :   279
>   combined VRP1/2:   5192
>
> The 2 passes of VRP get a total of 5192 cases.
>
> If we run EVRP instead of each VRP pass, we get the following results:
>
> Number of EVRP1 cases :   5789
> Number of EVRP2 cases :   7521
> Number of EVRP3 cases :   2240
>   combined EVRP2/3:   9761
>
> so the EVRP passes find an additional 4569 opportunities,  or 88% more.
> This initially surprised me until it occurred to me that this is
> probably due to the pruning require for ASSERT_EXPRs, which means it
> never had a chance to see some of those cases. Notice how the second
> pass appears far more effective now.
>
> Regarding what we would miss if we took VRP out, if we run  EVRP passes
> first then a VRP pass immediately after, we see what VRP finds that EVRP
> cannot:
>
> Number of EVRP2 cases :  7521
> Number of VRP1 cases :   11
> Number of EVRP3 cases :  2269
> Number of VRP2 cases :   54
>
> I have looked at some of these, and so far they all appear to be cases
> which are solved via the iteration model VRP uses.

Most should be now handled by using SCEV analysis on PHIs rather
than VRP iteration so can you share an example where the iteration helps?

> regardless, missing
> 65 cases and getting 4569 new ones would seem to be a win. I will
> continue to investigate them.
>
> == Performance ==
>
> The threading work has been pulled out of VRP, so we get a better idea
> of what VRPs time really is. we're showing about a 58% slowdown in VRP
> over the 2 passes.  I've begun investigating because it shouldn't be off
> by that much, Im seeing a lot of excess time being wasted with callback
> queries from the substitute_and_fold engine when processing PHIs.  It
> should be possible to bring this all back in line as that isnt the model
> ranger should be using anyway.
>
> I figured while I'm looking into the performance side of it, maybe we
> should start talking about whether we want to replace one or both of the
> VRP passes with an EVRP instance.
>
>
> I see 3 primary options:
> 1 - replace both VRP passes with EVRP instances.
> 2 - replace VRP2 with EVRP2
> 3 - Replace neither, leave it as is.
>
> I figure since the second pass of VRP doesn't get a lot to start with,
> it probably doesn't make sense to replace VRP1 and not VRP2.
>
> Option 1 is what I would expect the strategic move next release to be,
> it seems ready now, its just a matter of whether we want to give it more
> time.  It would also be trivial to turn VRP back on for one for both
> later in the cycle if we determines there was something important missing.
>
> option 2 is something we ought to really consider if we don't want to do
> option 1.  There are a few PRs that are starting to open that have VRP
> not getting something due to the whims of more precise mutli-ranges
> being converted back to a value_range, and replacing VRP2 would allow us
> to catch those..  plus, we pick up a lot more than  VRP2 does.
>
> I would propose we add a param, similar to what EVRP has which will
> allow us to choose which pass is called for VRP1 and VRP2 and set our
> defaults appropriately.  I wouldn't work with a hybrid like we did with
> EVRP... just choose which pass runs. And we'll have to adjust some
> testcases based one whatever our default is.
>
> Thoughts?
>
> Personally I think we give option 1 a go and if something shows up over
> the next couple of months, or  we cant get performance in line with
> where we want it, then we can switch back to VRP for one or both
> passes.  I wouldn't  expect either, but one never knows :-)
>
> If that isn't palatable for everyone, then I'd suggest option 2

How far are you with handling the symbolic range cases VRP gets
with the relation work?  The symbolic range handling is important
for Ada IIRC.  I would of course have hoped the testsuite would
catch important regressions there (did you test Ada?)

I think option 2 would be safest at least so any important iteration/symbolic
work is done early.

Thanks,
Richard.

> Andrew
>
>
>
>
>
>


[RFC] Replace VRP with EVRP passes

2021-10-13 Thread Andrew MacLeod via Gcc-patches
As work has progressed, we're pretty close to being able to functionally 
replace VRP with another EVRP pass.  At least it seems close enough that 
we should discuss if thats something we might want to consider for this 
release.   Replacing just one of the 2 VRP passes is another option.


First, lets examine simplifications/folds.

Running over my set of 380 GCC source files, we see the following 
results for number of cases we currently get:


Number of EVRP cases :   5789
Number of VRP1 cases :   4913
Number of VRP2 cases :   279
 combined VRP1/2:   5192

The 2 passes of VRP get a total of 5192 cases.

If we run EVRP instead of each VRP pass, we get the following results:

Number of EVRP1 cases :   5789
Number of EVRP2 cases :   7521
Number of EVRP3 cases :   2240
 combined EVRP2/3:   9761

so the EVRP passes find an additional 4569 opportunities,  or 88% more. 
This initially surprised me until it occurred to me that this is 
probably due to the pruning require for ASSERT_EXPRs, which means it 
never had a chance to see some of those cases. Notice how the second 
pass appears far more effective now.


Regarding what we would miss if we took VRP out, if we run  EVRP passes 
first then a VRP pass immediately after, we see what VRP finds that EVRP 
cannot:


Number of EVRP2 cases :  7521
Number of VRP1 cases :   11
Number of EVRP3 cases :  2269
Number of VRP2 cases :   54

I have looked at some of these, and so far they all appear to be cases 
which are solved via the iteration model VRP uses. regardless, missing 
65 cases and getting 4569 new ones would seem to be a win. I will 
continue to investigate them.


== Performance ==

The threading work has been pulled out of VRP, so we get a better idea 
of what VRPs time really is. we're showing about a 58% slowdown in VRP 
over the 2 passes.  I've begun investigating because it shouldn't be off 
by that much, Im seeing a lot of excess time being wasted with callback 
queries from the substitute_and_fold engine when processing PHIs.  It 
should be possible to bring this all back in line as that isnt the model 
ranger should be using anyway.


I figured while I'm looking into the performance side of it, maybe we 
should start talking about whether we want to replace one or both of the 
VRP passes with an EVRP instance.



I see 3 primary options:
1 - replace both VRP passes with EVRP instances.
2 - replace VRP2 with EVRP2
3 - Replace neither, leave it as is.

I figure since the second pass of VRP doesn't get a lot to start with, 
it probably doesn't make sense to replace VRP1 and not VRP2.


Option 1 is what I would expect the strategic move next release to be, 
it seems ready now, its just a matter of whether we want to give it more 
time.  It would also be trivial to turn VRP back on for one for both 
later in the cycle if we determines there was something important missing.


option 2 is something we ought to really consider if we don't want to do 
option 1.  There are a few PRs that are starting to open that have VRP 
not getting something due to the whims of more precise mutli-ranges 
being converted back to a value_range, and replacing VRP2 would allow us 
to catch those..  plus, we pick up a lot more than  VRP2 does.


I would propose we add a param, similar to what EVRP has which will 
allow us to choose which pass is called for VRP1 and VRP2 and set our 
defaults appropriately.  I wouldn't work with a hybrid like we did with 
EVRP... just choose which pass runs.     And we'll have to adjust some 
testcases based one whatever our default is.


Thoughts?

Personally I think we give option 1 a go and if something shows up over 
the next couple of months, or  we cant get performance in line with 
where we want it, then we can switch back to VRP for one or both 
passes.  I wouldn't  expect either, but one never knows :-)


If that isn't palatable for everyone, then I'd suggest option 2

Andrew