On 10/13/2021 2:58 PM, Andrew MacLeod wrote:
As work has progressed, we're pretty close to being able to
functionally replace VRP with another EVRP pass. At least it seems
close enough that we should discuss if thats something we might want
to consider for this release. Replacing just one of the 2 VRP passes
is another option.
First, lets examine simplifications/folds.
Running over my set of 380 GCC source files, we see the following
results for number of cases we currently get:
Number of EVRP cases : 5789
Number of VRP1 cases : 4913
Number of VRP2 cases : 279
combined VRP1/2: 5192
The 2 passes of VRP get a total of 5192 cases.
If we run EVRP instead of each VRP pass, we get the following results:
Number of EVRP1 cases : 5789
Number of EVRP2 cases : 7521
Number of EVRP3 cases : 2240
combined EVRP2/3: 9761
so the EVRP passes find an additional 4569 opportunities, or 88%
more. This initially surprised me until it occurred to me that this is
probably due to the pruning require for ASSERT_EXPRs, which means it
never had a chance to see some of those cases. Notice how the second
pass appears far more effective now.
Wow. It'd be nice to know for sure if it's the pruning, but the data is
pretty clear, EVRP does a considerably better job.
Regarding what we would miss if we took VRP out, if we run EVRP
passes first then a VRP pass immediately after, we see what VRP finds
that EVRP cannot:
Number of EVRP2 cases : 7521
Number of VRP1 cases : 11
Number of EVRP3 cases : 2269
Number of VRP2 cases : 54
I have looked at some of these, and so far they all appear to be cases
which are solved via the iteration model VRP uses. regardless, missing
65 cases and getting 4569 new ones would seem to be a win. I will
continue to investigate them.
Well, that would seem to support our long-held belief that the iteration
step just isn't that important. Good to finally get a confirmation.
== Performance ==
The threading work has been pulled out of VRP, so we get a better idea
of what VRPs time really is. we're showing about a 58% slowdown in VRP
over the 2 passes. I've begun investigating because it shouldn't be
off by that much, Im seeing a lot of excess time being wasted with
callback queries from the substitute_and_fold engine when processing
PHIs. It should be possible to bring this all back in line as that
isnt the model ranger should be using anyway.
I figured while I'm looking into the performance side of it, maybe we
should start talking about whether we want to replace one or both of
the VRP passes with an EVRP instance.
I see 3 primary options:
1 - replace both VRP passes with EVRP instances.
2 - replace VRP2 with EVRP2
3 - Replace neither, leave it as is.
I figure since the second pass of VRP doesn't get a lot to start with,
it probably doesn't make sense to replace VRP1 and not VRP2.
Option 1 is what I would expect the strategic move next release to be,
it seems ready now, its just a matter of whether we want to give it
more time. It would also be trivial to turn VRP back on for one for
both later in the cycle if we determines there was something important
missing.
option 2 is something we ought to really consider if we don't want to
do option 1. There are a few PRs that are starting to open that have
VRP not getting something due to the whims of more precise
mutli-ranges being converted back to a value_range, and replacing VRP2
would allow us to catch those.. plus, we pick up a lot more than
VRP2 does.
I would propose we add a param, similar to what EVRP has which will
allow us to choose which pass is called for VRP1 and VRP2 and set our
defaults appropriately. I wouldn't work with a hybrid like we did
with EVRP... just choose which pass runs. And we'll have to adjust
some testcases based one whatever our default is.
Thoughts?
Personally I think we give option 1 a go and if something shows up
over the next couple of months, or we cant get performance in line
with where we want it, then we can switch back to VRP for one or both
passes. I wouldn't expect either, but one never knows :-)
If that isn't palatable for everyone, then I'd suggest option 2
I'd be happy with #1 or #2. The only one I don't like is #3.
Jeff