On Mon, Jun 24, 2024 at 7:35 PM Andrew Pinski <pins...@gmail.com> wrote:
>
> On Mon, Jun 24, 2024 at 7:20 PM Andrew MacLeod <amacl...@redhat.com> wrote:
> >
> >
> > On 6/22/24 09:15, Richard Biener wrote:
> > > On Fri, Jun 21, 2024 at 3:02 PM Andrew MacLeod <amacl...@redhat.com> 
> > > wrote:
> > >> This patch adds
> > >>
> > >>       --param=vrp-block-limit=N
> > >>
> > >> When the basic block counter for a function exceeded 'N' , VRP is
> > >> invoked with the new fast_vrp algorithm instead.   This algorithm uses a
> > >> lot less memory and processing power, although it does get a few less
> > >> things.
> > >>
> > >> Primary motivation is cases like
> > >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114855 in which the 3  VRP
> > >> passes consume about 600 seconds of the compile time, and a lot of
> > >> memory.      With fast_vrp, it spends less than 10 seconds total in the
> > >> 3 passes of VRP.     This test case has about 400,000 basic blocks.
> > >>
> > >> The default for N in this patch is 150,000,  arbitrarily chosen.
> > >>
> > >> This bootstraps, (and I bootstrapped it with --param=vrp-block-limit=0
> > >> as well) on x86_64-pc-linux-gnu, with no regressions.
> > >>
> > >> What do you think, OK for trunk?
> > > +      if (last_basic_block_for_fn (fun) > param_vrp_block_limit ||
> > > +         &data == &pass_data_fast_vrp)
> > >
> > > || goes to the next line.
> > >
> > > Btw, we have -Wdisabled-optimization for these cases which should
> > > say sth like "function has excess of %d number of basic blocks
> > > (--param vrp-block-limit=%d), using fast VRP algorithm"
> > > or so in this case.
> > >
> > > As I wrote in the PR the priority should be -O1 compile-time
> > > performance and memory use.
> >
> >
> > Yeah, I just wanted to use it as a model for "bad" cases for ranger.
> > Adjusted patch attached which now issues the warning.
> >
> > I also found that the transitive relations were causing a small blowup
> > in time for relation processing now that I turned relations on for fast
> > VRP.  I commited a patch and fast_vrp no longer does transitives.
> >
> > If you want to experiment with enabling fast VRP at -O1, it should be
> > fast all the time now.  I think :-)    This testcase runs in about 95
> > seconds on my test machine.  if I turn on VRP, a single VRP pass takes
> > about 2.5 seconds.    Its all set up, you can just add:
> >
> > NEXT_PASS (pass_fast_vrp);
> >
> > at an appropriate spot.
> >
> > > Richard.
> > >
> > >> Andrew
> > >>
> > >> PS sorry,. it doesn't help the threader in that PR :-(
> > > It's probably one of the known bottlenecks there - IIRC the path range
> > > queries make O(n^2) work.  I can look at this at some point as I've
> > > dealt with the slowness of this pass in the past.
> > >
> > > There is param_max_jump_thread_paths that should limit searching
> > > but there is IIRC no way to limit the work path ranger actually does
> > > when resolving the query.
> > >
> > Yeah, Id like to talk to Aldy about revamping the threader now that some
> > of the newer facilities are available that fast_vrp uses.
> >
> > We can calculate all the outgoing ranges for a block at once with :
> >
> >    // Fill ssa-cache R with any outgoing ranges on edge E, using QUERY.
> >    bool gori_on_edge (class ssa_cache &r, edge e, range_query *query =
> > NULL);
> >
> > This is what the fast_vrp routines uses.  We can gather all range
> > restrictions generated from an edge efficiently just once and then
> > intersect them with a known range as we walk the different paths. We
> > don't need the gori exports , nor any of the other on-demand bits where
> > we calculate each export range dynamically.. I suspect it would reduce
> > the workload and memory impact quite a bit, but I'm not really familiar
> > with exactly how the threader uses those things.
> >
> > It'd require some minor tweaking to the lazy_ssa_cache to make the
> > bitmap of names set accessible. This  would provide similar
> > functionality to what the gori export () routine provides.  Both
> > relations and inferred ranges should only need to be calculated once per
> > block as well and could/should/would be applied the same way if they are
> > present.   I don't *think* the threader uses any of the def chains, but
> > Aldy can chip in.
>
> +   warning (OPT_Wdisabled_optimization,
> +    "Using fast VRP algorithm. %d basic blocks"
> +    " exceeds %s%d limit",
> +    n_basic_blocks_for_fn (fun),
> +    "--param=vrp-block-limit=",
> +    param_vrp_block_limit);
>
> This should be:
> warning (OPT_Wdisabled_optimization, "Using fast VRP algorithm. %d basic 
> blocks"
>     " exceeds %<%--param=vrp-block-limit=d%> limit",
> n_basic_blocks_for_fn (fun), param_vrp_block_limit);
>
> I had thought it was mentioned that options should be quoted but it is
> not mentioned in the coding conventions:
> https://gcc.gnu.org/codingconventions.html#Diagnostics
>
> But it is mentioned in
> https://inbox.sourceware.org/gcc/2d2bd844-2de4-ecff-7a07-b22350750...@gmail.com/
> ; This is why you were getting an error as you mentioned on IRC.

Note I filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115627 to
add it to the coding conventions. I might take a stab at adding the
rules mentioned in
https://inbox.sourceware.org/gcc-patches/8ac62fe2-e4bf-0922-4947-fca9567a0...@gmail.com/
since those are the ones which are detected right now.

Thanks,
Andrew

>
>
> >
> > Andrew

Reply via email to