> AFAICT revision 172430 fixed the original problem in pr45810:
> 
> gfc -Ofast -fwhole-program fatigue.f90       : 6.301u 0.003s 0:06.30
> gfc -Ofast -fwhole-program -flto fatigue.f90 : 6.263u 0.003s 0:06.26
> 
> However if I play with --param max-inline-insns-auto=*, I get
> 
> gfc -Ofast -fwhole-program --param max-inline-insns-auto=124 -fstack-arrays 
> fatigue.f90 : 4.870u 0.002s 0:04.87
> gfc -Ofast -fwhole-program --param max-inline-insns-auto=125 -fstack-arrays 
> fatigue.f90 : 2.872u 0.002s 0:02.87
> 
> and
> 
> gfc -Ofast -fwhole-program -flto --param max-inline-insns-auto=515 
> -fstack-arrays fatigue.f90 : 4.965u 0.003s 0:04.97
> gfc -Ofast -fwhole-program -flto --param max-inline-insns-auto=516 
> -fstack-arrays fatigue.f90 : 2.732u 0.002s 0:02.73
> 
> while I get the same threshold=125 with/without -flto at revision 172429.
> Note that I get the same thresholds without -fstack-arrays, the run times
> are only larger.

Thanks for notice.   This was not really expected, but seems to give some
insight.  I just tested a new cleanup patch of mine where I fixed few minor
bugs in side corners.  One of those bugs I noticed was introduced by this patch
(an overlook while converting the code to new accesor).

In case of nested inlining, the stack usage got misaccounted and consequently
we allowed more inlining than --param large-stack-frame-growth would allow 
normally.
The vortex and wupwise improvement seems to be gone, so I think they are due to 
this
issue.

I never really tuned the stack frame growth heuristics since it did not cause 
any problems
in the benchmarks. On fortran this is quite different because of the large i/o 
blocks
hitting it very commonly, so I will look into making it more permissive.  We 
definitely
can just bump up the limits and/or we can also teach it that if call dominates 
the return
there is not really much to save of stack usage by preventing inlining since 
both stack
frames will wind up on the stack anyway.

This means adding new bit whether call edge dominate exit and using this info. 
Also simple
noreturn IPA discovery can be based on this and I recently noticed it might be 
important
for Mozilla. So I will give it a try soonish.

I will also look into the estimate_size ICE reported today.

Honza

Reply via email to