[Bug tree-optimization/103409] [12 Regression] 18% SPEC2017 WRF compile-time regression with -O2 -flto since r12-5228-gb7a23949b0dcc4205fcc2be6b84b91441faa384d

2021-12-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103409

Jan Hubicka  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #15 from Jan Hubicka  ---
the first graph seems to be back to normal and I think the second is withing
noise range. If not I will try to figure out what happens here (one is -O2 and
other -Ofast so there may be something in that).

[Bug tree-optimization/103409] [12 Regression] 18% SPEC2017 WRF compile-time regression with -O2 -flto since r12-5228-gb7a23949b0dcc4205fcc2be6b84b91441faa384d

2021-12-03 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103409

--- Comment #14 from Aldy Hernandez  ---
(In reply to hubicka from comment #13)
> > I've fixed the threading slowdown.  Can someone verify and close this PR if 
> > all
> > the slowdown has been accounted for?  If not, then someone needs to explore 
> > any
> > slowdown unrelated to the threader.
> The plots linked from the PR are live, so they should come back to
> original speed (so far they did not).
> 
> https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=226.548.8
> and
> https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=287.548.8

There's now a big drop for the first graph, and a small drop for the second
one.

[Bug tree-optimization/103409] [12 Regression] 18% SPEC2017 WRF compile-time regression with -O2 -flto since r12-5228-gb7a23949b0dcc4205fcc2be6b84b91441faa384d

2021-12-01 Thread hubicka at kam dot mff.cuni.cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103409

--- Comment #13 from hubicka at kam dot mff.cuni.cz ---
> I've fixed the threading slowdown.  Can someone verify and close this PR if 
> all
> the slowdown has been accounted for?  If not, then someone needs to explore 
> any
> slowdown unrelated to the threader.
The plots linked from the PR are live, so they should come back to
original speed (so far they did not).

https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=226.548.8
and
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=287.548.8

[Bug tree-optimization/103409] [12 Regression] 18% SPEC2017 WRF compile-time regression with -O2 -flto since r12-5228-gb7a23949b0dcc4205fcc2be6b84b91441faa384d

2021-12-01 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103409

--- Comment #12 from Aldy Hernandez  ---
I've fixed the threading slowdown.  Can someone verify and close this PR if all
the slowdown has been accounted for?  If not, then someone needs to explore any
slowdown unrelated to the threader.

[Bug tree-optimization/103409] [12 Regression] 18% SPEC2017 WRF compile-time regression with -O2 -flto since r12-5228-gb7a23949b0dcc4205fcc2be6b84b91441faa384d

2021-12-01 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103409

--- Comment #11 from CVS Commits  ---
The master branch has been updated by Aldy Hernandez :

https://gcc.gnu.org/g:54ebec35abec09a24b47b997172622ca0d8e2318

commit r12-5694-g54ebec35abec09a24b47b997172622ca0d8e2318
Author: Aldy Hernandez 
Date:   Mon Nov 29 14:49:59 2021 +0100

path solver: Use only one ssa_global_cache.

We're using a temporary range cache while computing ranges for PHIs to
make sure the real cache doesn't get set until all PHIs are computed.
With the ltrans beast in LTO mode this causes undue overhead.

Since we already have a bitmap to indicate whether there's a cache
entry, we can avoid the extra cache object by clearing it while PHIs
are being calculated.

gcc/ChangeLog:

PR tree-optimization/103409
* gimple-range-path.cc (path_range_query::compute_ranges_in_phis):
Do all the work with just one ssa_global_cache.
* gimple-range-path.h: Remove m_tmp_phi_cache.

[Bug tree-optimization/103409] [12 Regression] 18% SPEC2017 WRF compile-time regression with -O2 -flto since r12-5228-gb7a23949b0dcc4205fcc2be6b84b91441faa384d

2021-11-29 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103409

--- Comment #10 from Aldy Hernandez  ---
Created attachment 51896
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51896=edit
untested patch

The threading slowdown here is due to the ssa_global_cache temporary.  It
doesn't look like ssa_global_cache was meant to be lightweight temporary cache
;-).

We can avoid the temporary altogether by using the bitmap already used to
determine if a cache entry is available.  With this (untested) patch the
ltrans42 unit is back to:

 tree VRP   :  13.70 (  3%)   0.04 (  2%)  13.71 (  3%)
   45M (  4%)
 backwards jump threading   :  13.22 (  3%)   0.01 (  0%)  13.26 (  3%)
 3609k (  0%)

[Bug tree-optimization/103409] [12 Regression] 18% SPEC2017 WRF compile-time regression with -O2 -flto since r12-5228-gb7a23949b0dcc4205fcc2be6b84b91441faa384d

2021-11-29 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103409

--- Comment #9 from Aldy Hernandez  ---
There's definitely something in the threader, but I'm not sure it's the cause
of all the regression.

For the record, I've reproduced on ppc64le with a spec .cfg file having:

OPTIMIZE= -O2 -flto=100 -save-temps -ftime-report -v -fno-checking

The slow wrf_r.ltransNN.o files that dominate the compilation and are taking
more than 2-3 seconds are (42, 76, and 24).  I've distilled -ftime-report for
VRP and jump threading, which usually go hand in hand now that VRP2 runs with
ranger:

dumping.42: tree VRP   :  13.70 (  3%)   0.08 (  2%) 
13.73 (  3%)45M (  4%)
dumping.42: backwards jump threading   :  26.68 (  5%)   0.00 (  0%) 
26.72 (  5%)  3609k (  0%)
dumping.42: TOTAL  : 524.00  3.31   
527.30 1277M
dumping.76: tree VRP   :  38.30 ( 13%)   0.03 (  2%) 
38.31 ( 13%)19M (  2%)
dumping.76: backwards jump threading   :  47.38 ( 17%)   0.01 (  1%) 
47.37 ( 16%)  1671k (  0%)
dumping.76: TOTAL  : 286.03  1.79   
287.82 1173M
dumping.24: tree VRP   :  87.43 (  8%)   0.07 (  2%) 
87.53 (  8%)58M (  3%)
dumping.24: backwards jump threading   : 129.81 ( 12%)   0.00 (  0%)
129.81 ( 12%)  8986k (  0%)
dumping.24: TOTAL  :1042.37  3.58  
1045.93 2325M

Threading is usually more expensive than VRP because it tries candidates over
and over, but it's not meant to be orders of magnitude slower.  Prior to the
bisected patch in r12-5228, we had:

dumping.42: tree VRP   :  14.58 (  3%)   0.07 (  2%) 
14.62 (  3%)45M (  4%)
dumping.42: backwards jump threading   :  13.88 (  3%)   0.00 (  0%) 
13.89 (  3%)  3609k (  0%)
dumping.42: TOTAL  : 484.12  3.06   
487.18 1277M
dumping.76: tree VRP   :  37.68 ( 13%)   0.04 (  2%) 
37.79 ( 13%)19M (  2%)
dumping.76: backwards jump threading   :  45.50 ( 15%)   0.03 (  2%) 
45.52 ( 15%)  1671k (  0%)
dumping.76: TOTAL  : 293.74  1.81   
295.55 1173M
dumping.24: tree VRP   :  94.27 (  9%)   0.11 (  3%) 
94.39 (  9%)58M (  3%)
dumping.24: backwards jump threading   : 102.63 ( 10%)   0.02 (  0%)
102.67 ( 10%)  8986k (  0%)
dumping.24: TOTAL  :1021.66  4.28  
1025.92 2325M

So at least for ltrans42, there's a big slowdown with this patch.  Before,
threading was 4.80% faster than VRP, whereas now it's 94.7% slower.

I have a patch for the above slowdown, but I wouldn't characterize the above
difference as a "compile hog".  When I add up the 3 ltrans unit totals (which
are basically the entire compilation), the difference is a 3% slowdown.

If this PR is for a larger than 3-4% slowdown, I think we should look
elsewhere.  I could be wrong though ;-).

[Bug tree-optimization/103409] [12 Regression] 18% SPEC2017 WRF compile-time regression with -O2 -flto since r12-5228-gb7a23949b0dcc4205fcc2be6b84b91441faa384d

2021-11-26 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103409

Jan Hubicka  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=102943

--- Comment #8 from Jan Hubicka  ---
thanks for bisecting!
So not modref, but jump threading this itme. Linking with the other PR on WRF
and threading (perhaps those are different issues).

[Bug tree-optimization/103409] [12 Regression] 18% SPEC2017 WRF compile-time regression with -O2 -flto since r12-5228-gb7a23949b0dcc4205fcc2be6b84b91441faa384d

2021-11-26 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103409

Martin Liška  changed:

   What|Removed |Added

Summary|[12 Regression] 18% |[12 Regression] 18%
   |SPEC2017 WRF compile-time   |SPEC2017 WRF compile-time
   |regression with -O2 -flto   |regression with -O2 -flto
   |since   |since
   |r12-3903-g0288527f47cec669  |r12-5228-gb7a23949b0dcc4205
   ||fcc2be6b84b91441faa384d

--- Comment #7 from Martin Liška  ---
Ok, then it started with r12-5228-gb7a23949b0dcc4205fcc2be6b84b91441faa384d.