https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928

--- Comment #36 from Richard Biener <rguenth at gcc dot gnu.org> ---
So the issue is still the same - one thing I noticed is that store-motion also
adds a flag for each counter update to avoid introducing store-data-races.
-fallow-store-data-races mitigates that part and speeds up the compilation
quite a bit.  In case there are threads involved you'd want
-fprofile-update=atomic
which then causes store-motion to give up and the compile-time is great
overall.

The original trigger of the regression is likely the marking of the profile
counters as to not be aliased - we might want to introduce another flag to
tell that store-data-races for the particular decl are not a consideration
(maybe even have some user-visible attribute for this).

Otherwise re-confirmed (I stripped options down to -O -fPIC -fprofile-arcs
-ftest-coverage):

rguenther@ryzen:/tmp> /usr/bin/time ~/install/gcc-11.0/usr/local/bin/gcc -S -O
-fPIC -fprofile-arcs -ftest-coverage fib-2.o1-fib-2.i
1.84user 0.05system 0:01.90elapsed 99%CPU (0avgtext+0avgdata
160764maxresident)k
0inputs+0outputs (0major+58129minor)pagefaults 0swaps
rguenther@ryzen:/tmp> /usr/bin/time ~/install/gcc-11.0/usr/local/bin/gcc -S -O
-fPIC -fprofile-arcs -ftest-coverage fib-3.o1-fib-3.i 
10.15user 0.17system 0:10.32elapsed 99%CPU (0avgtext+0avgdata
726688maxresident)k
0inputs+0outputs (0major+265008minor)pagefaults 0swaps
rguenther@ryzen:/tmp> /usr/bin/time ~/install/gcc-11.0/usr/local/bin/gcc -S -O
-fPIC -fprofile-arcs -ftest-coverage fib-4.o1-fib-4.i 
43.60user 1.06system 0:44.68elapsed 99%CPU (0avgtext+0avgdata
6107260maxresident)k
0inputs+0outputs (0major+1765217minor)pagefaults 0swaps
rguenther@ryzen:/tmp> /usr/bin/time ~/install/gcc-11.0/usr/local/bin/gcc -S -O
-fPIC -fprofile-arcs -ftest-coverage fib-5.o1-fib-5.i 
gcc: fatal error: Killed signal terminated program cc1
compilation terminated.
Command exited with non-zero status 1
143.09user 3.93system 2:28.29elapsed 99%CPU (0avgtext+0avgdata
24636148maxresident)k
37504inputs+0outputs (31major+6133278minor)pagefaults 0swaps

on the last which runs OOM adding -fallow-store-data-races does

rguenther@ryzen:/tmp> /usr/bin/time ~/install/gcc-11.0/usr/local/bin/gcc -S -O
-fPIC -fprofile-arcs -ftest-coverage fib-5.o1-fib-5.i -fallow-store-data-races
123.06user 0.45system 2:03.59elapsed 99%CPU (0avgtext+0avgdata
1777700maxresident)k
57304inputs+0outputs (68major+535127minor)pagefaults 0swaps

and -fprofile-update=atomic

rguenther@ryzen:/tmp> /usr/bin/time ~/install/gcc-11.0/usr/local/bin/gcc -S -O
-fPIC -fprofile-arcs -ftest-coverage fib-5.o1-fib-5.i -fprofile-update=atomic 
0.61user 0.02system 0:00.63elapsed 100%CPU (0avgtext+0avgdata
73236maxresident)k
72inputs+0outputs (0major+18284minor)pagefaults 0swaps

and -fno-tree-loop-im

rguenther@ryzen:/tmp> /usr/bin/time ~/install/gcc-11.0/usr/local/bin/gcc -S -O
-fPIC -fprofile-arcs -ftest-coverage fib-5.o1-fib-5.i -fno-tree-loop-im      
1.06user 0.01system 0:01.07elapsed 99%CPU (0avgtext+0avgdata 90672maxresident)k
0inputs+0outputs (0major+24331minor)pagefaults 0swaps

I still wonder if you can produce an even smaller testcase where visualizing
the CFG is possible.  Unfortunately the source is mechanically generated
and following it is hard.  Like a testcase that retains the basic structure
but ends up with just a few (2, less than 10) computed gotos?

Reply via email to