[Bug tree-optimization/79224] [7 Regression] Large C-Ray slowdown
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79224 Richard Biener changed: What|Removed |Added Target Milestone|--- |8.0 Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #26 from Richard Biener --- Fixed for GCC 8 I think.
[Bug tree-optimization/79224] [7 Regression] Large C-Ray slowdown
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79224 Jakub Jelinek changed: What|Removed |Added Target Milestone|8.5 |---
[Bug tree-optimization/79224] [7 Regression] Large C-Ray slowdown
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79224 Richard Biener changed: What|Removed |Added Status|NEW |ASSIGNED
[Bug tree-optimization/79224] [7 Regression] Large C-Ray slowdown
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79224 Jakub Jelinek changed: What|Removed |Added Target Milestone|8.4 |8.5 --- Comment #25 from Jakub Jelinek --- GCC 8.4.0 has been released, adjusting target milestone.
[Bug tree-optimization/79224] [7 Regression] Large C-Ray slowdown
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79224 Jakub Jelinek changed: What|Removed |Added Target Milestone|8.3 |8.4 --- Comment #24 from Jakub Jelinek --- GCC 8.3 has been released.
[Bug tree-optimization/79224] [7 Regression] Large C-Ray slowdown
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79224 Jakub Jelinek changed: What|Removed |Added Target Milestone|8.2 |8.3 --- Comment #23 from Jakub Jelinek --- GCC 8.2 has been released.
[Bug tree-optimization/79224] [7 Regression] Large C-Ray slowdown
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79224 Jakub Jelinek changed: What|Removed |Added Target Milestone|8.0 |8.2 --- Comment #22 from Jakub Jelinek --- GCC 8.1 has been released.
[Bug tree-optimization/79224] [7 Regression] Large C-Ray slowdown
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79224 Richard Biener changed: What|Removed |Added Priority|P1 |P2
[Bug tree-optimization/79224] [7 Regression] Large C-Ray slowdown
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79224 Richard Biener changed: What|Removed |Added Blocks||83665 --- Comment #21 from Richard Biener --- https://gcc.opensuse.org/gcc-old/c++bench-czerny/c-ray/ indeed shows it's fixed on trunk, likely by 2018-01-02 Richard Biener * ipa-inline.c (big_speedup_p): Fix expression. so let's watch if it regresses again if the fallout for this change is fixed... (PR83665). Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83665 [Bug 83665] [8 regression] Big code size regression and some code quality improvement at Jan 2 2018
[Bug tree-optimization/79224] [7 Regression] Large C-Ray slowdown
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79224 Jeffrey A. Law changed: What|Removed |Added Summary|[7/8 Regression] Large |[7 Regression] Large C-Ray |C-Ray slowdown |slowdown --- Comment #20 from Jeffrey A. Law --- 8 regression marker removed per Aldy's testing. Assumption is Jan and Yuri's work addressed the problem.
[Bug tree-optimization/79224] [7 Regression] Large C-Ray slowdown
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79224 --- Comment #12 from Jan Hubicka --- I know - it is the problem I mentioned earlier. ray_sphere has a conditional on parameter SP that decides whether it does extra work. In some cases ray_sphere is called with SP NULL. Now we compute the speedup by comparing the offline copy of ray_sphere (where we know nothing on SP value) to specialized inline version (where we know that SP is NULL). This makes us to account quite large speedup and prioritize the inline. This is wrong (and has been for a while) and I have patch to fix it next stage1, but it is not really stage 4 material :( Honza
[Bug tree-optimization/79224] [7 Regression] Large C-Ray slowdown
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79224 --- Comment #11 from Richard Biener --- I still see inlining differences (comparing to GCC 6). The profile looks like 33.18% c-ray-f-7 c-ray-f-7 [.] shade 28.18% c-ray-f-6 c-ray-f-6 [.] shade 11.50% c-ray-f-7 c-ray-f-7 [.] ray_sphere 9.32% c-ray-f-6 c-ray-f-6 [.] trace 7.40% c-ray-f-7 c-ray-f-7 [.] render 7.26% c-ray-f-6 c-ray-f-6 [.] render GCC 6: Inlining ray_sphere.constprop to shade with frequency 10 Inlining ray_sphere to trace with frequency 6169 Inlining get_sample_pos to get_primary_ray with frequency 1000 Inlining trace.constprop to render with frequency 10 Inlining ray_sphere to render with frequency 10 Inlining get_msec.part.0 to get_msec with frequency 390 GCC 7: Inlining get_sample_pos to get_primary_ray with frequency 1000 Inlining ray_sphere.constprop to shade with frequency 36274 Inlining trace to shade with frequency 505 Inlining ray_sphere to trace with frequency 3059 Inlining trace.constprop to render with frequency 10 Inlining get_primary_ray to render with frequency 10 Inlining get_sample_pos to render with frequency 10 Inlining ray_sphere to render with frequency 10 so the difference is that with GCC 6 we inline ray_shpere to trace (and that not into shade) while with GCC 7 we inline trace into shade but before inlining ray_sphere into trace. We know that for good performance inlining ray_sphere is critical and for some reason that's still not prioritized on trunk. Of course it's just a benchmark and using -fwhole-program fixes it on trunk (to faster than GCC 6 w/o -fwhole-program, GCC 6 with -fwhole-program actually regresses...).
[Bug tree-optimization/79224] [7 Regression] Large C-Ray slowdown
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79224 Richard Biener changed: What|Removed |Added Priority|P3 |P1 --- Comment #10 from Richard Biener --- The regression isn't fully fixed yet, we're only at most half-way there.
[Bug tree-optimization/79224] [7 Regression] Large C-Ray slowdown
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79224 --- Comment #9 from Jan Hubicka --- Most of the regression caused by the inlining difference is fixed now, but the solution is not ideal. According to Czerny we still have quite noticeable regression https://gcc.opensuse.org/c++bench-czerny/c-ray/ and code size is bigger. Morever we get noticeable code size growth in wave (808569->814865, 0.8%) and Botan (1780199->1789329) The other change is 160810 0.33 30089 13360 2281 160811 0.27 30089 12656 2560 Perhaps this patch might be a suspect, but I have no idea 2016-08-10 Yuri Rumyantsev PR tree-optimization/71734 * tree-ssa-loop-im.c (ref_indep_loop_p): Add new argument REF_LOOP, invoke ref_indep_loop_p_1. (outermost_indep_loop): Pass LOOP argumnet where REF was defined to ref_indep_loop_p. (ref_indep_loop_p_1): Fix commentary, add argument REF_LOOP, combine it with ref_indep_lopp_p_2, update SAFELEN if only REF is inside LOOP, do not cache dpendence value for loops with non-zero SAFELEN. (ref_indep_loop_p_2): Delete function. (can_sm_ref_p): Pass LOOP as additional argument to ref_indep_loop_p. One more important issue I noticed is that inline metric always compare the estimated runtime of offline copy with the runtime of specialized copy after inlining (with known constants and other context). This is OK for size metrics, but not OK for speed. The offline copy is run in the same context, in particular if some code is guarded by a conditional that is false, it is not executed and should not be acocunted to offline path. This makes the inline metric to be skewed toward inlining which eliminates large conditionals. I will fix that in next stage1, but I am not sure how much we can still do in current stage4. One observation is that overall runtime/size estimates after early opt has quite improved in last two releases where I did not do re-tunning of parameters. Perhaps it is time to tune down a bit early inlining and inline-insns-auto again...
[Bug tree-optimization/79224] [7 Regression] Large C-Ray slowdown
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79224 --- Comment #8 from Jan Hubicka --- Author: hubicka Date: Sat Feb 11 21:49:51 2017 New Revision: 245366 URL: https://gcc.gnu.org/viewcvs?rev=245366&root=gcc&view=rev Log: PR ipa/79224 * params.def (inline-min-speedup) Change from 10 to 8. Modified: trunk/gcc/ChangeLog trunk/gcc/params.def
[Bug tree-optimization/79224] [7 Regression] Large C-Ray slowdown
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79224 --- Comment #7 from Jan Hubicka --- Author: hubicka Date: Sat Feb 11 16:11:57 2017 New Revision: 245357 URL: https://gcc.gnu.org/viewcvs?rev=245357&root=gcc&view=rev Log: PR ipa/79224 * ipa-inline-analysis.c (get_minimal_bb): New function. (record_modified): Use it. (remap_edge_change_prob): Handle also ancestor functions. Modified: trunk/gcc/ChangeLog trunk/gcc/ipa-inline-analysis.c
[Bug tree-optimization/79224] [7 Regression] Large C-Ray slowdown
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79224 --- Comment #6 from Jan Hubicka --- This is all bit about luck. We have big_speedup hack that lets us to bypass inline-insns-auto when we know the combination caller+callee improve by given precentage. Because we inline more, caller is now bigger and slower and because we early optimize better callee is faster, so overall speedup is smaller. There are two extra issues with propagating. I will simply fix them and drop the big speedup percentage from 10% to 8%. Honza
[Bug tree-optimization/79224] [7 Regression] Large C-Ray slowdown
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79224 --- Comment #5 from Jan Hubicka --- The issue is that we no longer inline all calls to ray_sphere which is the inlining that matters. Declaring trace noinline or ray_sphere always_inline helps.
[Bug tree-optimization/79224] [7 Regression] Large C-Ray slowdown
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79224 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #4 from Jakub Jelinek --- So it seems r22 removes one stmt (e.g. in *.fnsplit dump): --- c-ray-f.i.048t.fnsplit.21 2017-01-25 09:13:45.0 -0500 +++ c-ray-f.i.048t.fnsplit.22 2017-01-25 09:14:20.0 -0500 @@ -1124,7 +1124,6 @@ shade (struct sphere * obj, struct spoin [1.29%]: ray.orig = sp_108(D)->pos; - ray.dir = sp_108(D)->vref; ray$dir$x_154 = MEM[(struct spoint *)sp_108(D) + 48B]; ray$dir$y_155 = MEM[(struct spoint *)sp_108(D) + 56B]; ray$dir$z_156 = MEM[(struct spoint *)sp_108(D) + 64B]; and that in turn changes the inlining decisions. In r21: Inlined into render which now has time 9181 and size 89,net change of -11. Inlined into shade which now has time 2920 and size 183,net change of -18. Inlined into render which now has time 17981 and size 141,net change of +52. Inlined into get_primary_ray which now has time 109 and size 65,net change of +36. Inlined into get_primary_ray which now has time 162 and size 102,net change of +37. Inlined into trace which now has time 301 and size 152,net change of +102. and in r22: Inlined into render which now has time 9181 and size 89,net change of -11. Inlined into shade which now has time 2918 and size 179,net change of -18. Inlined into render which now has time 17981 and size 141,net change of +52. Inlined into get_primary_ray which now has time 109 and size 65,net change of +36. Inlined into get_primary_ray which now has time 162 and size 102,net change of +37. Inlined into shade which now has time 2957 and size 216,net change of +37. Inlined into trace which now has time 301 and size 152,net change of +102. The difference is that trace has been inlined into shade.
[Bug tree-optimization/79224] [7 Regression] Large C-Ray slowdown
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79224 --- Comment #3 from Martin Liška --- Just for curiosity, all releases: 4.7.0 (93c5ebd73a4d1626)(22 Mar 2012 07:11): [took: 2.836s] result: OK Rendering took: 2 seconds (2531 milliseconds) 4.7.1 (0e3097e7d505b7be)(14 Jun 2012 08:32): [took: 2.828s] result: OK Rendering took: 2 seconds (2517 milliseconds) 4.7.2 (c9b304ada7111264)(20 Sep 2012 06:54): [took: 2.816s] result: OK Rendering took: 2 seconds (2507 milliseconds) 4.7.3 (f22940cb824859bd)(11 Apr 2013 07:57): [took: 2.835s] result: OK Rendering took: 2 seconds (2523 milliseconds) 4.7.4 (ae10eb82fe34c186)(12 Jun 2014 12:08): [took: 2.836s] result: OK Rendering took: 2 seconds (2514 milliseconds) 4.8.0 (e9c762ec4671d77e)(22 Mar 2013 10:05): [took: 2.621s] result: OK Rendering took: 2 seconds (2265 milliseconds) 4.8.1 (caa62b4636bfed71)(31 May 2013 09:02): [took: 2.619s] result: OK Rendering took: 2 seconds (2258 milliseconds) 4.8.2 (9bcca88e24e64d4e)(16 Oct 2013 07:20): [took: 2.647s] result: OK Rendering took: 2 seconds (2292 milliseconds) 4.8.3 (6bbf0dec66c0e719)(22 May 2014 09:10): [took: 2.675s] result: OK Rendering took: 2 seconds (2310 milliseconds) 4.8.4 (1a97fa0bb3fa5669)(19 Dec 2014 11:43): [took: 2.652s] result: OK Rendering took: 2 seconds (2291 milliseconds) 4.8.5 (cf82a597b0d18985)(23 Jun 2015 07:54): [took: 2.742s] result: OK Rendering took: 2 seconds (2380 milliseconds) 4.9.0 (a7aa383874520cd5)(22 Apr 2014 09:43): [took: 2.672s] result: OK Rendering took: 2 seconds (2291 milliseconds) 4.9.1 (c6fa1b4126635939)(16 Jul 2014 10:04): [took: 2.645s] result: OK Rendering took: 2 seconds (2269 milliseconds) 4.9.2 (c1283af40b65f1ad)(30 Oct 2014 08:27): [took: 2.610s] result: OK Rendering took: 2 seconds (2245 milliseconds) 4.9.3 (876d41ed80ce13e0)(26 Jun 2015 17:57): [took: 2.559s] result: OK Rendering took: 2 seconds (2198 milliseconds) 4.9.4 (d3191480f376c780)(03 Aug 2016 05:07): [took: 2.587s] result: OK Rendering took: 2 seconds (2223 milliseconds) 5.1.0 (d5ad84b309d0d97d)(22 Apr 2015 08:43): [took: 2.689s] result: OK Rendering took: 2 seconds (2282 milliseconds) 5.2.0 (7b26e3896e268cd4)(16 Jul 2015 09:13): [took: 2.695s] result: OK Rendering took: 2 seconds (2270 milliseconds) 5.3.0 (2bc376d60753a58b)(04 Dec 2015 10:45): [took: 2.635s] result: OK Rendering took: 2 seconds (2232 milliseconds) 5.4.0 (9d0507742960aa9f)(03 Jun 2016 08:41): [took: 2.650s] result: OK Rendering took: 2 seconds (2252 milliseconds) 6.1.0 (c441d9e8e0438dcf)(27 Apr 2016 08:20): [took: 2.650s] result: OK Rendering took: 2 seconds (2241 milliseconds) 6.2.0 (6ac74a62ba725829)(22 Aug 2016 08:01): [took: 2.630s] result: OK Rendering took: 2 seconds (2228 milliseconds) 6.3.0 (4b5e15daff8b5444)(21 Dec 2016 07:51): [took: 2.614s] result: OK Rendering took: 2 seconds (2214 milliseconds) Please ignore 'took: x', compare just 'Rending took'
[Bug tree-optimization/79224] [7 Regression] Large C-Ray slowdown
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79224 --- Comment #2 from Martin Liška --- r244884 (current trunk): 2409 milliseconds r240470 (25 Sep 2016): 2309 milliseconds
[Bug tree-optimization/79224] [7 Regression] Large C-Ray slowdown
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79224 Martin Liška changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2017-01-25 CC||marxin at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Martin Liška --- On my Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz, I see: gcc c-ray-f.c -O3 -ffast-math -funroll-loops -march=core-avx2 -lm && cat scene | time ./a.out -s 3000x2000: r22: 2434 milliseconds r21: 2310 milliseconds
[Bug tree-optimization/79224] [7 Regression] Large C-Ray slowdown
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79224 Richard Biener changed: What|Removed |Added Target Milestone|--- |7.0