[Bug tree-optimization/79224] [7 Regression] Large C-Ray slowdown

2021-11-16 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79224

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |8.0
 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #26 from Richard Biener  ---
Fixed for GCC 8 I think.

[Bug tree-optimization/79224] [7 Regression] Large C-Ray slowdown

2021-05-17 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79224

Jakub Jelinek  changed:

   What|Removed |Added

   Target Milestone|8.5 |---

[Bug tree-optimization/79224] [7 Regression] Large C-Ray slowdown

2021-05-04 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79224

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED

[Bug tree-optimization/79224] [7 Regression] Large C-Ray slowdown

2020-03-04 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79224

Jakub Jelinek  changed:

   What|Removed |Added

   Target Milestone|8.4 |8.5

--- Comment #25 from Jakub Jelinek  ---
GCC 8.4.0 has been released, adjusting target milestone.

[Bug tree-optimization/79224] [7 Regression] Large C-Ray slowdown

2019-02-22 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79224

Jakub Jelinek  changed:

   What|Removed |Added

   Target Milestone|8.3 |8.4

--- Comment #24 from Jakub Jelinek  ---
GCC 8.3 has been released.

[Bug tree-optimization/79224] [7 Regression] Large C-Ray slowdown

2018-07-26 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79224

Jakub Jelinek  changed:

   What|Removed |Added

   Target Milestone|8.2 |8.3

--- Comment #23 from Jakub Jelinek  ---
GCC 8.2 has been released.

[Bug tree-optimization/79224] [7 Regression] Large C-Ray slowdown

2018-05-02 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79224

Jakub Jelinek  changed:

   What|Removed |Added

   Target Milestone|8.0 |8.2

--- Comment #22 from Jakub Jelinek  ---
GCC 8.1 has been released.

[Bug tree-optimization/79224] [7 Regression] Large C-Ray slowdown

2018-01-08 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79224

Richard Biener  changed:

   What|Removed |Added

   Priority|P1  |P2

[Bug tree-optimization/79224] [7 Regression] Large C-Ray slowdown

2018-01-08 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79224

Richard Biener  changed:

   What|Removed |Added

 Blocks||83665

--- Comment #21 from Richard Biener  ---
https://gcc.opensuse.org/gcc-old/c++bench-czerny/c-ray/ indeed shows it's fixed
on trunk, likely by

2018-01-02  Richard Biener  

* ipa-inline.c (big_speedup_p): Fix expression.

so let's watch if it regresses again if the fallout for this change is fixed...
(PR83665).


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83665
[Bug 83665] [8 regression] Big code size regression and some code quality
improvement at Jan 2 2018

[Bug tree-optimization/79224] [7 Regression] Large C-Ray slowdown

2018-01-06 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79224

Jeffrey A. Law  changed:

   What|Removed |Added

Summary|[7/8 Regression] Large  |[7 Regression] Large C-Ray
   |C-Ray slowdown  |slowdown

--- Comment #20 from Jeffrey A. Law  ---
8 regression marker removed per Aldy's testing.  Assumption is Jan and Yuri's
work addressed the problem.

[Bug tree-optimization/79224] [7 Regression] Large C-Ray slowdown

2017-04-06 Thread hubicka at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79224

--- Comment #12 from Jan Hubicka  ---
I know - it is the problem I mentioned earlier. ray_sphere has a conditional
on parameter SP that decides whether it does extra work.  In some cases
ray_sphere is called with SP NULL.  Now we compute the speedup by comparing the
offline copy of ray_sphere (where we know nothing on SP value) to specialized
inline version (where we know that SP is NULL). This makes us to account quite
large speedup and prioritize the inline.

This is wrong (and has been for a while) and I have patch to fix it next
stage1, but it is not really stage 4 material :(

Honza

[Bug tree-optimization/79224] [7 Regression] Large C-Ray slowdown

2017-03-31 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79224

--- Comment #11 from Richard Biener  ---
I still see inlining differences (comparing to GCC 6).  The profile looks like

  33.18%  c-ray-f-7  c-ray-f-7 [.] shade
  28.18%  c-ray-f-6  c-ray-f-6 [.] shade
  11.50%  c-ray-f-7  c-ray-f-7 [.] ray_sphere
   9.32%  c-ray-f-6  c-ray-f-6 [.] trace
   7.40%  c-ray-f-7  c-ray-f-7 [.] render
   7.26%  c-ray-f-6  c-ray-f-6 [.] render

GCC 6:
Inlining ray_sphere.constprop to shade with frequency 10
Inlining ray_sphere to trace with frequency 6169
Inlining get_sample_pos to get_primary_ray with frequency 1000
Inlining trace.constprop to render with frequency 10
Inlining ray_sphere to render with frequency 10
Inlining get_msec.part.0 to get_msec with frequency 390

GCC 7:
Inlining get_sample_pos to get_primary_ray with frequency 1000
Inlining ray_sphere.constprop to shade with frequency 36274
Inlining trace to shade with frequency 505
Inlining ray_sphere to trace with frequency 3059
Inlining trace.constprop to render with frequency 10
Inlining get_primary_ray to render with frequency 10
Inlining get_sample_pos to render with frequency 10
Inlining ray_sphere to render with frequency 10

so the difference is that with GCC 6 we inline ray_shpere to trace
(and that not into shade) while with GCC 7 we inline trace into shade
but before inlining ray_sphere into trace.

We know that for good performance inlining ray_sphere is critical and
for some reason that's still not prioritized on trunk.

Of course it's just a benchmark and using -fwhole-program fixes it
on trunk (to faster than GCC 6 w/o -fwhole-program, GCC 6 with
-fwhole-program actually regresses...).

[Bug tree-optimization/79224] [7 Regression] Large C-Ray slowdown

2017-03-14 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79224

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P1

--- Comment #10 from Richard Biener  ---
The regression isn't fully fixed yet, we're only at most half-way there.

[Bug tree-optimization/79224] [7 Regression] Large C-Ray slowdown

2017-02-14 Thread hubicka at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79224

--- Comment #9 from Jan Hubicka  ---
Most of the regression caused by the inlining difference is fixed now, but the
solution is not ideal.
According to Czerny we still have quite noticeable regression
https://gcc.opensuse.org/c++bench-czerny/c-ray/ and code size is bigger.

Morever we get noticeable code size growth in wave (808569->814865, 0.8%) and
Botan (1780199->1789329)

The other change is
160810 0.33 30089 13360 2281 
160811 0.27 30089 12656 2560 
Perhaps this patch might be a suspect, but I have no idea
2016-08-10  Yuri Rumyantsev     

PR tree-optimization/71734  
* tree-ssa-loop-im.c (ref_indep_loop_p): Add new argument   
REF_LOOP, invoke ref_indep_loop_p_1.
(outermost_indep_loop): Pass LOOP argumnet where REF was defined
to ref_indep_loop_p.
(ref_indep_loop_p_1): Fix commentary, add argument REF_LOOP,
combine it with ref_indep_lopp_p_2, update SAFELEN if only REF  
is inside LOOP, do not cache dpendence value for loops with 
non-zero SAFELEN.   
(ref_indep_loop_p_2): Delete function.  
(can_sm_ref_p): Pass LOOP as additional argument to 
ref_indep_loop_p.   


One more important issue I noticed is that inline metric always compare the
estimated runtime of offline copy with the runtime of specialized copy after
inlining (with known constants and other context). 

This is OK for size metrics, but not OK for speed.  The offline copy is run in
the same context, in particular if some code is guarded by a conditional that
is false, it is not executed and should not be acocunted to offline path.  This
makes the inline metric to be skewed toward inlining which eliminates large
conditionals.  I will fix that in next stage1, but I am not sure how much we
can still do in current stage4.

One observation is that overall runtime/size estimates after early opt has
quite improved in last two releases where I did not do re-tunning of
parameters. Perhaps it is time to tune down a bit early inlining and
inline-insns-auto again...

[Bug tree-optimization/79224] [7 Regression] Large C-Ray slowdown

2017-02-11 Thread hubicka at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79224

--- Comment #8 from Jan Hubicka  ---
Author: hubicka
Date: Sat Feb 11 21:49:51 2017
New Revision: 245366

URL: https://gcc.gnu.org/viewcvs?rev=245366=gcc=rev
Log:

PR ipa/79224
* params.def (inline-min-speedup) Change from 10 to 8.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/params.def

[Bug tree-optimization/79224] [7 Regression] Large C-Ray slowdown

2017-02-11 Thread hubicka at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79224

--- Comment #7 from Jan Hubicka  ---
Author: hubicka
Date: Sat Feb 11 16:11:57 2017
New Revision: 245357

URL: https://gcc.gnu.org/viewcvs?rev=245357=gcc=rev
Log:

PR ipa/79224
* ipa-inline-analysis.c (get_minimal_bb): New function.
(record_modified): Use it.
(remap_edge_change_prob): Handle also ancestor functions.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/ipa-inline-analysis.c

[Bug tree-optimization/79224] [7 Regression] Large C-Ray slowdown

2017-02-09 Thread hubicka at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79224

--- Comment #6 from Jan Hubicka  ---
This is all bit about luck.  We have big_speedup hack that lets us to bypass
inline-insns-auto when we know the combination caller+callee improve by given
precentage.  Because we inline more, caller is now bigger and slower and
because we early optimize better callee is faster, so overall speedup is
smaller.

There are two extra issues with propagating.  I will simply fix them and drop
the big speedup percentage from 10% to 8%.

Honza

[Bug tree-optimization/79224] [7 Regression] Large C-Ray slowdown

2017-02-06 Thread hubicka at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79224

--- Comment #5 from Jan Hubicka  ---
The issue is that we no longer inline all calls to ray_sphere which is the
inlining that matters.  Declaring  trace noinline or ray_sphere always_inline
helps.

[Bug tree-optimization/79224] [7 Regression] Large C-Ray slowdown

2017-01-25 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79224

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek  ---
So it seems r22 removes one stmt (e.g. in *.fnsplit dump):
--- c-ray-f.i.048t.fnsplit.21   2017-01-25 09:13:45.0 -0500
+++ c-ray-f.i.048t.fnsplit.22   2017-01-25 09:14:20.0 -0500
@@ -1124,7 +1124,6 @@ shade (struct sphere * obj, struct spoin

[1.29%]:
   ray.orig = sp_108(D)->pos;
-  ray.dir = sp_108(D)->vref;
   ray$dir$x_154 = MEM[(struct spoint *)sp_108(D) + 48B];
   ray$dir$y_155 = MEM[(struct spoint *)sp_108(D) + 56B];
   ray$dir$z_156 = MEM[(struct spoint *)sp_108(D) + 64B];
and that in turn changes the inlining decisions.  In r21:
 Inlined into render which now has time 9181 and size 89,net change of -11.
 Inlined into shade which now has time 2920 and size 183,net change of -18.
 Inlined into render which now has time 17981 and size 141,net change of +52.
 Inlined into get_primary_ray which now has time 109 and size 65,net change of
+36.
 Inlined into get_primary_ray which now has time 162 and size 102,net change of
+37.
 Inlined into trace which now has time 301 and size 152,net change of +102.
and in r22:
 Inlined into render which now has time 9181 and size 89,net change of -11.
 Inlined into shade which now has time 2918 and size 179,net change of -18.
 Inlined into render which now has time 17981 and size 141,net change of +52.
 Inlined into get_primary_ray which now has time 109 and size 65,net change of
+36.
 Inlined into get_primary_ray which now has time 162 and size 102,net change of
+37.
 Inlined into shade which now has time 2957 and size 216,net change of +37.
 Inlined into trace which now has time 301 and size 152,net change of +102.

The difference is that trace has been inlined into shade.

[Bug tree-optimization/79224] [7 Regression] Large C-Ray slowdown

2017-01-25 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79224

--- Comment #3 from Martin Liška  ---
Just for curiosity, all releases:

  4.7.0 (93c5ebd73a4d1626)(22 Mar 2012 07:11): [took: 2.836s] result: OK
Rendering took: 2 seconds (2531 milliseconds)
  4.7.1 (0e3097e7d505b7be)(14 Jun 2012 08:32): [took: 2.828s] result: OK
Rendering took: 2 seconds (2517 milliseconds)
  4.7.2 (c9b304ada7111264)(20 Sep 2012 06:54): [took: 2.816s] result: OK
Rendering took: 2 seconds (2507 milliseconds)
  4.7.3 (f22940cb824859bd)(11 Apr 2013 07:57): [took: 2.835s] result: OK
Rendering took: 2 seconds (2523 milliseconds)
  4.7.4 (ae10eb82fe34c186)(12 Jun 2014 12:08): [took: 2.836s] result: OK
Rendering took: 2 seconds (2514 milliseconds)
  4.8.0 (e9c762ec4671d77e)(22 Mar 2013 10:05): [took: 2.621s] result: OK
Rendering took: 2 seconds (2265 milliseconds)
  4.8.1 (caa62b4636bfed71)(31 May 2013 09:02): [took: 2.619s] result: OK
Rendering took: 2 seconds (2258 milliseconds)
  4.8.2 (9bcca88e24e64d4e)(16 Oct 2013 07:20): [took: 2.647s] result: OK
Rendering took: 2 seconds (2292 milliseconds)
  4.8.3 (6bbf0dec66c0e719)(22 May 2014 09:10): [took: 2.675s] result: OK
Rendering took: 2 seconds (2310 milliseconds)
  4.8.4 (1a97fa0bb3fa5669)(19 Dec 2014 11:43): [took: 2.652s] result: OK
Rendering took: 2 seconds (2291 milliseconds)
  4.8.5 (cf82a597b0d18985)(23 Jun 2015 07:54): [took: 2.742s] result: OK
Rendering took: 2 seconds (2380 milliseconds)
  4.9.0 (a7aa383874520cd5)(22 Apr 2014 09:43): [took: 2.672s] result: OK
Rendering took: 2 seconds (2291 milliseconds)
  4.9.1 (c6fa1b4126635939)(16 Jul 2014 10:04): [took: 2.645s] result: OK
Rendering took: 2 seconds (2269 milliseconds)
  4.9.2 (c1283af40b65f1ad)(30 Oct 2014 08:27): [took: 2.610s] result: OK
Rendering took: 2 seconds (2245 milliseconds)
  4.9.3 (876d41ed80ce13e0)(26 Jun 2015 17:57): [took: 2.559s] result: OK
Rendering took: 2 seconds (2198 milliseconds)
  4.9.4 (d3191480f376c780)(03 Aug 2016 05:07): [took: 2.587s] result: OK
Rendering took: 2 seconds (2223 milliseconds)
  5.1.0 (d5ad84b309d0d97d)(22 Apr 2015 08:43): [took: 2.689s] result: OK
Rendering took: 2 seconds (2282 milliseconds)
  5.2.0 (7b26e3896e268cd4)(16 Jul 2015 09:13): [took: 2.695s] result: OK
Rendering took: 2 seconds (2270 milliseconds)
  5.3.0 (2bc376d60753a58b)(04 Dec 2015 10:45): [took: 2.635s] result: OK
Rendering took: 2 seconds (2232 milliseconds)
  5.4.0 (9d0507742960aa9f)(03 Jun 2016 08:41): [took: 2.650s] result: OK
Rendering took: 2 seconds (2252 milliseconds)
  6.1.0 (c441d9e8e0438dcf)(27 Apr 2016 08:20): [took: 2.650s] result: OK
Rendering took: 2 seconds (2241 milliseconds)
  6.2.0 (6ac74a62ba725829)(22 Aug 2016 08:01): [took: 2.630s] result: OK
Rendering took: 2 seconds (2228 milliseconds)
  6.3.0 (4b5e15daff8b5444)(21 Dec 2016 07:51): [took: 2.614s] result: OK
Rendering took: 2 seconds (2214 milliseconds)

Please ignore 'took: x', compare just 'Rending took'

[Bug tree-optimization/79224] [7 Regression] Large C-Ray slowdown

2017-01-25 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79224

--- Comment #2 from Martin Liška  ---
r244884 (current trunk): 2409 milliseconds
r240470 (25 Sep 2016): 2309 milliseconds

[Bug tree-optimization/79224] [7 Regression] Large C-Ray slowdown

2017-01-25 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79224

Martin Liška  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2017-01-25
 CC||marxin at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Martin Liška  ---
On my Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz, I see:

gcc c-ray-f.c -O3 -ffast-math -funroll-loops -march=core-avx2 -lm && cat scene
| time ./a.out -s 3000x2000:

r22: 2434 milliseconds
r21: 2310 milliseconds

[Bug tree-optimization/79224] [7 Regression] Large C-Ray slowdown

2017-01-25 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79224

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |7.0