https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79224

--- Comment #11 from Richard Biener <rguenth at gcc dot gnu.org> ---
I still see inlining differences (comparing to GCC 6).  The profile looks like

  33.18%  c-ray-f-7  c-ray-f-7         [.] shade
  28.18%  c-ray-f-6  c-ray-f-6         [.] shade
  11.50%  c-ray-f-7  c-ray-f-7         [.] ray_sphere
   9.32%  c-ray-f-6  c-ray-f-6         [.] trace
   7.40%  c-ray-f-7  c-ray-f-7         [.] render
   7.26%  c-ray-f-6  c-ray-f-6         [.] render

GCC 6:
Inlining ray_sphere.constprop to shade with frequency 100000
Inlining ray_sphere to trace with frequency 6169
Inlining get_sample_pos to get_primary_ray with frequency 1000
Inlining trace.constprop to render with frequency 100000
Inlining ray_sphere to render with frequency 100000
Inlining get_msec.part.0 to get_msec with frequency 390

GCC 7:
Inlining get_sample_pos to get_primary_ray with frequency 1000
Inlining ray_sphere.constprop to shade with frequency 36274
Inlining trace to shade with frequency 505
Inlining ray_sphere to trace with frequency 3059
Inlining trace.constprop to render with frequency 100000
Inlining get_primary_ray to render with frequency 100000
Inlining get_sample_pos to render with frequency 100000
Inlining ray_sphere to render with frequency 100000

so the difference is that with GCC 6 we inline ray_shpere to trace
(and that not into shade) while with GCC 7 we inline trace into shade
but before inlining ray_sphere into trace.

We know that for good performance inlining ray_sphere is critical and
for some reason that's still not prioritized on trunk.

Of course it's just a benchmark and using -fwhole-program fixes it
on trunk (to faster than GCC 6 w/o -fwhole-program, GCC 6 with
-fwhole-program actually regresses...).

Reply via email to