https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79224
--- Comment #11 from Richard Biener <rguenth at gcc dot gnu.org> --- I still see inlining differences (comparing to GCC 6). The profile looks like 33.18% c-ray-f-7 c-ray-f-7 [.] shade 28.18% c-ray-f-6 c-ray-f-6 [.] shade 11.50% c-ray-f-7 c-ray-f-7 [.] ray_sphere 9.32% c-ray-f-6 c-ray-f-6 [.] trace 7.40% c-ray-f-7 c-ray-f-7 [.] render 7.26% c-ray-f-6 c-ray-f-6 [.] render GCC 6: Inlining ray_sphere.constprop to shade with frequency 100000 Inlining ray_sphere to trace with frequency 6169 Inlining get_sample_pos to get_primary_ray with frequency 1000 Inlining trace.constprop to render with frequency 100000 Inlining ray_sphere to render with frequency 100000 Inlining get_msec.part.0 to get_msec with frequency 390 GCC 7: Inlining get_sample_pos to get_primary_ray with frequency 1000 Inlining ray_sphere.constprop to shade with frequency 36274 Inlining trace to shade with frequency 505 Inlining ray_sphere to trace with frequency 3059 Inlining trace.constprop to render with frequency 100000 Inlining get_primary_ray to render with frequency 100000 Inlining get_sample_pos to render with frequency 100000 Inlining ray_sphere to render with frequency 100000 so the difference is that with GCC 6 we inline ray_shpere to trace (and that not into shade) while with GCC 7 we inline trace into shade but before inlining ray_sphere into trace. We know that for good performance inlining ray_sphere is critical and for some reason that's still not prioritized on trunk. Of course it's just a benchmark and using -fwhole-program fixes it on trunk (to faster than GCC 6 w/o -fwhole-program, GCC 6 with -fwhole-program actually regresses...).