https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445

--- Comment #32 from Jan Hubicka <hubicka at ucw dot cz> ---
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
> 
> --- Comment #31 from Segher Boessenkool <segher at gcc dot gnu.org> ---
> (In reply to Jan Hubicka from comment #27)
> > It is because --param inline-insns-single was reduced for -O2 from 200
> > to 70.  GCC 10 has newly different set of parameters for -O2 and -O3 and
> > enables auto-inlining at -O2.
> > 
> > Problem with inlininig funtions declared inline is that C++ codebases
> > tends to abuse this keyword for things that are really too large (and
> > get_order would be such example if it did not have builtin_constant_p
> > check which inliner does not understand well). So having same limit at
> > -O2 and -O3 turned out to be problematic with respect to code size and
> > especially with respect to LTO, where a lot more inlining oppurtunities
> > appear.
> 
> Do the heuristics account for that not inlining a "static inline" results
> in multiple copies?

It prevents inlining only when there are multiple calls in the unit
being compiled (there is no way to know that the same inline function is
duplicated in other units).
This is what happens here: there are multiple calls so inliner concludes
inlining would cost too much of code size and later they are optimized
away.

get_order is a wrapper around ffs64.  This can be implemented w/o asm
statement as follows:
int
my_fls64 (__u64 x)
{
  if (!x)
      return 0;
  return 64 - __builtin_clzl (x);
}

This results in longer assembly than the kernel asm implementation. If
that matters I would replace builtin_constnat_p part of get_order by this
implementation that is more transparent to the code size estimation and
things will get inlined.

Honza

Reply via email to