https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108030
--- Comment #3 from Matthias Kretz (Vir) <mkretz at gcc dot gnu.org> --- (In reply to Jakub Jelinek from comment #2) > I bet by adding too many always_inline functions that call normal inlines > that is what is bound to happen, one runs into inline growth limits. It is > better to use always_inline on the leaf functions rather than on what calls > them. How is the inline growth limit determined? I mean, in the cases where it really hurts, the resulting function compiles down to a single instruction (plus parameter passing boilerplate). The optimizer cannot know about the number of instructions, so what is the measure it uses? Especially with the helper functions necessary to work with parameter packs / index_sequence, it's not enough to use always_inline on the leaf functions. E.g. any simd binary operator basically should be [[gnu::always_inline, gnu::flatten]]. However, simd maybe shouldn't use 'flatten' for functions that call a user-provided callable (e.g. the simd generator constructor).