Something always tells me this is the compilers job... What
clever reasoning are you applying that the compiler's inliner
can't? It seems like a different situation to say SIMD code,
where correctly structuring loops can require a lot of
gymnastics that the compiler can't or won't (floating point
conformance) do. The inlining decision seems easily automatable
in comparison.
I understand that unoptimised builds for debugging are a
problem, but a sensible compiler let's you hand pick your
optimisation passes.
In short: why are compilers not good enough at this that the
programmer needs to be involved?
No compiler gets this right 100% of the time, so if it is the
compilers job they are failing. Most C++ compilers will sometimes
require use of forceinline with SSE intrinsics.
Unless it has PGO support the compiler has no idea about the
runtime usage of that code. It wouldn't know which code the
program spends 90% of its time in so it just applies general
heuristics when deciding to inline.
What I'd like is the ability to set a inline level per function.
Something like 0 being always inline, and 10 being never inline.
Unless specified otherwise, the default would be 5
So if you want forceinline behavior
inline(0) vec3 dot(vec3 a, vec3 b); //always inlined
inline(10) vec3 cross(vec3 a, vec3 b); //never inlined
And override it at callsite--
inline(10) auto v = dot(a,b);