On Friday, 31 May 2013 at 11:49:05 UTC, Manu wrote:
I find that using templates actually makes it more likely for the compiler to properly inline. But I think the totally generic expressions produce cases where the compiler is considering too many possibilities that inhibit
many optimisations.
It might also be that the optimisations get a lot more complex when the
code fragments span across a complex call tree with optimisation
dependencies on non-deterministic inlining.

One of the most important jobs for the optimiser is code re-ordering. Generic code is often written in such a way that makes it hard/impossible
for the optimiser to reorder the flattened code properly.
Hand written code can have branches and memory accesses carefully placed at
the appropriate locations.
Generic code will usually package those sorts of operations behind little
templates that often flatten out in a different order.
The optimiser is rarely able to re-order code across if statements, or pointer accesses. __restrict is very important in generic code to allow the optimiser to reorder across any indirection, otherwise compilers typically have to be conservative and presume that something somewhere may have changed the destination of a pointer, and leave the order as the template expanded. Sadly, D doesn't even support __restrict, and nobody ever uses it
in C++ anyway.

I've always has better results with writing precisely what I intend the compiler to do, and using __forceinline where it needs a little extra
encouragement.

Thanks for valuable input. Have never had a pleasure to actually try templates in performance-critical code and this a good stuff to remember about. Have added to notes.

Reply via email to