On Mon, Feb 20, 2017 at 05:16:15AM -0800, Jonathan M Davis via Digitalmars-d-learn wrote: [...] > Regardless, if performance is your #1 concern, then I would suggest > that you compile with ldc and not dmd. [...]
+1. If you are concerned about performance enough to worry whether the compiler will inline something, it's time to use gdc or ldc. Dmd's inliner is rudimentary at best, and its optimizer, while serviceable, is not up to par with gdc or ldc's optimizers. If you want top performance, use gdc / ldc. IME gdc -O3 consistently produces code that runs about 20-30% faster than code produced by dmd -O (even with -inline). Sometimes I've seen performance gains of up to 40-50%. This is especially likely when your code consists of deep call trees involving small(ish) functions: I've looked at the assembly output before and it seems that dmd's inliner just gives up too easily, thus missing the opportunities for further reductions and further inlining. Even after discounting the inliner, though, I find that gdc is simply better at loop optimization than dmd, such as hoisting, strength reduction, unrolling, etc.. So if your code involves complex loops, expect gdc -O3 to produce better code than dmd. Well, "better" may be debatable, but certainly gdc is far more aggressive at optimizing loops (and optimizing in general) than dmd, and I find in the cases I've looked at that aggressive optimization often leads to further optimization opportunities, whereas if the optimizer is too conservative, opportunities are missed that may lead to other opportunities, so the resulting code can end up being vastly different in performance. Having said all that, though, have you used a profiler to determine whether or not your performance bottleneck is really at the function in question? I find that 90% of the time what I truly believe should be inlined actually doesn't make much difference; the bottleneck is usually somewhere else that I didn't expect. I used to spend lots of time trying to hyper-optimize everything, only to discover later that 90% of my efforts have been wasted on gaining a meager 1% of performance, whereas if I had just used a profiler in the first place, I would have gotten a 50% performance improvement with only 10% of the effort. T -- Tech-savvy: euphemism for nerdy.