On 12/1/2017 3:31 AM, Nicholas Wilson wrote:
On Friday, 1 December 2017 at 11:07:32 UTC, Walter Bright wrote:
Does DMD optimise for locality?
No. However, the much-despised Optlink does! It uses the trace.def output from
the profiler to set the layout of functions, so that tightly coupled functions
are co-located.
https://digitalmars.com/ctg/trace.html
It's not even just cache locality - rarely used functions can be allocated to
pages so they are never even loaded in from disk. (The executable files are
demand loaded.) The speed improvement can be dramatic, especially on program
startup times, and if the program does a lot of swapping. I don't know if the
Linux linker can accept a script file telling it the function layout.
The downside is because it relies on runtime profile information, it is awkward
to set up and needs a representative usage test case to drive it.
dmd could potentially use a static call graph to do a better-than-nothing stab
at it, but it would only work on code supplied to it as a group on the command line.
I would hope co-located functions are either larger than cache lines by a
reasonable amount or, if they are small enough, inlined so that the asserts can
be aggregated. It is also possible (though I can't comment on how easy it would
be to implement) if you are trying to optimise for co-location to have the
asserts be completely out of line so that you have
function1
function2
function3
call asserts of function1
call asserts of function2
call asserts of function3
such that the calls to the asserts never appear in the icache at all apart from
overlap of e.g. function1's asserts after the end of function3, or one of the
the asserts fail.
It's possible, although the jmps to the assert code would now have to be
unconditional relocatable jmps which are larger:
jne L1
jmp assertcode
L1:
Then it becomes a tradeoff, one that I'm glad the compiler is doing for me.
Everything about codegen is a tradeoff :-)