On 12/1/2017 3:31 AM, Nicholas Wilson wrote:
On Friday, 1 December 2017 at 11:07:32 UTC, Walter Bright wrote:
Does DMD optimise for locality?

No. However, the much-despised Optlink does! It uses the trace.def output from the profiler to set the layout of functions, so that tightly coupled functions are co-located.

  https://digitalmars.com/ctg/trace.html

It's not even just cache locality - rarely used functions can be allocated to pages so they are never even loaded in from disk. (The executable files are demand loaded.) The speed improvement can be dramatic, especially on program startup times, and if the program does a lot of swapping. I don't know if the Linux linker can accept a script file telling it the function layout.

The downside is because it relies on runtime profile information, it is awkward to set up and needs a representative usage test case to drive it.

dmd could potentially use a static call graph to do a better-than-nothing stab at it, but it would only work on code supplied to it as a group on the command line.


I would hope co-located functions are either larger than cache lines by a reasonable amount or, if they are small enough, inlined so that the asserts can be aggregated. It is also possible (though I can't comment on how easy it would be to implement) if you are trying to optimise for co-location to have the asserts be completely out of line so that you have

function1
function2
function3
call asserts of function1
call asserts of function2
call asserts of function3

such that the calls to the asserts never appear in the icache at all apart from overlap of e.g. function1's asserts after the end of function3, or one of the the asserts fail.

It's possible, although the jmps to the assert code would now have to be unconditional relocatable jmps which are larger:

    jne L1
    jmp assertcode
L1:


Then it becomes a tradeoff, one that I'm glad the compiler is doing for me.

Everything about codegen is a tradeoff :-)

Reply via email to