On Tue, Mar 27, 2018 at 09:52:25PM +0000, Rubn via Digitalmars-d wrote: > On Tuesday, 27 March 2018 at 20:38:35 UTC, H. S. Teoh wrote: > > On Tue, Mar 27, 2018 at 08:25:36PM +0000, Rubn via Digitalmars-d wrote: > > [...] > > > _D7example__T3fooTSQr3FooZQnFNbNiNfQrZv: > > > push rbp > > > mov rbp, rsp > > > sub rsp, 3104 > > > lea rax, [rbp + 16] > > > lea rdi, [rbp - 2048] > > > lea rcx, [rbp - 1024] > > > mov edx, 1024 > > > mov rsi, rcx > > > mov qword ptr [rbp - 2056], rdi > > > mov rdi, rsi > > > mov rsi, rax > > > mov qword ptr [rbp - 2064], rcx > > > call memcpy@PLT <--------------------- hidden copy > > [...] > > > > Is this generated by dmd, or gdc/ldc? > > > > Generally, when it comes to performance issues, I don't even bother > > looking at dmd-generated code anymore. If the extra copying is > > still happening with gdc -O2 / ldc -O, then you have a point. > > Otherwise, it doesn't really say very much. > > > > > > T > > It happens with LDC too, not sure how it would be able to know to do > any kind of optimization like that unless it was able to inline every > single function called into one function and be able to do optimize it > from there. I don't imagine that'll be likely though.
You'll be surprised. Don't underestimate the power of modern optimizers. I've seen LDC do inlining that's so aggressive, that it essentially evaluated an entire series of function calls at compile-time (likely on the IR) and generated a single instruction to load the answer into the return register at runtime. :-D Of course, it still generated the individual functions, but those are never actually called at runtime. (On one occasion, this produced odd-looking "benchmark" results where the ldc executable computed the answer in exactly 0ms, whereas everyone else took a lot longer than that. :-D (Well, it was probably a few nanosecs while the CPU decoded and ran the instruction, but I don't think any benchmark could measure that!)) For your code example, you might want to look at the code generated for callers of the function, since when compiling individual functions in isolation, LDC is obligated to follow the ABI, which could include redundant copying. But if inlining was possible, it could generate very different code. T -- Dogs have owners ... cats have staff. -- Krista Casada