https://issues.dlang.org/show_bug.cgi?id=14943
--- Comment #1 from hst...@quickfur.ath.cx ---
Further notes:
- gdc not only inlines the call trees of .empty, .front, .popFront, it also
applied other loop optimizations like strength reduction to refactor the a =
a*7 into a += b; b += 7. Not sure if dmd is capable of doing this, but in any
case the opportunity is missed because .popFront was not inlined, so the
optimizer wouldn't have been able to apply strength reduction.
- gdc's aggressive inlining also allowed various loop counters and accumulators
to be completely held in registers, while the function calls generated by dmd
necessitated dereferencing addresses to stack variables, which is an extra
layer of indirection. Again, a missed opportunity due to not inlining
aggressively enough.
For reference, here's the inner loop produced by gdc:
403b80: 89 d7 mov%edx,%edi
403b82: c1 ef 1fshr$0x1f,%edi
403b85: 8d 34 3alea(%rdx,%rdi,1),%esi
403b88: 83 c2 07add$0x7,%edx
403b8b: 83 e6 01and$0x1,%esi
403b8e: 39 fe cmp%edi,%esi
403b90: 75 1e jne403bb0 int test.fun(int)+0x80
403b92: 89 c6 mov%eax,%esi
403b94: 8d 14 cd 00 00 00 00lea0x0(,%rcx,8),%edx
403b9b: c1 ee 1fshr$0x1f,%esi
403b9e: 01 f0 add%esi,%eax
403ba0: 29 ca sub%ecx,%edx
403ba2: d1 f8 sar%eax
403ba4: 01 d0 add%edx,%eax
403ba6: 83 c2 07add$0x7,%edx
403ba9: 0f 1f 80 00 00 00 00nopl 0x0(%rax)
403bb0: 83 c1 01add$0x1,%ecx
403bb3: 39 cb cmp%ecx,%ebx
403bb5: 75 c9 jne403b80 int test.fun(int)+0x50
--