https://issues.dlang.org/show_bug.cgi?id=21027
Issue ID: 21027 Summary: Backend: DMD use 'rep stosb' even for ulong arrays Product: D Version: D2 Hardware: x86_64 OS: Linux Status: NEW Keywords: performance Severity: normal Priority: P1 Component: dmd Assignee: nob...@puremagic.com Reporter: pro.mathias.l...@gmail.com Take the following code: ``` alias Content = ulong[256]; void main () { Content v; } ``` What DMD generates for this is on Linux c86_64 (used `run.dlang.org`): ``` .text._Dmain segment assume CS:.text._Dmain _Dmain: push RBP mov RBP,RSP sub RSP,0808h mov ECX,0800h mov qword ptr -8[RBP],0 lea RAX,-8[RBP] mov AL,[RAX] lea RDI,0FFFFF7F8h[RBP] rep stosb xor EAX,EAX leave ret add [RAX],AL .text._Dmain ends ``` The best to do here would be to call `memset` or `memcpy`, which is what LDC does. The second best would be to use `rep stosd` 0x100 times, as it is faster than `rep stosb` 0x800 times. Source: - Agner Fog, optimizing assembly (https://www.agner.org/optimize/optimizing_assembly.pdf), 16.9 Strings instructions (all processors): > `REP MOVSD` and `REP STOSD` are quite fast if the repeat count is not too > small. The largest word size (DWORD in 32-bit mode, QWORD in 64-bit mode) is > preferred. Both source and destination should be aligned by the word size or > better. In many cases, however, it is faster to use vector registers. Moving > data in the largest available registers is faster than `REP MOVSD` and `REP > STOSD` in most cases, especially on older processors. See page 150 for > details. Related: https://issues.dlang.org/show_bug.cgi?id=14458 --