https://issues.dlang.org/show_bug.cgi?id=21027

          Issue ID: 21027
           Summary: Backend: DMD use 'rep stosb' even for ulong arrays
           Product: D
           Version: D2
          Hardware: x86_64
                OS: Linux
            Status: NEW
          Keywords: performance
          Severity: normal
          Priority: P1
         Component: dmd
          Assignee: nob...@puremagic.com
          Reporter: pro.mathias.l...@gmail.com

Take the following code:
```
alias Content = ulong[256];
void main ()
{
    Content v;
}
```

What DMD generates for this is on Linux c86_64 (used `run.dlang.org`):
```
.text._Dmain    segment
        assume  CS:.text._Dmain
_Dmain:
                push    RBP
                mov     RBP,RSP
                sub     RSP,0808h
                mov     ECX,0800h
                mov     qword ptr -8[RBP],0
                lea     RAX,-8[RBP]
                mov     AL,[RAX]
                lea     RDI,0FFFFF7F8h[RBP]
                rep
                stosb
                xor     EAX,EAX
                leave
                ret
                add     [RAX],AL
.text._Dmain    ends
```

The best to do here would be to call `memset` or `memcpy`, which is what LDC
does.
The second best would be to use `rep stosd` 0x100 times, as it is faster than
`rep stosb` 0x800 times.

Source:
- Agner Fog, optimizing assembly
(https://www.agner.org/optimize/optimizing_assembly.pdf), 16.9 Strings
instructions (all processors):
> `REP MOVSD` and `REP STOSD` are quite fast if the repeat count is not too 
> small. The largest word size (DWORD in 32-bit mode, QWORD in 64-bit mode) is 
> preferred. Both source and destination should be aligned by the word size or 
> better. In many cases, however, it is faster to use vector registers. Moving 
> data in the largest available registers is faster than `REP MOVSD` and `REP 
> STOSD` in most cases, especially on older processors. See page 150 for 
> details.

Related: https://issues.dlang.org/show_bug.cgi?id=14458

--

Reply via email to