On Tuesday, 31 January 2017 at 01:30:48 UTC, Walter Bright wrote:
Just from D's type signature, we can know a lot about memcpy():
1. There are no side effects.
2. The return value is derived from s1.
3. Nothing s2 transitively points to is altered via s2.
4. Copies of s1 or s2 are not saved.
The C declaration does not give us any of that info, although
the C description
does give us 2, and the 'restrict' says that s1 and s2 do not
overlap.
The Rust declaration does not give us 1, 2 or 4 (because it is
marked as unsafe). If it was safe, the declaration does not
give us 2.
By this information being knowable from the declaration, the
compiler knows it too and can make use of it.
Well, I would not have taken memcpy as an example in favor of D.
Good C compilers (like gcc) know what memcpy does and are able to
optimize it according to its arguments. DMD may know better about
memcpy through its declaration but does not make any use about it.
A simple example:
// cmemcpy.c
#include <string.h>
#include <stdio.h>
int main(void) {
char a[16] = "world hello";
char b[16] = "";
memcpy(b, a, 12);
memcpy(b, a + 6, 5);
memcpy(b + 6, a, 5);
printf("%s -> %s\n", a, b);
}
//------------
gcc -Ofast produces the following code:
main:
.LFB0:
.cfi_startproc
subq $40, %rsp
.cfi_def_cfa_offset 48
movl $.LC0, %edi
movabsq $7307126011096887159, %rax
movq %rax, (%rsp)
movq %rsp, %rdx
movq %rax, 16(%rsp)
leaq 16(%rsp), %rsi
movq $7302252, 24(%rsp)
movl 22(%rsp), %eax
movq $0, 8(%rsp)
movl $7302252, 8(%rsp)
movl %eax, (%rsp)
movzbl 26(%rsp), %eax
movb %al, 4(%rsp)
movl 16(%rsp), %eax
movl %eax, 6(%rsp)
movzbl 20(%rsp), %eax
movb %al, 10(%rsp)
xorl %eax, %eax
call printf
xorl %eax, %eax
addq $40, %rsp
.cfi_def_cfa_offset 8
ret
No call to memcpy, this has been optimized out by the compiler.
Now a D equivalent:
// dmemcpy.d
module dmemcpy;
import core.stdc.string, std.stdio;
void main() {
char [16] a_ = "world hello", b_ = "";
void* a = &a_[0], b = &b_[0];
memcpy(b, a, 12);
memcpy(b, a + 6, 5);
memcpy(b + 6, a, 5);
writefln("%s -> %s", a_, b_);
}
//--------------------
dmd -O -release -inline -boundscheck=off prouces the following
asm:
_Dmain:
push RBP
mov RBP,RSP
sub RSP,020h
lea RSI,_TMP0@PC32[RIP]
lea RDI,-020h[RBP]
movsd
movsd
lea RSI,_TMP0@PC32[RIP]
lea RDI,-010h[RBP]
movsd
movsd
mov EDX,0Ch
lea RSI,-020h[RBP]
lea RDI,-010h[RBP]
call memcpy@PLT32
mov EDX,5
lea RSI,-01Ah[RBP]
lea RDI,-010h[RBP]
call memcpy@PLT32
mov EDX,5
lea RSI,-020h[RBP]
lea RDI,-0Ah[RBP]
call memcpy@PLT32
lea RDX,_TMP0@PC32[RIP]
mov EDI,8
mov RSI,RDX
push dword ptr -018h[RBP]
push dword ptr -020h[RBP]
push dword ptr -8[RBP]
push dword ptr -010h[RBP]
call
_D3std5stdio27__T8writeflnTAyaTG16aTG16aZ8writeflnFNfAyaG16aG16aZv@PLT32
add RSP,020h
xor EAX,EAX
mov RSP,RBP
pop RBP
ret
So with DMD, calls to memcpy are done verbatim, without any
optimization :-(
To be fair, gdc will optimize the memcpy call out too.
But, my main argument here, is that a good C compiler, is able to
do a very good job at optimizing memcpy, so the extra information
brought by the D language, is not so useful in practice.