Sergey Gromov wrote:
Sun, 15 Mar 2009 13:17:50 +0000 (UTC), Moritz Warning wrote:
On Sat, 14 Mar 2009 23:50:58 -0400, bearophile wrote:
While doing some string processing I've seen some unusual timings
compared to the C code, so I have written this to see the situation
better. When USE_MEMCPY is false this little benchmark runs about 3+
times slower:
I did a little benchmark:
ldc -release -O5
true: 0.51
false: 0.63
dmd -release -O
true: 4.47
false: 3.58
I don't see a very big difference between slice copying and memcpy (but
between compilers).
Btw.: http://www.digitalmars.com/pnews/read.php?
server=news.digitalmars.com&group=digitalmars.D.bugs&artnum=14933
The original benchmark swapped insanely on my 1GB laptop so I've cut the
number of iterations in half, to 50_000_000. Compiled with -O -release
-inline. Results:
slice: 2.31
memcpy: 0.73
That's 3 times difference. Disassembly:
slice:
L31: mov ECX,EDX
mov EAX,6
lea ESI,010h[ESP]
mov ECX,EAX
mov EDI,EDX
rep
movsb
add EDX,6
add EBX,6
cmp EBX,011E1A300h
jb L31
memcpy:
L35: push 6
lea ECX,014h[ESP]
push ECX
push EBX
call near ptr _memcpy
add EBX,6
add ESI,6
add ESP,0Ch
cmp ESI,011E1A300h
jb L35
Seems like rep movsb is /way/ sub-optimal for copying data.
Definitely! The difference ought to be bigger than a factor of 3. Which
means that memcpy probably isn't anywhere near optimal, either.
rep movsd is always 4 times quicker than rep movsb. There's a range of
lengths for which rep movsd is optimal; outside that range, there's are
other options which are even faster.
So there's a factor of 4-8 speedup available on most memory copies.
Low-hanging fruit! <g>