Sun, 15 Mar 2009 13:17:50 +0000 (UTC), Moritz Warning wrote: > On Sat, 14 Mar 2009 23:50:58 -0400, bearophile wrote: > >> While doing some string processing I've seen some unusual timings >> compared to the C code, so I have written this to see the situation >> better. When USE_MEMCPY is false this little benchmark runs about 3+ >> times slower: > > I did a little benchmark: > > ldc -release -O5 > true: 0.51 > false: 0.63 > > dmd -release -O > true: 4.47 > false: 3.58 > > I don't see a very big difference between slice copying and memcpy (but > between compilers). > > Btw.: http://www.digitalmars.com/pnews/read.php? > server=news.digitalmars.com&group=digitalmars.D.bugs&artnum=14933
The original benchmark swapped insanely on my 1GB laptop so I've cut the number of iterations in half, to 50_000_000. Compiled with -O -release -inline. Results: slice: 2.31 memcpy: 0.73 That's 3 times difference. Disassembly: slice: L31: mov ECX,EDX mov EAX,6 lea ESI,010h[ESP] mov ECX,EAX mov EDI,EDX rep movsb add EDX,6 add EBX,6 cmp EBX,011E1A300h jb L31 memcpy: L35: push 6 lea ECX,014h[ESP] push ECX push EBX call near ptr _memcpy add EBX,6 add ESI,6 add ESP,0Ch cmp ESI,011E1A300h jb L35 Seems like rep movsb is /way/ sub-optimal for copying data.