Mon, 16 Mar 2009 10:34:33 +0100, Don wrote: > Sergey Gromov wrote: >> Sun, 15 Mar 2009 13:17:50 +0000 (UTC), Moritz Warning wrote: >> >>> On Sat, 14 Mar 2009 23:50:58 -0400, bearophile wrote: >>> >>>> While doing some string processing I've seen some unusual timings >>>> compared to the C code, so I have written this to see the situation >>>> better. When USE_MEMCPY is false this little benchmark runs about 3+ >>>> times slower: >>> I did a little benchmark: >>> >>> ldc -release -O5 >>> true: 0.51 >>> false: 0.63 >>> >>> dmd -release -O >>> true: 4.47 >>> false: 3.58 >>> >>> I don't see a very big difference between slice copying and memcpy (but >>> between compilers). >>> >>> Btw.: http://www.digitalmars.com/pnews/read.php? >>> server=news.digitalmars.com&group=digitalmars.D.bugs&artnum=14933 >> >> The original benchmark swapped insanely on my 1GB laptop so I've cut the >> number of iterations in half, to 50_000_000. Compiled with -O -release >> -inline. Results: >> >> slice: 2.31 >> memcpy: 0.73 >> >> That's 3 times difference. Disassembly: >> >> slice: >> L31: mov ECX,EDX >> mov EAX,6 >> lea ESI,010h[ESP] >> mov ECX,EAX >> mov EDI,EDX >> rep >> movsb >> add EDX,6 >> add EBX,6 >> cmp EBX,011E1A300h >> jb L31 >> >> memcpy: >> L35: push 6 >> lea ECX,014h[ESP] >> push ECX >> push EBX >> call near ptr _memcpy >> add EBX,6 >> add ESI,6 >> add ESP,0Ch >> cmp ESI,011E1A300h >> jb L35 >> >> Seems like rep movsb is /way/ sub-optimal for copying data. > > Definitely! The difference ought to be bigger than a factor of 3. Which > means that memcpy probably isn't anywhere near optimal, either. > rep movsd is always 4 times quicker than rep movsb. There's a range of > lengths for which rep movsd is optimal; outside that range, there's are > other options which are even faster. > > So there's a factor of 4-8 speedup available on most memory copies. > Low-hanging fruit! <g>
Don't disregard the function call overhead. memcpy is called 50 M times, copying only 6 bytes per call.