speed of low-level C funcs: example of memmove

2011-04-09 Thread spir

Hello,

To insert of delete an array slice, I tried to use C's memmove, thinking it 
would be far faster than manually copying bit per bit (by any kind of magic). 
But I still wrote a D versions just to check what the actual speed gain is. To 
my great surprise, the C-memmove and D-manual versions perform *exactly* at the 
same speed (considering measure imprecision).
Note: this remains true when elements are bigger; speed slows down slowly (eg 
dchar's take only 1/3 more time).


Any comment or explanation welcome. Below the code.

Denis

= code =
import std.date : getUTCtime, d_time;
import std.c.string : memmove;
// C interface: void *memmove(void *dest, const void *src, size_t n);

void shiftEndPartC (E) (ref E[] array, size_t source, size_t dest) {
// Record length before possible extension.
auto length = array.length;
int offset  = dest - source;

// If move up, extend array to make place.
if (offset  0)
array.length += offset;

// Shift slice.
auto pDest  = cast(void*)((array[dest]));
auto pSource= cast(void*)((array[source]));
size_t size = (length - source) * E.sizeof;
memmove(pDest, pSource, size);

// If move down, compress array.
if (offset  0)
array.length += offset;
}
void shiftEndPartD (E) (ref E[] array, size_t source, size_t dest) {
// Record length before possible extension.
auto length = array.length;
int offset  = dest - source;

// If move up, extend array  shift backwards.
if (offset  0) {
array.length += offset;
for (size_t i=length-1 ; i=source ; i--)
array[i+offset] = array[i];
}

// If move down, shift forwards  compress array.
if (offset  0) {
for (size_t i=source ; ilength ; i++)
array[i+offset] = array[i];
array.length += offset;
}
}

void testFuncs () {
char[] s;

// C memmove
s = 0123456789.dup;
writeln(s);
// Insert slice.
s.shiftEndPartC(3,5);
s[3..5] = --;
writeln(s);
// Delete slice.
s.shiftEndPartC(5,3);
writeln(s);
writeln();

// D manual
s = 0123456789.dup;
writeln(s);
// Insert slice.
s.shiftEndPartD(3,5);
s[3..5] = --;
writeln(s);
// Delete slice.
s.shiftEndPartD(5,3);
writeln(s);
writeln();
}
void chrono () {
char[] s;
d_time t;
enum N = 1_000_000;

// C memmove
s = 0123456789.dup;
t = getUTCtime();
foreach (_ ; 0..N) {
s.shiftEndPartC(3,5);
s[3..5] = --;
s.shiftEndPartC(5,3);
}
t = getUTCtime() - t;
writefln(C time: %s, t);

// D manual
s = 0123456789.dup;
t = getUTCtime();
foreach (_ ; 0..N) {
s.shiftEndPartD(3,5);
s[3..5] = --;
s.shiftEndPartD(5,3);
}
t = getUTCtime() - t;
writefln(D time: %s, t);
}

unittest {
//~ testFuncs();
chrono();
}
void main () {}
--
_
vita es estrany
spir.wikidot.com



Re: speed of low-level C funcs: example of memmove

2011-04-09 Thread spir

On 04/09/2011 07:08 PM, spir wrote:

Hello,

To insert of delete an array slice, I tried to use C's memmove, thinking it
would be far faster than manually copying bit per bit (by any kind of magic).
But I still wrote a D versions just to check what the actual speed gain is. To
my great surprise, the C-memmove and D-manual versions perform *exactly* at the
same speed (considering measure imprecision).
Note: this remains true when elements are bigger; speed slows down slowly (eg
dchar's take only 1/3 more time).


Correction: memmove can be from 5 to 10 times faster on big arrays. For 
instance, with 1_000_000 char array:


void chrono () {
char[] s;
d_time t;
enum N = 100;

// C memmove
s = (0.mult(1_000_000)).dup;
t = getUTCtime();
foreach (_ ; 0..N) {
s.shiftEndPartC(3,5);
s[3..5] = --;
s.shiftEndPartC(5,3);
}
t = getUTCtime() - t;
writefln(C time: %s, t);

// D manual
s = (0.mult(1_000_000)).dup;
t = getUTCtime();
foreach (_ ; 0..N) {
s.shiftEndPartD(3,5);
s[3..5] = --;
s.shiftEndPartD(5,3);
}
t = getUTCtime() - t;
writefln(D time: %s, t);
}

--
_
vita es estrany
spir.wikidot.com



Re: speed of low-level C funcs: example of memmove

2011-04-09 Thread Cliff Hudson
If you take a look at the implementation of memmove (grab the std C lib
source) you'll see a rather optimized assembly loop which is very smart
about doing machine-word aligned moves, and using processor block-copy
instructions.  I suspect that is the reason you see the difference.  For
smaller data sets, you won't see as much gain, if any, because it can't take
advantage of those other semantics to the same degree.

- Cliff

On Sat, Apr 9, 2011 at 10:17 AM, spir denis.s...@gmail.com wrote:

 On 04/09/2011 07:08 PM, spir wrote:

 Hello,

 To insert of delete an array slice, I tried to use C's memmove, thinking
 it
 would be far faster than manually copying bit per bit (by any kind of
 magic).
 But I still wrote a D versions just to check what the actual speed gain
 is. To
 my great surprise, the C-memmove and D-manual versions perform *exactly*
 at the
 same speed (considering measure imprecision).
 Note: this remains true when elements are bigger; speed slows down slowly
 (eg
 dchar's take only 1/3 more time).


 Correction: memmove can be from 5 to 10 times faster on big arrays. For
 instance, with 1_000_000 char array:


 void chrono () {
char[] s;
d_time t;
enum N = 100;

// C memmove
s = (0.mult(1_000_000)).dup;

t = getUTCtime();
foreach (_ ; 0..N) {
s.shiftEndPartC(3,5);
s[3..5] = --;
s.shiftEndPartC(5,3);
}
t = getUTCtime() - t;
writefln(C time: %s, t);

// D manual
s = (0.mult(1_000_000)).dup;

t = getUTCtime();
foreach (_ ; 0..N) {
s.shiftEndPartD(3,5);
s[3..5] = --;
s.shiftEndPartD(5,3);
}
t = getUTCtime() - t;
writefln(D time: %s, t);
 }

 --
 _
 vita es estrany
 spir.wikidot.com