Re: Replacing C's memcpy with a D implementation

David Nadlinger via Digitalmars-d Sun, 10 Jun 2018 11:21:47 -0700

On Sunday, 10 June 2018 at 12:49:31 UTC, Mike Franklin wrote:

I'm not experienced with this kind of programming, so I'mdoubting these results. Have I done something wrong? Am Ioverlooking something?

You've just discovered the fact that one can rarely be carefulenough with what is benchmarked, and having enough statistics.

For example, check out the following output from running yourprogram on macOS 10.12, compiled with LDC 1.8.0:


---
$ ./test
memcpyD: 2 ms, 570 μs, and 9 hnsecs
memcpyDstdAlg: 77 μs and 2 hnsecs
memcpyC: 74 μs and 1 hnsec
memcpyNaive: 76 μs and 4 hnsecs
memcpyASM: 145 μs and 5 hnsecs
$ ./test
memcpyD: 3 ms and 376 μs
memcpyDstdAlg: 76 μs and 9 hnsecs
memcpyC: 104 μs and 4 hnsecs
memcpyNaive: 72 μs and 2 hnsecs
memcpyASM: 181 μs and 8 hnsecs
$ ./test
memcpyD: 2 ms and 565 μs
memcpyDstdAlg: 76 μs and 9 hnsecs
memcpyC: 73 μs and 2 hnsecs
memcpyNaive: 71 μs and 9 hnsecs
memcpyASM: 145 μs and 3 hnsecs
$ ./test
memcpyD: 2 ms, 813 μs, and 8 hnsecs
memcpyDstdAlg: 81 μs and 2 hnsecs
memcpyC: 99 μs and 2 hnsecs
memcpyNaive: 74 μs and 2 hnsecs
memcpyASM: 149 μs and 1 hnsec
$ ./test
memcpyD: 2 ms, 593 μs, and 7 hnsecs
memcpyDstdAlg: 77 μs and 3 hnsecs
memcpyC: 75 μs
memcpyNaive: 77 μs and 2 hnsecs
memcpyASM: 145 μs and 5 hnsecs
---

Because of the large amounts of noise, the only conclusion onecan draw from this is that memcpyD is the slowest, followed bythe ASM implementation.

In fact, memcpyC and memcpyNaive produce exactly the same machinecode (without bounds checking), as LLVM recognizes the loop andlowers it into a memcpy. memcpyDstdAlg instead gets turned into avectorized loop, for reasons I didn't investigate any further.


 — David

Re: Replacing C's memcpy with a D implementation

Reply via email to