On 30/07/12 13:24, Andrei Alexandrescu wrote:
On 7/30/12 4:34 AM, Dmitry Olshansky wrote:
On 30-Jul-12 06:01, Andrei Alexandrescu wrote:
In fact memcpy could and should be replaced with word by word copy for
almost all of struct sizes up to ~32 bytes (as size is known in advance
for this particular function pointer i.e. handler!int).

In fact memcpy should be smart enough to do all that, but apparently it
doesn't.


I'd say array ops could and should do this (since compiler has all the
info at compile-time). On the other hand memcpy is just one tired C
function aimed at fast blitting of any memory chunks.
(Even just call/ret pair is too much of overhead in case of int).

memcpy is implemented as an intrinsic on many platforms. I'm not sure
whether it is on dmd, but it is on dmc
(http://www.digitalmars.com/ctg/sc.html), icc, and gcc
(http://software.intel.com/en-us/articles/memcpy-performance/). But then
clearly using simple word assignments wherever possible makes for a more
robust performance profile.

It is an intrinsic on DMD, but it isn't done optimally. Mostly it just compiles to a couple of loads + the single instruction
rep movsd; / rep movsq;
which is perfect for medium-sized lengths when everything is aligned, but once it is longer than a few hundred bytes, it should be done as a function call. (The optimal method involves cache blocking).
Also for very short lengths it should be done as a couple of loads.

Reply via email to