Don:
>Which means that memcpy probably isn't anywhere near optimal, either.<

Time ago I have read an article written by AMD that shows that indeed with 
modern CPUs there are ways to go much faster, using vector asm instructions, 
loop unrolling and explicit cache prefetching (but it's useful with longer 
arrays only. Profile-driven optimization can tell you if a particular copy 
usually copies lot of data, and use such kind of copy, that is 
overkill/slow/too much code for the cache for smaller copies. As an alternative 
the programmer may add some annotation to choose what copy strategy to use, but 
this is not nice).

Bye,
bearophile

Reply via email to