Re: gcc will become the best optimizing x86 compiler

Agner Fog Wed, 30 Jul 2008 23:14:23 -0700

Denys Vlasenko wrote:

I tend to doubt that odd-byte aligned large memcpys are anywhere
near typical. malloc and mmap both return well-aligned buffers
(say, 8 byte aligned). Static and on-stack objects are also
at least word-aligned 99% of the time.


memcpy can just use "relatively simple" code for copies in which
either src or dst is not word aligned. This cuts possibilities down
from 16 to 4 (or even 2?).

The XMM code is still more than 3 times faster than rep movsl when dataare aligned by 4 or 8, but not by 16.Even if odd addresses are rare, they must be supported, but we can putthe most common cases first.strcpy and strcat can be implemented efficiently simply by callingstrlen and memcpy, since both strlen and memcpy can be optimized verywell. This can give unaligned addresses.


Dennis Clarke wrote:

You forgot to look at PowerPC :

http://cvs.opensolaris.org/source/xref/ppc-dev/ppc-dev/usr/src/lib/libc/ppc/gen/memcpy.s

is that nice and small ?

.. and slow. Why doesn't it use Altivec?

Re: gcc will become the best optimizing x86 compiler

Reply via email to