Re: Replacing C's memcpy with a D implementation

Guillaume Piolat via Digitalmars-d Mon, 11 Jun 2018 11:21:17 -0700

BTW the way memcpy is(was?) implemented in the C runtime comingfrom the Inter C++ compiler was really enlightening on the sheerdifficulty of such a task.

First of all there isn't one loop but many depending on thesource and destination alignment.

- If both are aligned on 16-byte boundaries, source anddestination operand would be with MOVAPS/MOVDQA, nothing special- If only the source or destination was misaligned, the functionwould dispatch to a variant with the core loop loading 16-bytealigned and writing 16-byte unaligned, with the PALIGNRinstruction. However, since PALIGNR can't take a runtime value,this variant was _replicated 16 times_.- I don't remember for both source and destination misaligned butyou can degenerate this case to the above one.

Each of this loop had complicated loop preludes that do the firstiteration, and they are so hard to do by hand.

It was also the only piece of assembly I've seen that(apparently) successfully used the "prefetch" instructions.


This was just the SSE version, AVX was different.

I don't know if someone really wrote this code, or if it was allfrom intrinsics.

Re: Replacing C's memcpy with a D implementation

Reply via email to