On Thu, Jan 24, 2013 at 11:24 PM, Stefan Hajnoczi <stefa...@gmail.com>wrote:

> On Thu, Jan 24, 2013 at 6:35 PM, Luigi Rizzo <ri...@iet.unipi.it> wrote:
>


> >> >
> >> > never mind, pilot error. in my test program i had swapped the
> >> > arguments to __builtin_memcpy(). With the correct ones,
> >> > __builtin_memcpy()  == bcopy == memcpy on both machines,
> >> > and never faster than the pkt_copy().
> >>
> >> Are the bcopy()/memcpy() calls given a length that is a multiple of 64
> bytes?
> >>
> >> IIUC pkt_copy() assumes 64-byte multiple lengths and that optimization
> >> can matches with memcpy(dst, src, (len + 63) & ~63).  Maybe it helps and
> >> at least ensures they are doing equal amounts of byte copying.
> >
> > the length is a parameter from the command line.
> > For short packets, at least on the i7-2600 and freebsd the pkt_copy()
> > is only slightly faster than memcpy on multiples of 64, and *a lot*
> > faster when the length is not a multiple.
>
> How about dropping pkt_copy() and instead rounding the memcpy() length
> up to the next 64 byte multiple?
>

> Using memcpy() is more future-proof IMO, that's why I'm pushing for this.
>
>
fair enough, i'll make this conditional and enable memcpy() rounded to 64
bytes
multiples by default (though as i said the pkt_copy() is always at least
as fast as memcpy() on all machines i tried.

cheers
luigi

Reply via email to