On Thu, Jan 24, 2013 at 11:24 PM, Stefan Hajnoczi <stefa...@gmail.com>wrote:
> On Thu, Jan 24, 2013 at 6:35 PM, Luigi Rizzo <ri...@iet.unipi.it> wrote: > > >> > > >> > never mind, pilot error. in my test program i had swapped the > >> > arguments to __builtin_memcpy(). With the correct ones, > >> > __builtin_memcpy() == bcopy == memcpy on both machines, > >> > and never faster than the pkt_copy(). > >> > >> Are the bcopy()/memcpy() calls given a length that is a multiple of 64 > bytes? > >> > >> IIUC pkt_copy() assumes 64-byte multiple lengths and that optimization > >> can matches with memcpy(dst, src, (len + 63) & ~63). Maybe it helps and > >> at least ensures they are doing equal amounts of byte copying. > > > > the length is a parameter from the command line. > > For short packets, at least on the i7-2600 and freebsd the pkt_copy() > > is only slightly faster than memcpy on multiples of 64, and *a lot* > > faster when the length is not a multiple. > > How about dropping pkt_copy() and instead rounding the memcpy() length > up to the next 64 byte multiple? > > Using memcpy() is more future-proof IMO, that's why I'm pushing for this. > > fair enough, i'll make this conditional and enable memcpy() rounded to 64 bytes multiples by default (though as i said the pkt_copy() is always at least as fast as memcpy() on all machines i tried. cheers luigi