fast bcopy...

2012-05-02 Thread Luigi Rizzo
as part of my netmap investigations, i was looking at how expensive are memory copies, and here are a couple of findings (first one is obvious, the second one less so) 1. especially on 64bit machines, always use multiple of at least 8 bytes (possibly even larger units). The bcopy code in amd

Re: fast bcopy...

2012-05-02 Thread Alex Dupre
Luigi Rizzo ha scritto: For small blocks and multiples of 32-64 bytes, i noticed that the following is a lot faster (breaking even at about 1 KBytes) static inline void fast_bcopy(void *_src, void *_dst, int l) { uint64_t *src = _src;

Re: fast bcopy...

2012-05-02 Thread Steven Atreju
Luigi Rizzo wrote: > 2. apparently, bcopy is not the fastest way to copy memory. http://now.cs.berkeley.edu/Td/bcopy.html Best Regards. Steven. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To un

Re: fast bcopy...

2012-05-02 Thread K. Macy
It's highly chipset and processor dependent what works best. Intel now has non-temporal loads and stores which work much better in some cases but provide little benefit in others. -Kip On Wed, May 2, 2012 at 11:52 PM, Steven Atreju wrote: > Luigi Rizzo wrote: >> 2. apparently, bcopy is not the f

Re: fast bcopy...

2012-05-02 Thread Arnaud Lacombe
Hi, On Wed, May 2, 2012 at 5:52 PM, Steven Atreju wrote: > Luigi Rizzo wrote: >> 2. apparently, bcopy is not the fastest way to copy memory. > > http://now.cs.berkeley.edu/Td/bcopy.html > "Pentium 166, Triton Chipset, EDO memory"... ahem. - Arnaud > Best Regards. > > Steven. >

Re: fast bcopy...

2012-05-03 Thread Steven Atreju
K. Macy wrote [2012-05-03 02:58+0200]: > It's highly chipset and processor dependent what works best. Yes, of course. Though i was kinda, even shocked, once i've seen this first: http://marc.info/?l=dragonfly-commits&m=132241713812022&w=2 So we don't use our assembler version for new gccs and

Re: fast bcopy...

2012-05-03 Thread Attilio Rao
2012/5/3, Steven Atreju : > K. Macy wrote [2012-05-03 02:58+0200]: >> It's highly chipset and processor dependent what works best. > > Yes, of course. > Though i was kinda, even shocked, once i've seen this first: > > http://marc.info/?l=dragonfly-commits&m=132241713812022&w=2 > > So we don't use

Re: fast bcopy...

2012-05-03 Thread Gabor Kovesdan
Em 03-05-2012 12:28, Steven Atreju escreveu: Yes, of course. Though i was kinda, even shocked, once i've seen this first: http://marc.info/?l=dragonfly-commits&m=132241713812022&w=2 I also experimented a bit with some trivial libc functions when testing a change for memcpy (still in queue, w

RE: fast bcopy...

2012-05-03 Thread rozhuk . im
> > guess this is a good time to thank the FreeBSD hackers for that FPU > > stack FILD/FISTP idea! > > I'll append the copy related notes of our doc/memperf.txt. > > Thanks, > > I made an implementation of fpu unwinding and mmx copy to see if they > were really making a difference years ago (reimp

Re: fast bcopy...

2012-05-03 Thread Andrew Reilly
On Wed, May 02, 2012 at 08:25:57PM +0200, Luigi Rizzo wrote: > as part of my netmap investigations, i was looking at how > expensive are memory copies, and here are a couple of findings > (first one is obvious, the second one less so) Most C compilers (well, the ones I regularly use) inline small,

Re: fast bcopy...

2012-05-03 Thread Luigi Rizzo
On Fri, May 04, 2012 at 09:44:15AM +1000, Andrew Reilly wrote: > On Wed, May 02, 2012 at 08:25:57PM +0200, Luigi Rizzo wrote: > > as part of my netmap investigations, i was looking at how > > expensive are memory copies, and here are a couple of findings > > (first one is obvious, the second one le