Hi! Can you give us actual numbers? Something that we can easily graph?
Do you have a diff against -HEAD? I'd like to stare at it a bunch and think about merging some stuff in. :-) How are you testing it? Is it something I can set up in our lab and thoroughly thrash? I'm very close to starting an mbuf batching thing to use in a few places like receive, transmit and transmit completion -> free path. I'd be interested in your review/feedback and testing as it sounds like something you can easily stress test there. :) Thanks, -Adrian On 24 August 2013 05:48, Alexander V. Chernikov <melif...@ipfw.ru> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 22.08.2013 00:51, Andre Oppermann wrote: > > On 19.08.2013 13:42, Alexander V. Chernikov wrote: > >> On 14.08.2013 19:48, Luigi Rizzo wrote: > >>> On Wed, Aug 14, 2013 at 05:40:28PM +0200, Marko Zec wrote: > >>>> On Wednesday 14 August 2013 14:40:24 Luigi Rizzo wrote: > >>>>> On Wed, Aug 14, 2013 at 04:15:25PM +0400, Alexander V. > >>>>> Chernikov wrote: > >>> ... > >>>> FWIW, apparently we already have that infrastrucure in place > >>>> - if_rele() calls if_free_internal() only when the last > >>>> reference to the ifnet is dropped, so with little care this > >>>> should be usable for caching ifp pointers w/o fears for > >>>> kernel crashes mentioned above. > >>> maybe Alexander was referring to holding references to the rte > >>> entries returned as a result of the lookup. The rte holds a > >>> reference to the ifp. > >> > >> Yes. Since there is the only refcount which is protected (and is > >> also a huge performance killer). > >> > >> Btw, there is a picture describing IPv4 packet flow from my > >> still-not-written post related network stack performance, maybe > >> it can be useful: > >> http://static.ipfw.ru/images/freebsd_ipv4_flow.png > > > > Wow, that's really cool. Please note that a rmlock doesn't cost > > anything for the read case (unless contended of course). Whereas > > normal rlocks or > We're running this entire stack without singe rwlock (everything is > either converted to rmlock or using lockless data copies with delayed > GC (in_adrr_local and other similar)). It really is fasters, but, > however, due to current process-to-completion routing architecture > this is limited to 5-6MPPS for 12 cores on 2xE5645. > > > rwlocks write to the lock memory location and cause atomic bus lock > > cycles as well as a lot of cache line invalidations across cores. > > The same is true for refcounts. > > > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v2.0.20 (FreeBSD) > Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ > > iEYEARECAAYFAlIYq7QACgkQwcJ4iSZ1q2nFZwCfZLckg4b/iny2CK+bYJa20XxE > y7UAnRZHVr4AZRYnB8acrN54KtRMpvNQ > =0kPb > -----END PGP SIGNATURE----- > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org" > _______________________________________________ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"