On Sun, 24 Dec 2006, Oleg Bulyzhin wrote:

We currently make this a lot worse than it needs to be by handing off the received packets one at a time, unlocking and relocking for every packet. It would be better if the driver's receive interrupt handler would harvest all of the incoming packets and queue them locally. Then, at the end, hand off the linked list of packets to the network stack wholesale, unlocking and relocking only once. (Actually, the list could probably be handed off at the very end of the interrupt service routine, after the driver has already dropped its lock.) We wouldn't even need a new primitive, if ether_input() and the other if_input() functions were enhanced to deal with a possible list of packets instead of just a single one.

I try this experiement every few years, and generally don't measure much improvement. I'll try it again with 10gbps early next year once back in the office again. The more interesting transition is between the link layer and the network layer, which is high on my list of topics to look into in the next few weeks. In particular, reworking the ifqueue handoff. The tricky bit is balancing latency, overhead, and concurrency...

FYI, there are several sets of patches floating around to modify if_em to hand off queues of packets to the link layer, etc. They probably need updating, of course, since if_em has changed quite a bit in the last year. In my implementaiton, I add a new input routine that accepts mbuf packet queues.

I'm just curious, do you remember average length of mbuf queue in your tests? While experimenting with bge(4) driver (taskqueue, interrupt moderation, converted bge_rxeof() to above scheme), i've found it's quite easy to exhaust available mbuf clusters under load (trying to queue hundreids of received packets). So i had to limit rx queue to rather low length.

Off-hand, I don't remember. I do remember it being very important to maintain bounds on the size of in-flight packet sets at all levels in the stack -- for the same reason the netisr dispatch queue is bounded. Otherwise if the device is able to keep the device driver entirely busy, you'll effectively live-lock since you never dispatch to the next layer, exhaust available memory, etc, etc. One of the ideas I've been futzing with is "back-pressure" across the netisr and a "checkout" model in which the total length of the queue spanning device driver and dispatch through to the protocol has a total bound with reservations taken by components as they process sets of packets. In this way, the ithread would know the netisr was already in execution and not perform a wakeup (and getting involved in the scheduler), avoid excessive memory consumption, etc. Ed Maste has also suggested changing our notion of mbuf packet queues, as our current queue model requires following linked lists, which make inefficient use of of CPU caches, and instead using arrays of mbuf pointers. I've done a bit of experimentation along these lines, but not enough to investigate the properties well.

Robert N M Watson
Computer Laboratory
University of Cambridge
_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/cvs-all
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to