On Sun, 24 Dec 2006, Oleg Bulyzhin wrote:
We currently make this a lot worse than it needs to be by handing off the
received packets one at a time, unlocking and relocking for every packet.
It would be better if the driver's receive interrupt handler would harvest
all of the incoming packets and queue them locally. Then, at the end, hand
off the linked list of packets to the network stack wholesale, unlocking
and relocking only once. (Actually, the list could probably be handed off
at the very end of the interrupt service routine, after the driver has
already dropped its lock.) We wouldn't even need a new primitive, if
ether_input() and the other if_input() functions were enhanced to deal
with a possible list of packets instead of just a single one.
I try this experiement every few years, and generally don't measure much
improvement. I'll try it again with 10gbps early next year once back in
the office again. The more interesting transition is between the link
layer and the network layer, which is high on my list of topics to look
into in the next few weeks. In particular, reworking the ifqueue handoff.
The tricky bit is balancing latency, overhead, and concurrency...
FYI, there are several sets of patches floating around to modify if_em to
hand off queues of packets to the link layer, etc. They probably need
updating, of course, since if_em has changed quite a bit in the last year.
In my implementaiton, I add a new input routine that accepts mbuf packet
queues.
I'm just curious, do you remember average length of mbuf queue in your
tests? While experimenting with bge(4) driver (taskqueue, interrupt
moderation, converted bge_rxeof() to above scheme), i've found it's quite
easy to exhaust available mbuf clusters under load (trying to queue
hundreids of received packets). So i had to limit rx queue to rather low
length.
Off-hand, I don't remember. I do remember it being very important to maintain
bounds on the size of in-flight packet sets at all levels in the stack -- for
the same reason the netisr dispatch queue is bounded. Otherwise if the device
is able to keep the device driver entirely busy, you'll effectively live-lock
since you never dispatch to the next layer, exhaust available memory, etc,
etc. One of the ideas I've been futzing with is "back-pressure" across the
netisr and a "checkout" model in which the total length of the queue spanning
device driver and dispatch through to the protocol has a total bound with
reservations taken by components as they process sets of packets. In this
way, the ithread would know the netisr was already in execution and not
perform a wakeup (and getting involved in the scheduler), avoid excessive
memory consumption, etc. Ed Maste has also suggested changing our notion of
mbuf packet queues, as our current queue model requires following linked
lists, which make inefficient use of of CPU caches, and instead using arrays
of mbuf pointers. I've done a bit of experimentation along these lines, but
not enough to investigate the properties well.
Robert N M Watson
Computer Laboratory
University of Cambridge
_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/cvs-all
To unsubscribe, send any mail to "[EMAIL PROTECTED]"