Re: RFC: issues concerning the next NAPI interface

Jan-Bernd Themann Fri, 24 Aug 2007 11:25:05 -0700

James Chapman schrieb:

Stephen Hemminger wrote:
On Fri, 24 Aug 2007 17:47:15 +0200
Jan-Bernd Themann <[EMAIL PROTECTED]> wrote:
Hi,

On Friday 24 August 2007 17:37, [EMAIL PROTECTED] wrote:
On Fri, Aug 24, 2007 at 03:59:16PM +0200, Jan-Bernd Themann wrote:
.......
3) On modern systems the incoming packets are processed very fast.Especiallyon SMP systems when we use multiple queues we process only afew packetsper napi poll cycle. So NAPI does not work very well here andthe interrupt rate is still high. What we need would be somesort of timer polling mode which will schedule a device after acertain amount of time for high load situations. With highprecision timers this could work well. Currentusual timers are too slow. A finer granularity would be neededto keep the
   latency down (and queue length moderate).
We found the same on ia64-sn systems with tg3 a couple of yearsago. Using simple interrupt coalescing ("don't interrupt untilyou've received N packets or M usecs have elapsed") workedreasonably well in practice. If your h/w supports that (and I'dguess it does, since it's such a simple thing), you might try it.
I don't see how this should work. Our latest machines are fastenough that they
simply empty the queue during the first poll iteration (in most cases).
Even if you wait until X packets have been received, it does nothelp forthe next poll cycle. The average number of packets we process perpoll queueis low. So a timer would be preferable that periodically polls thequeue, without the need of generating a HW interrupt. This wouldallow usto wait until a reasonable amount of packets have been received inthe meantime
to keep the poll overhead low. This would also be useful in combination
with LRO.
You need hardware support for deferred interrupts. Most devices haveit (e1000, sky2, tg3)and it interacts well with NAPI. It is not a generic thing you wantdone by the stack,you want the hardware to hold off interrupts until X packets or Yusecs have expired.
Does hardware interrupt mitigation really interact well with NAPI? Inmy experience, holding off interrupts for X packets or Y usecs doesmore harm than good; such hardware features are useful only when theOS has no NAPI-like mechanism.
When tuning NAPI drivers for packets/sec performance (which is a goodindicator of driver performance), I make sure that the driver stays inNAPI polled mode while it has any rx or tx work to do. If the CPU isfast enough that all work is always completed on each poll, I have thedriver stay in polled mode until dev->poll() is called N times with nowork being done. This keeps interrupts disabled for reasonable trafficlevels, while minimizing packet processing latency. No need forhardware interrupt mitigation.

Yes, that was one idea as well. But the problem with that is thatnet_rx_action will callthe same poll function over and over again in a row if there are nofurther networkdevices. The problem about this approach is that you always poll just avery few packetseach time. This does not work with LRO well, as there are no packets toaggregate...So it would make more sense to wait for a certain time before trying itagain.Second problem: after the jiffies incremented by one in net_rx_action(after some poll rounds), net_rx_action will quit and return control tothe softIRQ handler. The poll functionis called again as the softIRQ handler thinks there is more work to bedone. So eventhen we do not wait... After some rounds in the softIRQ handler, wefinally wait some time.

The parameters for controlling it are already in ethtool, the issueis finding a gooddefault set of values for a wide range of applications andarchitectures. Maybe someheuristic based on processor speed would be a good starting point.The dynamic irq
moderation stuff is not widely used because it is too hard to get right.
I agree. It would be nice to find a way for the typical user to derivebest values for these knobs for his/her particular system. Perhaps atool using pktgen and network device phy internal loopback could bedeveloped?



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RFC: issues concerning the next NAPI interface

Reply via email to