Some suggestions on how to proceed on receive side:

1. support for distributing receive processing across multiple CPUs
(using NIC hw queues).

Multiple hw queues can be used to spread receive processing across CPUs;
this will eliminate main cpu% as a bottleneck for 10GbE performance. 

Using a NIC that supports multiple hw queues and MSI-X, a network driver
can do a decent job on distributing kernel part of receive traffic
processing across CPUs - as long as it is not important which session
lands on which cpu. This part doesn't require any changes outside of the
driver.

This scheme can be further improved upon, if the host tells the driver
what CPU it wished to run a particular session on.
With this information, the driver can steer a session to the same CPU
that the scheduler runs the socket reads on, and achieve the best cache
locality for both kernel and user level rx processing.

So far, the best idea for doing this seems to be the one that Andi came
up with - adding a new callback in the netdevice structure that is
invoked every time a scheduler migrates socket reads to a different cpu.
This would allow the driver to migrate 
the kernel part of rx processing to the same cpu that the read is
running on.
In addition to the cpu number, it will be beneficial to get priority for
the socket as well. This is because NIC capacity for explicit "session
to cpu" steering may not be unlimited.

2. LRO.
This can be arguably left for now to the driver-only implementation,
since the support needed from the stack - ability to accept fragmented
skb that is bigger that MTU - is already there. The only other thing to
consider may be forcing an ACK per LRO frame; not sure if this is
worthwhile...

3. Additional support for multiple hw queues.

In addition to distributing rx processing across multiple CPUs (#1
above), hw queues can be used for other things, like QoS for incoming
traffic. In this case, separate queues for higher priority traffic will
guarantee things like lower latency, better bandwidth, better DoS
protection and more fine-tuned (per queue, not per NIC) interrupt
moderation. 
This part needs more discussion. Possibly NAPI can make some changes to
utilize the feature, and some common user-level configuration options
(via do_ioctl) may be useful too.




> -----Original Message-----
> From: David S. Miller [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, July 12, 2005 12:20 PM
> To: [EMAIL PROTECTED]
> Cc: netdev@vger.kernel.org
> Subject: Re: SKB tutorial, Blog, and NET TODO
> 
> From: "Leonid Grossman" <[EMAIL PROTECTED]>
> Date: Wed, 29 Jun 2005 14:11:13 -0700
> 
> > - TSO support for IPv6
> > - USO (UDP TSO) support
> > - support for multiple hardware queues/channels and TCP traffic 
> > steering; there are number of benefits in the ability to 
> associate TCP 
> > flows with a particular hw queue/cpu/MSI (MSI-X), one of them is 
> > improving receive bottleneck for high-speed networks at 1500mtu
> > - support for Large Receive Offload, mainly to the same end of 
> > reducing cpu utilization and solving 1500 mtu receive bottleneck
> 
> I've added entries for this stuff, thanks for the suggestions.
> 
> I've labelled the TCP flow assosciation and LRO stuff as 
> "Investigate .." because it is still unclear how exactly we 
> should proceed here.
> 
> 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to