This all sounds like the discussions we had within HP-UX between 10.20 and 11.0 concerning Inbound Packet Scheduling vs Thread Optimized Packet Scheduling. IPS was done by the 10.20 stack at the handoff between the driver and netisr. If the packet was not an IP datagram fragment, parts of the transport and IP headers would be hashed, and the result would be the netisr queue to which the packet would be queued for further processing.

It worked fine and dandy for stuff like aggregate netperf TCP_RR tests because there was a 1-1 correspondence between a connection and a process/thread. It was "OK" for the networking to dictate where the process should run. That feels rather like a NIC that would hash packets and pick the MSI-X based on that.

However, as Andi discusses, when there is a process/thread doing more than one connection, picking a CPU based on addressing hashing will be like TweedleDee and TweedleDum telling Alice to go in opposite directions. Hence TOPS in 11.X. This time, when there is a "normal" lookup location in the path, where the application last accessed the socket is determined, and things shift-over to that CPU. This then is the process (well actually the scheduler) telling networking where it should do its work.

That addresses the multiple connections per thread/process and still works just as well for 1-1. There are still issues if you have mutiple threads/processes concurrently accessing the same socket/connection, but that one is much more rare.

Nirvana I suppose would be the addition of a field in the header which could be used for the determination of where to process. A Transport Protocol option I suppose, maybe the IPv6 flow id, but knuth only knows if anyone would go for something along those lines. It does though mean that the "state" is per-packet without it having to be based on addressing information. Almost like RDMA arriving saying where the data goes, but this thing says where the processing should happen :)

rick jones
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to