... we get perfectly good throughput without 400k ints a second on the ixgbe driver.
As in, I can easily saturate 2 x 10GE on ixgbe hardware with a handful of flows. That's not terribly difficult. However, there's a few interesting problems that need addressing: * There's lock contention between the transmit side from userland and the TCP timers, and the receive side with ACK processing. Under very high traffic load a lot of lock contention stalls things. We (the royal "we", I'm mostly just doing tooling at the moment) working on that. * There's lock contention on the ARP, routing table and PCB lookups. The latter will go away when we've finally implemented RSS for transmit and receive and then moved things over to using PCB groups on CPUs which have NIC driver threads bound to them. * There's increasing cache thrashing from a larger workload, causing the expensive lookups to be even more expensive. * All the list walks suck. We need to be batching things so we use CPU caches much more efficiently. The idea of using TSO on the transmit side and generic LRO on the receive side is to make the per-packet overhead less. I think we can be much more efficient in general in packet processing, but that's a big task. :-) So, using at least TSO is a big benefit if purely to avoid decomposing things into smaller mbufs and contending on those locks in a very big way. I'm working on PMC to make it easier to use to find these bottlenecks and make the code and data more efficient. Then, likely, I'll end up hacking on generic TSO/LRO, TX/RX RSS queue management and make the PCB group thing default on for SMP machines. I may even take a knife to some of the packet processing overhead. -adrian _______________________________________________ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"