Ed Ravin wrote: > I'm testing out an AMD Opteron 6128 platform as a router with two igb > interfaces. The kernel is 2.6.34.1 (32 bit), with igb-2.2.9. > > We're seeing lower performance than we expect when testing it with > denial-of-service style traffic (source IPs random spoofed). > > I have gc_interval and gc_elasticity set to 1 to reduce the route-cache > lookups, and have set the igb parameters to: > > IntMode=2,2 QueuePairs=0,0 RSS=4,4 InterruptThrottleRate=3,3 > > The RX ring is set to the max, 4096. TX ring is set to 512. > > The traffic is generated with the "mz" tool - source spoofed, short UDP > packets. > > The CPUs are nowhere near maxed out - I set affinity for the four RX > queues on the interface receiving the traffic to CPUs 0,1,2,3 and they > are around 50% utilized, and CPUS 8,9,10,11 have the queues on the > interface forwarding the traffic and they are around 10% utilized. > But nonetheless, we're dropping 2-3000 packets per second (rx_fifo_errors > according to ethtool). > > I suspect I'm running into a NUMA issue or some other bottleneck. Any > suggestions for how to get the maximum throughput out of this platform?
I am kind of suspecting NUMA issues to be an issue in this as well. The rx_fifo_errors are an indication that the RX interrupts cannot allocate buffers fast enough and so the adapter is dropping packets. My recommendations are to do the following.: 1. Set the RX and TX ring sizes to 256. This makes it so that all the descriptors for each ring fit within a 4K single page. 2. You may want to just stack all of the same queues on the same CPU, so rx/tx 0 for both ports on CPU 0, rx/tx1 on CPU 1, etc. This way you can keep the memory local and reduce cross cpu and cross node allocation and free. 3. You could probably also set the RSS value to 0 and see how many queues this gives you. Depending on what hardware you have there may be more queues available and if the CPUS contain a stack of queues as I suggested in item 2 then spreading it out over more CPUs would be advisable. 4. One other thing that might be useful would be to put a static entry into your ARP table for the destination IPs you are routing too. I have seen instances where this can cause packets to be dropped due to a delay in obtaining the MAC address via ARP. That is what I can think up off of the top of my head. Other than that if you could provide more information on the system it would be useful. Perhaps an lspci -vvv, a dump of /proc/cpuinfo, and /proc/zoneinfo. With that I can give more detailed steps on the layout that might provide the best performance. Thanks, Alex ------------------------------------------------------------------------------ The Palm PDK Hot Apps Program offers developers who use the Plug-In Development Kit to bring their C/C++ apps to Palm for a share of $1 Million in cash or HP Products. Visit us here for more details: http://ad.doubleclick.net/clk;226879339;13503038;l? http://clk.atdmt.com/CRS/go/247765532/direct/01/ _______________________________________________ E1000-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired
