Re: [E1000-devel] getting max throughput from AMD Opteron w/igb-2.2.9

Alexander Duyck Mon, 26 Jul 2010 08:53:35 -0700

Ed Ravin wrote:
> I'm testing out an AMD Opteron 6128 platform as a router with two igb
> interfaces.  The kernel is 2.6.34.1 (32 bit), with igb-2.2.9.
> 
> We're seeing lower performance than we expect when testing it with 
> denial-of-service style traffic (source IPs random spoofed).
> 
> I have gc_interval and gc_elasticity set to 1 to reduce the route-cache
> lookups, and have set the igb parameters to:
> 
>    IntMode=2,2 QueuePairs=0,0 RSS=4,4 InterruptThrottleRate=3,3
> 
> The RX ring is set to the max, 4096.  TX ring is set to 512.
> 
> The traffic is generated with the "mz" tool - source spoofed, short UDP
> packets.
> 
> The CPUs are nowhere near maxed out - I set affinity for the four RX
> queues on the interface receiving the traffic to CPUs 0,1,2,3 and they
> are around 50% utilized, and CPUS 8,9,10,11 have the queues on the
> interface forwarding the traffic and they are around 10% utilized.
> But nonetheless, we're dropping 2-3000 packets per second (rx_fifo_errors
> according to ethtool).
> 
> I suspect I'm running into a NUMA issue or some other bottleneck.  Any
> suggestions for how to get the maximum throughput out of this platform?


I am kind of suspecting NUMA issues to be an issue in this as well.  The 
rx_fifo_errors are an indication that the RX interrupts cannot allocate 
buffers fast enough and so the adapter is dropping packets.

My recommendations are to do the following.:
1.  Set the RX and TX ring sizes to 256.  This makes it so that all the 
descriptors for each ring fit within a 4K single page.
2.  You may want to just stack all of the same queues on the same CPU, 
so rx/tx 0 for both ports on CPU 0, rx/tx1 on CPU 1, etc.  This way you 
can keep the memory local and reduce cross cpu and cross node allocation 
and free.
3.  You could probably also set the RSS value to 0 and see how many 
queues this gives you.  Depending on what hardware you have there may be 
more queues available and if the CPUS contain a stack of queues as I 
suggested in item 2 then spreading it out over more CPUs would be advisable.
4.  One other thing that might be useful would be to put a static entry 
into your ARP table for the destination IPs you are routing too.  I have 
seen instances where this can cause packets to be dropped due to a delay 
  in obtaining the MAC address via ARP.

That is what I can think up off of the top of my head.  Other than that 
if you could provide more information on the system it would be useful. 
  Perhaps an lspci -vvv, a dump of /proc/cpuinfo, and /proc/zoneinfo. 
With that I can give more detailed steps on the layout that might 
provide the best performance.

Thanks,

Alex


------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share 
of $1 Million in cash or HP Products. Visit us here for more details:
http://ad.doubleclick.net/clk;226879339;13503038;l?
http://clk.atdmt.com/CRS/go/247765532/direct/01/
_______________________________________________
E1000-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Re: [E1000-devel] getting max throughput from AMD Opteron w/igb-2.2.9

Reply via email to