Ed Ravin wrote: > On Tue, Jul 27, 2010 at 09:42:16AM -0700, Alexander Duyck wrote: > >> The fact that the ring size seems to effect the number of packets >> dropped per second implies that there may be some sort of latency issue. >> One thing you might try is different values for rx-usecs via ethtool >> -C. You may find that fixing the value at something fairly low like 33 >> usecs per interrupt may help to reduce the number of rx_fifo_errors. > > "ethtool -C eth0 rx-usecs 33" is accepted, but "ethtool -c eth0" shows > the values unchanged. This is with igb-2.2.9.
I tried to reproduce the issue with rx-usecs here but didn't have much luck. What version of ethtool are you currently running? Perhaps the driver is having an issue with a specific version of ethtool. >> Another factor you will need to take into account is that the ring >> memory should be allocated on the same node the hardware is on. You >> should be able to accomplish that by using taskset with the correct CPU >> mask for the physical ID you are using when calling modprobe/insmod and >> the ifconfig commands to bring up the interfaces. This should help to >> decrease the memory latency and increase the throughput available to the >> adapter. > > The taskset/modprobe trick along with putting all the queues on the same > physical CPU seems to provide the best performance, when using these > igb-2.2.9 settings: > > taskset 0002 modprobe igb RSS=0,0 InterruptThrottleRate=3,3 > > r...@big-tester:~# eth_affinity_tool show eth0 eth1 > 16 CPUs detected > > eth0: ffff 0001 0002 0004 0008 0010 0020 0040 0080 > eth1: ffff 0001 0002 0004 0008 0010 0020 0040 0080 > Based on this it sounds like the PCIe bus for the network interface is likely connected to node 0 on your system. So for performance reasons we will want to keep everything in the mask cpu mask 0xFF to keep both the memory and the PCIe bus local to the node. >> Thanks for the information. One other item I would be interested in >> seeing is the kind of numbers we are talking about. If you could >> provide me with an ethtool -S dump from 10 seconds of one of your tests >> that might be useful for me to better understand the kind of pressures >> the system is under. > > Here's a sample 10-second run of "ethtool -S" on the receiving interface, > after being piped through beforeafter, so these are 10 seconds of each > counter. Counters that are missing were zero, i.e., no changes in the > 10 second run. So dividing these numbers by 10 gives you the per-second > rate. > > In this interval, we're sending far more packets than > can be processed: > > Interval from 20100727.174619 to 20100727.174629 > > NIC statistics: > rx_packets: 11595293 > rx_bytes: 742098688 > rx_long_byte_count: 742098688 > rx_fifo_errors: 163800 > rx_queue_0_packets: 754216 > rx_queue_0_bytes: 45252960 > rx_queue_0_drops: 20475 > rx_queue_1_packets: 760734 > rx_queue_1_bytes: 45644040 > rx_queue_1_drops: 20475 > rx_queue_2_packets: 736546 > rx_queue_2_bytes: 44192760 > rx_queue_2_drops: 20475 > rx_queue_3_packets: 742368 > rx_queue_3_bytes: 44542080 > rx_queue_3_drops: 20475 > rx_queue_4_packets: 661758 > rx_queue_4_bytes: 39705480 > rx_queue_4_drops: 20475 > rx_queue_5_packets: 713095 > rx_queue_5_bytes: 42785706 > rx_queue_5_drops: 20475 > rx_queue_6_packets: 696702 > rx_queue_6_bytes: 41802120 > rx_queue_6_drops: 20475 > rx_queue_7_packets: 705726 > rx_queue_7_bytes: 42343560 > rx_queue_7_drops: 20475 > > And in this interval, we're sending packets just a bit faster than they > can be processed: > > Interval from 20100727.175747 to 20100727.175757 > > NIC statistics: > rx_packets: 3877553 > rx_bytes: 248163392 > rx_long_byte_count: 248163392 > rx_fifo_errors: 6515 > rx_queue_0_packets: 484608 > rx_queue_0_bytes: 29076480 > rx_queue_0_drops: 81 > rx_queue_1_packets: 484690 > rx_queue_1_bytes: 29081400 > rx_queue_2_packets: 484690 > rx_queue_2_bytes: 29081400 > rx_queue_3_packets: 484360 > rx_queue_3_bytes: 29061600 > rx_queue_3_drops: 342 > rx_queue_4_packets: 483207 > rx_queue_4_bytes: 28992420 > rx_queue_4_drops: 1497 > rx_queue_5_packets: 480183 > rx_queue_5_bytes: 28810980 > rx_queue_5_drops: 4521 > rx_queue_6_packets: 484615 > rx_queue_6_bytes: 29076900 > rx_queue_6_drops: 74 > rx_queue_7_packets: 484690 > rx_queue_7_bytes: 29081400 > > Again, to keep things readable, my counter-processing script doesn't > list statistics that didn't increment during the 10 second interval, > so the stats from "ethtool -S" not listed were all zero. > > Note that the traffic I'm using to get these numbers (lots of small UDP > packets) is a denial-of-service scenario - we're more interested > in real-world performance for routing but that's harder to simulate, and > we need to be able to handle DoS attacks so this is the benchmark we're > using. Based on this I would think that a single CPU should be able to handle routing at this rate. One thought that occurred to me is that we might be spreading the load too wide and that could possibly be causing extra latency since the queues are generating interrupts instead of polling. One other thing to try would be to try reducing the number of queues to either 1 or 2 and probably disable QueuePairs. You might try RX on CPU0, TX on CPU1, and if that consumes an entire CPU you might try putting another RX on CPU2 and TX on CPU3. Thanks, Alex ------------------------------------------------------------------------------ The Palm PDK Hot Apps Program offers developers who use the Plug-In Development Kit to bring their C/C++ apps to Palm for a share of $1 Million in cash or HP Products. Visit us here for more details: http://p.sf.net/sfu/dev2dev-palm _______________________________________________ E1000-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired
