Hi David, On Fri, Jan 29, 2010 at 03:58:09PM -0800, David Birdsong wrote: > I'm curious what others are doing to achieve high connection rates > -say 10Kconnections/ second. > > We're serving objects averaging around 100KB, so 10K/sec is a fully > utilized 1G ethernet card.
No, at 10k/sec you're at 1GB/s or approx 10 Gbps. 100 kB is huge for an average size. My experiences with common web sites are in the range from a few hundreds of bytes (buttons, icons, ...) to a few tens of kB (js, css, images). The more objects you have on a page, the smaller they are and the higher the hit rate too. > I'd like to safely hit 7-800 Mb/sec, but > interrupts are just eating the machine alive. you just did not configure your e1000 properly. I'm used to force InterruptThrottleRate between 5000 and 10000, not more. You have to set the value as many times as you have NICs. > Before adjusting the ethernet irq to allow interrupts to be delivered > to either core instead of just cpu1, I was hitting a limit right > around 480Mb/sec, cpu1 was taxed out servicing both hardware and > software interrupts. Check that you have properly disabled irqbalance. It kills network performance because its switching rate is too low. You need something which balances a lot faster or does not balance at all, both of which are achieved by default by the hardware. > I adjusted the ethernet card's IRQ to remove it's affinity to cpu1 and > now the limit is around 560Mb/sec before the machine starts dropping > packets. I did this against the advice that this could cause cpu > cache misses. > > machine is: Intel(R) Core(TM)2 Duo CPU E7200 @ 2.53GHz On opterons, I'm used to bind haproxy to one core and the IRQs to the other one. On Core2, it's generally the opposite, I bind them to the same CPU. But at 1 Gbps, you should not be saturating a core with softirqs. Your experience sounds like a massive tx drop which causes heavy retransmits. Maybe your outgoing bandwidth is capped by a bufferless equipment (switch...), or maybe you haven't set enough tx descriptors for your e1000 NIC. > os: fedora 10 2.6.27.38-170.2.113.fc10.x86_64 #1 SMP > > card: Intel Corporation 82573L Gigabit Ethernet Controller > > I've had some ideas on cutting down interrupts: > > - jumbo frames behind haproxy (inside my network) Be careful with jumbos and e1000, I often get page allocation failures with them. Setting them to 7kB is generally fine though, as the system only has to allocate 2, not 3 pages. > - LRO enabled cards (not even sure what this is yet) it's Large Receive Upload. The Myricom 10GE NICs support that and it considerably boosts performance for high packet rates. But we're 10 times above your load. This consists in recomposing large incoming TCP segments from many small ones, so that the TCP stack has less IP/TCP headers to process. This is an almost absolute requirement when processing 800k packets per second (10G @1.5kB). To be honnest, at gig rate, I don't see a big benefit. Also, LRO cannot be enabled if you're doing ip forwarding on the same NIC. > I'm not even exactly sure which cards support either of these features yet. all e1000 that I know support jumbo frames. Recent kernels support GRO which is a software version of LRO but which still improves performance. > Also, an msi-x card sound like it might reduce interrupts, but I'm > uncertain....might be trying these soonest. well, please adjust your NIC's settings first :-) > Here's some net kernel settings. > sysctl -A | grep net > http://pastebin.com/m26e88d16 # net.ipv4.tcp_wmem = 4096 65536 16777216 # net.ipv4.tcp_rmem = 4096 87380 16777216 Try reducing the middle value 2 or 4 fold. You may very well be lacking socket buffers. # net.netfilter.nf_conntrack_max = 1310720 # net.netfilter.nf_conntrack_buckets = 16384 well, no wonder you're using a lot of CPU with conntrack enabled at these session rates. Also, the conntrack_buckets is low compared to the conntrack_max. I'm used to set it between 1/16 and 1/4 of the other one to limit the hash table length. But even better would be not to load the module at all. Also I suspect you ran it on an idle system since the conntrack_count is zero. On a live system it should be vey high due to the large timeouts (especially the tcp_timeout_time_wait at 120 seconds). > I also have everything out of /proc/net/nestat graphed for the last > few weeks if anybody wants to see. > > Is this the best I can expect out of the card, the machine and the > kernel? Are there any amount of tuning that can alleviate this? Well, first please recheck your numbers, especially the average object size. The worst case are for objects between 5 and 20kB. They produce large numbers of sessions AND large numbers of bytes, which increase CPU usage and socket buffer usage. But that's not a reason for not sustaining the gig rate :-) Regards, Willy