Hi David,

On Fri, Jan 29, 2010 at 03:58:09PM -0800, David Birdsong wrote:
> I'm curious what others are doing to achieve high connection rates
> -say 10Kconnections/ second.
> 
> We're serving objects averaging around 100KB, so 10K/sec is a fully
> utilized 1G ethernet card.

No, at 10k/sec you're at 1GB/s or approx 10 Gbps. 100 kB is huge for
an average size. My experiences with common web sites are in the range
from a few hundreds of bytes (buttons, icons, ...) to a few tens of kB
(js, css, images). The more objects you have on a page, the smaller
they are and the higher the hit rate too.

>  I'd like to safely hit 7-800 Mb/sec, but
> interrupts are just eating the machine alive.

you just did not configure your e1000 properly. I'm used to force
InterruptThrottleRate between 5000 and 10000, not more. You have
to set the value as many times as you have NICs.

> Before adjusting the ethernet irq to allow interrupts to be delivered
> to either core instead of just cpu1, I was hitting a limit right
> around 480Mb/sec, cpu1 was taxed out servicing both hardware and
> software interrupts.

Check that you have properly disabled irqbalance. It kills network
performance because its switching rate is too low. You need something
which balances a lot faster or does not balance at all, both of which
are achieved by default by the hardware.

> I adjusted the ethernet card's IRQ to remove it's affinity to cpu1 and
> now the limit is around 560Mb/sec before the machine starts dropping
> packets.  I did this against the advice that this could cause cpu
> cache misses.
> 
> machine is: Intel(R) Core(TM)2 Duo CPU     E7200  @ 2.53GHz

On opterons, I'm used to bind haproxy to one core and the IRQs to
the other one. On Core2, it's generally the opposite, I bind them
to the same CPU. But at 1 Gbps, you should not be saturating a core
with softirqs. Your experience sounds like a massive tx drop which
causes heavy retransmits. Maybe your outgoing bandwidth is capped
by a bufferless equipment (switch...), or maybe you haven't set
enough tx descriptors for your e1000 NIC.

> os: fedora 10 2.6.27.38-170.2.113.fc10.x86_64 #1 SMP
> 
> card: Intel Corporation 82573L Gigabit Ethernet Controller
> 
> I've had some ideas on cutting down interrupts:
> 
>  - jumbo frames behind haproxy (inside my network)

Be careful with jumbos and e1000, I often get page allocation
failures with them. Setting them to 7kB is generally fine though,
as the system only has to allocate 2, not 3 pages.

>  - LRO enabled cards (not even sure what this is yet)

it's Large Receive Upload. The Myricom 10GE NICs support that and it
considerably boosts performance for high packet rates. But we're 10
times above your load. This consists in recomposing large incoming
TCP segments from many small ones, so that the TCP stack has less
IP/TCP headers to process. This is an almost absolute requirement
when processing 800k packets per second (10G @1.5kB). To be honnest,
at gig rate, I don't see a big benefit. Also, LRO cannot be enabled
if you're doing ip forwarding on the same NIC.

> I'm not even exactly sure which cards support either of these features yet.

all e1000 that I know support jumbo frames. Recent kernels support
GRO which is a software version of LRO but which still improves
performance.

> Also, an msi-x card sound like it might reduce interrupts, but I'm
> uncertain....might be trying these soonest.

well, please adjust your NIC's settings first :-)

> Here's some net kernel settings.
> sysctl -A | grep net >  http://pastebin.com/m26e88d16

# net.ipv4.tcp_wmem = 4096        65536   16777216
# net.ipv4.tcp_rmem = 4096        87380   16777216

Try reducing the middle value 2 or 4 fold. You may very well
be lacking socket buffers.

# net.netfilter.nf_conntrack_max = 1310720
# net.netfilter.nf_conntrack_buckets = 16384

well, no wonder you're using a lot of CPU with conntrack enabled
at these session rates. Also, the conntrack_buckets is low compared
to the conntrack_max. I'm used to set it between 1/16 and 1/4 of
the other one to limit the hash table length. But even better
would be not to load the module at all.

Also I suspect you ran it on an idle system since the conntrack_count
is zero. On a live system it should be vey high due to the large
timeouts (especially the tcp_timeout_time_wait at 120 seconds).

> I also have everything out of /proc/net/nestat graphed for the last
> few weeks if anybody wants to see.
> 
> Is this the best I can expect out of the card,  the machine and the
> kernel?  Are there any amount of tuning that can alleviate this?

Well, first please recheck your numbers, especially the average
object size. The worst case are for objects between 5 and 20kB.
They produce large numbers of sessions AND large numbers of bytes,
which increase CPU usage and socket buffer usage. But that's not
a reason for not sustaining the gig rate :-)

Regards,
Willy


Reply via email to