It's a big topic, here's the stuff to google: - cat /proc/interrupts - look for "eth0", "eth1", etc. If you have one interrupt assigned to eth0, you have a single-queue NIC.
If you have many interrupts that look like "eth0-0, eth0-1", etc, you have a multi-queue NIC. These can have their interrupts spread out more. Use either irqbalance (probably a bad idea) or echoing values into /proc/irq/nn/smp_affinity (google for help with this), to spread out the interrupts. You can then experiment with using `taskset` to bind memcached to the same CPUs as the interrupts, or to different CPU's and see if the throughput changes. - look up "linux sysctl network tuning" This tends to give you crap like this: net.ipv4.ip_local_port_range = 9500 65536 net.core.rmem_max = 1048576 net.core.wmem_max = 1048576 net.ipv4.tcp_rmem = 4096 87380 4194304 net.ipv4.tcp_wmem = 4096 43690 4194304 net.ipv4.tcp_fin_timeout = 30 net.ipv4.tcp_syncookies = 0 net.ipv4.tcp_max_orphans = 65536 net.ipv4.tcp_max_syn_backlog = 16384 #net.ipv4.tcp_synack_retries = 2 net.core.netdev_budget=1000 net.ipv4.tcp_max_tw_buckets = 1512000 Which will vary by if you're using persistent connections or not (ie; how high the turnover is). Don't blindly copy/paste this stuff. - Read up on all the "ethtool" options available for your NIC. ensure the defaults work. NICs are all configured for a balance between packet latency and throughput. The more interrupts they coalesce within the driver, the higher potential latency of packet return. This can be really hard to go through and the settings will vary by every NIC you fiddle with. I've managed to get differences of 40k-60k pps by tuning these values. Often the latency doesn't get much worse. - Use a recent kernel. On a particular piece of recent hardware I doubled packet throughput by moving from 2.6.27 to 2.6.32. I was not able to push 2.6.18 hard without having it drop networking. - Use a more recent kernel http://kernelnewbies.org/Linux_2_6_35#head-94daf753b96280181e79a71ca4bb7f7a423e302a I haven't played with this much yet, but it looks like a BFD, especially if you're stuck with single-queue NIC's. - Get a better NIC. 10ge NICs have awesomesauce features for shoveling more packets around. There're different levels of how awesome straight-gbit NICs are as well. I like the high-end intels more than most of the broadcoms, for instance. - Don't think running multiple instances of memcached will make much of a difference. Maybe run more threads though, or try pinning them to a set of CPU's or a particular CPU. On Tue, 28 Sep 2010, Jay Paroline wrote: > We've run into this exact same issue and narrowed it down to the NIC, > but don't really know where to go from there. I'm going to look up > Dormando's suggestions but if anyone else has experience with this and > can point us in the right direction, it would be greatly appreciated. > > Thanks, > > Jay > > On Sep 27, 2:34 pm, dormando <dorma...@rydia.net> wrote: > > > We have an 2 x quad core server with 32 gb ram. If many clients > > > connect to this server (only memcached runs on it) the first core run > > > to nearly 100 % use by si (software interrups) and so some client > > > can't reach the server. > > > Memcached runs currently with 4 threads and with version (1.4.2). All > > > other cores have 70 % idle so I ask me is there a possibility to > > > improve the performance? > > > > This is an issue with how your network interrupts are being routed, not > > with how memcached is being threaded. > > > > Wish I had some good links offhand for this, because it's a little obscure > > to deal with. In short; you'll want to balance your network interrupts > > across cores. Google for blog posts about smp_affinity for network cards > > and irqbalance (which poorly tries to automatically do this). > > > > Depending on how many NIC's you have and if it's multiqueued or not you'll > > have to tune it differently. Linux 2.6.35 has some features for extending > > the speed of single-queued NIC's (find the pages discussing it on > > kernelnewbies.org). >