Hi all, This is not related to haproxy, but I am having a performance issue with number of packets processed. I am running haproxy on a 48 core system (we have 64 such servers at present, which is going to increase for production tessting), where cpus 0,2,4,6,..46 are part of NUMA node 1, and cpus 1,3,5,7,.. 47 are part of NUMA node 2. The systems are running Debian 7, with 3.16.0-23 (kernel has both CONFIG_XPS and CONFIG_RPS enabled). nbproc is set to 12, and each haproxy is bound to cpus 0,2,4, ... 22, so that they are on the same socket, as seen here:
# ps -efF | egrep "hap|PID" | cut -c1-80 UID PID PPID C SZ RSS PSR STIME TTY TIME CMD haproxy 3099 1 17 89697 324024 0 18:37 ? 00:11:19 haproxy -f hap haproxy 3100 1 18 87171 314324 2 18:37 ? 00:12:00 haproxy -f hap haproxy 3101 1 18 87214 305328 4 18:37 ? 00:12:00 haproxy -f hap haproxy 3102 1 19 89215 322676 6 18:37 ? 00:12:02 haproxy -f hap haproxy 3103 1 18 86788 310976 8 18:37 ? 00:11:59 haproxy -f hap haproxy 3104 1 18 87197 314888 10 18:37 ? 00:12:00 haproxy -f hap haproxy 3105 1 18 91311 319784 12 18:37 ? 00:11:59 haproxy -f hap haproxy 3106 1 18 88785 305576 14 18:37 ? 00:12:00 haproxy -f hap haproxy 3107 1 19 90366 326428 16 18:37 ? 00:12:09 haproxy -f hap haproxy 3108 1 19 89758 320780 18 18:37 ? 00:12:09 haproxy -f hap haproxy 3109 1 19 87670 314752 20 18:37 ? 00:12:07 haproxy -f hap haproxy 3110 1 19 87763 316672 22 18:37 ? 00:12:10 haproxy -f hap set_irq_affinity.sh was run on the ixgbe card, and /proc/irq/*/smp_affinity shows that each irq is bound to cpus 0-47 correctly. However, I see that packets are being processed on cpus of the 2nd socket too, though user/system usage is zero on those as haproxy does not run on those cores. The following shows the difference of number of packets processed after 10 seconds on the different rx/tx queues: # ./rx_tx /tmp/ethtool_start /tmp/ethtool_end "Significant" difference in #packets processed after 10 seconds on the various rx/tx queues: Queue# TX RX 0 2623165 2826065 1 2564573 2749859 2 2901998 2801043 3 2636856 2794000 4 2892465 2742228 5 3087442 2795762 6 2936588 2760732 7 2934087 2767705 8 2260933 2767707 9 2165087 2759038 10 2144893 2814390 11 2302304 2835790 12 3037722 2748335 13 2940284 2727689 14 2348277 2830378 15 2117679 2838013 16 2679899 487703 17 2447832 438733 18 2505330 429834 19 2611643 447960 20 2595708 449729 21 2534836 447217 22 2616150 466920 23 2522947 450145 mpstat shows that first 22 even numbered cpus are heavily used, while the odd ones only does softirq processing: Average: CPU %usr %sys %soft %idle Average: 0 15.47 60.0 24.47 0.00 Average: 1 0.00 0.00 12.86 87.14 Average: 2 20.32 58.49 21.19 0.00 Average: 3 0.10 0.00 2. 59 97.30 Average: 4 18.20 60.87 20.93 0.00 Average: 5 0.10 0.00 4.15 95.75 Average: 6 18.75 59.37 21.88 0.00 Average: 7 0.00 0.00 3.03 96.97 Average: 8 22.75 57.71 19.55 0.00 Average: 9 0.00 0.00 2.78 97.22 Average: 10 21.87 57.67 20.47 0.00 Average: 11 0.00 0.00 2.80 97.20 Average: 12 19.48 59.84 20.68 0.00 Average: 13 0.00 0.00 1.76 98.24 Average: 14 22.58 57.16 20.25 0.00 Average: 15 0.00 0.00 1.57 98.43 Average: 16 27.00 67.00 6.00 0.00 Average: 17 0.00 0.07 0.59 99.27 Average: 18 26.17 67.84 5.93 0.07 Average: 19 0.00 0.00 0.15 99.78 Average: 20 26.52 67.36 6.13 0.00 Average: 21 0.00 0.00 0.30 99.63 Average: 22 27.69 66.71 5.60 0.00 Average: 23 0.00 0.00 0.07 99.93 Average: 24 0.00 0.00 0.00 100.00 (remaining are 100% idle) Is there a way to make sure that tx/rx happens only on the cpus that haproxy runs on? The reason I think this is affecting performance is due to locking and IPI: cpu#0 gets skbs and is in softirq handler. netif_receive_skb calls get_rps_cpu() and uses the flow information to find that this skb is for cpu#1. Next cpu#0 calls enqueue_to_backlog() giving the cpu#1 index as parameter, which gets the input_pkt_queue_lock of cpu#1, contending for a lock across nodes, that should normally be only used by cpu#1, and then enqueue's the skb. Finally cpu#0 sends and IPI to cpu#1 to process it's backlog since we added skbs to it. Thanks, - Krishna Kumar -- ------------------------------------------------------------------------------------------------------------------------------------------ This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited. Although Flipkart has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility for any loss or damage arising from the use of this email or attachments