Hi Willy, On Thu, May 21, 2015 at 10:04 PM, Willy Tarreau <w...@1wt.eu> wrote:
> I still have them in my home lab and use them from time to time, yes. > These cards are very interesting to test your software because they > combine very low latency with very little CPU usage in the driver. So > you can reach 10Gbps of forwarding rate with only 25% of one core on > an old core-2 duo, and you're never concerned with ksoftirqd triggering > during your tests. However I found that I was facing some packet rate > limitations with them, meaning it's not possible to reach 10G in TCP > with packets smaller than 1500 bytes and their respective ACKs, which > is about 800kpps in each direction, so in our appliances we have switched > to intel 82599 NICs whose driver is much heavier, but which can saturate > 10G at any packet size. > > > Any thoughts if these would be a win with our workload? Our data rates > are > > relatively small, it's all about request rates. > > I know at least one site who managed to significantly increase their > request rate by switching from gig NICs to myricom for small requests. > If you look there : > > http://www.haproxy.org/10g.html > > You'll see in the old benchmark (2009) that at 2kB objects we were at > about 33k req/s on a single process on the core-2 using haproxy 1.3. > > On recent hardware, intel NICs can go higher than this because you > easily have more CPU power to dedicate to the driver. Reaching 60-100k > is not uncommon with fine tuning on such small request sizes. > ok, I'll keep these 10G NICs in mind, but based on your later comments and I suspect the problem lies elsewhere. > 1000 http 302/s is almost idle. You should barely notice haproxy in "top" > at such a rate. Here's what I'm seeing on my core2quad @3 GHz at exactly > 1000 connections per second : > > Tasks: 178 total, 1 running, 175 sleeping, 2 stopped, 0 zombie > Cpu(s): 0.2%us, 1.0%sy, 0.2%ni, 98.0%id, 0.0%wa, 0.0%hi, 0.5%si, > 0.0%st > Mem: 8168416k total, 7927908k used, 240508k free, 479316k buffers > Swap: 8393956k total, 18148k used, 8375808k free, 4886752k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 9253 willy 20 0 4236 512 412 S 2 0.0 0:00.22 injectl4 > 9240 willy 20 0 3096 1096 812 S 1 0.0 0:01.83 haproxy > 1 root 20 0 828 44 20 S 0 0.0 0:50.36 init > 2 root 20 0 0 0 0 S 0 0.0 0:00.02 kthreadd > 3 root 20 0 0 0 0 S 0 0.0 0:23.72 ksoftirqd/0 > > => 1% of one core for haproxy, 2% for injectl4. > > > For these I would love to have backend connection pooling, I've seen it > > discussed, but based on my reading of the conversations it may be > > problematic to implement? Is this still a likely feature? > > Yes it will have to be implemented for HTTP/2 otherwise we'll lose > server-side keep-alive that took so long to have! It will be useful as > well when haproxy is installed in front of a fast cache like varnish, > because for small objects, most of the CPU power is lost in the > connection setup. > great new I will keep an eye out for this feature. > I think you should definitely run a benchmark of your setup, your > CPU usage numbers still look quite high to me for the load and I'm > still suspecting something might be wrong on the system and/or user. > Even the 20% user for 7k req should be better, as that's what you > could have for 50-100k conn/s. Maybe you have a complex config though, > I don't know (eg: lots of rewrite rules or so). > > A benchmark would tell you how far you are from the limits you could > reach. > I'd love to benchmark, I'm not sure how to make it a realistic representation of our real workload though, which is a little diverse (in a ideal world this would run through different haproxy instances). Our peaks are running our haproxy nodes a little hotter than I would like though hence the interest in if we can optimise or need to add boxes. I think config and haproxy stats are relevant at this point. I inherited this config and although I've changed some things and experimented with some settings. Here's the config: global cpu-map all 0 maxconn 40960 ulimit-n 102455 user haproxy group haproxy daemon quiet stats socket /var/run/haproxy.sock mode 0600 level admin ca-base /etc/ssl/certs crt-base /etc/ssl/private ssl-default-bind-ciphers kEECDH+aRSA+AES:kRSA+AES:+AES256:RC4-SHA:!kEDH:!LOW:!EXP:!MD5:!aNULL:!eNULL defaults mode http balance roundrobin retries 1 option forceclose option dontlognull option redispatch option splice-auto option httpchk GET /ping HTTP/1.0 timeout connect 4000 timeout client 30000 timeout server 7000 maxconn 40960 frontend gw01 bind x.x.x.1:80 default_backend web-nodes http-request set-header X-domain example.com http-request del-header X-protocol frontend gw01-ssl bind x.x.x.1:443 ssl crt web.pem no-sslv3 default_backend web-nodes http-request set-header X-domain example.com http-request set-header X-protocol https frontend gw01-api bind x.x.x.2:80 default_backend web-nodes option http-keep-alive timeout http-keep-alive 3000 http-request set-header X-domain api.com http-request del-header X-protocol acl is_rtb hdr(Host) -i rtb.api.com use_backend web-nodes-keepalive if is_rtb frontend gw01-api-ssl bind x.x.x.2:443 ssl crt api.pem no-sslv3 default_backend web-nodes http-request set-header X-domain api.com http-request set-header X-protocol https backend web-nodes option forceclose retries 2 timeout server 800 option forwardfor header X-forwarded server websvr50 websvr50:80 check inter 3s rise 1 fall 1 slowstart 120s server websvr51 websvr51:80 check inter 3s rise 1 fall 1 slowstart 120s server websvr52 websvr52:80 check inter 3s rise 1 fall 1 slowstart 120s server websvr53 websvr53:80 check inter 3s rise 1 fall 1 slowstart 120s server websvr54 websvr54:80 check inter 3s rise 1 fall 1 slowstart 120s server websvr55 websvr55:80 check inter 3s rise 1 fall 1 slowstart 120s server websvr56 websvr56:80 check inter 3s rise 1 fall 1 slowstart 120s server websvr57 websvr57:80 check inter 3s rise 1 fall 1 slowstart 120s server websvr58 websvr58:80 check inter 3s rise 1 fall 1 slowstart 120s server websvr59 websvr59:80 check inter 3s rise 1 fall 1 slowstart 120s backend web-nodes-keepalive retries 2 timeout server 3000 timeout http-keep-alive 3000 option forwardfor header X-forwarded option http-keep-alive option prefer-last-server server websvr50 websvr50:81 check inter 3s rise 1 fall 1 slowstart 120s server websvr51 websvr51:81 check inter 3s rise 1 fall 1 slowstart 120s server websvr52 websvr52:81 check inter 3s rise 1 fall 1 slowstart 120s server websvr53 websvr53:81 check inter 3s rise 1 fall 1 slowstart 120s server websvr54 websvr54:81 check inter 3s rise 1 fall 1 slowstart 120s server websvr55 websvr55:81 check inter 3s rise 1 fall 1 slowstart 120s server websvr56 websvr56:81 check inter 3s rise 1 fall 1 slowstart 120s server websvr57 websvr57:81 check inter 3s rise 1 fall 1 slowstart 120s server websvr58 websvr58:81 check inter 3s rise 1 fall 1 slowstart 120s server websvr59 websvr59:81 check inter 3s rise 1 fall 1 slowstart 120s Stats attached as a csv, as that makes the most sense. Also, I think based on this the system time is associated with haproxy, as the %cpu for the process looks close to the %us + %sys. top - 18:05:50 up 15 days, 20:23, 3 users, load average: 0.57, 0.59, 0.59 Tasks: 257 total, 3 running, 253 sleeping, 0 stopped, 1 zombie Cpu0 : 20.9%us, 34.4%sy, 0.0%ni, 44.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu1 : 0.7%us, 1.7%sy, 0.0%ni, 85.6%id, 0.0%wa, 0.0%hi, 12.1%si, 0.0%st Cpu2 : 0.3%us, 0.3%sy, 0.0%ni, 89.8%id, 0.0%wa, 0.0%hi, 9.5%si, 0.0%st Cpu3 : 0.0%us, 2.7%sy, 0.0%ni, 76.0%id, 0.0%wa, 0.0%hi, 21.3%si, 0.0%st Mem: 8134648k total, 2413680k used, 5720968k free, 280828k buffers Swap: 8352252k total, 0k used, 8352252k free, 960000k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 45555 haproxy 20 0 541m 519m 1704 R 56 6.5 2743:06 haproxy Let me know if there's more information that would help. When running strace the only thing that stood out was a lot of connect calls returning EINPROGRESS. Regards, Rob
# pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkfail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,rate_max,check_status,check_code,check_duration,hrsp_1xx,hrsp_2xx,hrsp_3xx,hrsp_4xx,hrsp_5xx,hrsp_other,hanafail,req_rate,req_rate_max,req_tot,cli_abrt,srv_abrt,comp_in,comp_out,comp_byp,comp_rsp,lastsess,last_chk,last_agt,qtime,ctime,rtime,ttime, gw01,FRONTEND,,,921,3937,40960,316065731,186788993188,273660973645,0,0,25260698,,,,,OPEN,,,,,,,,,1,2,0,,,,0,694,0,2157,,,,0,274312800,15196076,26549187,6484,263,,703,2165,316064813,,,0,0,0,0,,,,,,,, gw01-ssl,FRONTEND,,,0,17,40960,224753,124005819,157919117,0,0,4641,,,,,OPEN,,,,,,,,,1,3,0,,,,0,1,0,14,,,,0,204974,14394,5385,0,0,,1,17,224753,,,0,0,0,0,,,,,,,, gw01-api,FRONTEND,,,3078,12646,40960,29919731,2814998814871,343579807293,0,0,436060,,,,,OPEN,,,,,,,,,1,4,0,,,,0,91,0,11640,,,,0,1945665195,1447867,453105,12142,1255,,7632,14788,1947610792,,,0,0,0,0,,,,,,,, gw01-api-ssl,FRONTEND,,,0,2,40960,49,11924,15646,0,0,21,,,,,OPEN,,,,,,,,,1,5,0,,,,0,0,0,6,,,,0,12,8,29,0,0,,0,6,49,,,0,0,0,0,,,,,,,, web-nodes,websvr50,0,0,0,159,,29701313,19079729762,27071216769,,0,,0,108,0,0,UP,1,1,0,0,0,325114,0,,1,6,1,,29701313,,2,63,,208,L7OK,200,0,0,27904575,1666392,129426,812,0,0,,,,489,0,,,,,0,OK,,0,0,6,100, web-nodes,websvr51,0,0,3,158,,29701313,19080587377,27065393861,,0,,0,144,0,0,UP,1,1,0,0,0,325114,0,,1,6,2,,29701313,,2,63,,208,L7OK,200,1,0,27903943,1667122,129355,746,0,0,,,,510,0,,,,,0,OK,,0,0,6,188, web-nodes,websvr52,0,0,0,158,,29701312,19079546763,27067098521,,0,,0,115,0,0,UP,1,1,0,0,0,325114,0,,1,6,3,,29701312,,2,62,,209,L7OK,200,0,0,27906076,1665594,128778,749,0,0,,,,525,0,,,,,0,OK,,0,0,5,152, web-nodes,websvr53,0,0,0,158,,29701312,19079499878,27071830653,,0,,0,121,0,0,UP,1,1,0,0,0,325114,0,,1,6,4,,29701312,,2,62,,209,L7OK,200,0,0,27906015,1665985,128532,659,0,0,,,,510,0,,,,,0,OK,,0,0,5,154, web-nodes,websvr54,0,0,0,159,,29701312,19079753315,27070175936,,0,,0,111,0,0,UP,1,1,0,0,0,325114,0,,1,6,5,,29701312,,2,62,,209,L7OK,200,0,0,27903392,1668336,128724,749,0,0,,,,536,0,,,,,0,OK,,0,0,6,133, web-nodes,websvr55,0,0,0,159,,29701312,19081196275,27069258858,,0,,0,159,0,0,UP,1,1,0,0,0,325114,0,,1,6,6,,29701312,,2,62,,209,L7OK,200,0,0,27905182,1666129,129152,690,0,0,,,,519,0,,,,,0,OK,,0,0,5,184, web-nodes,websvr56,0,0,0,159,,29701312,19080359501,27070578815,,0,,0,186,0,0,UP,1,1,0,0,0,325114,0,,1,6,7,,29701312,,2,62,,209,L7OK,200,1,0,27906983,1664643,128765,735,0,0,,,,528,1,,,,,0,OK,,0,0,5,153, web-nodes,websvr57,0,0,0,159,,29701312,19078879042,27067644611,,0,,0,165,0,0,UP,1,1,0,0,0,325114,0,,1,6,8,,29701312,,2,62,,209,L7OK,200,0,0,27907452,1663840,129111,744,0,0,,,,553,0,,,,,0,OK,,0,0,5,141, web-nodes,websvr58,0,0,0,159,,29701312,19079242734,27068678181,,0,,0,189,0,0,UP,1,1,0,0,0,325114,0,,1,6,9,,29701312,,2,62,,209,L7OK,200,0,0,27906771,1664917,128714,721,0,0,,,,561,0,,,,,0,OK,,0,0,5,121, web-nodes,websvr59,0,0,0,159,,29701312,19081163300,27069288822,,0,,0,169,0,0,UP,1,1,0,0,0,325114,0,,1,6,10,,29701312,,2,62,,209,L7OK,200,1,0,27905686,1665387,129340,730,0,0,,,,589,0,,,,,0,OK,,0,0,7,100, web-nodes,BACKEND,0,0,3,2101,16384,297013397,190800125478,270691165027,0,0,,0,1467,0,0,UP,10,10,0,,0,325114,0,,1,6,0,,297013122,,1,626,,2088,,,,0,279056075,16658345,1289897,8801,276,,,,,5595,1,0,0,0,0,0,,,0,0,5,149, web-nodes-keepalive,websvr50,0,0,1,765,,208269519,302973205930,36014045287,,0,,0,0,0,0,UP,1,1,0,0,0,325114,0,,1,7,1,,2349553,,2,707,,1634,L7OK,200,0,0,208263448,0,1735,988,0,0,,,,1168,0,,,,,0,OK,,0,0,2,411, web-nodes-keepalive,websvr51,0,0,18,765,,206646889,300499516936,35854258624,,0,,0,0,0,0,UP,1,1,0,0,0,325114,0,,1,7,2,,2349553,,2,1047,,1742,L7OK,200,0,0,206640866,0,1648,999,0,0,,,,1229,0,,,,,0,OK,,0,0,5,394, web-nodes-keepalive,websvr52,0,0,2,760,,209846832,305627194849,36275504358,,0,,0,0,0,0,UP,1,1,0,0,0,325114,0,,1,7,3,,2349553,,2,716,,1701,L7OK,200,0,0,209840749,0,1733,1009,0,0,,,,1208,0,,,,,0,OK,,0,0,2,397, web-nodes-keepalive,websvr53,0,0,2,769,,209027148,303989521123,36170460329,,0,,0,0,0,0,UP,1,1,0,0,0,325114,0,,1,7,4,,2349553,,2,671,,1579,L7OK,200,0,0,209021088,0,1692,992,0,0,,,,1207,0,,,,,0,OK,,0,0,2,380, web-nodes-keepalive,websvr54,0,0,2,761,,209704197,305574503404,36244787033,,0,,0,0,0,0,UP,1,1,0,0,0,325114,0,,1,7,5,,2349552,,2,913,,1932,L7OK,200,1,0,209698283,0,1603,1023,0,0,,,,1181,0,,,,,0,OK,,0,0,2,394, web-nodes-keepalive,websvr55,0,0,0,765,,181935611,261990441649,32613877556,,0,,0,0,0,0,UP,1,1,0,0,0,325114,0,,1,7,6,,2349552,,2,620,,1482,L7OK,200,0,0,181929848,0,1423,957,0,0,,,,1246,0,,,,,0,OK,,0,0,2,395, web-nodes-keepalive,websvr56,0,0,3,762,,176673560,254250148353,31886997462,,0,,0,0,0,0,UP,1,1,0,1,1,299506,3,,1,7,7,,2348981,,2,912,,1482,L7OK,200,0,0,176667902,0,1365,992,0,0,,,,1219,0,,,,,0,OK,,0,0,2,389, web-nodes-keepalive,websvr57,0,0,0,760,,182422902,262729703430,32671860327,,0,,0,0,0,0,UP,1,1,0,0,0,325114,0,,1,7,8,,2349552,,2,671,,1459,L7OK,200,0,0,182417044,0,1392,936,0,0,,,,1222,0,,,,,0,OK,,0,0,2,378, web-nodes-keepalive,websvr58,0,0,0,768,,177314674,255320996264,31920904185,,0,,0,0,0,0,UP,1,1,0,0,0,325114,0,,1,7,9,,2349552,,2,768,,1498,L7OK,200,0,0,177308904,0,1364,939,0,0,,,,1254,0,,,,,0,OK,,0,0,2,383, web-nodes-keepalive,websvr59,0,0,2,759,,179344533,258138450093,32254581375,,0,,0,0,0,0,UP,1,1,0,0,0,325114,0,,1,7,10,,2349552,,2,597,,1467,L7OK,200,1,0,179338774,0,1384,990,0,0,,,,1279,0,,,,,0,OK,,0,0,2,377, web-nodes-keepalive,BACKEND,0,0,30,7634,4096,1941185868,2811093687341,341907276536,0,0,,0,0,0,0,UP,10,10,0,,0,325114,0,,1,7,0,,23494953,,1,7627,,14770,,,,0,1941126906,0,16667,9825,1242,,,,,12216,0,0,0,0,0,0,,,0,0,3,389,