Hi Willy,

On Thu, May 21, 2015 at 10:04 PM, Willy Tarreau <w...@1wt.eu> wrote:

> I still have them in my home lab and use them from time to time, yes.
> These cards are very interesting to test your software because they
> combine very low latency with very little CPU usage in the driver. So
> you can reach 10Gbps of forwarding rate with only 25% of one core on
> an old core-2 duo, and you're never concerned with ksoftirqd triggering
> during your tests. However I found that I was facing some packet rate
> limitations with them, meaning it's not possible to reach 10G in TCP
> with packets smaller than 1500 bytes and their respective ACKs, which
> is about 800kpps in each direction, so in our appliances we have switched
> to intel 82599 NICs whose driver is much heavier, but which can saturate
> 10G at any packet size.
>
> > Any thoughts if these would be a win with our workload? Our data rates
> are
> > relatively small, it's all about request rates.
>
> I know at least one site who managed to significantly increase their
> request rate by switching from gig NICs to myricom for small requests.
> If you look there :
>
>     http://www.haproxy.org/10g.html
>
> You'll see in the old benchmark (2009) that at 2kB objects we were at
> about 33k req/s on a single process on the core-2 using haproxy 1.3.
>
> On recent hardware, intel NICs can go higher than this because you
> easily have more CPU power to dedicate to the driver. Reaching 60-100k
> is not uncommon with fine tuning on such small request sizes.
>

ok, I'll keep these 10G NICs in mind, but based on your later comments and
I suspect the problem lies elsewhere.


> 1000 http 302/s is almost idle. You should barely notice haproxy in "top"
> at such a rate. Here's what I'm seeing on my core2quad @3 GHz at exactly
> 1000 connections per second :
>
> Tasks: 178 total,   1 running, 175 sleeping,   2 stopped,   0 zombie
> Cpu(s):  0.2%us,  1.0%sy,  0.2%ni, 98.0%id,  0.0%wa,  0.0%hi,  0.5%si,
> 0.0%st
> Mem:   8168416k total,  7927908k used,   240508k free,   479316k buffers
> Swap:  8393956k total,    18148k used,  8375808k free,  4886752k cached
>
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  9253 willy     20   0  4236  512  412 S    2  0.0   0:00.22 injectl4
>  9240 willy     20   0  3096 1096  812 S    1  0.0   0:01.83 haproxy
>     1 root      20   0   828   44   20 S    0  0.0   0:50.36 init
>     2 root      20   0     0    0    0 S    0  0.0   0:00.02 kthreadd
>     3 root      20   0     0    0    0 S    0  0.0   0:23.72 ksoftirqd/0
>
> => 1% of one core for haproxy, 2% for injectl4.
>
> > For these I would love to have backend connection pooling, I've seen it
> > discussed, but based on my reading of the conversations it may be
> > problematic to implement? Is this still a likely feature?
>
> Yes it will have to be implemented for HTTP/2 otherwise we'll lose
> server-side keep-alive that took so long to have! It will be useful as
> well when haproxy is installed in front of a fast cache like varnish,
> because for small objects, most of the CPU power is lost in the
> connection setup.
>

great new I will keep an eye out for this feature.


> I think you should definitely run a benchmark of your setup, your
> CPU usage numbers still look quite high to me for the load and I'm
> still suspecting something might be wrong on the system and/or user.
> Even the 20% user for 7k req should be better, as that's what you
> could have for 50-100k conn/s. Maybe you have a complex config though,
> I don't know (eg: lots of rewrite rules or so).
>
> A benchmark would tell you how far you are from the limits you could
> reach.
>

I'd love to benchmark, I'm not sure how to make it a realistic
representation of our real workload though, which is a little diverse (in a
ideal world this would run through different haproxy instances). Our peaks
are running our haproxy nodes a little hotter than I would like though
hence the interest in if we can optimise or need to add boxes.

I think config and haproxy stats are relevant at this point. I inherited
this config and although I've changed some things and experimented with
some settings.

Here's the config:

global
        cpu-map all 0
        maxconn 40960
        ulimit-n 102455
        user haproxy
        group haproxy
        daemon
        quiet
        stats socket /var/run/haproxy.sock mode 0600 level admin
        ca-base /etc/ssl/certs
        crt-base /etc/ssl/private
        ssl-default-bind-ciphers
kEECDH+aRSA+AES:kRSA+AES:+AES256:RC4-SHA:!kEDH:!LOW:!EXP:!MD5:!aNULL:!eNULL

defaults
        mode            http
        balance         roundrobin
        retries         1
        option        forceclose
        option          dontlognull
        option          redispatch
        option        splice-auto
        option          httpchk GET /ping HTTP/1.0
        timeout connect 4000
        timeout client  30000
        timeout server  7000
        maxconn        40960

frontend gw01
       bind x.x.x.1:80
       default_backend web-nodes
       http-request set-header X-domain example.com
       http-request del-header X-protocol

frontend gw01-ssl
       bind x.x.x.1:443 ssl crt web.pem no-sslv3
       default_backend web-nodes
       http-request set-header X-domain example.com
       http-request set-header X-protocol https

frontend gw01-api
       bind x.x.x.2:80
       default_backend web-nodes
       option http-keep-alive
       timeout http-keep-alive 3000
       http-request set-header X-domain api.com
       http-request del-header X-protocol
       acl is_rtb hdr(Host) -i rtb.api.com
       use_backend web-nodes-keepalive if is_rtb

frontend gw01-api-ssl
       bind x.x.x.2:443 ssl crt api.pem no-sslv3
       default_backend web-nodes
       http-request set-header X-domain api.com
       http-request set-header X-protocol https

backend web-nodes
       option forceclose
       retries         2
       timeout server 800
       option forwardfor header X-forwarded
       server websvr50 websvr50:80 check inter 3s rise 1 fall 1 slowstart
120s
       server websvr51 websvr51:80 check inter 3s rise 1 fall 1 slowstart
120s
       server websvr52 websvr52:80 check inter 3s rise 1 fall 1 slowstart
120s
       server websvr53 websvr53:80 check inter 3s rise 1 fall 1 slowstart
120s
       server websvr54 websvr54:80 check inter 3s rise 1 fall 1 slowstart
120s
       server websvr55 websvr55:80 check inter 3s rise 1 fall 1 slowstart
120s
       server websvr56 websvr56:80 check inter 3s rise 1 fall 1 slowstart
120s
       server websvr57 websvr57:80 check inter 3s rise 1 fall 1 slowstart
120s
       server websvr58 websvr58:80 check inter 3s rise 1 fall 1 slowstart
120s
       server websvr59 websvr59:80 check inter 3s rise 1 fall 1 slowstart
120s

backend web-nodes-keepalive
       retries         2
       timeout server 3000
       timeout http-keep-alive 3000
       option forwardfor header X-forwarded
       option http-keep-alive
       option prefer-last-server
       server websvr50 websvr50:81 check inter 3s rise 1 fall 1 slowstart
120s
       server websvr51 websvr51:81 check inter 3s rise 1 fall 1 slowstart
120s
       server websvr52 websvr52:81 check inter 3s rise 1 fall 1 slowstart
120s
       server websvr53 websvr53:81 check inter 3s rise 1 fall 1 slowstart
120s
       server websvr54 websvr54:81 check inter 3s rise 1 fall 1 slowstart
120s
       server websvr55 websvr55:81 check inter 3s rise 1 fall 1 slowstart
120s
       server websvr56 websvr56:81 check inter 3s rise 1 fall 1 slowstart
120s
       server websvr57 websvr57:81 check inter 3s rise 1 fall 1 slowstart
120s
       server websvr58 websvr58:81 check inter 3s rise 1 fall 1 slowstart
120s
       server websvr59 websvr59:81 check inter 3s rise 1 fall 1 slowstart
120s

Stats attached as a csv, as that makes the most sense.

Also, I think based on this the system time is associated with haproxy, as
the %cpu for the process looks close to the %us + %sys.

top - 18:05:50 up 15 days, 20:23,  3 users,  load average: 0.57, 0.59, 0.59
Tasks: 257 total,   3 running, 253 sleeping,   0 stopped,   1 zombie
Cpu0  : 20.9%us, 34.4%sy,  0.0%ni, 44.7%id,  0.0%wa,  0.0%hi,  0.0%si,
0.0%st
Cpu1  :  0.7%us,  1.7%sy,  0.0%ni, 85.6%id,  0.0%wa,  0.0%hi, 12.1%si,
0.0%st
Cpu2  :  0.3%us,  0.3%sy,  0.0%ni, 89.8%id,  0.0%wa,  0.0%hi,  9.5%si,
0.0%st
Cpu3  :  0.0%us,  2.7%sy,  0.0%ni, 76.0%id,  0.0%wa,  0.0%hi, 21.3%si,
0.0%st
Mem:   8134648k total,  2413680k used,  5720968k free,   280828k buffers
Swap:  8352252k total,        0k used,  8352252k free,   960000k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
COMMAND
45555 haproxy   20   0  541m 519m 1704 R   56  6.5   2743:06 haproxy

Let me know if there's more information that would help. When running
strace the only thing that stood out was a lot of connect calls returning
EINPROGRESS.

Regards,

Rob
# pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkfail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,rate_max,check_status,check_code,check_duration,hrsp_1xx,hrsp_2xx,hrsp_3xx,hrsp_4xx,hrsp_5xx,hrsp_other,hanafail,req_rate,req_rate_max,req_tot,cli_abrt,srv_abrt,comp_in,comp_out,comp_byp,comp_rsp,lastsess,last_chk,last_agt,qtime,ctime,rtime,ttime,
gw01,FRONTEND,,,921,3937,40960,316065731,186788993188,273660973645,0,0,25260698,,,,,OPEN,,,,,,,,,1,2,0,,,,0,694,0,2157,,,,0,274312800,15196076,26549187,6484,263,,703,2165,316064813,,,0,0,0,0,,,,,,,,
gw01-ssl,FRONTEND,,,0,17,40960,224753,124005819,157919117,0,0,4641,,,,,OPEN,,,,,,,,,1,3,0,,,,0,1,0,14,,,,0,204974,14394,5385,0,0,,1,17,224753,,,0,0,0,0,,,,,,,,
gw01-api,FRONTEND,,,3078,12646,40960,29919731,2814998814871,343579807293,0,0,436060,,,,,OPEN,,,,,,,,,1,4,0,,,,0,91,0,11640,,,,0,1945665195,1447867,453105,12142,1255,,7632,14788,1947610792,,,0,0,0,0,,,,,,,,
gw01-api-ssl,FRONTEND,,,0,2,40960,49,11924,15646,0,0,21,,,,,OPEN,,,,,,,,,1,5,0,,,,0,0,0,6,,,,0,12,8,29,0,0,,0,6,49,,,0,0,0,0,,,,,,,,
web-nodes,websvr50,0,0,0,159,,29701313,19079729762,27071216769,,0,,0,108,0,0,UP,1,1,0,0,0,325114,0,,1,6,1,,29701313,,2,63,,208,L7OK,200,0,0,27904575,1666392,129426,812,0,0,,,,489,0,,,,,0,OK,,0,0,6,100,
web-nodes,websvr51,0,0,3,158,,29701313,19080587377,27065393861,,0,,0,144,0,0,UP,1,1,0,0,0,325114,0,,1,6,2,,29701313,,2,63,,208,L7OK,200,1,0,27903943,1667122,129355,746,0,0,,,,510,0,,,,,0,OK,,0,0,6,188,
web-nodes,websvr52,0,0,0,158,,29701312,19079546763,27067098521,,0,,0,115,0,0,UP,1,1,0,0,0,325114,0,,1,6,3,,29701312,,2,62,,209,L7OK,200,0,0,27906076,1665594,128778,749,0,0,,,,525,0,,,,,0,OK,,0,0,5,152,
web-nodes,websvr53,0,0,0,158,,29701312,19079499878,27071830653,,0,,0,121,0,0,UP,1,1,0,0,0,325114,0,,1,6,4,,29701312,,2,62,,209,L7OK,200,0,0,27906015,1665985,128532,659,0,0,,,,510,0,,,,,0,OK,,0,0,5,154,
web-nodes,websvr54,0,0,0,159,,29701312,19079753315,27070175936,,0,,0,111,0,0,UP,1,1,0,0,0,325114,0,,1,6,5,,29701312,,2,62,,209,L7OK,200,0,0,27903392,1668336,128724,749,0,0,,,,536,0,,,,,0,OK,,0,0,6,133,
web-nodes,websvr55,0,0,0,159,,29701312,19081196275,27069258858,,0,,0,159,0,0,UP,1,1,0,0,0,325114,0,,1,6,6,,29701312,,2,62,,209,L7OK,200,0,0,27905182,1666129,129152,690,0,0,,,,519,0,,,,,0,OK,,0,0,5,184,
web-nodes,websvr56,0,0,0,159,,29701312,19080359501,27070578815,,0,,0,186,0,0,UP,1,1,0,0,0,325114,0,,1,6,7,,29701312,,2,62,,209,L7OK,200,1,0,27906983,1664643,128765,735,0,0,,,,528,1,,,,,0,OK,,0,0,5,153,
web-nodes,websvr57,0,0,0,159,,29701312,19078879042,27067644611,,0,,0,165,0,0,UP,1,1,0,0,0,325114,0,,1,6,8,,29701312,,2,62,,209,L7OK,200,0,0,27907452,1663840,129111,744,0,0,,,,553,0,,,,,0,OK,,0,0,5,141,
web-nodes,websvr58,0,0,0,159,,29701312,19079242734,27068678181,,0,,0,189,0,0,UP,1,1,0,0,0,325114,0,,1,6,9,,29701312,,2,62,,209,L7OK,200,0,0,27906771,1664917,128714,721,0,0,,,,561,0,,,,,0,OK,,0,0,5,121,
web-nodes,websvr59,0,0,0,159,,29701312,19081163300,27069288822,,0,,0,169,0,0,UP,1,1,0,0,0,325114,0,,1,6,10,,29701312,,2,62,,209,L7OK,200,1,0,27905686,1665387,129340,730,0,0,,,,589,0,,,,,0,OK,,0,0,7,100,
web-nodes,BACKEND,0,0,3,2101,16384,297013397,190800125478,270691165027,0,0,,0,1467,0,0,UP,10,10,0,,0,325114,0,,1,6,0,,297013122,,1,626,,2088,,,,0,279056075,16658345,1289897,8801,276,,,,,5595,1,0,0,0,0,0,,,0,0,5,149,
web-nodes-keepalive,websvr50,0,0,1,765,,208269519,302973205930,36014045287,,0,,0,0,0,0,UP,1,1,0,0,0,325114,0,,1,7,1,,2349553,,2,707,,1634,L7OK,200,0,0,208263448,0,1735,988,0,0,,,,1168,0,,,,,0,OK,,0,0,2,411,
web-nodes-keepalive,websvr51,0,0,18,765,,206646889,300499516936,35854258624,,0,,0,0,0,0,UP,1,1,0,0,0,325114,0,,1,7,2,,2349553,,2,1047,,1742,L7OK,200,0,0,206640866,0,1648,999,0,0,,,,1229,0,,,,,0,OK,,0,0,5,394,
web-nodes-keepalive,websvr52,0,0,2,760,,209846832,305627194849,36275504358,,0,,0,0,0,0,UP,1,1,0,0,0,325114,0,,1,7,3,,2349553,,2,716,,1701,L7OK,200,0,0,209840749,0,1733,1009,0,0,,,,1208,0,,,,,0,OK,,0,0,2,397,
web-nodes-keepalive,websvr53,0,0,2,769,,209027148,303989521123,36170460329,,0,,0,0,0,0,UP,1,1,0,0,0,325114,0,,1,7,4,,2349553,,2,671,,1579,L7OK,200,0,0,209021088,0,1692,992,0,0,,,,1207,0,,,,,0,OK,,0,0,2,380,
web-nodes-keepalive,websvr54,0,0,2,761,,209704197,305574503404,36244787033,,0,,0,0,0,0,UP,1,1,0,0,0,325114,0,,1,7,5,,2349552,,2,913,,1932,L7OK,200,1,0,209698283,0,1603,1023,0,0,,,,1181,0,,,,,0,OK,,0,0,2,394,
web-nodes-keepalive,websvr55,0,0,0,765,,181935611,261990441649,32613877556,,0,,0,0,0,0,UP,1,1,0,0,0,325114,0,,1,7,6,,2349552,,2,620,,1482,L7OK,200,0,0,181929848,0,1423,957,0,0,,,,1246,0,,,,,0,OK,,0,0,2,395,
web-nodes-keepalive,websvr56,0,0,3,762,,176673560,254250148353,31886997462,,0,,0,0,0,0,UP,1,1,0,1,1,299506,3,,1,7,7,,2348981,,2,912,,1482,L7OK,200,0,0,176667902,0,1365,992,0,0,,,,1219,0,,,,,0,OK,,0,0,2,389,
web-nodes-keepalive,websvr57,0,0,0,760,,182422902,262729703430,32671860327,,0,,0,0,0,0,UP,1,1,0,0,0,325114,0,,1,7,8,,2349552,,2,671,,1459,L7OK,200,0,0,182417044,0,1392,936,0,0,,,,1222,0,,,,,0,OK,,0,0,2,378,
web-nodes-keepalive,websvr58,0,0,0,768,,177314674,255320996264,31920904185,,0,,0,0,0,0,UP,1,1,0,0,0,325114,0,,1,7,9,,2349552,,2,768,,1498,L7OK,200,0,0,177308904,0,1364,939,0,0,,,,1254,0,,,,,0,OK,,0,0,2,383,
web-nodes-keepalive,websvr59,0,0,2,759,,179344533,258138450093,32254581375,,0,,0,0,0,0,UP,1,1,0,0,0,325114,0,,1,7,10,,2349552,,2,597,,1467,L7OK,200,1,0,179338774,0,1384,990,0,0,,,,1279,0,,,,,0,OK,,0,0,2,377,
web-nodes-keepalive,BACKEND,0,0,30,7634,4096,1941185868,2811093687341,341907276536,0,0,,0,0,0,0,UP,10,10,0,,0,325114,0,,1,7,0,,23494953,,1,7627,,14770,,,,0,1941126906,0,16667,9825,1242,,,,,12216,0,0,0,0,0,0,,,0,0,3,389,

Reply via email to