On 2017/5/19 00:35, Willy Tarreau wrote: > Hi Patrick, > > On Thu, May 18, 2017 at 05:44:30PM -0400, Patrick Hemmer wrote: >> On 2017/1/17 17:02, Willy Tarreau wrote: >>> Hi Patrick, >>> >>> On Tue, Jan 17, 2017 at 02:33:44AM +0000, Patrick Hemmer wrote: >>>> So on one of my local development machines haproxy started pegging the >>>> CPU at 100% >>>> `strace -T` on the process just shows: >>>> >>>> ... >>>> epoll_wait(0, {}, 200, 0) = 0 <0.000003> >>>> epoll_wait(0, {}, 200, 0) = 0 <0.000003> >>>> epoll_wait(0, {}, 200, 0) = 0 <0.000003> >>>> epoll_wait(0, {}, 200, 0) = 0 <0.000003> >>>> epoll_wait(0, {}, 200, 0) = 0 <0.000003> >>>> epoll_wait(0, {}, 200, 0) = 0 <0.000003> >>>> ... >>> Hmm not good. >>> >>>> Opening it up with gdb, the backtrace shows: >>>> >>>> (gdb) bt >>>> #0 0x00007f4d18ba82a3 in __epoll_wait_nocancel () from /lib64/libc.so.6 >>>> #1 0x00007f4d1a570ebc in _do_poll (p=<optimized out>, exp=-1440976915) >>>> at src/ev_epoll.c:125 >>>> #2 0x00007f4d1a4d3098 in run_poll_loop () at src/haproxy.c:1737 >>>> #3 0x00007f4d1a4cf2c0 in main (argc=<optimized out>, argv=<optimized >>>> out>) at src/haproxy.c:2097 >>> Ok so an event is not being processed correctly. >>> >>>> This is haproxy 1.7.0 on CentOS/7 >>> Ah, that could be a clue. We've had 2 or 3 very ugly bugs in 1.7.0 >>> and 1.7.1. One of them is responsible for the few outages on haproxy.org >>> (last one happened today, I left it running to get the core to confirm). >>> One of them is an issue with the condition to wake up an applet when it >>> failed to get a buffer first and it could be what you're seeing. The >>> other ones could possibly cause some memory corruption resulting in >>> anything. >>> >>> Thus I'd strongly urge you to update this one to 1.7.2 (which I'm going >>> to do on haproxy.org now that I could get a core). Continue to monitor >>> it but I'd feel much safer after this update. >>> >>> Thanks for your report! >>> Willy >>> >> So I just had this issue recur, this time on version 1.7.2. > OK. If it's still doing it, capturing the output of "show sess all" on > the CLI could help a lot. > > I've looked at the changelog since 1.7.2 and other bugs have since been > fixed possibly responsible for this : > - c691781 ("BUG/MEDIUM: stream: fix client-fin/server-fin handling") > => fixes a cause of 100% CPU when timeout client-fin/server-fin are > used > - 57393fb ("BUG/MEDIUM: buffers: Fix how input/output data are injected > into buffe > => fixes the computation of free buffer space, it's unknown if we've > ever hit this bug. It could be possible that it causes some applets > like stats to fail to write and to be called again immediately for > example. > - 57393fb ("BUG/MEDIUM: buffers: Fix how input/output data are injected > into buffe > => some filters might be woken up to do nothing. Compression might trigger > this. > > - there are a bunch of polling-related fixes which might have got rid of > such a bad situation on certain connect() cases, possibly over unix > sockets. > > There are a few other fixes in the queue that I need to backport but none > of them is related to this. Thus if you're in emergency, 1.7.5 could help > by bringing the fixes above. If you can wait a few more days, I expect to > issue 1.7.6 early next week. > > Cheers, > Willy
Ok, I'm on 1.7.5 and the issue has recurred again. Here is `show sess all`: > show sess all 0x7fe603090550: [01/Jun/2017:10:36:10.616812] id=187 proto=unix_stream source=unix:1 flags=0x88, conn_retries=0, srv_conn=(nil), pend_pos=(nil) frontend=GLOBAL (id=0 mode=tcp), listener=? (id=1) addr=unix:1 backend=<NONE> (id=-1 mode=-) server=<NONE> (id=-1) task=0x7fe6030856d0 (state=0x20 nice=-64 calls=2 exp=44s age=13s) si[0]=0x7fe603090788 (state=EST flags=0x0c endp0=CONN:0x7fe603090920 exp=<NEVER>, et=0x000) si[1]=0x7fe6030907b0 (state=EST flags=0x4018 endp1=APPCTX:0x7fe60309e790 exp=<NEVER>, et=0x000) co0=0x7fe603090920 ctrl=unix_stream xprt=RAW data=STRM target=LISTENER:0x7fe602e50310 flags=0x0020b300 fd=4 fd.state=24 fd.cache=0 updt=1 app1=0x7fe60309e790 st0=7 st1=1 st2=3 applet=<CLI> req=0x7fe603090560 (f=0xc08202 an=0x0 pipe=0 tofwd=-1 total=40) an_exp=<NEVER> rex=<NEVER> wex=<NEVER> buf=0x7fe603095b40 data=0x7fe603095b54 o=0 p=0 req.next=0 i=0 size=16384 res=0x7fe6030905a0 (f=0x80048000 an=0x0 pipe=0 tofwd=-1 total=6) an_exp=<NEVER> rex=<NEVER> wex=<NEVER> buf=0x7fe602f44de0 data=0x7fe602f44df4 o=0 p=0 rsp.next=0 i=0 size=16384 0x7fe603090c80: [01/Jun/2017:10:36:24.028362] id=0 proto=tcpv4 flags=0x6, conn_retries=0, srv_conn=(nil), pend_pos=(nil) frontend=lb1 (id=4294967295 mode=tcp), listener=? (id=0) backend=<NONE> (id=-1 mode=-) addr=172.28.128.16:58246 server=<NONE> (id=-1) addr=172.28.128.20:1029 task=0x7fe603090c00 (state=0x08 nice=0 calls=2 exp=3s age=?) si[0]=0x7fe603090eb8 (state=EST flags=0x4048 endp0=APPCTX:0x7fe603091050 exp=<NEVER>, et=0x000) si[1]=0x7fe603090ee0 (state=EST flags=0x58 endp1=CONN:0x7fe603099b60 exp=<NEVER>, et=0x000) app0=0x7fe603091050 st0=7 st1=0 st2=0 applet=<PEER> co1=0x7fe603099b60 ctrl=tcpv4 xprt=RAW data=STRM target=PROXY:0x7fe602e52050 flags=0x0020b306 fd=1 fd.state=25 fd.cache=0 updt=0 req=0x7fe603090c90 (f=0x848000 an=0x0 pipe=0 tofwd=-1 total=28) an_exp=<NEVER> rex=3s wex=<NEVER> buf=0x7fe602e016b0 data=0x7fe602e016c4 o=0 p=0 req.next=0 i=0 size=0 res=0x7fe603090cd0 (f=0x80408202 an=0x0 pipe=0 tofwd=-1 total=4) an_exp=<NEVER> rex=<NEVER> wex=<NEVER> buf=0x7fe6030911c0 data=0x7fe6030911d4 o=0 p=0 rsp.next=0 i=0 size=16384 0x7fe603099d50: [01/Jun/2017:10:36:25.050203] id=205 proto=tcpv4 source=172.28.128.18:46932 flags=0x80, conn_retries=3, srv_conn=(nil), pend_pos=(nil) frontend=lb1 (id=4294967295 mode=tcp), listener=? (id=0) addr=172.28.128.16:1029 backend=<NONE> (id=-1 mode=-) server=<NONE> (id=-1) task=0x7fe603099cd0 (state=0x0a nice=0 calls=1 exp=4s age=?) si[0]=0x7fe603099f88 (state=EST flags=0x48 endp0=CONN:0x7fe6030854d0 exp=<NEVER>, et=0x000) si[1]=0x7fe603099fb0 (state=EST flags=0x4058 endp1=APPCTX:0x7fe60309a120 exp=<NEVER>, et=0x000) co0=0x7fe6030854d0 ctrl=tcpv4 xprt=RAW data=STRM target=LISTENER:0x7fe602e53a90 flags=0x0020b306 fd=2 fd.state=25 fd.cache=0 updt=0 app1=0x7fe60309a120 st0=7 st1=0 st2=0 applet=<PEER> req=0x7fe603099d60 (f=0xc08200 an=0x0 pipe=0 tofwd=-1 total=28) an_exp=<NEVER> rex=4s wex=<NEVER> buf=0x7fe60309a290 data=0x7fe60309a2a4 o=0 p=0 req.next=0 i=0 size=16384 res=0x7fe603099da0 (f=0x80048202 an=0x0 pipe=0 tofwd=-1 total=4) an_exp=<NEVER> rex=<NEVER> wex=<NEVER> buf=0x7fe602e016b0 data=0x7fe602e016c4 o=0 p=0 rsp.next=0 i=0 size=0 -Patrick