On 2017/5/19 00:35, Willy Tarreau wrote:
> Hi Patrick,
>
> On Thu, May 18, 2017 at 05:44:30PM -0400, Patrick Hemmer wrote:
>> On 2017/1/17 17:02, Willy Tarreau wrote:
>>> Hi Patrick,
>>>
>>> On Tue, Jan 17, 2017 at 02:33:44AM +0000, Patrick Hemmer wrote:
>>>> So on one of my local development machines haproxy started pegging the
>>>> CPU at 100%
>>>> `strace -T` on the process just shows:
>>>>
>>>> ...
>>>> epoll_wait(0, {}, 200, 0)               = 0 <0.000003>
>>>> epoll_wait(0, {}, 200, 0)               = 0 <0.000003>
>>>> epoll_wait(0, {}, 200, 0)               = 0 <0.000003>
>>>> epoll_wait(0, {}, 200, 0)               = 0 <0.000003>
>>>> epoll_wait(0, {}, 200, 0)               = 0 <0.000003>
>>>> epoll_wait(0, {}, 200, 0)               = 0 <0.000003>
>>>> ...
>>> Hmm not good.
>>>
>>>> Opening it up with gdb, the backtrace shows:
>>>>
>>>> (gdb) bt
>>>> #0  0x00007f4d18ba82a3 in __epoll_wait_nocancel () from /lib64/libc.so.6
>>>> #1  0x00007f4d1a570ebc in _do_poll (p=<optimized out>, exp=-1440976915)
>>>> at src/ev_epoll.c:125
>>>> #2  0x00007f4d1a4d3098 in run_poll_loop () at src/haproxy.c:1737
>>>> #3  0x00007f4d1a4cf2c0 in main (argc=<optimized out>, argv=<optimized
>>>> out>) at src/haproxy.c:2097
>>> Ok so an event is not being processed correctly.
>>>
>>>> This is haproxy 1.7.0 on CentOS/7
>>> Ah, that could be a clue. We've had 2 or 3 very ugly bugs in 1.7.0
>>> and 1.7.1. One of them is responsible for the few outages on haproxy.org
>>> (last one happened today, I left it running to get the core to confirm).
>>> One of them is an issue with the condition to wake up an applet when it
>>> failed to get a buffer first and it could be what you're seeing. The
>>> other ones could possibly cause some memory corruption resulting in
>>> anything.
>>>
>>> Thus I'd strongly urge you to update this one to 1.7.2 (which I'm going
>>> to do on haproxy.org now that I could get a core). Continue to monitor
>>> it but I'd feel much safer after this update.
>>>
>>> Thanks for your report!
>>> Willy
>>>
>> So I just had this issue recur, this time on version 1.7.2.
> OK. If it's still doing it, capturing the output of "show sess all" on
> the CLI could help a lot.
>
> I've looked at the changelog since 1.7.2 and other bugs have since been
> fixed possibly responsible for this :
>   - c691781 ("BUG/MEDIUM: stream: fix client-fin/server-fin handling")
>     => fixes a cause of 100% CPU when timeout client-fin/server-fin are
>        used
>   - 57393fb ("BUG/MEDIUM: buffers: Fix how input/output data are injected 
> into buffe
>     => fixes the computation of free buffer space, it's unknown if we've
>        ever hit this bug. It could be possible that it causes some applets
>        like stats to fail to write and to be called again immediately for
>        example.
>   - 57393fb ("BUG/MEDIUM: buffers: Fix how input/output data are injected 
> into buffe
>     => some filters might be woken up to do nothing. Compression might trigger
>        this.
>
>   - there are a bunch of polling-related fixes which might have got rid of
>     such a bad situation on certain connect() cases, possibly over unix
>     sockets.
>
> There are a few other fixes in the queue that I need to backport but none
> of them is related to this. Thus if you're in emergency, 1.7.5 could help
> by bringing the fixes above. If you can wait a few more days, I expect to
> issue 1.7.6 early next week.
>
> Cheers,
> Willy

Ok, I'm on 1.7.5 and the issue has recurred again.

Here is `show sess all`:

> show sess all
0x7fe603090550: [01/Jun/2017:10:36:10.616812] id=187 proto=unix_stream
source=unix:1
  flags=0x88, conn_retries=0, srv_conn=(nil), pend_pos=(nil)
  frontend=GLOBAL (id=0 mode=tcp), listener=? (id=1) addr=unix:1
  backend=<NONE> (id=-1 mode=-)
  server=<NONE> (id=-1)
  task=0x7fe6030856d0 (state=0x20 nice=-64 calls=2 exp=44s age=13s)
  si[0]=0x7fe603090788 (state=EST flags=0x0c endp0=CONN:0x7fe603090920
exp=<NEVER>, et=0x000)
  si[1]=0x7fe6030907b0 (state=EST flags=0x4018
endp1=APPCTX:0x7fe60309e790 exp=<NEVER>, et=0x000)
  co0=0x7fe603090920 ctrl=unix_stream xprt=RAW data=STRM
target=LISTENER:0x7fe602e50310
      flags=0x0020b300 fd=4 fd.state=24 fd.cache=0 updt=1
  app1=0x7fe60309e790 st0=7 st1=1 st2=3 applet=<CLI>
  req=0x7fe603090560 (f=0xc08202 an=0x0 pipe=0 tofwd=-1 total=40)
      an_exp=<NEVER> rex=<NEVER> wex=<NEVER>
      buf=0x7fe603095b40 data=0x7fe603095b54 o=0 p=0 req.next=0 i=0
size=16384
  res=0x7fe6030905a0 (f=0x80048000 an=0x0 pipe=0 tofwd=-1 total=6)
      an_exp=<NEVER> rex=<NEVER> wex=<NEVER>
      buf=0x7fe602f44de0 data=0x7fe602f44df4 o=0 p=0 rsp.next=0 i=0
size=16384
0x7fe603090c80: [01/Jun/2017:10:36:24.028362] id=0 proto=tcpv4
  flags=0x6, conn_retries=0, srv_conn=(nil), pend_pos=(nil)
  frontend=lb1 (id=4294967295 mode=tcp), listener=? (id=0)
  backend=<NONE> (id=-1 mode=-) addr=172.28.128.16:58246
  server=<NONE> (id=-1) addr=172.28.128.20:1029
  task=0x7fe603090c00 (state=0x08 nice=0 calls=2 exp=3s age=?)
  si[0]=0x7fe603090eb8 (state=EST flags=0x4048
endp0=APPCTX:0x7fe603091050 exp=<NEVER>, et=0x000)
  si[1]=0x7fe603090ee0 (state=EST flags=0x58 endp1=CONN:0x7fe603099b60
exp=<NEVER>, et=0x000)
  app0=0x7fe603091050 st0=7 st1=0 st2=0 applet=<PEER>
  co1=0x7fe603099b60 ctrl=tcpv4 xprt=RAW data=STRM
target=PROXY:0x7fe602e52050
      flags=0x0020b306 fd=1 fd.state=25 fd.cache=0 updt=0
  req=0x7fe603090c90 (f=0x848000 an=0x0 pipe=0 tofwd=-1 total=28)
      an_exp=<NEVER> rex=3s wex=<NEVER>
      buf=0x7fe602e016b0 data=0x7fe602e016c4 o=0 p=0 req.next=0 i=0 size=0
  res=0x7fe603090cd0 (f=0x80408202 an=0x0 pipe=0 tofwd=-1 total=4)
      an_exp=<NEVER> rex=<NEVER> wex=<NEVER>
      buf=0x7fe6030911c0 data=0x7fe6030911d4 o=0 p=0 rsp.next=0 i=0
size=16384
0x7fe603099d50: [01/Jun/2017:10:36:25.050203] id=205 proto=tcpv4
source=172.28.128.18:46932
  flags=0x80, conn_retries=3, srv_conn=(nil), pend_pos=(nil)
  frontend=lb1 (id=4294967295 mode=tcp), listener=? (id=0)
addr=172.28.128.16:1029
  backend=<NONE> (id=-1 mode=-)
  server=<NONE> (id=-1)
  task=0x7fe603099cd0 (state=0x0a nice=0 calls=1 exp=4s age=?)
  si[0]=0x7fe603099f88 (state=EST flags=0x48 endp0=CONN:0x7fe6030854d0
exp=<NEVER>, et=0x000)
  si[1]=0x7fe603099fb0 (state=EST flags=0x4058
endp1=APPCTX:0x7fe60309a120 exp=<NEVER>, et=0x000)
  co0=0x7fe6030854d0 ctrl=tcpv4 xprt=RAW data=STRM
target=LISTENER:0x7fe602e53a90
      flags=0x0020b306 fd=2 fd.state=25 fd.cache=0 updt=0
  app1=0x7fe60309a120 st0=7 st1=0 st2=0 applet=<PEER>
  req=0x7fe603099d60 (f=0xc08200 an=0x0 pipe=0 tofwd=-1 total=28)
      an_exp=<NEVER> rex=4s wex=<NEVER>
      buf=0x7fe60309a290 data=0x7fe60309a2a4 o=0 p=0 req.next=0 i=0
size=16384
  res=0x7fe603099da0 (f=0x80048202 an=0x0 pipe=0 tofwd=-1 total=4)
      an_exp=<NEVER> rex=<NEVER> wex=<NEVER>
      buf=0x7fe602e016b0 data=0x7fe602e016c4 o=0 p=0 rsp.next=0 i=0 size=0


-Patrick

Reply via email to