Re: 100% cpu , epoll_wait()

Willy Tarreau Tue, 19 Apr 2016 07:16:45 -0700

Hi guys,

On Tue, Apr 19, 2016 at 02:54:35PM +0200, Lukas Tribus wrote:
> >We use haproxy 1.6.3 (latest CentOS 6.7) and experience similar situation
> >after some reloads (-sf). The old haproxy process does not exit and uses
> >100% cpu, strace showing:
> >epoll_wait(0, {}, 200, 0)               = 0
> >epoll_wait(0, {}, 200, 0)               = 0
> >epoll_wait(0, {}, 200, 0)               = 0
> >epoll_wait(0, {}, 200, 0)               = 0
> >epoll_wait(0, {}, 200, 0)               = 0
> >epoll_wait(0, {}, 200, 0)               = 0
> >
> >In our case, it was a tcp backend tunnelling rsyslog messages. After
> >restarting local rsyslogd, the load was gone and old haproxy instance
> >exited. It's hard to tell how many reloads it takes to make haproxy go
> >crazy or what is the exact reproducible test. But it does not take
> >hundreds of restart, rather 10-20 (our reloads are not very frequent) to
> >make haproxy go crazy.
> 
> Also matches this report from December:
> https://www.mail-archive.com/haproxy@formilux.org/msg20772.html


Yep very likely. The combination of the two reports is very intriguing.
The first one shows the signals being blocked, while the only place where
we block them is in __signal_process_queue() only while calling the handlers
or performing the wakeup() calls, both of which should be instantaneous,
and more importantly the function cannot return without unblocking the
signals.

I still have no idea what is going on, the code looks simple and clear,
and certainly not compatible with such behaviours. I'm still digging.

Willy

Re: 100% cpu , epoll_wait()

Reply via email to