Re: haproxy-1.9-dev [0c026f49e]: 100% CPU when a server goes DOWN with option redispatch

Willy Tarreau Tue, 21 Aug 2018 07:53:08 -0700

On Tue, Aug 21, 2018 at 04:09:55PM +0200, Cyril Bonté wrote:
> Hi Willy,
> Here is another issue seen today with the current dev branch [tests were also 
> made after pulling recent commit 3bcc2699b].
> 
> Since 0c026f49e, when a server status is set to DOWN and option redispatch is 
> enabled, the haproxy process hits 100% CPU.
> Even more, with the latest commits, if haproxy is compiled with DEBUG_FULL, 
> it will simply segfault.
> 
> Here is the minimal configuration for the test:
> listen crash
>     bind :9000
>     option redispatch
>     server non-existent 127.0.0.1:9999 check


OK so this one is related to the first part of the problem that I spotted,
which is that pendconn_redistribute() takes the server lock, which is already
held when entering srv_update_status(). I'm currently studying the other
similar corner cases but it seems for now that it's the only one trying to
take the lock from the callees we have there, so I'll add an unlocked
version.

However I'm more concerned by the calls to lb.set_server_{up,down} that
definitely do not expect to be called concurrently. It looks like at least
for the roundrobin algo it supports a lock that we should use there, but I
have to study the other ones as well.

The rendez-vous point was a much bigger carpet than I imagined it seems...

Willy

Re: haproxy-1.9-dev [0c026f49e]: 100% CPU when a server goes DOWN with option redispatch

Reply via email to