On Tue, Aug 21, 2018 at 04:09:55PM +0200, Cyril Bonté wrote: > Hi Willy, > Here is another issue seen today with the current dev branch [tests were also > made after pulling recent commit 3bcc2699b]. > > Since 0c026f49e, when a server status is set to DOWN and option redispatch is > enabled, the haproxy process hits 100% CPU. > Even more, with the latest commits, if haproxy is compiled with DEBUG_FULL, > it will simply segfault. > > Here is the minimal configuration for the test: > listen crash > bind :9000 > option redispatch > server non-existent 127.0.0.1:9999 check
OK so this one is related to the first part of the problem that I spotted, which is that pendconn_redistribute() takes the server lock, which is already held when entering srv_update_status(). I'm currently studying the other similar corner cases but it seems for now that it's the only one trying to take the lock from the callees we have there, so I'll add an unlocked version. However I'm more concerned by the calls to lb.set_server_{up,down} that definitely do not expect to be called concurrently. It looks like at least for the roundrobin algo it supports a lock that we should use there, but I have to study the other ones as well. The rendez-vous point was a much bigger carpet than I imagined it seems... Willy