Hello Christopher, Thanks for the followup patch.
On Wed, Jun 20, 2018 at 04:42:58PM +0200, Christopher Faulet wrote: > Hum, ok, forget the previous patch. Here is a second try. It solves the same > bug using another way. In this patch, all threads must enter in the sync > point to exit. I hope it will do the trick. it seems better now, but not completely gone, in a way, I think we now have a new issue. this morning, on one test machine I have a process which remains polling traffic root 745 3.1 1.8 1007252 918760 ? Ss Jun20 30:24 /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -sf 41623 39579 30845 -x /var/lib/haproxy/stats haproxy 30845 4.0 1.9 1277604 949508 ? Ssl 07:05 3:31 \_ /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -sf 30836 30826 29600 -x /var/lib/haproxy/stats haproxy 39579 35.0 1.9 1283460 951520 ? Ssl 08:30 0:43 \_ /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -sf 39568 39534 30845 39439 -x /var/lib/haproxy/stats haproxy 41623 32.3 1.9 1285932 954988 ? Ssl 08:32 0:07 \_ /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -sf 39579 30845 -x /var/lib/haproxy/stats haproxy 44987 58.0 1.9 1282780 950584 ? Ssl 08:32 0:00 \_ /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -sf 41623 39579 30845 -x /var/lib/haproxy/stats process 30845 looks weird. kill -USR1 30845 does nothing SigQ: 2/192448 SigPnd: 0000000000000000 SigBlk: 0000000000000800 SigIgn: 0000000000001800 SigCgt: 0000000180300205 #0 0x00007f85e6bee923 in epoll_wait () from /lib64/libc.so.6 #1 0x000055c21fd048f5 in _do_poll (p=<optimized out>, exp=<optimized out>) at src/ev_epoll.c:172 #2 0x000055c21fd8570b in run_poll_loop () at src/haproxy.c:2432 #3 run_thread_poll_loop (data=data@entry=0x55c2327d3390) at src/haproxy.c:2470 #4 0x000055c21fd01856 in main (argc=<optimized out>, argv=<optimized out>) at src/haproxy.c:3072 > From c01f5636a0cbe2be18573e455370c4a47f84d59e Mon Sep 17 00:00:00 2001 > From: Christopher Faulet <[email protected]> > Date: Wed, 20 Jun 2018 16:22:03 +0200 > Subject: [PATCH] BUG/MEDIUM: threads: Use the sync point to check active jobs > and exit > > When HAProxy is shutting down, it exits the polling loop when there is no jobs > anymore (jobs == 0). When there is no thread, it works pretty well, but when > HAProxy is started with several threads, a thread can decide to exit because > jobs variable reached 0 while another one is processing a task (e.g. a > health-check). At this stage, the running thread could decide to request a > synchronization. But because at least one of them has already gone, the others > will wait infinitly in the sync point and the process will never die. > > To fix the bug, when the first thread (and only this one) detects there is no > active jobs anymore, it requests a synchronization. And in the sync point, all > threads will check if jobs variable reached 0 to exit the polling loop. it does explain what was going on indeed. with nbthread = 4, the process was using 300% cpu, explaining the fact, one thread is gone and the other are waiting for sync. I guess it is a good process anyway. Thanks! -- William

