Hello Christopher,

Thanks for the followup patch.

On Wed, Jun 20, 2018 at 04:42:58PM +0200, Christopher Faulet wrote:
> Hum, ok, forget the previous patch. Here is a second try. It solves the same
> bug using another way. In this patch, all threads must enter in the sync
> point to exit. I hope it will do the trick.

it seems better now, but not completely gone, in a way, I think we now
have a new issue.
this morning, on one test machine I have a process which remains polling
traffic

root       745  3.1  1.8 1007252 918760 ?      Ss   Jun20  30:24
/usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid
-sf 41623 39579 30845 -x /var/lib/haproxy/stats
haproxy  30845  4.0  1.9 1277604 949508 ?      Ssl  07:05   3:31  \_
/usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid
-sf 30836 30826 29600 -x /var/lib/haproxy/stats
haproxy  39579 35.0  1.9 1283460 951520 ?      Ssl  08:30   0:43  \_
/usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid
-sf 39568 39534 30845 39439 -x /var/lib/haproxy/stats
haproxy  41623 32.3  1.9 1285932 954988 ?      Ssl  08:32   0:07  \_
/usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid
-sf 39579 30845 -x /var/lib/haproxy/stats
haproxy  44987 58.0  1.9 1282780 950584 ?      Ssl  08:32   0:00  \_
/usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid
-sf 41623 39579 30845 -x /var/lib/haproxy/stats

process 30845 looks weird.

kill -USR1 30845 does nothing

SigQ:   2/192448
SigPnd: 0000000000000000
SigBlk: 0000000000000800
SigIgn: 0000000000001800
SigCgt: 0000000180300205

#0  0x00007f85e6bee923 in epoll_wait () from /lib64/libc.so.6
#1  0x000055c21fd048f5 in _do_poll (p=<optimized out>, exp=<optimized
out>) at src/ev_epoll.c:172
#2  0x000055c21fd8570b in run_poll_loop () at src/haproxy.c:2432
#3  run_thread_poll_loop (data=data@entry=0x55c2327d3390) at src/haproxy.c:2470
#4  0x000055c21fd01856 in main (argc=<optimized out>, argv=<optimized
out>) at src/haproxy.c:3072


> From c01f5636a0cbe2be18573e455370c4a47f84d59e Mon Sep 17 00:00:00 2001
> From: Christopher Faulet <cfau...@haproxy.com>
> Date: Wed, 20 Jun 2018 16:22:03 +0200
> Subject: [PATCH] BUG/MEDIUM: threads: Use the sync point to check active jobs
>  and exit
>
> When HAProxy is shutting down, it exits the polling loop when there is no jobs
> anymore (jobs == 0). When there is no thread, it works pretty well, but when
> HAProxy is started with several threads, a thread can decide to exit because
> jobs variable reached 0 while another one is processing a task (e.g. a
> health-check). At this stage, the running thread could decide to request a
> synchronization. But because at least one of them has already gone, the others
> will wait infinitly in the sync point and the process will never die.
>
> To fix the bug, when the first thread (and only this one) detects there is no
> active jobs anymore, it requests a synchronization. And in the sync point, all
> threads will check if jobs variable reached 0 to exit the polling loop.

it does explain what was going on indeed.
with nbthread = 4, the process was using 300% cpu, explaining the fact,
one thread is gone and the other are waiting for sync.

I guess it is a good process anyway.
Thanks!

-- 
William

Reply via email to