Le 19/06/2018 à 16:42, William Dauchy a écrit :
On Tue, Jun 19, 2018 at 4:30 PM William Lallemand
<[email protected]> wrote:
That's interesting, we can suppose that this bug is not related anymore to the
signal problem we had previously.
Looks like it's blocking in the thread sync point.
Are you able to do a backtrace with gdb? that could help a lot.
yes, sorry, forgot to paste it.
#0 0x000056269318f0fb in thread_sync_barrier (barrier=0x5626933fa528
<barrier.27104>) at src/hathreads.c:112
#1 thread_enter_sync () at src/hathreads.c:125
#2 0x00005626931377a2 in sync_poll_loop () at src/haproxy.c:2376
#3 run_poll_loop () at src/haproxy.c:2433
#4 run_thread_poll_loop (data=data@entry=0x5626a7aa1000) at src/haproxy.c:2463
#5 0x00005626930b3856 in main (argc=<optimized out>, argv=<optimized
out>) at src/haproxy.c:3065
Hi William(s),
Here is a patch to avoid a thread to exit its polling loop while others
are waiting in the sync point. It is a theoretical patch because I was
not able to reproduce the bug.
Could you check if it fixes it please ?
Thanks,
--
Christopher Faulet
>From 3576ecdfe108b07c20173d3d82dce5370e796742 Mon Sep 17 00:00:00 2001
From: Christopher Faulet <[email protected]>
Date: Wed, 20 Jun 2018 11:05:03 +0200
Subject: [PATCH] BUG/MEDIUM: threads: Increase jobs when threads must reach
the sync point
When a thread want to pass in the sync point, the jobs number is increased. It's
only done for the first thread requesting the sync point. The jobs number is
decreased when the last thread exits from the sync point.
This is mandatory to avoid a thread to stop its polling loop while it must enter
in the sync point. It is really hard to figure out how to hit the bug. But, in
theory, it is possible for a thread to ask for a synchronization during the
HAProxy shutdown. In this case, we can imagine some threads waiting in the sync
point while anothers are stopping all jobs (listeners, peers...). So a thread
could exit from its polling loop without passing by the sync point, blocking all
others in the sync point and finally letting the process stalled and consuming
all the cpu.
This patch must be backported in 1.8.
---
src/hathreads.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/src/hathreads.c b/src/hathreads.c
index 5db3c2197..295a0b304 100644
--- a/src/hathreads.c
+++ b/src/hathreads.c
@@ -71,8 +71,10 @@ void thread_want_sync()
if (all_threads_mask) {
if (threads_want_sync & tid_bit)
return;
- if (HA_ATOMIC_OR(&threads_want_sync, tid_bit) == tid_bit)
+ if (HA_ATOMIC_OR(&threads_want_sync, tid_bit) == tid_bit) {
+ HA_ATOMIC_ADD(&jobs, 1);
shut_your_big_mouth_gcc(write(threads_sync_pipe[1], "S", 1));
+ }
}
else {
threads_want_sync = 1;
@@ -142,6 +144,7 @@ void thread_exit_sync()
char c;
shut_your_big_mouth_gcc(read(threads_sync_pipe[0], &c, 1));
+ HA_ATOMIC_SUB(&jobs, 1);
fd_done_recv(threads_sync_pipe[0]);
}
--
2.17.1