Le 20/06/2018 à 15:11, William Dauchy a écrit :
Hello Christopher,

Thank you for the quick answer and the patch.

On Wed, Jun 20, 2018 at 11:32 AM Christopher Faulet <[email protected]> wrote:
Here is a patch to avoid a thread to exit its polling loop while others
are waiting in the sync point. It is a theoretical patch because I was
not able to reproduce the bug.
Could you check if it fixes it please ?

Unfortunately, it does not fix the issue; I am getting the same backtrace:
#0  0x000055f30960a10b in thread_sync_barrier (barrier=0x55f309875528
<barrier.27101>) at src/hathreads.c:114
#1  thread_enter_sync () at src/hathreads.c:127
#2  0x000055f3095b27a2 in sync_poll_loop () at src/haproxy.c:2376
#3  run_poll_loop () at src/haproxy.c:2433
#4  run_thread_poll_loop (data=data@entry=0x55f31c4b0c50) at src/haproxy.c:2463
#5  0x000055f30952e856 in main (argc=<optimized out>, argv=<optimized
out>) at src/haproxy.c:3065

however the backtrace is different on some machines:

#0  0x00007f31fd89ff57 in pthread_join () from /lib64/libpthread.so.0
#1  0x000055e184a5486a in main (argc=<optimized out>, argv=<optimized
out>) at src/haproxy.c:3069


Hum, ok, forget the previous patch. Here is a second try. It solves the same bug using another way. In this patch, all threads must enter in the sync point to exit. I hope it will do the trick.

--
Christopher Faulet
>From c01f5636a0cbe2be18573e455370c4a47f84d59e Mon Sep 17 00:00:00 2001
From: Christopher Faulet <[email protected]>
Date: Wed, 20 Jun 2018 16:22:03 +0200
Subject: [PATCH] BUG/MEDIUM: threads: Use the sync point to check active jobs
 and exit

When HAProxy is shutting down, it exits the polling loop when there is no jobs
anymore (jobs == 0). When there is no thread, it works pretty well, but when
HAProxy is started with several threads, a thread can decide to exit because
jobs variable reached 0 while another one is processing a task (e.g. a
health-check). At this stage, the running thread could decide to request a
synchronization. But because at least one of them has already gone, the others
will wait infinitly in the sync point and the process will never die.

To fix the bug, when the first thread (and only this one) detects there is no
active jobs anymore, it requests a synchronization. And in the sync point, all
threads will check if jobs variable reached 0 to exit the polling loop.

This patch must be backported in 1.8.
---
 src/haproxy.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/src/haproxy.c b/src/haproxy.c
index 1c0455c56..5906f7989 100644
--- a/src/haproxy.c
+++ b/src/haproxy.c
@@ -2372,10 +2372,12 @@ void mworker_pipe_register()
 	fd_want_recv(mworker_pipe[0]);
 }
 
-static void sync_poll_loop()
+static int sync_poll_loop()
 {
+	int stop = 0;
+
 	if (THREAD_NO_SYNC())
-		return;
+		return stop;
 
 	THREAD_ENTER_SYNC();
 
@@ -2389,7 +2391,9 @@ static void sync_poll_loop()
 
 	/* *** } */
   exit:
+	stop = (jobs == 0); /* stop when there's nothing left to do */
 	THREAD_EXIT_SYNC();
+	return stop;
 }
 
 /* Runs the polling loop */
@@ -2410,9 +2414,10 @@ static void run_poll_loop()
 		/* Check if we can expire some tasks */
 		next = wake_expired_tasks();
 
-		/* stop when there's nothing left to do */
-		if (jobs == 0)
-			break;
+		/* the first thread requests a synchronization to exit when
+		 * there is no active jobs anymore */
+		if (tid == 0 && jobs == 0)
+			THREAD_WANT_SYNC();
 
 		/* expire immediately if events are pending */
 		exp = now_ms;
@@ -2431,7 +2436,9 @@ static void run_poll_loop()
 
 
 		/* Synchronize all polling loops */
-		sync_poll_loop();
+		if (sync_poll_loop())
+			break;
+
 		activity[tid].loops++;
 	}
 }
-- 
2.17.1

Reply via email to