On 01/22/21 19:05, Max Reitz wrote: > On 22.01.21 18:09, Laszlo Ersek wrote: >> On 01/22/21 11:20, Max Reitz wrote: >>> Modifying signal handlers is a process-global operation. When two >>> threads run coroutine-sigaltstack's qemu_coroutine_new() concurrently, >>> they may interfere with each other: One of them may revert the SIGUSR2 >>> handler back to the default between the other thread setting up >>> coroutine_trampoline() as the handler and raising SIGUSR2. That SIGUSR2 >>> will then lead to the process exiting. >>> >>> Outside of coroutine-sigaltstack, qemu does not use SIGUSR2. We can >>> thus keep the signal handler installed all the time. >>> CoroutineThreadState.tr_handler tells coroutine_trampoline() whether its >>> stack is set up so a new coroutine is to be launched (i.e., it should >>> invoke sigsetjmp()), or not (i.e., the signal came from an external >>> source and we should just perform the default action, which is to exit >>> the process). >>> >>> Note that in user-mode emulation, the guest can register signal handlers >>> for any signal but SIGSEGV and SIGBUS, so if it registers a SIGUSR2 >>> handler, sigaltstack coroutines will break from then on. However, we do >>> not use coroutines for user-mode emulation, so that is fine. >>> >>> Suggested-by: Laszlo Ersek <ler...@redhat.com> >>> Signed-off-by: Max Reitz <mre...@redhat.com> >>> --- >>> util/coroutine-sigaltstack.c | 56 +++++++++++++++++++----------------- >>> 1 file changed, 29 insertions(+), 27 deletions(-) >>> >>> diff --git a/util/coroutine-sigaltstack.c b/util/coroutine-sigaltstack.c >>> index aade82afb8..2d32afc322 100644 >>> --- a/util/coroutine-sigaltstack.c >>> +++ b/util/coroutine-sigaltstack.c >>> @@ -59,6 +59,8 @@ typedef struct { >>> static pthread_key_t thread_state_key; >>> +static void coroutine_trampoline(int signal); >>> + >>> static CoroutineThreadState *coroutine_get_thread_state(void) >>> { >>> CoroutineThreadState *s = pthread_getspecific(thread_state_key); >>> @@ -80,6 +82,7 @@ static void qemu_coroutine_thread_cleanup(void >>> *opaque) >>> static void __attribute__((constructor)) coroutine_init(void) >>> { >>> + struct sigaction sa; >>> int ret; >>> ret = pthread_key_create(&thread_state_key, >>> qemu_coroutine_thread_cleanup); >>> @@ -87,6 +90,20 @@ static void __attribute__((constructor)) >>> coroutine_init(void) >>> fprintf(stderr, "unable to create leader key: %s\n", >>> strerror(errno)); >>> abort(); >>> } >>> + >>> + /* >>> + * Establish the SIGUSR2 signal handler. This is a process-wide >>> + * operation, and so will apply to all threads from here on. >>> + */ >>> + sa = (struct sigaction) { >>> + .sa_handler = coroutine_trampoline, >>> + .sa_flags = SA_ONSTACK, >>> + }; >>> + >>> + if (sigaction(SIGUSR2, &sa, NULL) != 0) { >>> + perror("Unable to install SIGUSR2 handler"); >>> + abort(); >>> + } >>> } >>> /* "boot" function >>> @@ -121,7 +138,17 @@ static void coroutine_trampoline(int signal) >>> /* Get the thread specific information */ >>> coTS = coroutine_get_thread_state(); >>> self = coTS->tr_handler; >>> + >>> + if (!self) { >>> + /* >>> + * This SIGUSR2 came from an external source, not from >>> + * qemu_coroutine_new(), so perform the default action. >>> + */ >>> + exit(0); >>> + } >>> + >>> coTS->tr_called = 1; >>> + coTS->tr_handler = NULL; >>> co = &self->base; >>> /* >> >> (8) There's a further complication here, assuming we really want to >> recognize the case when the handler is executing unexpectedly: >> >> - pthread_getspecific() is not necessarily async-signal-safe, according >> to POSIX, so calling coroutine_get_thread_state() in the "unexpected" >> case (e.g. in response to an asynchronously generated SIGUSR2) is >> problematic in its own right, > > That’s a shame. > >> - if the SIGUSR2 is delivered to a thread that has never called >> coroutine_get_thread_state() before, then we'll reach g_malloc0() inside >> coroutine_get_thread_state(), in signal handler context, which is very >> bad. > > Could be solved with a coroutine_try_get_thread_state() that will never > malloc, but return NULL then. > >> You'd have to block SIGUSR2 for the entire process (all threads) at all >> times, and only temporarily unblock it for a particular coroutine >> thread, with the sigsuspend(). The above check would suffice, that way. > > Yes, that’s what I was originally afraid of. I feel like that may be > the complexity drop that pushes this change too far out of my comfort > zone. (And as evidenced by your review, it already was pretty much > outside as it was.) > >> Such blocking is possible by calling pthread_sigmask() from the main >> thread, before any other thread is created (the signal mask is inherited >> across pthread_create()). I guess it could be done in coroutine_init() >> too. >> >> And *then* the pthread_sigmask() calls should indeed be removed from >> qemu_coroutine_new(). > > OTOH, that does sound rather simple... > >> (Apologies if my feedback is difficult to understand, it's my fault. I >> could propose a patch, if (and only if) you want that.) > > I can’t say I wouldn’t be happy with a patch for this code that doesn’t > bear my S-o-b. ;) > > I feel conflicted. I can send a v2 that addresses this (probably > consisting of multiple patches then, e.g. I’d split the SIGUSR2 blocking > off the main patch), but to me, this bug is really more of a nuisance > that just blocks me from sending a pull request for my block branch... > So I’d rather not drag it out forever. OTOH, sending a quick and bad > fix just because I can’t wait is just bad. > > I suppose I’ll have to decide over the weekend. Though if you’re > itching to write a patch yourself, I’d definitely be grateful.
OK, I'll try my hand at it; I hope I won't be eating my words. Laszlo