Hey Ludo! Ludovic Courtès <l...@gnu.org> writes:
> Hi! > > Ludovic Courtès <l...@gnu.org> skribis: > >> Turns out that this happens when calling the ‘daemonize’ action on >> ‘root’. I have a reproducer now and am investigating… > > Good news: this is fixed in Shepherd commit > f4272d2f0f393d2aa3e9d76b36ab6aa5f2fc72c2! > > The root cause is inconsistent semantics when mixing epoll, signalfd, > and fork, specifically this part from signalfd(2): > > epoll(7) semantics > If a process adds (via epoll_ctl(2)) a signalfd file descriptor to an > epoll(7) instance, then epoll_wait(2) returns events only for signals > sent to that process. In particular, if the process then uses fork(2) > to create a child process, then the child will be able to read(2) sig‐ > nals that are sent to it using the signalfd file descriptor, but > epoll_wait(2) will not indicate that the signalfd file descriptor is > ready. In this scenario, a possible workaround is that after the > fork(2), the child process can close the signalfd file descriptor that > it inherited from the parent process and then create another signalfd > file descriptor and add it to the epoll instance. […] > > The C program below illustrates this behavior: > > #include <stdlib.h> > #include <stdio.h> > #include <unistd.h> > #include <sys/signal.h> > #include <sys/signalfd.h> > #include <sys/epoll.h> > > int > main () > { > int ep, sfd; > > sigset_t signals; > sigemptyset (&signals); > sigaddset (&signals, SIGINT); > sigaddset (&signals, SIGHUP); > > sigprocmask (SIG_BLOCK, &signals, NULL); > sfd = signalfd (-1, &signals, SFD_CLOEXEC); > > ep = epoll_create1 (EPOLL_CLOEXEC); > > struct epoll_event events = { .events = EPOLLIN | EPOLLONESHOT, .data = > NULL }; > epoll_ctl (ep, EPOLL_CTL_ADD, sfd, &events); > > epoll_wait (ep, &events, 1, 123); > > if (fork () == 0) > { > /* Quoth signalfd(2): > > If a process adds (via epoll_ctl(2)) a signalfd file descriptor to an > epoll(7) instance, then epoll_wait(2) returns events only for signals > sent to that process. In particular, if the process then uses fork(2) > to create a child process, then the child will be able to read(2) sig‐ > nals that are sent to it using the signalfd file descriptor, but > epoll_wait(2) will not indicate that the signalfd file descriptor is > ready. */ > > printf ("try this: kill -INT %i\n", getpid ()); > while (1) > { > struct signalfd_siginfo info; > if (epoll_wait (ep, &events, 1, 777) > 0) > { > read (sfd, &info, sizeof info); > printf ("got signal %i!\n", info.ssi_signo); > epoll_ctl (ep, EPOLL_CTL_MOD, sfd, &events); > } > } > } > > return 0; > } > > > Of course it took me a while to find out about this; I first looked at > things individually and didn’t expect the mixture to behave > inconsistently. Tricky! Thanks for sharing the result of your investigation, it's always enlightening! > Maxim, let me know if it works for you! Better than ever! Thanks a lot for fixing the various issues reported here. I'm closing this one! -- Thanks, Maxim