Race in reboot/poweroff path at init?
Hi all, I've been debugging an issue where we can't reboot or poweroff a machine in the early stages of busybox init. Using the poweroff case as an example: - kernel starts /sbin/init - kernel receives a poweroff event, so calls __orderly_poweroff. Effectively, these will just call out to the /sbin/poweroff usermode helper. - /sbin/poweroff just does a: kill(1, SIGUSR2); - However, /sbin/init has not yet installed a signal handler for SIGUSR2. Because we're PID 1, this means the signal is ignored, and so the command to poweroff the machine is dropped. - init keeps booting rather than powering off. In our particular case, the "poweroff event" is an IPMI soft shutdown message. However, the same would apply for any other path that involves orderly_poweroff or orderly_reboot. Even though the signal handlers are installed fairly early in init, we can still hit the race between this and the SIGUSR2 being sent fairly reliably. I see a couple of options for resolving this: - installing the signal handlers even earlier in init_main(). However, this will only reduce the window for lost events, rather than eliminating it; or - using a synchronous channel to send the shutdown/reboot message between the poweroff/reboot helpers, rather than an asynchronous signal. Say, have init listening on a socket, allowing the poweroff binary to wait and/or retry. However, before I go down the wrong path here: does anyone have other ideas that might help eliminating dropped poweroff/reboot events? Regards, Jeremy ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: Race in reboot/poweroff path at init?
- using a synchronous channel to send the shutdown/reboot message between the poweroff/reboot helpers, rather than an asynchronous signal. Say, have init listening on a socket, allowing the poweroff binary to wait and/or retry. That would not work either: you could receive the event before init starts listening to the socket. There will always be a window where init can't receive events. The kernel starts it barebones, with no channel of communication to other processes; if an event arrives before it starts establishing these channels, you're out of luck. The best you can do is make the window as small as possible. If you need to be 100% safe, then you need to somehow queue the events before init starts processing them. But that's tricky, because it's extremely early - you have nothing but the kernel and the /sbin/poweroff process' memory. You don't even have the guarantee that you can write to a filesystem: you only have the rootfs and it may be read-only. You don't even have a tmpfs yet. You can't be certain you have a devtmpfs mounted. You don't have /dev/shm. You don't have /proc. So it's a matter of finding a way to queue events that don't involve writing to the filesystem at all. That severely restricts your options: for instance, POSIX message queues sound like a perfect fit, but Linux implements them via a virtual filesystem that needs to be mounted first, so it's a no-go. Signals are actually pretty good: all they require is that init has installed a handler, which can be done early. The only issue is that you can't queue them. What I would do is add a check to /sbin/poweroff that init has progressed to a point where its signal handlers are installed, and if it's not there yet, poll until it is (i.e. sleep and retry). What check to use? well at this point it's very hackish. The only thing I can think of that doesn't depend on the contents of /etc/inittab is that when init reaps zombies, we know it has its signal handlers installed. So... I would have poweroff doublefork a process (have the child communicate the pid of the grandchild before dying), the grandchild dies - at this point it's a zombie waiting for init to reap it - and poweroff repeatedly hits the grandchild with kill(), using signal 0 just to be safe. When kill() fails with ESRCH, it means the zombie has disappeared and init is now ready to accept signals. It's really ugly, but it's the best I can come up with that makes no unsafe assumptions. Whether implementing that in /sbin/poweroff is better than simply eating the race condition... that's your call. -- Laurent ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: Race in reboot/poweroff path at init?
On October 10, 2017 4:52:17 PM GMT+08:00, Jeremy Kerr wrote: >Hi all, > >I've been debugging an issue where we can't reboot or poweroff a >machine >in the early stages of busybox init. Using the poweroff case as an >example: > > - kernel starts /sbin/init > > - kernel receives a poweroff event, so calls __orderly_poweroff. > Effectively, these will just call out to the /sbin/poweroff usermode > helper. > > - /sbin/poweroff just does a: > > kill(1, SIGUSR2); > > - However, /sbin/init has not yet installed a signal handler for > SIGUSR2. Because we're PID 1, this means the signal is ignored, and > so the command to poweroff the machine is dropped. > > - init keeps booting rather than powering off. > >In our particular case, the "poweroff event" is an IPMI soft shutdown >message. However, the same would apply for any other path that involves >orderly_poweroff or orderly_reboot. > >Even though the signal handlers are installed fairly early in init, we >can still hit the race between this and the SIGUSR2 being sent fairly >reliably. > >I see a couple of options for resolving this: > > - installing the signal handlers even earlier in init_main(). However, > this will only reduce the window for lost events, rather than > eliminating it; or > > - using a synchronous channel to send the shutdown/reboot message > between the poweroff/reboot helpers, rather than an asynchronous > signal. Say, have init listening on a socket, allowing the poweroff > binary to wait and/or retry. > >However, before I go down the wrong path here: does anyone have other >ideas that might help eliminating dropped poweroff/reboot events? > >Regards, > > >Jeremy >___ >busybox mailing list >busybox@busybox.net >http://lists.busybox.net/mailman/listinfo/busybox Hi, just a silly idea: would running the poweroff or reboot signal in a loop do any harm or eventually be somehow an improvement? Just my 0,2 cents from vacation after a couple of beers. Ciao, Tito ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: Race in reboot/poweroff path at init?
Hi Laurent, Thanks for the reply, good to get some conversation going here! >> - using a synchronous channel to send the shutdown/reboot message >> between the poweroff/reboot helpers, rather than an asynchronous >> signal. Say, have init listening on a socket, allowing the poweroff >> binary to wait and/or retry. > > That would not work either: you could receive the event before init > starts listening to the socket. That's OK, as the helper (/sbin/poweroff) has the opportunity to retry if the connect() fails (because init hasn't established the listening socket yet). The main difference is that the sender can detect failure, and retry if necessary. AF_UNIX sockets in the abstract namespace don't require a path bound to the filesystem, so perhaps they would be available early enough - or have I missed something there? [Non-Linux platforms may not support the same abstract namespace, so I'd need to implement a fallback there, but I don't (yet) know if this same race is relevant on those platforms...] I'd rather not just wear the race, as that means we've missed shutdown events, which is quite user-visible. The signal-after-reaped-grandchild approach seems okay too, if other methods aren't workable. Or even Tito's suggestion of a repeated signal, which has the advantage of a minimal change to code. Cheers, Jeremy ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: Race in reboot/poweroff path at init?
Le 11/10/2017 à 04:43, Jeremy Kerr a écrit : Hi Laurent, Thanks for the reply, good to get some conversation going here! - using a synchronous channel to send the shutdown/reboot message between the poweroff/reboot helpers, rather than an asynchronous signal. Say, have init listening on a socket, allowing the poweroff binary to wait and/or retry. That would not work either: you could receive the event before init starts listening to the socket. That's OK, as the helper (/sbin/poweroff) has the opportunity to retry if the connect() fails (because init hasn't established the listening socket yet). The main difference is that the sender can detect failure, and retry if necessary. AF_UNIX sockets in the abstract namespace don't require a path bound to the filesystem, so perhaps they would be available early enough - or have I missed something there? [Non-Linux platforms may not support the same abstract namespace, so I'd need to implement a fallback there, but I don't (yet) know if this same race is relevant on those platforms...] I'd rather not just wear the race, as that means we've missed shutdown events, which is quite user-visible. The signal-after-reaped-grandchild approach seems okay too, if other methods aren't workable. Or even Tito's suggestion of a repeated signal, which has the advantage of a minimal change to code. There's the sigqueue() mechanism out there. From the man page, it seems it's essentially dedicated to send data together with the signal, but it also has a queueing mechanism implemented in the kernel. Wether this allows the message to be kept in the queue until the destination process unmasks it, this isn't written explicitely in the man, but maybe somebody knows it. Anyway your case is a perfect test bench. Didier ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox