Race in reboot/poweroff path at init?

2017-10-10 Thread Jeremy Kerr
Hi all,

I've been debugging an issue where we can't reboot or poweroff a machine
in the early stages of busybox init. Using the poweroff case as an
example:

 - kernel starts /sbin/init

 - kernel receives a poweroff event, so calls __orderly_poweroff.
   Effectively, these will just call out to the /sbin/poweroff usermode
   helper.

 - /sbin/poweroff just does a:

 kill(1, SIGUSR2);

 - However, /sbin/init has not yet installed a signal handler for
   SIGUSR2. Because we're PID 1, this means the signal is ignored, and
   so the command to poweroff the machine is dropped.

 - init keeps booting rather than powering off.

In our particular case, the "poweroff event" is an IPMI soft shutdown
message. However, the same would apply for any other path that involves
orderly_poweroff or orderly_reboot.

Even though the signal handlers are installed fairly early in init, we
can still hit the race between this and the SIGUSR2 being sent fairly
reliably.

I see a couple of options for resolving this:

 - installing the signal handlers even earlier in init_main(). However,
   this will only reduce the window for lost events, rather than
   eliminating it; or

 - using a synchronous channel to send the shutdown/reboot message
   between the poweroff/reboot helpers, rather than an asynchronous
   signal. Say, have init listening on a socket, allowing the poweroff
   binary to wait and/or retry.

However, before I go down the wrong path here: does anyone have other
ideas that might help eliminating dropped poweroff/reboot events?

Regards,


Jeremy
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Race in reboot/poweroff path at init?

2017-10-10 Thread Laurent Bercot



- using a synchronous channel to send the shutdown/reboot message
  between the poweroff/reboot helpers, rather than an asynchronous
  signal. Say, have init listening on a socket, allowing the poweroff
  binary to wait and/or retry.


 That would not work either: you could receive the event before init
starts listening to the socket.

 There will always be a window where init can't receive events. The
kernel starts it barebones, with no channel of communication to other
processes; if an event arrives before it starts establishing these 
channels,

you're out of luck. The best you can do is make the window as small as
possible.

 If you need to be 100% safe, then you need to somehow queue the events
before init starts processing them. But that's tricky, because it's
extremely early - you have nothing but the kernel and the /sbin/poweroff
process' memory. You don't even have the guarantee that you can write to
a filesystem: you only have the rootfs and it may be read-only. You 
don't

even have a tmpfs yet. You can't be certain you have a devtmpfs mounted.
You don't have /dev/shm. You don't have /proc.

 So it's a matter of finding a way to queue events that don't involve
writing to the filesystem at all. That severely restricts your options:
for instance, POSIX message queues sound like a perfect fit, but Linux
implements them via a virtual filesystem that needs to be mounted first,
so it's a no-go.
 Signals are actually pretty good: all they require is that init has
installed a handler, which can be done early. The only issue is that
you can't queue them.

 What I would do is add a check to /sbin/poweroff that init has 
progressed

to a point where its signal handlers are installed, and if it's not
there yet, poll until it is (i.e. sleep and retry).

 What check to use? well at this point it's very hackish. The only
thing I can think of that doesn't depend on the contents of /etc/inittab
is that when init reaps zombies, we know it has its signal handlers
installed. So... I would have poweroff doublefork a process (have the
child communicate the pid of the grandchild before dying), the 
grandchild

dies - at this point it's a zombie waiting for init to reap it - and
poweroff repeatedly hits the grandchild with kill(), using signal 0 just
to be safe. When kill() fails with ESRCH, it means the zombie has
disappeared and init is now ready to accept signals.

 It's really ugly, but it's the best I can come up with that makes no
unsafe assumptions. Whether implementing that in /sbin/poweroff is
better than simply eating the race condition... that's your call.

--
 Laurent

___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Race in reboot/poweroff path at init?

2017-10-10 Thread Tito


On October 10, 2017 4:52:17 PM GMT+08:00, Jeremy Kerr  wrote:
>Hi all,
>
>I've been debugging an issue where we can't reboot or poweroff a
>machine
>in the early stages of busybox init. Using the poweroff case as an
>example:
>
> - kernel starts /sbin/init
>
> - kernel receives a poweroff event, so calls __orderly_poweroff.
>   Effectively, these will just call out to the /sbin/poweroff usermode
>   helper.
>
> - /sbin/poweroff just does a:
>
> kill(1, SIGUSR2);
>
> - However, /sbin/init has not yet installed a signal handler for
>   SIGUSR2. Because we're PID 1, this means the signal is ignored, and
>   so the command to poweroff the machine is dropped.
>
> - init keeps booting rather than powering off.
>
>In our particular case, the "poweroff event" is an IPMI soft shutdown
>message. However, the same would apply for any other path that involves
>orderly_poweroff or orderly_reboot.
>
>Even though the signal handlers are installed fairly early in init, we
>can still hit the race between this and the SIGUSR2 being sent fairly
>reliably.
>
>I see a couple of options for resolving this:
>
> - installing the signal handlers even earlier in init_main(). However,
>   this will only reduce the window for lost events, rather than
>   eliminating it; or
>
> - using a synchronous channel to send the shutdown/reboot message
>   between the poweroff/reboot helpers, rather than an asynchronous
>   signal. Say, have init listening on a socket, allowing the poweroff
>   binary to wait and/or retry.
>
>However, before I go down the wrong path here: does anyone have other
>ideas that might help eliminating dropped poweroff/reboot events?
>
>Regards,
>
>
>Jeremy
>___
>busybox mailing list
>busybox@busybox.net
>http://lists.busybox.net/mailman/listinfo/busybox
Hi, just a silly idea: would running the poweroff or reboot signal in a loop do 
any harm or eventually be somehow an improvement?
Just my 0,2 cents from vacation after a couple of beers.
Ciao, Tito
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Race in reboot/poweroff path at init?

2017-10-10 Thread Jeremy Kerr
Hi Laurent,

Thanks for the reply, good to get some conversation going here!

>> - using a synchronous channel to send the shutdown/reboot message
>>   between the poweroff/reboot helpers, rather than an asynchronous
>>   signal. Say, have init listening on a socket, allowing the poweroff
>>   binary to wait and/or retry.
> 
>  That would not work either: you could receive the event before init
> starts listening to the socket.

That's OK, as the helper (/sbin/poweroff) has the opportunity to retry
if the connect() fails (because init hasn't established the listening
socket yet). The main difference is that the sender can detect failure,
and retry if necessary.

AF_UNIX sockets in the abstract namespace don't require a path bound to
the filesystem, so perhaps they would be available early enough - or
have I missed something there?

[Non-Linux platforms may not support the same abstract namespace, so
I'd need to implement a fallback there, but I don't (yet) know if this
same race is relevant on those platforms...]

I'd rather not just wear the race, as that means we've missed shutdown
events, which is quite user-visible. The signal-after-reaped-grandchild
approach seems okay too, if other methods aren't workable. Or even
Tito's suggestion of a repeated signal, which has the advantage of a
minimal change to code.

Cheers,


Jeremy
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Race in reboot/poweroff path at init?

2017-10-10 Thread Didier Kryn

Le 11/10/2017 à 04:43, Jeremy Kerr a écrit :

Hi Laurent,

Thanks for the reply, good to get some conversation going here!


- using a synchronous channel to send the shutdown/reboot message
   between the poweroff/reboot helpers, rather than an asynchronous
   signal. Say, have init listening on a socket, allowing the poweroff
   binary to wait and/or retry.

  That would not work either: you could receive the event before init
starts listening to the socket.

That's OK, as the helper (/sbin/poweroff) has the opportunity to retry
if the connect() fails (because init hasn't established the listening
socket yet). The main difference is that the sender can detect failure,
and retry if necessary.

AF_UNIX sockets in the abstract namespace don't require a path bound to
the filesystem, so perhaps they would be available early enough - or
have I missed something there?

[Non-Linux platforms may not support the same abstract namespace, so
I'd need to implement a fallback there, but I don't (yet) know if this
same race is relevant on those platforms...]

I'd rather not just wear the race, as that means we've missed shutdown
events, which is quite user-visible. The signal-after-reaped-grandchild
approach seems okay too, if other methods aren't workable. Or even
Tito's suggestion of a repeated signal, which has the advantage of a
minimal change to code.
There's the sigqueue() mechanism out there. From the man page, it 
seems it's essentially dedicated to send data together with the signal, 
but it also has a queueing mechanism implemented in the kernel. Wether 
this allows the message to be kept in the queue until the destination 
process unmasks it, this isn't written explicitely in the man, but maybe 
somebody knows it. Anyway your case is a perfect test bench.


Didier

___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox