Re: How to supervise an early process [s6-svscan root pivot]
On Tue, 21 Jun 2016 13:36:38 -0400 Steve Littwrote: > ... doesn't the bootloader config already contain the > info to know which device, and isn't that device mounted before the > initramfs does the switch_root or pivot_root? What do they need udevd > for? They have the UUID, for gosh sakes. On Linux bootloader usually loads ramdisk from /boot. On FreeBSD it loads kernel and modules which reside in /boot/kernel and ramdisk is usually not needed at all. The thing is though, /boot is not exactly directly related to actual "cruise" root in many cases. Take for example ZFS, where rootfs (/) can be any nested ZFS filesystem in any ZFS pool present on the machine. Although it contains /boot inside, it's not even partition in such setups. There might be even multiple(!) rootfses present with only one being active. There is no way for bios/uefi to know about this. More over firmware doesn't even understand ZFS. Thus you need kernel with drivers/modules loaded to access the root. Hopefully loader on FreeBSD understands enough of ZFS and pool properties to find "rootfs", /boot/kernel in it and fire the machine up. Some similar dance is necessary on Linux as well. There you can also have ext* root and use ZFS for rest of the system only. This is IMHO correct, I personally would rather see "bootloader/ramdisk" state extended with features than to see uefi "inflated" with ZFS support (or any other futre fs for that matter). Uefi is big and complex enough as it is already. Same situation happens with convoluted md raid setups and such. This is unfortunate bootstrapping problem. Regarding rising Linux ramdisk complexity, it seems to me like FreeBSD' reroot way of preserving kernel state, but killing everything and respawning init, removes the need to concern oneself with pivoting. One could just exec into new s6-svscan instance from (finally located) actual rootfs at the end of the ramdisk stage. That way one can have have supervision running both from ramdisk and during "cruise" as well. Maybe it's approach worth emulating in Linux ramdisks as well? I am not very experienced with Linux ramdisks. As user, I personally liked archlinux's mkinitcpio more than dracut. But I just use what distribution gives me. All these "frameworks" make heavy assumnptions about how system is supposed to work. Maybe we need s6-mkramdisk at some point in the future? eto
Re: How to supervise an early process [root pivot]
On Tue, 21 Jun 2016 15:18:38 + Charles Duffywrote: > Couldn't one play with bind mounts to keep the absolute paths consistent on > both sides of the pivot operation? Well I decided to accept "don't do that" approach. Also bind mounts are Linux specific. My main area of interest is s6 on my FreeBSD systems, but I try to understand Linux side of things as well. More over, I looked up once again FreeBSD pivot_root like thing. It is apparently called "reroot". It seems already present on my testing system. Info from reboot manpage implies that reroot kills even init: -r The system kills all processes, unmounts all filesystems, mounts the new root filesystem, and begins the usual startup sequence. After changing vfs.root.mountfrom with kenv(8), reboot -r can be used to change the root filesystem while preserving kernel state. This means Linux style pivot_root doesn't even happen, rerooted init gets clean state which seems much better even. Also although FreeBSD has nullfs mounts, those are not exactly same as Linux bind and it seems it doesn't need (*)dev helper at all. It appears to me, that /dev is always populated by kernel beforehand. "Don't do that" makes sense under these conditions. eto
Re: How to supervise an early process [root pivot]
On 21/06/2016 17:18, Charles Duffy wrote: Couldn't one play with bind mounts to keep the absolute paths consistent on both sides of the pivot operation? I guess you could, but really, not starting anything that relies on absolute paths before pivot_rooting is by far the simplest solution. -- Laurent
Re: How to supervise an early process [root pivot]
On 21/06/16 16:24, Martin "eto" Misuth wrote: On Tue, 21 Jun 2016 14:45:59 +0200 Laurent Bercotwrote: ... With udevd, the workaround is to kill it after you have performed the coldplug, and only restart it as part of your normal boot sequence once you have pivot_rooted. It can be supervised at this point. Thank you! Especially for mdev coldplug process description! I asked, because it seems FreeBSD will be getting pivot_root like capabilities soon. This makes it more similar to Linux in a way. And opens some weekends for tinkering. It also introduces remote posibility of situation like described actually happening there too. FreeBSD 10.3 (the latest release as of writing) includes rerooting support. By passing the rerooting flag to the reboot systemcall the userland can tell the kernel to start the usual shutdown (kill all processes including init, unmount all filesystems including "/") and after unmounting the root filesystem the kernel performs a "userland reboot" by mounting a new root filesystem and starting a new init process. There are lots of usecases for this e.g. configure the in-kernel iSCSI initiator from a small netboot image and switch to an iSCSI LUN as root file system. An other example are full disc encrypted systems without trusted system console. In that case you can use a minimal unencrypted system to unlock the encrypted disks and reroot into your encrypted devices. Use kenv vfs.root.mountfrom=":" to set the filesystem type and device path before you invoke "reroot -r".
Re: How to supervise an early process [root pivot]
On Tue, 21 Jun 2016 17:02:42 +0200 Laurent Bercotwrote: > ... > > The only sensible "protection" against pivot_root is: do your pivot_root > very early when basically nothing is running, and start your supervision > tree later on. > Well I just thought accidental rm is kinda simlar to remout, but now I see it is not. Thank you, understood. eto
Re: How to supervise an early process [root pivot]
Couldn't one play with bind mounts to keep the absolute paths consistent on both sides of the pivot operation? On Tue, Jun 21, 2016 at 10:02 AM Laurent Bercotwrote: > On 21/06/2016 16:24, Martin "eto" Misuth wrote: > > Reinterpreting based on my personal experience, situation would be > basically > > similar to - "deleting" servicedirs from "underneath" running s6-svscan > one (I > > did that one to myself due to script error, don't ask): > > No, it's not the same thing. When you pivot_root, everything is kept > open, > the inodes do not change, everything keeps working - except that the > absolute paths to the files are not the same anymore. If you were referring > to a service as /service/foo beforehand, it has to be referred to as > /old_root_location/service/foo after a pivot_root. > > If you used absolute paths to link servicedirs into your scandir, and you > pivot_root, then s6-svscan will rightfully freak out on its next scan. But > s6-supervise should keep working - the control interface hasn't > disappeared, > it is just named differently. > > > > Would it be possible to somehow "posixly" lock control files in such > way, that > > remount/pivot_root/unlink would fail and one could not delete them > without force > > flag, indicating indeed sysadmin error? > > No. Well, there are "extended attributes" that allow you to do that kind > of > thing, but I'm not sure to what extent those are portable. But they > wouldn't > protect you against pivot_root anyway, because no files are deleted or > changed when you pivot_root, it's just a rotation in the directory tree. > (Also, trying to protect admins against themselves is doomed to fail, and > a sure recipe for bad design.) > > The only sensible "protection" against pivot_root is: do your pivot_root > very early when basically nothing is running, and start your supervision > tree later on. > > -- > Laurent > >
Re: How to supervise an early process [root pivot]
On Tue, 21 Jun 2016 14:45:59 +0200 Laurent Bercotwrote: > ... > With udevd, the workaround is to kill it after you have performed the > coldplug, and only restart it as part of your normal boot sequence once > you have pivot_rooted. It can be supervised at this point. > Thank you! Especially for mdev coldplug process description! I asked, because it seems FreeBSD will be getting pivot_root like capabilities soon. This makes it more similar to Linux in a way. And opens some weekends for tinkering. It also introduces remote posibility of situation like described actually happening there too. So I got curious what is "proper" solution to such broken "state". And it seems answer is: don't do that! Reinterpreting based on my personal experience, situation would be basically similar to - "deleting" servicedirs from "underneath" running s6-svscan one (I did that one to myself due to script error, don't ask): - When I did that - once tree was "wiped"/"cleaned" out, all s6-svscan/s6-supervise special files got unlinked() and disappeared from directory tree view. Although s6-svscan/s6-supervise were holding onto those files, I was unable to control them with s6-svscanctl/s6-svc, as there were no "control points" to "connect to" in the filesystem anymore. Process tree did not dismantle though as unlinked() files are not deleted right away. Because both runit and s6 are so robust, in this case s6 just held onto unlinked fds for days and it took me some time to figure this one out. This makes me think, that situations like remounting servicedirs root/pivot_root/unlink and such, although PEBKAC, leave some less experienced sysadmin unable to control services, at least without without signals (I can attest signals always work perfectly, even in such "broken" cases). Would it be possible to somehow "posixly" lock control files in such way, that remount/pivot_root/unlink would fail and one could not delete them without force flag, indicating indeed sysadmin error? eto
Re: How to supervise an early process [s6-svscan root pivot]
On 21/06/2016 14:00, Martin "eto" Misuth wrote: Let's say, one mounts some tmpfs fses, containing sevicedirs, and fires up s6-svscan as one of first binaries (when booting from ramdisk) - what is expected behaviour of running instance of s6-svscan, when pivot_root happens ? Heh, that's a good point. I wouldn't try it. The supervision tree itself would keep working, but it would defeat all the normal assumptions that people do about it, e.g. "service directories can be accessed via a reliable absolute path". s6-rc would break horribly, but you'd be insane to run anything of the kind before pivot_rooting. Generally speaking, you shouldn't run any long-lived process before pivot_rooting or switch_rooting. The structure of the filesystem is too important an assumption to be modified behind people's (or daemon's) backs. Fortunately, there's really no need to do that: the early initialization that happens in an initramfs is oneshot-only, and your real "init" is always run after the pivot_root happens; that's the moment when you can spawn long-lived processes. There's obviously one exception: udevd. Some systems need it to coldplug devices, in order to find the correct device to pivot_root on. The answer here is that it's a design mistake of udevd (the n+1th one...) to not provide a short-lived hotplug helper for this. With a program such as mdev, it's possible to find the correct device without running a daemon: - register /sbin/mdev as a hotplug helper - run mdev -s (the coldplug scanner) - unregister the hotplug helper - your /dev is fully populated, you can pivot_root With udevd, the workaround is to kill it after you have performed the coldplug, and only restart it as part of your normal boot sequence once you have pivot_rooted. It can be supervised at this point. -- Laurent
Re: How to supervise an early process [s6-svscan root pivot]
Let's say, one mounts some tmpfs fses, containing sevicedirs, and fires up s6-svscan as one of first binaries (when booting from ramdisk) - what is expected behaviour of running instance of s6-svscan, when pivot_root happens ? Will it detect that servicedirs were "swapped out" in-flight? And in both cases: - even if servicedirs are on same filesystem mounted (again) under new_root (fstat inodes are same)? - or even when new servicedirs are completely different set of dirs? Is this behaviour undefined? eto
Re: How to supervise an early process
On 19/06/2016 18:02, Steve Litt wrote: A big objection to most supervision type init systems is that for a given process you must choose between early, like run from the rc script(s) preceding running of the supervisor, and respawning supervision. I just thought of a theoretical hack to have both. Lots of people think of hacks. But hacks are a problem, not a solution; they're the very problems that supervision was made to solve. The non-hackish solution is to have the supervisor start very early, before any service; then you don't have to make that choice, because every longrun can be supervised. That's what s6-linux-init and nosh do. It's a solved problem. -- Laurent
How to supervise an early process
Hi all, A big objection to most supervision type init systems is that for a given process you must choose between early, like run from the rc script(s) preceding running of the supervisor, and respawning supervision. I just thought of a theoretical hack to have both. Symlink to give the executable a new name. ln -s myapp myapp_sym Run myapp_sym as early as you want in the rc file. Heck, run it in the initramfs for all I care, and let it switch_root over. Then, in the run script for any supervision suite, do this: === #!/bin/sh if ps ax | grep myapp_sym; then killall myapp_sym fi exec myapp === Obviously, for some apps, you'll need to shut down a little more gracefully than killall, but whatever way you need to shut down, you just put it in the if statement or in a shellscript called from within the if statement. This should work on daemontools, daemontools-encore, runit and s6. It might run on more, but those four are the only ones I've used. One of the outstanding benefits of supervision suites is how malleable they are with a little imagination. Thanks, SteveT Steve Litt June 2016 featured book: Troubleshooting: Why Bother? http://www.troubleshooters.com/twb