Re: How to supervise an early process [s6-svscan root pivot]

2016-06-22 Thread Martin "eto" Misuth
On Tue, 21 Jun 2016 13:36:38 -0400
Steve Litt  wrote:

> ... doesn't the bootloader config already contain the
> info to know which device, and isn't that device mounted before the
> initramfs does the switch_root or pivot_root? What do they need udevd
> for? They have the UUID, for gosh sakes.

On Linux bootloader usually loads ramdisk from /boot. On
FreeBSD it loads kernel and modules which reside in /boot/kernel and ramdisk is
usually not needed at all.

The thing is though, /boot is not exactly directly related to actual "cruise"
root in many cases. 

Take for example ZFS, where rootfs (/) can be any nested ZFS filesystem in any
ZFS pool present on the machine. Although it contains /boot inside, it's not
even partition in such setups. There might be even multiple(!) rootfses present
with only one being active.

There is no way for bios/uefi to know about this. More over firmware doesn't
even understand ZFS. Thus you need kernel with drivers/modules loaded to access
the root.

Hopefully loader on FreeBSD understands enough of ZFS and pool properties
to find "rootfs", /boot/kernel in it and fire the machine up.

Some similar dance is necessary on Linux as well. There you can also have ext*
root and use ZFS for rest of the system only.

This is IMHO correct, I personally would rather see "bootloader/ramdisk" state 
extended with features than to see uefi "inflated" with ZFS support (or any
other futre fs for that matter). Uefi is big and complex enough as it is
already.

Same situation happens with convoluted md raid setups and such. 

This is unfortunate bootstrapping problem.

Regarding rising Linux ramdisk complexity, it seems to me like FreeBSD' reroot
way of preserving kernel state, but killing everything and respawning init,
removes the need to concern oneself with pivoting. One could just exec into new
s6-svscan instance from (finally located) actual rootfs at the end of the
ramdisk stage.

That way one can have have supervision running both from ramdisk and during
"cruise" as well. Maybe it's approach worth emulating in Linux ramdisks as well?

I am not very experienced with Linux ramdisks. As user, I personally liked
archlinux's mkinitcpio more than dracut. But I just use what distribution gives
me. All these "frameworks" make heavy assumnptions about how system is supposed
to work.

Maybe we need s6-mkramdisk at some point in the future?

  eto


Re: How to supervise an early process [root pivot]

2016-06-21 Thread Martin "eto" Misuth
On Tue, 21 Jun 2016 15:18:38 +
Charles Duffy  wrote:
> Couldn't one play with bind mounts to keep the absolute paths consistent on
> both sides of the pivot operation?

Well I decided to accept "don't do that" approach. Also bind mounts are Linux
specific. My main area of interest is s6 on my FreeBSD systems, but I try
to understand Linux side of things as well.

More over, I looked up once again FreeBSD pivot_root like thing. 

It is apparently called "reroot". It seems already present on my testing
system. 

Info from reboot manpage implies that reroot kills even init:
  -r The system kills all processes, unmounts all filesystems, mounts
 the new root filesystem, and begins the usual startup sequence.
 After changing vfs.root.mountfrom with kenv(8), reboot -r can be
 used to change the root filesystem while preserving kernel state.

This means Linux style pivot_root doesn't even happen, rerooted init gets 
clean state which seems much better even. 

Also although FreeBSD has nullfs mounts, those are not exactly same as Linux
bind and it seems it doesn't need (*)dev helper at all. It appears to me,
that /dev is always populated by kernel beforehand.

"Don't do that" makes sense under these conditions.

  eto


Re: How to supervise an early process [root pivot]

2016-06-21 Thread Laurent Bercot

On 21/06/2016 17:18, Charles Duffy wrote:

Couldn't one play with bind mounts to keep the absolute paths consistent on
both sides of the pivot operation?


 I guess you could, but really, not starting anything that relies on
absolute paths before pivot_rooting is by far the simplest solution.

--
 Laurent


Re: How to supervise an early process [root pivot]

2016-06-21 Thread Jan Bramkamp



On 21/06/16 16:24, Martin "eto" Misuth wrote:

On Tue, 21 Jun 2016 14:45:59 +0200
Laurent Bercot  wrote:

...
  With udevd, the workaround is to kill it after you have performed the
coldplug, and only restart it as part of your normal boot sequence once
you have pivot_rooted. It can be supervised at this point.



Thank you! Especially for mdev coldplug process description!

I asked, because it seems FreeBSD will be getting pivot_root like
capabilities soon. This makes it more similar to Linux in a way. And opens
some weekends for tinkering. It also introduces remote posibility of situation
like described actually happening there too.


FreeBSD 10.3 (the latest release as of writing) includes rerooting 
support. By passing the rerooting flag to the reboot systemcall the 
userland can tell the kernel to start the usual shutdown (kill all 
processes including init, unmount all filesystems including "/") and 
after unmounting the root filesystem the kernel performs a "userland 
reboot" by mounting a new root filesystem and starting a new init process.


There are lots of usecases for this e.g. configure the in-kernel iSCSI 
initiator from a small netboot image and switch to an iSCSI LUN as root 
file system. An other example are full disc encrypted systems without 
trusted system console. In that case you can use a minimal unencrypted 
system to unlock the encrypted disks and reroot into your encrypted devices.


Use kenv vfs.root.mountfrom=":" to set the filesystem 
type and device path before you invoke "reroot -r".


Re: How to supervise an early process [root pivot]

2016-06-21 Thread Martin "eto" Misuth
On Tue, 21 Jun 2016 17:02:42 +0200
Laurent Bercot  wrote:

> ...
> 
>   The only sensible "protection" against pivot_root is: do your pivot_root
> very early when basically nothing is running, and start your supervision
> tree later on.
> 

Well I just thought accidental rm is kinda simlar to remout, but now I see it is
not. Thank you, understood.

  eto


Re: How to supervise an early process [root pivot]

2016-06-21 Thread Charles Duffy
Couldn't one play with bind mounts to keep the absolute paths consistent on
both sides of the pivot operation?

On Tue, Jun 21, 2016 at 10:02 AM Laurent Bercot 
wrote:

> On 21/06/2016 16:24, Martin "eto" Misuth wrote:
> > Reinterpreting based on my personal experience, situation would be
> basically
> > similar to - "deleting" servicedirs from "underneath" running s6-svscan
> one (I
> > did that one to myself due to script error, don't ask):
>
>   No, it's not the same thing. When you pivot_root, everything is kept
> open,
> the inodes do not change, everything keeps working - except that the
> absolute paths to the files are not the same anymore. If you were referring
> to a service as /service/foo beforehand, it has to be referred to as
> /old_root_location/service/foo after a pivot_root.
>
>   If you used absolute paths to link servicedirs into your scandir, and you
> pivot_root, then s6-svscan will rightfully freak out on its next scan. But
> s6-supervise should keep working - the control interface hasn't
> disappeared,
> it is just named differently.
>
>
> > Would it be possible to somehow "posixly" lock control files in such
> way, that
> > remount/pivot_root/unlink would fail and one could not delete them
> without force
> > flag, indicating indeed sysadmin error?
>
>   No. Well, there are "extended attributes" that allow you to do that kind
> of
> thing, but I'm not sure to what extent those are portable. But they
> wouldn't
> protect you against pivot_root anyway, because no files are deleted or
> changed when you pivot_root, it's just a rotation in the directory tree.
> (Also, trying to protect admins against themselves is doomed to fail, and
> a sure recipe for bad design.)
>
>   The only sensible "protection" against pivot_root is: do your pivot_root
> very early when basically nothing is running, and start your supervision
> tree later on.
>
> --
>   Laurent
>
>


Re: How to supervise an early process [root pivot]

2016-06-21 Thread Martin "eto" Misuth
On Tue, 21 Jun 2016 14:45:59 +0200
Laurent Bercot  wrote:
> ...
>   With udevd, the workaround is to kill it after you have performed the
> coldplug, and only restart it as part of your normal boot sequence once
> you have pivot_rooted. It can be supervised at this point.
> 

Thank you! Especially for mdev coldplug process description!

I asked, because it seems FreeBSD will be getting pivot_root like
capabilities soon. This makes it more similar to Linux in a way. And opens
some weekends for tinkering. It also introduces remote posibility of situation
like described actually happening there too.

So I got curious what is "proper" solution to such broken "state". And it seems
answer is: don't do that!

Reinterpreting based on my personal experience, situation would be basically
similar to - "deleting" servicedirs from "underneath" running s6-svscan one (I
did that one to myself due to script error, don't ask):

 - When I did that - once tree was "wiped"/"cleaned" out, all
s6-svscan/s6-supervise special files got unlinked() and disappeared from
directory tree view. Although s6-svscan/s6-supervise were holding onto those
files, I was unable to control them with s6-svscanctl/s6-svc, as there were no
"control points" to "connect to" in the filesystem anymore. Process tree did
not dismantle though as unlinked() files are not deleted right away. 

Because both runit and s6 are so robust, in this case s6 just held onto
unlinked fds for days and it took me some time to figure this one out.

This makes me think, that situations like remounting servicedirs
root/pivot_root/unlink and such, although PEBKAC, leave some less experienced
sysadmin unable to control services, at least without without signals (I can
attest signals always work perfectly, even in such "broken" cases). 

Would it be possible to somehow "posixly" lock control files in such way, that
remount/pivot_root/unlink would fail and one could not delete them without force
flag, indicating indeed sysadmin error?

 eto


Re: How to supervise an early process [s6-svscan root pivot]

2016-06-21 Thread Laurent Bercot

On 21/06/2016 14:00, Martin "eto" Misuth wrote:

Let's say, one mounts some tmpfs fses, containing sevicedirs, and fires up
s6-svscan as one of first binaries (when booting from ramdisk) - what is
expected behaviour of running instance of s6-svscan, when pivot_root happens ?


 Heh, that's a good point.

 I wouldn't try it. The supervision tree itself would keep working, but it
would defeat all the normal assumptions that people do about it, e.g.
"service directories can be accessed via a reliable absolute path".
s6-rc would break horribly, but you'd be insane to run anything of the kind
before pivot_rooting.

 Generally speaking, you shouldn't run any long-lived process before
pivot_rooting or switch_rooting. The structure of the filesystem is too
important an assumption to be modified behind people's (or daemon's) backs.
Fortunately, there's really no need to do that: the early initialization
that happens in an initramfs is oneshot-only, and your real "init" is
always run after the pivot_root happens; that's the moment when you can
spawn long-lived processes.

 There's obviously one exception: udevd. Some systems need it to coldplug
devices, in order to find the correct device to pivot_root on. The answer
here is that it's a design mistake of udevd (the n+1th one...) to not
provide a short-lived hotplug helper for this.

 With a program such as mdev, it's possible to find the correct device
without running a daemon:
 - register /sbin/mdev as a hotplug helper
 - run mdev -s (the coldplug scanner)
 - unregister the hotplug helper
 - your /dev is fully populated, you can pivot_root

 With udevd, the workaround is to kill it after you have performed the
coldplug, and only restart it as part of your normal boot sequence once
you have pivot_rooted. It can be supervised at this point.

--
 Laurent



Re: How to supervise an early process [s6-svscan root pivot]

2016-06-21 Thread Martin "eto" Misuth
Let's say, one mounts some tmpfs fses, containing sevicedirs, and fires up
s6-svscan as one of first binaries (when booting from ramdisk) - what is
expected behaviour of running instance of s6-svscan, when pivot_root happens ? 

Will it detect that servicedirs were "swapped out" in-flight?

And in both cases: 
- even if servicedirs are on same filesystem mounted (again) under new_root
  (fstat inodes are same)?
- or even when new servicedirs are completely different set of dirs?

Is this behaviour undefined?

  eto


Re: How to supervise an early process

2016-06-19 Thread Laurent Bercot

On 19/06/2016 18:02, Steve Litt wrote:

A big objection to most supervision type init systems is that for a
given process you must choose between early, like run from the rc
script(s) preceding running of the supervisor, and respawning
supervision.

I just thought of a theoretical hack to have both.


 Lots of people think of hacks. But hacks are a problem, not a solution;
they're the very problems that supervision was made to solve.

 The non-hackish solution is to have the supervisor start very early,
before any service; then you don't have to make that choice, because
every longrun can be supervised.
 That's what s6-linux-init and nosh do. It's a solved problem.

--
 Laurent



How to supervise an early process

2016-06-19 Thread Steve Litt
Hi all,

A big objection to most supervision type init systems is that for a
given process you must choose between early, like run from the rc
script(s) preceding running of the supervisor, and respawning
supervision.

I just thought of a theoretical hack to have both. 

Symlink to give the executable a new name. 

ln -s myapp myapp_sym

Run myapp_sym as early as you want in the rc file. Heck, run it in
the initramfs for all I care, and let it switch_root over. Then, in the
run script for any supervision suite, do this:

===
#!/bin/sh
if ps ax | grep myapp_sym; then
  killall myapp_sym
fi

exec myapp
===

Obviously, for some apps, you'll need to shut down a little more
gracefully than killall, but whatever way you need to shut down, you
just put it in the if statement or in a shellscript called from
within the if statement. 

This should work on daemontools, daemontools-encore, runit and s6. It
might run on more, but those four are the only ones I've used.

One of the outstanding benefits of supervision suites is how malleable
they are with a little imagination.

Thanks,

SteveT

Steve Litt
June 2016 featured book: Troubleshooting: Why Bother?
http://www.troubleshooters.com/twb