Re: 64-btrfs.rules and degraded boot

Austin S. Hemmelgarn Thu, 07 Jul 2016 11:24:57 -0700

On 2016-07-07 12:52, Goffredo Baroncelli wrote:

On 2016-07-06 14:48, Austin S. Hemmelgarn wrote:

On 2016-07-06 08:39, Andrei Borzenkov wrote:

[....]


To be entirely honest, if it were me, I'd want systemd to
fsck off.  If the kernel mount(2) call succeeds, then the
filesystem was ready enough to mount, and if it doesn't, then
it wasn't, end of story.


How should user space know when to try mount? What user space
is supposed to do during boot if mount fails? Do you suggest

while true; do mount /dev/foo && exit 0 done

as part of startup sequence? And note that nowhere is systemd
involved so far.

Nowhere there, except if you have a filesystem in fstab (or a
mount unit, which I hate for other reasons that I will not go
into right now), and you mount it and systemd thinks the device
isn't ready, it unmounts it _immediately_.  In the case of boot,
it's because of systemd thinking the device isn't ready that you
can't mount degraded with a missing device.  In the case of the
root filesystem at least, the initramfs is expected to handle
this, and most of them do poll in some way, or have other methods
of determining this.  I occasionally have issues with it with
dracut without systemd, but that's due to a separate bug there
involving the device mapper.


How this systemd bashing answers my question - how user space knows
when it can call mount at startup?

You mentioned that systemd wasn't involved, which is patently false
if it's being used as your init system, and I was admittedly mostly
responding to that.

Now, to answer the primary question which I forgot to answer:
Userspace doesn't.  Systemd doesn't either but assumes it does and
checks in a flawed way.  Dracut's polling loop assumes it does but
sometimes fails in a different way.  There is no way other than
calling mount right now to know for sure if the mount will succeed,
and that actually applies to a certain degree to any filesystem
(because any number of things that are outside of even the kernel's
control might happen while trying to mount the device.


I think that there is no a simple answer, and the answer may depend by context.
In the past, I made a prototype of a mount helper for btrfs [1]; the aim was to:

1) get rid of the actual btrfs volume discovery (udev which trigger btrfs dev 
scan) which has a lot of strange condition (what happens when a device 
disappear ?)
2) create a place where we develop and define strategies to handle all (or 
most) of the case of [partial] failure of a [multi-device] btrfs filesystem

By default, my mount.btrfs waited the needed devices for a filesystem, and 
mount in degraded mode if not all devices are appeared (depending by a switch); 
if a timeout is reached, and error is returned.

It doesn't need any special udev rule, because it performs a discovery of the 
devices using libuuid. I think that mounting a filesystem and handling all the 
possibles case relaying of the udev and its syntax of the udev rules is more a 
problem than a solution. Adding that udev and the udev rules are developed in a 
different project, the difficulties increase.

I think that BTRFS for its complexity and their peculiarities need a dedicated 
tool like a mount helper.

My mount.btrfs is not able to solve all the problem, but might be a starts for 
handling the issues.

FWIW, I've pretty much always been of the opinion that the devicediscovery belongs in a mount helper. The auto-discovery from udev (andmore importantly, how the kernel handles being told about a device) ismuch of the reason that it's so inherently dangerous to do block levelcopies. There's obviously no way that can be changed now withoutbreaking something, but that's on the really short list of things that Ipersonally feel are worth breaking to fix a particularly dangerouspitfall. The recent discovery that device ready state is write-oncewhen set just reinforces this in my opinion.


Here's how I would picture the ideal situation:

* A device is processed by udev. It detects that it's part of a BTRFSarray, updates blkid and whatever else in userspace with this info, andthen stops without telling the kernel.* The kernel tracks devices until the filesystem they are part of isunmounted, or a mount of that FS fails.* When the user goes to mount the a BTRFS filesystem, they use a mounthelper.1. This helper queries udev/blkid/whatever to see which devices arepart of an array.2. Once the helper determines which devices are potentially in therequested FS, it checks the following things to ensure array integrity:- Does each device report the same number of component devices forthe array?

    - Does the reported number match the number of devices found?

- If a mount by UUID is requested, do all the labels match on eachdevice?- If a mount by LABEL is requested, do all the UUID's match on eachdevice?- If a mount by path is requested, do all the component devicesreported by that device have matching LABEL _and_ UUID?

    - Is any of the devices found already in-use by another mount?

4. If any of the above checks fails, and the user has not specifiedan option to request a mount anyway, report the error and exit withnon-zero status _before_ even talking to the kernel.5. If only the second check fails (the check verifying the number ofdevices found), and it fails because the number found is less thanrequired for a non-degraded mount, ignore that check if and only if theuser specified -o degraded.6. If any of the other checks fail, ignore them if and only if theuser asks to ignore that specific check.

  7. Otherwise, notify the kernel about the devices and call mount(2).

* The mount helper parses it's own set of special options similar to thebg/fg/retry options used by mount.nfs to allow for timeouts whenmounting, as well as asynchronous mounts in the background.

* btrfs device scan becomes a no-op

* btrfs device ready uses the above logic minus step 7 to determine if afilesystem is probably ready.

Such a situation would probably eliminate or at least reduce most of ourcurrent issues with device discovery, and provide much better errorreporting and general flexibility.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 64-btrfs.rules and degraded boot

Reply via email to