On 2016-07-07 12:52, Goffredo Baroncelli wrote:
On 2016-07-06 14:48, Austin S. Hemmelgarn wrote:
On 2016-07-06 08:39, Andrei Borzenkov wrote:
[....]

To be entirely honest, if it were me, I'd want systemd to
fsck off.  If the kernel mount(2) call succeeds, then the
filesystem was ready enough to mount, and if it doesn't, then
it wasn't, end of story.

How should user space know when to try mount? What user space
is supposed to do during boot if mount fails? Do you suggest

while true; do mount /dev/foo && exit 0 done

as part of startup sequence? And note that nowhere is systemd
involved so far.
Nowhere there, except if you have a filesystem in fstab (or a
mount unit, which I hate for other reasons that I will not go
into right now), and you mount it and systemd thinks the device
isn't ready, it unmounts it _immediately_.  In the case of boot,
it's because of systemd thinking the device isn't ready that you
can't mount degraded with a missing device.  In the case of the
root filesystem at least, the initramfs is expected to handle
this, and most of them do poll in some way, or have other methods
of determining this.  I occasionally have issues with it with
dracut without systemd, but that's due to a separate bug there
involving the device mapper.


How this systemd bashing answers my question - how user space knows
when it can call mount at startup?
You mentioned that systemd wasn't involved, which is patently false
if it's being used as your init system, and I was admittedly mostly
responding to that.

Now, to answer the primary question which I forgot to answer:
Userspace doesn't.  Systemd doesn't either but assumes it does and
checks in a flawed way.  Dracut's polling loop assumes it does but
sometimes fails in a different way.  There is no way other than
calling mount right now to know for sure if the mount will succeed,
and that actually applies to a certain degree to any filesystem
(because any number of things that are outside of even the kernel's
control might happen while trying to mount the device.

I think that there is no a simple answer, and the answer may depend by context.
In the past, I made a prototype of a mount helper for btrfs [1]; the aim was to:

1) get rid of the actual btrfs volume discovery (udev which trigger btrfs dev 
scan) which has a lot of strange condition (what happens when a device 
disappear ?)
2) create a place where we develop and define strategies to handle all (or 
most) of the case of [partial] failure of a [multi-device] btrfs filesystem

By default, my mount.btrfs waited the needed devices for a filesystem, and 
mount in degraded mode if not all devices are appeared (depending by a switch); 
if a timeout is reached, and error is returned.

It doesn't need any special udev rule, because it performs a discovery of the 
devices using libuuid. I think that mounting a filesystem and handling all the 
possibles case relaying of the udev and its syntax of the udev rules is more a 
problem than a solution. Adding that udev and the udev rules are developed in a 
different project, the difficulties increase.

I think that BTRFS for its complexity and their peculiarities need a dedicated 
tool like a mount helper.

My mount.btrfs is not able to solve all the problem, but might be a starts for 
handling the issues.
FWIW, I've pretty much always been of the opinion that the device discovery belongs in a mount helper. The auto-discovery from udev (and more importantly, how the kernel handles being told about a device) is much of the reason that it's so inherently dangerous to do block level copies. There's obviously no way that can be changed now without breaking something, but that's on the really short list of things that I personally feel are worth breaking to fix a particularly dangerous pitfall. The recent discovery that device ready state is write-once when set just reinforces this in my opinion.

Here's how I would picture the ideal situation:
* A device is processed by udev. It detects that it's part of a BTRFS array, updates blkid and whatever else in userspace with this info, and then stops without telling the kernel. * The kernel tracks devices until the filesystem they are part of is unmounted, or a mount of that FS fails. * When the user goes to mount the a BTRFS filesystem, they use a mount helper. 1. This helper queries udev/blkid/whatever to see which devices are part of an array. 2. Once the helper determines which devices are potentially in the requested FS, it checks the following things to ensure array integrity: - Does each device report the same number of component devices for the array?
    - Does the reported number match the number of devices found?
- If a mount by UUID is requested, do all the labels match on each device? - If a mount by LABEL is requested, do all the UUID's match on each device? - If a mount by path is requested, do all the component devices reported by that device have matching LABEL _and_ UUID?
    - Is any of the devices found already in-use by another mount?
4. If any of the above checks fails, and the user has not specified an option to request a mount anyway, report the error and exit with non-zero status _before_ even talking to the kernel. 5. If only the second check fails (the check verifying the number of devices found), and it fails because the number found is less than required for a non-degraded mount, ignore that check if and only if the user specified -o degraded. 6. If any of the other checks fail, ignore them if and only if the user asks to ignore that specific check.
  7. Otherwise, notify the kernel about the devices and call mount(2).
* The mount helper parses it's own set of special options similar to the bg/fg/retry options used by mount.nfs to allow for timeouts when mounting, as well as asynchronous mounts in the background.
* btrfs device scan becomes a no-op
* btrfs device ready uses the above logic minus step 7 to determine if a filesystem is probably ready.

Such a situation would probably eliminate or at least reduce most of our current issues with device discovery, and provide much better error reporting and general flexibility.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to