On 2018-01-29 16:54, waxhead wrote:


Austin S. Hemmelgarn wrote:
On 2018-01-29 12:58, Andrei Borzenkov wrote:
29.01.2018 14:24, Adam Borowski пишет:
...

So any event (the user's request) has already happened.  A rc system, of which systemd is one, knows whether we reached the "want root filesystem" or "want secondary filesystems" stage.  Once you're there, you can issue the
mount() call and let the kernel do the work.

It is a btrfs choice to not expose compound device as separate one (like
every other device manager does)

Btrfs is not a device manager, it's a filesystem.

it is a btrfs drawback that doesn't provice anything else except for this
IOCTL with it's logic

How can it provide you with something it doesn't yet have?  If you want the information, call mount().  And as others in this thread have mentioned, what, pray tell, would you want to know "would a mount succeed?" for if you
don't want to mount?

it is a btrfs drawback that there is nothing to push assembling into "OK,
going degraded" state

The way to do so is to timeout, then retry with -o degraded.


That's possible way to solve it. This likely requires support from
mount.btrfs (or btrfs.ko) to return proper indication that filesystem is
incomplete so caller can decide whether to retry or to try degraded mount.
We already do so in the accepted standard manner.  If the mount fails because of a missing device, you get a very specific message in the kernel log about it, as is the case for most other common errors (for uncommon ones you usually just get a generic open_ctree error).  This is really the only option too, as the mount() syscall (which the mount command calls) returns only 0 on success or -1 and an appropriate errno value on failure, and we can't exactly go about creating a half dozen new error numbers just for this (well, technically we could, but I very much doubt that they would be accepted upstream, which defeats the purpose).

Or may be mount.btrfs should implement this logic internally. This would
really be the most simple way to make it acceptable to the other side by
not needing to accept anything :)
And would also be another layering violation which would require a proliferation of extra mount options to control the mount command itself and adjust the timeout handling.

This has been done before with mount.nfs, but for slightly different reasons (primarily to allow nested NFS mounts, since the local directory that the filesystem is being mounted on not being present is treated like a mount timeout), and it had near zero control.  It works there because they push the complicated policy decisions to userspace (namely, there is no support for retrying with different options or trying a different server).

I just felt like commenting a bit on this from a regular users point of view.

Remember that at some point BTRFS will probably be the default filesystem for the average penguin. BTRFS big selling point is redundance and a guarantee that whatever you write is the same that you will read sometime later.

Many users will probably build their BTRFS system on a redundant array of storage devices. As long as there are sufficient (not necessarily all) storage devices present they expect their system to come up and work. If the system is not able to come up in a fully operative state it must at least be able to limp until the issue is fixed.

Starting a argument about what init system is the most sane or most shiny is not helping. The truth is that systemd is not going away sometime soon and one might as well try to become friends if nothing else for the sake of having things working which should be a common goal regardless of the religion.
FWIW, I don't care that it's systemd in this case, I care that people are arguing for the forced use of a coding anti-pattern that ends up being covered as bad practice in first year computer science courses (no, seriously, every professional programmer I've asked about this had time-of-check-time-of-use race conditions covered in one of their first-year CS classes) or the enforcement of an event-based model that really doesn't make any sense for this (OK, it makes a little sense for handling of devices reappearing, but systemd doesn't need to be involved in that beyond telling the kernel that the device reappeared, except that that's udev's job).

I personally think the degraded mount option is a mistake as this assumes that a lightly degraded system is not able to work which is false. If the system can mount to some working state then it should mount regardless if it is fully operative or not. If the array is in a bad state you need to learn about it by issuing a command or something. The same goes for a MD array (and yes, I am aware of the block layer vs filesystem thing here).
The problem with this is that right now, it is not safe to run a BTRFS volume degraded and writable, but for an even remotely usable system with pretty much any modern distro, you need your root filesystem to be writable (or you need to have jumped through the hoops to make sure /var and /tmp are writable even if / isn't).

Long-term, yes, I do think that such behavior should be an option (yes, specifically optional, there are people out there who like me would rather the system just doesn't boot so we know immediately something is wrong and can fix it then).
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to