Re: degraded permanent mount option

Austin S. Hemmelgarn Tue, 30 Jan 2018 05:46:55 -0800

On 2018-01-29 16:54, waxhead wrote:

Austin S. Hemmelgarn wrote:
On 2018-01-29 12:58, Andrei Borzenkov wrote:
29.01.2018 14:24, Adam Borowski пишет:
...
So any event (the user's request) has already happened. A rcsystem, ofwhich systemd is one, knows whether we reached the "want rootfilesystem" or"want secondary filesystems" stage. Once you're there, you canissue the
mount() call and let the kernel do the work.
It is a btrfs choice to not expose compound device as separate one(like
every other device manager does)
Btrfs is not a device manager, it's a filesystem.
it is a btrfs drawback that doesn't provice anything else exceptfor this
IOCTL with it's logic
How can it provide you with something it doesn't yet have? If youwant theinformation, call mount(). And as others in this thread havementioned,what, pray tell, would you want to know "would a mount succeed?" forif you
don't want to mount?
it is a btrfs drawback that there is nothing to push assemblinginto "OK,
going degraded" state
The way to do so is to timeout, then retry with -o degraded.
That's possible way to solve it. This likely requires support from
mount.btrfs (or btrfs.ko) to return proper indication that filesystem is
incomplete so caller can decide whether to retry or to try degradedmount.
We already do so in the accepted standard manner. If the mount failsbecause of a missing device, you get a very specific message in thekernel log about it, as is the case for most other common errors (foruncommon ones you usually just get a generic open_ctree error). Thisis really the only option too, as the mount() syscall (which the mountcommand calls) returns only 0 on success or -1 and an appropriateerrno value on failure, and we can't exactly go about creating a halfdozen new error numbers just for this (well, technically we could, butI very much doubt that they would be accepted upstream, which defeatsthe purpose).
Or may be mount.btrfs should implement this logic internally. This would
really be the most simple way to make it acceptable to the other side by
not needing to accept anything :)
And would also be another layering violation which would require aproliferation of extra mount options to control the mount commanditself and adjust the timeout handling.
This has been done before with mount.nfs, but for slightly differentreasons (primarily to allow nested NFS mounts, since the localdirectory that the filesystem is being mounted on not being present istreated like a mount timeout), and it had near zero control. It worksthere because they push the complicated policy decisions to userspace(namely, there is no support for retrying with different options ortrying a different server).
I just felt like commenting a bit on this from a regular users point ofview.
Remember that at some point BTRFS will probably be the defaultfilesystem for the average penguin.BTRFS big selling point is redundance and a guarantee that whatever youwrite is the same that you will read sometime later.
Many users will probably build their BTRFS system on a redundant arrayof storage devices. As long as there are sufficient (not necessarilyall) storage devices present they expect their system to come up andwork. If the system is not able to come up in a fully operative state itmust at least be able to limp until the issue is fixed.
Starting a argument about what init system is the most sane or mostshiny is not helping. The truth is that systemd is not going awaysometime soon and one might as well try to become friends if nothingelse for the sake of having things working which should be a common goalregardless of the religion.

FWIW, I don't care that it's systemd in this case, I care that peopleare arguing for the forced use of a coding anti-pattern that ends upbeing covered as bad practice in first year computer science courses(no, seriously, every professional programmer I've asked about this hadtime-of-check-time-of-use race conditions covered in one of theirfirst-year CS classes) or the enforcement of an event-based model thatreally doesn't make any sense for this (OK, it makes a little sense forhandling of devices reappearing, but systemd doesn't need to be involvedin that beyond telling the kernel that the device reappeared, exceptthat that's udev's job).

I personally think the degraded mount option is a mistake as thisassumes that a lightly degraded system is not able to work which is false.If the system can mount to some working state then it should mountregardless if it is fully operative or not. If the array is in a badstate you need to learn about it by issuing a command or something. Thesame goes for a MD array (and yes, I am aware of the block layer vsfilesystem thing here).

The problem with this is that right now, it is not safe to run a BTRFSvolume degraded and writable, but for an even remotely usable systemwith pretty much any modern distro, you need your root filesystem to bewritable (or you need to have jumped through the hoops to make sure /varand /tmp are writable even if / isn't).

Long-term, yes, I do think that such behavior should be an option (yes,specifically optional, there are people out there who like me wouldrather the system just doesn't boot so we know immediately something iswrong and can fix it then).

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: degraded permanent mount option

Reply via email to