On 2018-01-29 16:54, waxhead wrote:
Austin S. Hemmelgarn wrote:
On 2018-01-29 12:58, Andrei Borzenkov wrote:
29.01.2018 14:24, Adam Borowski пишет:
...
So any event (the user's request) has already happened. A rc
system, of
which systemd is one, knows whether we reached the "want root
filesystem" or
"want secondary filesystems" stage. Once you're there, you can
issue the
mount() call and let the kernel do the work.
It is a btrfs choice to not expose compound device as separate one
(like
every other device manager does)
Btrfs is not a device manager, it's a filesystem.
it is a btrfs drawback that doesn't provice anything else except
for this
IOCTL with it's logic
How can it provide you with something it doesn't yet have? If you
want the
information, call mount(). And as others in this thread have
mentioned,
what, pray tell, would you want to know "would a mount succeed?" for
if you
don't want to mount?
it is a btrfs drawback that there is nothing to push assembling
into "OK,
going degraded" state
The way to do so is to timeout, then retry with -o degraded.
That's possible way to solve it. This likely requires support from
mount.btrfs (or btrfs.ko) to return proper indication that filesystem is
incomplete so caller can decide whether to retry or to try degraded
mount.
We already do so in the accepted standard manner. If the mount fails
because of a missing device, you get a very specific message in the
kernel log about it, as is the case for most other common errors (for
uncommon ones you usually just get a generic open_ctree error). This
is really the only option too, as the mount() syscall (which the mount
command calls) returns only 0 on success or -1 and an appropriate
errno value on failure, and we can't exactly go about creating a half
dozen new error numbers just for this (well, technically we could, but
I very much doubt that they would be accepted upstream, which defeats
the purpose).
Or may be mount.btrfs should implement this logic internally. This would
really be the most simple way to make it acceptable to the other side by
not needing to accept anything :)
And would also be another layering violation which would require a
proliferation of extra mount options to control the mount command
itself and adjust the timeout handling.
This has been done before with mount.nfs, but for slightly different
reasons (primarily to allow nested NFS mounts, since the local
directory that the filesystem is being mounted on not being present is
treated like a mount timeout), and it had near zero control. It works
there because they push the complicated policy decisions to userspace
(namely, there is no support for retrying with different options or
trying a different server).
I just felt like commenting a bit on this from a regular users point of
view.
Remember that at some point BTRFS will probably be the default
filesystem for the average penguin.
BTRFS big selling point is redundance and a guarantee that whatever you
write is the same that you will read sometime later.
Many users will probably build their BTRFS system on a redundant array
of storage devices. As long as there are sufficient (not necessarily
all) storage devices present they expect their system to come up and
work. If the system is not able to come up in a fully operative state it
must at least be able to limp until the issue is fixed.
Starting a argument about what init system is the most sane or most
shiny is not helping. The truth is that systemd is not going away
sometime soon and one might as well try to become friends if nothing
else for the sake of having things working which should be a common goal
regardless of the religion.
FWIW, I don't care that it's systemd in this case, I care that people
are arguing for the forced use of a coding anti-pattern that ends up
being covered as bad practice in first year computer science courses
(no, seriously, every professional programmer I've asked about this had
time-of-check-time-of-use race conditions covered in one of their
first-year CS classes) or the enforcement of an event-based model that
really doesn't make any sense for this (OK, it makes a little sense for
handling of devices reappearing, but systemd doesn't need to be involved
in that beyond telling the kernel that the device reappeared, except
that that's udev's job).
I personally think the degraded mount option is a mistake as this
assumes that a lightly degraded system is not able to work which is false.
If the system can mount to some working state then it should mount
regardless if it is fully operative or not. If the array is in a bad
state you need to learn about it by issuing a command or something. The
same goes for a MD array (and yes, I am aware of the block layer vs
filesystem thing here).
The problem with this is that right now, it is not safe to run a BTRFS
volume degraded and writable, but for an even remotely usable system
with pretty much any modern distro, you need your root filesystem to be
writable (or you need to have jumped through the hoops to make sure /var
and /tmp are writable even if / isn't).
Long-term, yes, I do think that such behavior should be an option (yes,
specifically optional, there are people out there who like me would
rather the system just doesn't boot so we know immediately something is
wrong and can fix it then).
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html