Austin S. Hemmelgarn wrote:
On 2019-02-07 06:04, Stefan K wrote:
Thanks, with degraded  as kernel parameter and also ind the fstab it works like expected

That should be the normal behaviour, cause a server must be up and running, and I don't care about a device loss, thats why I use a RAID1. The device-loss problem can I fix later, but its important that a server is up and running, i got informed at boot time and also in the logs files that a device is missing, also I see that if you use a monitoring program.
No, it shouldn't be the default, because:

* Normal desktop users _never_ look at the log files or boot info, and rarely run monitoring programs, so they as a general rule won't notice until it's already too late.  BTRFS isn't just a server filesystem, so it needs to be safe for regular users too.

I am willing to argue that whatever you refer to as normal users don't have a clue how to make a raid1 filesystem, nor do they care about what underlying filesystem their computer runs. I can't quite see how a limping system would be worse than a failing system in this case. Besides "normal" desktop users use Windows anyway, people that run on penguin powered stuff generally have at least some technical knowledge.

* It's easily possible to end up mounting degraded by accident if one of the constituent devices is slow to enumerate, and this can easily result in a split-brain scenario where all devices have diverged and the volume can only be repaired by recreating it from scratch.

Am I wrong or would not the remaining disk have the generation number bumped on every commit? would it not make sense to ignore (previously) stale disks and require a manual "re-add" of the failed disks. From a users perspective with some C coding knowledge this sounds to me (in principle) like something as quite simple. E.g. if the superblock UUID match for all devices and one (or more) devices has a lower generation number than the other(s) then the disk(s) with the newest generation number should be considered good and the other disks with a lower generation number should be marked as failed.

* We have _ZERO_ automatic recovery from this situation.  This makes both of the above mentioned issues far more dangerous.

See above, would this not be as simple as auto-deleting disks from the pool that has a matching UUID and a mismatch for the superblock generation number? Not exactly a recovery, but the system should be able to limp along.

* It just plain does not work with most systemd setups, because systemd will hang waiting on all the devices to appear due to the fact that they refuse to acknowledge that the only way to correctly know if a BTRFS volume will mount is to just try and mount it.

As far as I have understood this BTRFS refuses to mount even in redundant setups without the degraded flag. Why?! This is just plain useless. If anything the degraded mount option should be replaced with something like failif=X where X would be anything from 'never' which should get a 2 disk system up with exclusively raid1 profiles even if only one device is working. 'always' in case any device is failed or even 'atrisk' when loss of one more device would keep any raid chunk profile guarantee. (this get admittedly complex in a multi disk raid1 setup or when subvolumes perhaps can be mounted with different "raid" profiles....)

* Given that new kernels still don't properly generate half-raid1 chunks when a device is missing in a two-device raid1 setup, there's a very real possibility that users will have trouble recovering filesystems with old recovery media (IOW, any recovery environment running a kernel before 4.14 will not mount the volume correctly).
Sometimes you have to break a few eggs to make an omelette right? If people want to recover their data they should have backups, and if they are really interested in recovering their data (and don't have backups) then they will probably find this on the web by searching anyway...

* You shouldn't be mounting writable and degraded for any reason other than fixing the volume (or converting it to a single profile until you can fix it), even aside from the other issues.

Well in my opinion the degraded mount option is counter intuitive. Unless otherwise asked for the system should mount and work as long as it can guarantee the data can be read and written somehow (regardless if any redundancy guarantee is not met). If the user is willing to accept more or less risk they should configure it!

Reply via email to