Austin S. Hemmelgarn wrote:
On 2019-02-08 13:10, waxhead wrote:
Austin S. Hemmelgarn wrote:
On 2019-02-07 13:53, waxhead wrote:


Austin S. Hemmelgarn wrote:

So why do BTRFS hurry to mount itself even if devices are missing? and if BTRFS still can mount , why whould it blindly accept a non-existing disk to take part of the pool?!
It doesn't unless you tell it to., and that behavior is exactly what I'm arguing against making the default here.
Understood, but that is not quite what I meant - let me rephrase...
If BTRFS still can't mount, why would it blindly accept a previously non-existing disk to take part of the pool?! E.g. if you have "disk" A+B and suddenly at one boot B is not there. Now you have only A and one would think that A should register that B has been missing. Now on the next boot you have AB , in which case B is likely to have diverged from A since A has been mounted without B present - so even if both devices are present why would btrfs blindly accept that both A+B are good to go even if it should be perfectly possible to register in A that B was gone. And if you have B without A it should be the same story right?


Realistically, we can only safely recover from divergence correctly if we can prove that all devices are true prior states of the current highest generation, which is not currently possible to do reliably because of how BTRFS operates.

So what you are saying is that the generation number does not represent a true frozen state of the filesystem at that point?
It does _only_ for those devices which were present at the time of the commit that incremented it.

So in other words devices that are not present can easily be marked / defined as such at a later time?

As an example (don't do this with any BTRFS volume you care about, it will break it), take a BTRFS volume with two devices configured for raid1.  Mount the volume with only one of the devices present, issue a single write to it, then unmounted it.  Now do the same with only the other device.  Both devices should show the same generation number right now (but it should be one higher than when you started), but the generation number on each device refers to a different volume state.

Also, LVM and MD have the exact same issue, it's just not as significant because they re-add and re-sync missing devices automatically when they reappear, which makes such split-brain scenarios much less likely.
Which means marking the entire device as invalid, then re-adding it from scratch more or less...
Actually, it doesn't.

For LVM and MD, they track what regions of the remaining device have changed, and sync only those regions when the missing device comes back.

For MD , if you have the bitmap enabled yes...

For BTRFS, the same thing happens implicitly because of the COW structure, and you can manually reproduce similar behavior to LVM or MD by scrubbing the volume and then using balance with the 'soft' filter to ensure all the chunks are the correct type.

Understood.

Why does systemd concern itself about what devices btrfs consist of. Please educate me, I am curious.
For the same reason that it concerns itself with what devices make up a LVM volume or an MD array.  In essence, it comes down to a couple of specific things:

* It is almost always preferable to delay boot-up while waiting for a missing device to reappear than it is to start using a volume that depends on it while it's missing.  The overall impact on the system from taking a few seconds longer to boot is generally less than the impact of having to resync the device when it reappears while the system is still booting up.

* Systemd allows mounts to not block the system booting while still allowing certain services to depend on those mounts being active.  This is extremely useful for remote management reasons, and is actually supported by most service managers these days.  Systemd extends this all the way down the storage stack though, which is even more useful, because it lets disk failures properly cascade up the storage stack and translate into the volumes they were part of showing up as degraded (or getting unmounted if you choose to configure it that way).
Ok, not sure I still understand how/why systemd knows what devices are part of btrfs (or md or lvm for that matter). I'll try to research this a bit - thanks for the info!


IOW, there's a special case with systemd that makes even mounting BTRFS volumes that have missing devices degraded not work.
Well I use systemd on Debian and have not had that issue. In what situation does this fail?
At one point, if you tried to manually mount a volume that systemd did not see all the constituent devices present for, it would get unmounted almost instantly by systemd itself.  This may not be the case anymore, or it may have been how the distros I've used with systemd on them happened to behave, but either way it's a pain in the arse when you want to fix a BTRFS volume.
I can see that, but from my "toying around" with btrfs I have not run into any issues while mounting degraded.



* Given that new kernels still don't properly generate half-raid1 chunks when a device is missing in a two-device raid1 setup, there's a very real possibility that users will have trouble recovering filesystems with old recovery media (IOW, any recovery environment running a kernel before 4.14 will not mount the volume correctly).
Sometimes you have to break a few eggs to make an omelette right? If people want to recover their data they should have backups, and if they are really interested in recovering their data (and don't have backups) then they will probably find this on the web by searching anyway...
Backups aren't the type of recovery I'm talking about.  I'm talking about people booting to things like SystemRescueCD to fix system configuration or do offline maintenance without having to nuke the system and restore from backups.  Such recovery environments often don't get updated for a _long_ time, and such usage is not atypical as a first step in trying to fix a broken system in situations where downtime really is a serious issue.
I would say that if downtime is such a serious issue you have a failover and a working tested backup.
Generally yes, but restoring a volume completely from scratch is almost always going to take longer than just fixing what's broken unless it's _really_ broken.  Would you really want to nuke a system and rebuild it from scratch just because you accidentally pulled out the wrong disk when hot-swapping drives to rebuild an array?
Absolutely not , but in this case I would not even want to use a rescue disk in the first place.


* You shouldn't be mounting writable and degraded for any reason other than fixing the volume (or converting it to a single profile until you can fix it), even aside from the other issues.

Well in my opinion the degraded mount option is counter intuitive. Unless otherwise asked for the system should mount and work as long as it can guarantee the data can be read and written somehow (regardless if any redundancy guarantee is not met). If the user is willing to accept more or less risk they should configure it!
Again, BTRFS mounting degraded is significantly riskier than LVM or MD doing the same thing.  Most users don't properly research things (When's the last time you did a complete cost/benefit analysis before deciding to use a particular piece of software on a system?), and would not know they were taking on significantly higher risk by using BTRFS without configuring it to behave safely until it actually caused them problems, at which point most people would then complain about the resulting data loss instead of trying to figure out why it happened and prevent it in the first place.  I don't know about you, but I for one would rather BTRFS have a reputation for being over-aggressively safe by default than risking users data by default.
Well I don't do cost/benefit analysis since I run free software. I do however try my best to ensure that whatever software I install don't cause more drawbacks than benefits.
Which is essentially a CBA.  The cost doesn't have to equate to money, it could be time, or even limitations in what you can do with the system.

I would also like for BTRFS to be over-aggressively safe, but I also want it to be over-aggressively always running or even limping if that is what it needs to do.
And you can have it do that, we just prefer not to by default.
Got it!

Reply via email to