Re: Unexpected raid1 behaviour

Austin S. Hemmelgarn Mon, 18 Dec 2017 05:32:17 -0800

On 2017-12-16 14:50, Dark Penguin wrote:

Could someone please point me towards some read about how btrfs handles
multiple devices? Namely, kicking faulty devices and re-adding them.


I've been using btrfs on single devices for a while, but now I want to
start using it in raid1 mode. I booted into an Ubuntu 17.10 LiveCD and
tried to see how does it handle various situations. The experience left
me very surprised; I've tried a number of things, all of which produced
unexpected results.

Expounding a bit on Duncan's answer with some more specific info.


I create a btrfs raid1 filesystem on two hard drives and mount it.

- When I pull one of the drives out (simulating a simple cable failure,
which happens pretty often to me), the filesystem sometimes goes
read-only. ??? > - But only after a while, and not always. ???

The filesystem won't go read-only until it hits an I/O error, and it'snon-deterministic how long it will be before that happens on an idlefilesystem that only sees read access (because if all the files that arebeing read are in the page cache).

- When I fix the cable problem (plug the device back), it's immediately
"re-added" back. But I see no replication of the data I've written onto
a degraded filesystem... Nothing shows any problems, so "my filesystem
must be ok". ???

One of two things happens in this case, and why there is no re-sync isdependent on which happens, but both ultimately have to do with the factthat BTRFS assumes I/O errors are from device failures, and are at worsttransient. Either:

1. The device reappears with the same name. This happens if the time itwas disconnected is less than the kernel's command timeout (30 secondsby default). In this case, BTRFS may not even notice that the devicewas gone (and if it doesn't, then a re-sync isn't necessary, since itwill retry all the writes it needs to). In this case, BTRFS assumes theI/O errors were temporary, and keeps using the device after logging theerrors. If this happens, then you need to manually re-sync things byscrubbing the filesystem (or balancing, but scrubbing is preferred as itshould run quicker and will only re-write what is actually needed).2. The device reappears with a different name. In this case, the devicewas gone long enough that the block layer is certain it wasdisconnected, and thus when it reappears and BTRFS still holds openreferences to the old device node, it gets a new device node. In thiscase, if the 'new' device is scanned, BTRFS will recognize it as part ofthe FS, but will keep using the old device node. The correct fix hereis to unmount the filesystem, re-scan all devices, and then remount thefilesystem and manually re-sync with a scrub.

- If I unmount the filesystem and then mount it back, I see all my
recent changes lost (everything I wrote during the "degraded" period).

I'm not quite sure about this, but I think BTRFS is rolling back to thelast common generation number for some reason.

- If I continue working with a degraded raid1 filesystem (even without
damaging it further by re-adding the faulty device), after a while it
won't mount at all, even with "-o degraded".

This is (probably) a known bug relating to chunk handling. In a twodevice volume using a raid1 profile with a missing device, older kernels(I don't remember when the fix went in, but I could have sworn it was in4.13) will (erroneously) generate single-profile chunks when they needto allocate new chunks. When you then go to mount the filesystem, thecheck for the degraded mount-ability of the FS fails because there is adevice missing and single profile chunks.

Now, even without that bug, it's never a good idea t0o run a storagearray degraded for any extended period of time, regardless of what typeof array it is (BTRFS, ZFS, MD, LVM, or even hardware RAID). By keepingit in 'degraded' mode, you're essentially telling the system that thearray will be fixed in a reasonably short time-frame, which impacts howit handles the array. If you're not going to fix it almost immediately,you should almost always reshape the array to account for the missingdevice if at all possible, as that will improve relative data safety andgenerally get you better performance than running degraded will.


I can't wrap my head about all this. Either the kicked device should not
be re-added, or it should be re-added "properly", or it should at least
show some errors and not pretend nothing happened, right?..

BTRFS is not the best at error reporting at the moment. If you checkthe output of `btrfs device stats` for that filesystem though, it shouldshow non-zero values in the error counters (note that these counters arecumulative, so they are counts since the last time they were reset (orwhen the FS was created if they have never been reset). Similarly,scrub should report errors, there should be error messages in the kernellog, and switching the FS to read-only mode _is_ technically reportingan error, as that's standard error behavior for most sensiblefilesystems (ext[234] being the notable exception, they just continue asif nothing happened).


I must be missing something. Is there an explanation somewhere about
what's really going on during those situations? Also, do I understand
correctly that upon detecting a faulty device (a write error), nothing
is done about it except logging an error into the 'btrfs device stats'
report? No device kicking, no notification?.. And what about degraded
filesystems - is it absolutely forbidden to work with them without
converting them to a "single" filesystem first?..

As mentioned above, going read-only _is_ a notification that somethingis wrong. Translating that (and the error counter increase, and thekernel log messages) into a user visible notification is not really thejob of BTRFS, especially considering that no other filesystem or devicemanager does so either (yes, you can get nice notifications from LVM,but they aren't _from_ LVM itself, they're from other software thatwatches for errors, and the same type of software works just fine forBTRFS too). If you're this worried about it and don't want to keep ontop of it yourself by monitoring things manually, you really need tolook into a tool like monit [1] that can handle this for you.



[1] https://mmonit.com/monit/
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Unexpected raid1 behaviour

Reply via email to