On Wed, Sep 16, 2015 at 5:56 PM, erp...@gmail.com <erp...@gmail.com> wrote:
> What I expected to happen: > I expected that the system would either start as if nothing were > wrong, or would warn me that one half of the mirror was missing and > ask if I really wanted to start the system with the root array in a > degraded state. It's not this sophisticated yet. Btrfs does not "assemble" degraded by default like mdadm and LVM based RAID. You need to manually mount it with -o degraded and then continue the boot process, or use boot parameter rootflags=degraded. Yet there is still some interaction between btrfs dev scan and udev (?) that I don't understand precisely, but what happens is when any device is missing, the Btrfs volume UUID doesn't appear and therefore it still can't be mounted degraded if volume UUID is used, e.g. boot parameter root=UUID=<btrfsrootvolumeuuid> so that needs to be changed to a /dev/sdXY type of notation and hope that you guess it correctly. > > What actually happened: > During the boot process, a kernel message appeared indicating that the > "system array" could not be found for the root filesystem (as > identified by a UUID). It then dumped me to an initramfs prompt. > Powering down the system, reattaching the second disk, and powering it > on allowed me to boot successfully. Running "btrfs fi df /" showed > that all System data was stored as RAID1. Just an FYI to be really careful about degraded rw mounts. There is no automatic resync to catch up the previously missing device with the device that was degraded,rw mounted. You have to scrub or balance, there's no optimization yet for Btrfs to effectively just "diff" between the devices' generations and get them all in sync quickly. Much worse is if you don't scrub or balance, and then redo the test reversing the device to make missing. Now you have multiple devices that were rw,degraded mounted, and putting them back together again will corrupt the whole file system irreparably. Fixing the first problem would (almost always) avoid the second problem. > If I want to have a storage server where one of two drives can fail at > any time without causing much down time, am I on the right track? If > so, what should I try next to get the behavior I'm looking for? It's totally not there yet if you want to obviate manual checks and intervention for failure cases. Both mdadm and LVM integrated RAID have monitoring and notification which Btrfs lacks entirely. So that means you have to check it or create scripts to check it. What often tends to happen is Btrfs just keeps retrying rather than ignoring a bad device, so you'll see piles of retries with dmesg But Btrfs doesn't kick out the bad device like the md drive would do. This could go on for hours, or days. So if you aren't checking for it, you could unwittingly have a degraded array already. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html