Device missing with RAID1 on boot - observations

Steven Davies Mon, 05 Apr 2021 09:35:20 -0700

Kernel: 5.11.8 vanilla, btrfs-progs 5.11.1

I booted a box with a root btrfs raid1 across two devices, /dev/nvme0n1p2 (devid 2) and/dev/sda2 (devid 3). For whatever reason during the initrd stage, btrfs device scan was unableto see the NVMe device and mounted the rootfs degraded after multiple retries as I had designedin the init script.

Once booted apparently the kernel was able to see nvme0n1p2 again (with no intervention from me)and btrfs device usage / btrfs filesystem show did not report any missing devices. btrfs scrubreported that devid 2 was unwriteable but the scrub completed successfully on devid 3 with noerrors. New block groups for data and metadata were being created as single on devid 3.

I balanced with -dconvert=single -mconvert=dup which moved all block groups to devid 3 andcompleted successfully; there was nothing remaining on devid 2 so I removed the device from thefilesystem and re-added it as devid 4. Once I'd balanced the filesystem back to -dconvert=raid1-mconvert=raid1 everything was back to normal.

My main observation was that it was very hard to notice that there was an issue. Yes, I'dpurposefully mounted as degraded, but there was no indication from the btrfs tools as to why newblock groups were only being created as single on one device: nothing was marked as missing orunwriteable. Is this behavour expected? How can a device be unwriteable but not marked as missing?

Was my course of action to correct the issue correct - is there a better way to re-sync a raid1device which has temporarily been removed?

(Afterwards I realised what caused the issue - missing libraries in the initrd - and I canreproduce it if necessary.)


--
Steven Davies

Device missing with RAID1 on boot - observations

Reply via email to