On Sat, Feb 2, 2019 at 5:02 AM Hugo Mills <h...@carfax.org.uk> wrote: > > On Fri, Feb 01, 2019 at 11:28:27PM -0500, Alan Hardman wrote: > > I have a Btrfs filesystem using 6 partitionless disks in RAID1 that's > > failing to mount. I've tried the common recommended safe check options, but > > I haven't gotten the disk to mount at all, even with -o ro,recovery. If > > necessary, I can try to use the recovery to another filesystem, but I have > > around 18 TB of data on the filesystem that won't mount, so I'd like to > > avoid that if there's some other way of recovering it. > > > > Versions: > > btrfs-progs v4.19.1 > > Linux localhost 4.20.6-arch1-1-ARCH #1 SMP PREEMPT Thu Jan 31 08:22:01 UTC > > 2019 x86_64 GNU/Linux > > > > Based on my understanding of how RAID1 works with Btrfs, I would expect a > > single disk failure to not prevent the volume from mounting entirely, but > > I'm only seeing one disk with errors according to dmesg output, maybe I'm > > misinterpreting it: > > > > [ 534.519437] BTRFS warning (device sdd): 'recovery' is deprecated, use > > 'usebackuproot' instead > > [ 534.519441] BTRFS info (device sdd): trying to use backup root at mount > > time > > [ 534.519443] BTRFS info (device sdd): disk space caching is enabled > > [ 534.519446] BTRFS info (device sdd): has skinny extents > > [ 536.306194] BTRFS info (device sdd): bdev /dev/sdc errs: wr 23038942, rd > > 22208378, flush 1, corrupt 29486730, gen 2933 > > [ 556.126928] BTRFS critical (device sdd): corrupt leaf: root=2 > > block=25540634836992 slot=45, unexpected item end, have 13882 expect 13898 > > It's worth noting that 13898-13882 = 16, which is a power of > two. This means that you most likely have a single-bit error in your > metadata. That, plus the checksum not being warned about, would > strongly suggest that you have bad RAM. I would recommend that you > check your RAM first before trying anything else that would write to > your filesystem (including btrfs check --repair).
Good catch! I think that can account for the corrupt and generation errors. I don't know that memory errors can account for the large number of read and write errors, however. So there may be more than one problem. -- Chris Murphy