On Sat, Feb 2, 2019 at 5:02 AM Hugo Mills <h...@carfax.org.uk> wrote:
>
> On Fri, Feb 01, 2019 at 11:28:27PM -0500, Alan Hardman wrote:
> > I have a Btrfs filesystem using 6 partitionless disks in RAID1 that's 
> > failing to mount. I've tried the common recommended safe check options, but 
> > I haven't gotten the disk to mount at all, even with -o ro,recovery. If 
> > necessary, I can try to use the recovery to another filesystem, but I have 
> > around 18 TB of data on the filesystem that won't mount, so I'd like to 
> > avoid that if there's some other way of recovering it.
> >
> > Versions:
> > btrfs-progs v4.19.1
> > Linux localhost 4.20.6-arch1-1-ARCH #1 SMP PREEMPT Thu Jan 31 08:22:01 UTC 
> > 2019 x86_64 GNU/Linux
> >
> > Based on my understanding of how RAID1 works with Btrfs, I would expect a 
> > single disk failure to not prevent the volume from mounting entirely, but 
> > I'm only seeing one disk with errors according to dmesg output, maybe I'm 
> > misinterpreting it:
> >
> > [  534.519437] BTRFS warning (device sdd): 'recovery' is deprecated, use 
> > 'usebackuproot' instead
> > [  534.519441] BTRFS info (device sdd): trying to use backup root at mount 
> > time
> > [  534.519443] BTRFS info (device sdd): disk space caching is enabled
> > [  534.519446] BTRFS info (device sdd): has skinny extents
> > [  536.306194] BTRFS info (device sdd): bdev /dev/sdc errs: wr 23038942, rd 
> > 22208378, flush 1, corrupt 29486730, gen 2933
> > [  556.126928] BTRFS critical (device sdd): corrupt leaf: root=2 
> > block=25540634836992 slot=45, unexpected item end, have 13882 expect 13898
>
>    It's worth noting that 13898-13882 = 16, which is a power of
> two. This means that you most likely have a single-bit error in your
> metadata. That, plus the checksum not being warned about, would
> strongly suggest that you have bad RAM. I would recommend that you
> check your RAM first before trying anything else that would write to
> your filesystem (including btrfs check --repair).

Good catch!

I think that can account for the corrupt and generation errors. I
don't know that memory errors can account for the large number of read
and write errors, however. So there may be more than one problem.


-- 
Chris Murphy

Reply via email to