Re: Issues with unmountable BTRFS raid1 filesystem

Hugo Mills Mon, 05 Oct 2015 06:15:07 -0700

On Mon, Oct 05, 2015 at 08:30:17AM -0400, Austin S Hemmelgarn wrote:
> I've been having issues recently with a relatively simple setup
> using a two device BTRFS raid1 on top of two two device md RAID0's,
> and every time I've rebooted since starting trying to use this
> particular filesystem, I've found it unable to mount and had to
> recreate it from scratch.  This is more of an inconvenience than
> anything else (while I don't have backups of it, all the data is
> trivial to recreate (in fact, so trivial that doing backups would be
> more effort than just recreating the data by hand)), but it's still
> something that I would like to try and fix.
> 
> First off, general info:
> Kernel version: 4.2.1-local+ (4.2.1 with minor modifications,
> sources can be found here: https://github.com/ferroin/linux)
> Btrfs-progs version: 4.2
>
> I would post output from btrfs fi show, but that's spouting
> obviously wrong data (it's saying I'm using only 127MB with 2GB of
> allocations on each 'disk',  I had been storing approximately 4-6GB
> of actual data on the filesystem).
> 
> This particular filesystem is composed of BTRFS raid1 across two LVM
> managed DM/MD RAID0 devices, each of which spans 2 physical hard
> drives.  I have a couple of other filesystems with the exact same
> configuration that have not ever displayed this issue.
> 
> When I run 'btrfs check' on the filesystem when it refuses to mount,
> I get a number of lines like the following:
> bad metadata [<bytenr>, <bytenr>) crossing stripe boundary
> 
> followed eventually by:
> Errors found in extent allocation tree or chunk allocation


   I _think_ this is a bug in mkfs from 4.2.0, fixed in later
releases of the btrfs-progs.

   Hugo.

> As is typical of a failed mount, dmesg shows a 'failed to read the
> system array on <device>' 'open_ctree failed'.
> 
> I doubt that this is a hardware issue because:
> 1. Memory is brand new, and I ran a 48 hour burn-in test that showed
> no errors.
> 2. A failing storage controller, PSU, or CPU would be manifesting
> with many more issues than just this.
> 3. A disk failure would mean that two different disks, from
> different manufacturing lots, are encountering errors on exactly the
> same LBA's at exactly the same time, which while possible is
> astronomically unlikely for disks bigger than a few hundred
> gigabytes (the disks in question are 1TB each).
> 



-- 
Hugo Mills             | Jazz is the sort of music where no-one plays
hugo@... carfax.org.uk | anything the same way once.
http://carfax.org.uk/  |
PGP: E2AB1DE4          |

signature.asc
Description: Digital signature

Re: Issues with unmountable BTRFS raid1 filesystem

Reply via email to