Re: Issues with unmountable BTRFS raid1 filesystem

2015-10-05 Thread Hugo Mills
On Mon, Oct 05, 2015 at 08:30:17AM -0400, Austin S Hemmelgarn wrote:
> I've been having issues recently with a relatively simple setup
> using a two device BTRFS raid1 on top of two two device md RAID0's,
> and every time I've rebooted since starting trying to use this
> particular filesystem, I've found it unable to mount and had to
> recreate it from scratch.  This is more of an inconvenience than
> anything else (while I don't have backups of it, all the data is
> trivial to recreate (in fact, so trivial that doing backups would be
> more effort than just recreating the data by hand)), but it's still
> something that I would like to try and fix.
> 
> First off, general info:
> Kernel version: 4.2.1-local+ (4.2.1 with minor modifications,
> sources can be found here: https://github.com/ferroin/linux)
> Btrfs-progs version: 4.2
>
> I would post output from btrfs fi show, but that's spouting
> obviously wrong data (it's saying I'm using only 127MB with 2GB of
> allocations on each 'disk',  I had been storing approximately 4-6GB
> of actual data on the filesystem).
> 
> This particular filesystem is composed of BTRFS raid1 across two LVM
> managed DM/MD RAID0 devices, each of which spans 2 physical hard
> drives.  I have a couple of other filesystems with the exact same
> configuration that have not ever displayed this issue.
> 
> When I run 'btrfs check' on the filesystem when it refuses to mount,
> I get a number of lines like the following:
> bad metadata [, ) crossing stripe boundary
> 
> followed eventually by:
> Errors found in extent allocation tree or chunk allocation

   I _think_ this is a bug in mkfs from 4.2.0, fixed in later
releases of the btrfs-progs.

   Hugo.

> As is typical of a failed mount, dmesg shows a 'failed to read the
> system array on ' 'open_ctree failed'.
> 
> I doubt that this is a hardware issue because:
> 1. Memory is brand new, and I ran a 48 hour burn-in test that showed
> no errors.
> 2. A failing storage controller, PSU, or CPU would be manifesting
> with many more issues than just this.
> 3. A disk failure would mean that two different disks, from
> different manufacturing lots, are encountering errors on exactly the
> same LBA's at exactly the same time, which while possible is
> astronomically unlikely for disks bigger than a few hundred
> gigabytes (the disks in question are 1TB each).
> 



-- 
Hugo Mills | Jazz is the sort of music where no-one plays
hugo@... carfax.org.uk | anything the same way once.
http://carfax.org.uk/  |
PGP: E2AB1DE4  |


signature.asc
Description: Digital signature


Issues with unmountable BTRFS raid1 filesystem

2015-10-05 Thread Austin S Hemmelgarn
I've been having issues recently with a relatively simple setup using a 
two device BTRFS raid1 on top of two two device md RAID0's, and every 
time I've rebooted since starting trying to use this particular 
filesystem, I've found it unable to mount and had to recreate it from 
scratch.  This is more of an inconvenience than anything else (while I 
don't have backups of it, all the data is trivial to recreate (in fact, 
so trivial that doing backups would be more effort than just recreating 
the data by hand)), but it's still something that I would like to try 
and fix.


First off, general info:
Kernel version: 4.2.1-local+ (4.2.1 with minor modifications, sources 
can be found here: https://github.com/ferroin/linux)

Btrfs-progs version: 4.2

I would post output from btrfs fi show, but that's spouting obviously 
wrong data (it's saying I'm using only 127MB with 2GB of allocations on 
each 'disk',  I had been storing approximately 4-6GB of actual data on 
the filesystem).


This particular filesystem is composed of BTRFS raid1 across two LVM 
managed DM/MD RAID0 devices, each of which spans 2 physical hard drives. 
 I have a couple of other filesystems with the exact same configuration 
that have not ever displayed this issue.


When I run 'btrfs check' on the filesystem when it refuses to mount, I 
get a number of lines like the following:

bad metadata [, ) crossing stripe boundary

followed eventually by:
Errors found in extent allocation tree or chunk allocation

As is typical of a failed mount, dmesg shows a 'failed to read the 
system array on ' 'open_ctree failed'.


I doubt that this is a hardware issue because:
1. Memory is brand new, and I ran a 48 hour burn-in test that showed no 
errors.
2. A failing storage controller, PSU, or CPU would be manifesting with 
many more issues than just this.
3. A disk failure would mean that two different disks, from different 
manufacturing lots, are encountering errors on exactly the same LBA's at 
exactly the same time, which while possible is astronomically unlikely 
for disks bigger than a few hundred gigabytes (the disks in question are 
1TB each).




smime.p7s
Description: S/MIME Cryptographic Signature