On Mon, Oct 05, 2015 at 08:30:17AM -0400, Austin S Hemmelgarn wrote: > I've been having issues recently with a relatively simple setup > using a two device BTRFS raid1 on top of two two device md RAID0's, > and every time I've rebooted since starting trying to use this > particular filesystem, I've found it unable to mount and had to > recreate it from scratch. This is more of an inconvenience than > anything else (while I don't have backups of it, all the data is > trivial to recreate (in fact, so trivial that doing backups would be > more effort than just recreating the data by hand)), but it's still > something that I would like to try and fix. > > First off, general info: > Kernel version: 4.2.1-local+ (4.2.1 with minor modifications, > sources can be found here: https://github.com/ferroin/linux) > Btrfs-progs version: 4.2 > > I would post output from btrfs fi show, but that's spouting > obviously wrong data (it's saying I'm using only 127MB with 2GB of > allocations on each 'disk', I had been storing approximately 4-6GB > of actual data on the filesystem). > > This particular filesystem is composed of BTRFS raid1 across two LVM > managed DM/MD RAID0 devices, each of which spans 2 physical hard > drives. I have a couple of other filesystems with the exact same > configuration that have not ever displayed this issue. > > When I run 'btrfs check' on the filesystem when it refuses to mount, > I get a number of lines like the following: > bad metadata [<bytenr>, <bytenr>) crossing stripe boundary > > followed eventually by: > Errors found in extent allocation tree or chunk allocation
I _think_ this is a bug in mkfs from 4.2.0, fixed in later releases of the btrfs-progs. Hugo. > As is typical of a failed mount, dmesg shows a 'failed to read the > system array on <device>' 'open_ctree failed'. > > I doubt that this is a hardware issue because: > 1. Memory is brand new, and I ran a 48 hour burn-in test that showed > no errors. > 2. A failing storage controller, PSU, or CPU would be manifesting > with many more issues than just this. > 3. A disk failure would mean that two different disks, from > different manufacturing lots, are encountering errors on exactly the > same LBA's at exactly the same time, which while possible is > astronomically unlikely for disks bigger than a few hundred > gigabytes (the disks in question are 1TB each). > -- Hugo Mills | Jazz is the sort of music where no-one plays hugo@... carfax.org.uk | anything the same way once. http://carfax.org.uk/ | PGP: E2AB1DE4 |
signature.asc
Description: Digital signature