On 2015-10-05 09:14, Hugo Mills wrote:
On Mon, Oct 05, 2015 at 08:30:17AM -0400, Austin S Hemmelgarn wrote:
I've been having issues recently with a relatively simple setup
using a two device BTRFS raid1 on top of two two device md RAID0's,
and every time I've rebooted since starting trying to use this
particular filesystem, I've found it unable to mount and had to
recreate it from scratch.  This is more of an inconvenience than
anything else (while I don't have backups of it, all the data is
trivial to recreate (in fact, so trivial that doing backups would be
more effort than just recreating the data by hand)), but it's still
something that I would like to try and fix.

First off, general info:
Kernel version: 4.2.1-local+ (4.2.1 with minor modifications,
sources can be found here: https://github.com/ferroin/linux)
Btrfs-progs version: 4.2

I would post output from btrfs fi show, but that's spouting
obviously wrong data (it's saying I'm using only 127MB with 2GB of
allocations on each 'disk',  I had been storing approximately 4-6GB
of actual data on the filesystem).

This particular filesystem is composed of BTRFS raid1 across two LVM
managed DM/MD RAID0 devices, each of which spans 2 physical hard
drives.  I have a couple of other filesystems with the exact same
configuration that have not ever displayed this issue.

When I run 'btrfs check' on the filesystem when it refuses to mount,
I get a number of lines like the following:
bad metadata [<bytenr>, <bytenr>) crossing stripe boundary

followed eventually by:
Errors found in extent allocation tree or chunk allocation

    I _think_ this is a bug in mkfs from 4.2.0, fixed in later
releases of the btrfs-progs.
If so, that's good news (that is, that it's just a mkfs bug). I guess it's time for me to quit waiting around for Gentoo to package the newest version and build it myself.

As is typical of a failed mount, dmesg shows a 'failed to read the
system array on <device>' 'open_ctree failed'.

I doubt that this is a hardware issue because:
1. Memory is brand new, and I ran a 48 hour burn-in test that showed
no errors.
2. A failing storage controller, PSU, or CPU would be manifesting
with many more issues than just this.
3. A disk failure would mean that two different disks, from
different manufacturing lots, are encountering errors on exactly the
same LBA's at exactly the same time, which while possible is
astronomically unlikely for disks bigger than a few hundred
gigabytes (the disks in question are 1TB each).




Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to