Re: btrfs raid5 unmountable

2014-02-04 Thread Tetja Rediske
Hello Duncan,

 Of course if you'd been following the list as btrfs testers really
 should still be doing at this point, you'd have seen all this covered
 before. And of course, if you had done pre-deployment testing before
 you stuck valuable data on that btrfs raid5, you'd have noted the
 problems, even without reading about it on-list or on the wiki.  But
 of course hindsight is 20/20, as they say, and at least you DO have
 backups, even if they'll take awhile to restore.  =:^) That's already
 vastly better than a lot of the reports we unfortunately get here.
 =:^\

yeah, i saw it. Main reason for me toying with btrfs is my work. We
rent Linux servers to customers. While they install and manage them by
them self, there will be questions about btrfs to our support, sooner or
later. So getting your hands on it early saves some headache later.

It is just inconvient for me now, so no big deal.

I first tried degraded, I also tried repair, no luck.

What I do now is, getting as much data from the broken fs with btrfs
recover and do a rsync after. It is just faster then downloading all
the data from my own mirror in the datacenter. ;)

After that I will even be so crazy and try it again. Next on my list is
send/receive. 

Tetja



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs raid5 unmountable

2014-02-03 Thread Tetja Rediske
Hi,

since Freenode is doomed today, i ask the direct way.

Following Filesystem:

Label: 'data'  uuid: 3a6fd6d7-5943-4cad-b56f-2e6dcabff453
Total devices 6 FS bytes used 7.02TiB
devid1 size 1.82TiB used 1.82TiB path /dev/sda3
devid2 size 2.73TiB used 2.48TiB path /dev/sdc3
devid3 size 931.38GiB used 931.38GiB path /dev/sdd3
devid5 size 931.51GiB used 931.51GiB path /dev/sde1
devid6 size 931.51GiB used 931.51GiB path /dev/sdf1
devid7 size 2.73TiB used 2.48TiB path /dev/sdb3

Btrfs v3.12-dirty

If I try to mount it from dmesg:

[30644.681210] parent transid verify failed on 32059176910848 wanted
259627 found 259431 
[30644.681307] parent transid verify failed on 32059176910848 wanted
259627 found 259431 
[30644.681399] btrfs bad tree block start 0 32059176910848 
[30644.681407] Failed to read block groups: -5 
[30644.776879] btrfs: open_ctree failed

btrfs check aborts with (many of the 1st lines)

[...] 
Ignoring transid failure parent transid verify failed on 32059196616704
wanted 259627 found 259432
parent transid verify failed on 32059196616704 wanted 259627 found
259432 Check tree block failed, want=32059196616704, have=32059196747776

parent transid verify failed on 32059196616704 wanted 259627 found259432
Ignoring transid failure
parent transid verify failed on 32059196616704 wanted 259627 found259432
Ignoring transid failure
parent transid verify failed on 32059177230336 wanted 259627 found259431
Ignoring transid failure
parent transid verify failed on 32059196620800 wanted 259627 found259432
parent transid verify failed on 32059196620800 wanted 259627 found259432
Check tree block failed, want=32059196620800, have=1983699371120445514
Check tree block failed, want=32059196620800, have=1983699371120445514
Check tree block failed, want=32059196620800, have=1983699371120445514
read block failed check_tree_block
btrfs: cmds-check.c:2212: check_owner_ref: Assertion `!(rec-is_root)'
failed.
Aborted

What happened before:

One disk was faulty, I added a new one and removed the old one,
followed by a balance.

So far so good.

Some days after this I accidently removed a SATA Power Connector from
another drive, without noticing it at first. Worked about an hour on
the system, building new Kernel on another Filesystem. Rebooted with my
new Kernel and the FS was not mountable. I noticed the missing disk
and reattached the power.

So far i tried:

mount -o recovery
btrfs check
(after google) btrfs-zero-log

Sadly no luck. Whoever I can get my Files with btrfs restore. The
Filesystem contains mainly Mediafiles, so it is not so bad, if they
were lost, but restoring them from backups and sources will need
atleast about a week. (Most of the Files are mirrored on a private
Server, but even with 100MBit this takes a lot of time ; )

Some Idea who to recover this FS?

Kind Regards
Tetja Rediske




--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs raid5 unmountable

2014-02-03 Thread Duncan
Tetja Rediske posted on Mon, 03 Feb 2014 17:12:24 +0100 as excerpted:

[...]

 What happened before:
 
 One disk was faulty, I added a new one and removed the old one, followed
 by a balance.
 
 So far so good.
 
 Some days after this I accidently removed a SATA Power Connector from
 another drive, without noticing it at first. Worked about an hour on the
 system, building new Kernel on another Filesystem. Rebooted with my new
 Kernel and the FS was not mountable. I noticed the missing disk and
 reattached the power.
 
 So far i tried:
 
 mount -o recovery
 btrfs check
 (after google) btrfs-zero-log
 
 Sadly no luck. Whoever I can get my Files with btrfs restore. The
 Filesystem contains mainly Mediafiles, so it is not so bad, if they were
 lost, but restoring them from backups and sources will need atleast
 about a week. (Most of the Files are mirrored on a private Server, but
 even with 100MBit this takes a lot of time ; )
 
 Some Idea who to recover this FS?

[As a btrfs users and list regular, /not/ a dev...]

That filesystem is very likely toast. =:(  Tho there's one thing you 
didn't mention trying yet that's worth the try.  See below...

You can read the list archives for the details if you like, but 
basically, the raid5/6 recovery code simply isn't complete yet and is not 
recommended for actual deployment in any way, shape or form.  In practice 
at present it's a fancy raid0 that calculates and writes a bunch of extra 
parity, and can be run-time tested and even in some cases recover from 
online-device-loss (as you noted), but throw a shutdown in there along 
with the bad device, and like a raid0, you might as well consider the 
filesystem lost... at least until the recovery code is complete, at which 
point if the filesystem is still around you may well be able to recover 
it, since the parity is all there, the code to actually recover from it 
just isn't all there yet.

FWIW, single-device btrfs is what I'd call almost-stable now altho you're 
still strongly encouraged to keep current and tested backups as there are 
still occasional corner-cases, and stay on current kernels and btrfs-
tools as potentially data-risking bugs still are getting fixed.  Multi-
device btrfs in single/raid0/1/10 modes are also closing in on stable 
now, tho not /quite/ as stable as single device, but quite usable as long 
as you do have tested backups -- unless you're unlucky you won't actually 
have to use them (I haven't had to use mine), but definitely keep 'em 
just in case.  But raid5/6, no-go, with the exception of pure testing 
data that you really are prepared to throw away, because recovery for it 
it really is still incomplete and thus known-broken.

The one thing I didn't see you mention that's worth a try if you haven't 
already, is the degraded mount option.  See
$KERNELSRC/Documentation/filesystems/btrfs.txt.  Tho really that should 
have been the first thing you tried for mounting once you realized you 
were down a device.

But with a bit of luck...

Also, if you've run btrf check with the --repair option (you didn't say, 
if you didn't, you should be fine as without --repair it's only a read-
only diagnostic), you may have made things worse, as that's really 
intended to be a last resort.

Of course if you'd been following the list as btrfs testers really should 
still be doing at this point, you'd have seen all this covered before.  
And of course, if you had done pre-deployment testing before you stuck 
valuable data on that btrfs raid5, you'd have noted the problems, even 
without reading about it on-list or on the wiki.  But of course hindsight 
is 20/20, as they say, and at least you DO have backups, even if they'll 
take awhile to restore.  =:^) That's already vastly better than a lot of 
the reports we unfortunately get here. =:^\

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html