need to recover large file

2015-12-15 Thread Langhorst, Brad
Hi:

I just screwed up… spent the last 3 weeks generting a 400G file (genome 
assembly) . 
Went to back it up and swapped the arguments to tar (tar Jcf my_precious 
my_precious.tar.xz) 
what was once 400G is now 108 bytes of xz header - argh.  

This is on a 6-volume btrfs filesystem. 

I immediately unmounted the fs (had to cd /  first).

After a bit of searching I found Chris Mason’s post about using 
btrfs-debug-tree -R

root tree: 25606900367360 level 1
chunk tree: 25606758596608 level 1
extent tree key (EXTENT_TREE ROOT_ITEM 0) 25606900383744 level 2
device tree key (DEV_TREE ROOT_ITEM 0) 25606865108992 level 1
fs tree key (FS_TREE ROOT_ITEM 0) 22983583956992 level 1
checksum tree key (CSUM_TREE ROOT_ITEM 0) 25606922682368 level 2
uuid tree key (UUID_TREE ROOT_ITEM 0) 22984609513472 level 0
data reloc tree key (DATA_RELOC_TREE ROOT_ITEM 0) 22984615477248 level 0
btrfs root backup slot 0
tree root gen 21613 block 25606891274240
extent root gen 21613 block 25606891323392
chunk root gen 21612 block 25606758596608
device root gen 21612 block 25606865108992
csum root gen 21613 block 25606891503616
fs root gen 21611 block 22983583956992
1712858198016 used 5400011280384 total 6 devices
btrfs root backup slot 1
tree root gen 21614 block 25606900367360
extent root gen 21614 block 25606900383744
chunk root gen 21612 block 25606758596608
device root gen 21612 block 25606865108992
csum root gen 21614 block 25606922682368
fs root gen 21611 block 22983583956992
1712858198016 used 5400011280384 total 6 devices
btrfs root backup slot 2
tree root gen 21611 block 25606857605120
extent root gen 21611 block 22983584268288
chunk root gen 21595 block 25606758612992
device root gen 21601 block 22983580794880
csum root gen 21611 block 22983584333824
fs root gen 21611 block 22983583956992
1712971542528 used 5400011280384 total 6 devices
btrfs root backup slot 3
tree root gen 21612 block 25606874546176
extent root gen 21612 block 25606880575488
chunk root gen 21612 block 25606758596608
device root gen 21612 block 25606865108992
csum root gen 21612 block 25606890864640
fs root gen 21611 block 22983583956992
1712971444224 used 5400011280384 total 6 devices
total bytes 5400011280384
bytes used 1712858198016
uuid b13d3dc1-f287-483c-8b7d-b142f31fe6df
Btrfs v3.12

if found the oldest generation and grabbed the tree root gen block  like this

sudo btrfs restore -o -v -t 25606857605120 --path-regex 
^/\(\|deer\(\|/masurca\(\|/quorum_mer_db.jf\)\)\)$ /dev/sdd1 /tmp/recover/

unfortunately, I only recovered the post error file.

I also read that one can use btrfs-find-root to get a list of files to recover
 and just ran btrfs-find-root on one of the underlying disks but I get an error 
 
"Super think's the tree root is at 25606900367360, chunk root 25606758596608
Went past the fs size, exiting”
Someone else was able to get past this by commenting that error … so i tried 
recompiling without those lines.

Super think's the tree root is at 25606900367360, chunk root 25606758596608
Well block 24729718243328 seems great, but generation doesn't match, 
have=21582, want=21614 level 1
Well block 24729718292480 seems great, but generation doesn't match, 
have=21582, want=21614 level 0
Well block 24729718308864 seems great, but generation doesn't match, 
have=21582, want=21614 level 0
Well block 24729718407168 seems great, but generation doesn't match, 
have=21582, want=21614 level 0
Well block 24729944670208 seems great, but generation doesn't match, 
have=21583, want=21614 level 1
Well block 24729944719360 seems great, but generation doesn't match, 
have=21583, want=21614 level 0
Well block 24729944735744 seems great, but generation doesn't match, 
have=21583, want=21614 level 0
Well block 24729944817664 seems great, but generation doesn't match, 
have=21583, want=21614 level 0
Well block 24730048708608 seems great, but generation doesn't match, 
have=21584, want=21614 level 1
Well block 24730048724992 seems great, but generation doesn't match, 
have=21584, want=21614 level 0
Well block 24730048741376 seems great, but generation doesn't match, 
have=21584, want=21614 level 0
Well block 24730048757760 seems great, but generation doesn't match, 
have=21584, want=21614 level 0
Well block 24730048774144 seems great, but generation doesn't match, 
have=21584, want=21614 level 0
Well block 24730048823296 seems great, but generation doesn't match, 
have=21584, want=21614 level 0
Well block 24730132348928 seems great, but generation doesn't match, 
have=21585, want=21614 level 1
Well block 24730132414464 seems great, but generation doesn't mat

Re: need to recover large file

2015-12-15 Thread Michael Darling
... Didn't read all of your original post originally, because I
haven't been into those internals.  Now that I have, I see it seems to
be using 6 devices, so you might have to use 1 hard drive 6*size of
partition (others can say if this will work), or 6 other hard drives
to make the backups.  Although, I'm not sure what raid configuration
there may be, which could reduce the number of copies you had to make.

On Tue, Dec 15, 2015 at 10:59 PM, Michael Darling  wrote:
>
> Or, even better yet, just unplug the drive completely and let someone more 
> knowledgeable with btrfs say my dd suggestion works, as long as you don't 
> mount either when they're both plugged in.  I know the urge to just work on 
> something, but bad recovery attempts can make recoverable data lost forever.
>
> On Tue, Dec 15, 2015 at 10:56 PM, Michael Darling  wrote:
>>
>> First thing first, if the file is as important as it sounds like.  Since 
>> there's no physical problem with the hard drive, go get (or if you have one 
>> laying around, use) another hard drive that is at least as big as the 
>> partition that file was on.  Then completely unmount the original partition. 
>>  Do a dd copy of the entire partition it was on.
>>
>> Then DO NOT mount either partition.  You do NOT want to mount a btrfs 
>> partition when there's a dd clone hooked up, because the UUID's are the same 
>> on both.
>>
>> Turn off the machine, pull the new backup copy, then go back to work on 
>> recovery attempts.  If recovery goes wrong, restore the fresh backup copy by 
>> another dd.  Make sure again to NOT mount either while more than one copy is 
>> plugged in.
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: need to recover large file

2015-12-16 Thread Duncan
Langhorst, Brad posted on Wed, 16 Dec 2015 03:13:48 + as excerpted:

> Hi:
> 
> I just screwed up
 spent the last 3 weeks generting a 400G file
> (genome assembly) .
> Went to back it up and swapped the arguments to tar (tar Jcf my_precious
> my_precious.tar.xz)
> what was once 400G is now 108 bytes of xz header - argh.
> 
> This is on a 6-volume btrfs filesystem.
> 
> I immediately unmounted the fs (had to cd /  first).
> 
> After a bit of searching I found Chris Mason’s post about using
> btrfs-debug-tree -R

[Snip most of the result as I'm not familiar with this utility.  But it 
ends with...]

> Btrfs v3.12

[...]
 
> I also read that one can use btrfs-find-root to get a list of files to
> recover and just ran btrfs-find-root on one of the underlying disks but
> I get an error "Super think's the tree root is at 25606900367360,
> chunk root 25606758596608
> Went past the fs size, exiting

WTF?  I thought that bug was patched a long time ago.  How old a btrfs-
progs are you using, anyway?

*OH!* *3.12*

Why are you still using 3.12?  That's nigh back in the dark ages as far 
as btrfs is concerned.  AFAIK, btrfs was either still labeled 
experimental back then, or 3.12 was the first version where experimental 
was stripped, so it's a very long time ago indeed, particularly for a 
filesystem that while no longer experimental, is still under heavy 
development, with bugs being fixed in every release, with the patches not 
always reaching old stable tho since 3.12 for the kernel anyway they do 
try, so if you're running old releases, you're code that's known to be 
buggy and known to have fixes for those bugs in newer releases.

The general list recommendation for the kernel, unless you have a known 
reason (like an already reported but still unfixed bug in newer), is run 
the latest or next to latest release series of either current, which 
would ATM be 4.3 or 4.2, as 4.4 isn't out yet, tho it's getting close, or 
or LTS, which would be 4.1 or 3.18, tho 4.4 will be also and it's getting 
close so you should be already preparing to upgrade to at least 4.1 if 
you're on the LTS series.

The coverage of the penultimate series gives you some time to upgrade to 
the latest, since the penultimate series is covered too.

For runtime, the kernel code is generally used (userspace mostly just 
makes calls to the kernel and lets it do the work), so kernel code is 
most important.  However, once you're trying to recover things, 
basically, when you're working with the unmounted filesystem, userspace 
code is used, so having something reasonably current there becomes 
important.

As a rule of thumb, then, running a btrfs-progs userspace of at least the 
same release series as the kernel you're running is recommended (tho 
newer is fine), since the kernel and userspace were developed at about 
the same time and with the same problems in mind, and if you're keeping 
up with the kernel series recommendation, that means you're userspace 
isn't getting /too/ old.  But even then, once you're trying to do a btrfs 
recovery with those tools, a recommendation to try latest current can be 
considered normal, since it'll be able to detect and usually fix the 
latest problems.

So a 3.18 series kernel and at least a 3.18 series userspace, would be 
recommended, and indeed, for a filesystem like btrfs that is still 
stabilizing and not yet fully stable and mature, is quite reasonable 
indeed.  While some people have reason to use particularly old and 
demonstrated stable versions, and enterprise distros generally cater to 
this need with support for upto a decade, using a still new and maturing 
btrfs is incompatible with a need for old and demonstrated stable, so in 
that case, from the viewpoint of this list at least, if you're looking 
for that old and stable, you really should be using a different 
filesystem, as btrfs simply isn't that old and stable, yet.

Meanwhile, while that's the view of upstream btrfs and thus this upstream 
list, some distros never-the-less choose to support old btrfs, backporting 
patches, etc, as they consider necessary.  However, that's their support 
then, and their business.  If you're trusting them for that support, 
really, you should be contacting them for it, as this list really isn't 
in the best position to supply that sort of support.  Yes, they may be 
using an old in number kernel and perhaps userspace, with newer patches 
backported to it, but it's the distro that makes those choices and knows 
what patches it's backporting, and thus is in the best position to 
support it.  Not that we on the list won't try, but we're simply not in a 
good position to provide that support that far back as we've long since 
moved on, neither do we track what distros have backported and what they 
haven't, etc.


So, basically, you have four choices:

1) Follow list recommendations and upgrade to something that isn't out of 
the dark ages in terms of btrfs history.

2) Follow the presumed recommendat

Re: need to recover large file

2015-12-17 Thread Langhorst, Brad
I have updated the btrfs tools.
The error with find roots was fixed.

I think i’m making some progress…

sudo ./btrfs restore -v -o -t 25606820544512  --path-regex 
^/\(\|deer\(\|/masurca\(\|/quorum_mer_db.jf\)\)\)$ /dev/sdd1 ~/recover/
parent transid verify failed on 25606820544512 wanted 21614 found 21591
parent transid verify failed on 25606820544512 wanted 21614 found 21591
parent transid verify failed on 25606820544512 wanted 21614 found 21591
parent transid verify failed on 25606820544512 wanted 21614 found 21591
Ignoring transid failure
parent transid verify failed on 25606820560896 wanted 21591 found 21611
parent transid verify failed on 25606820560896 wanted 21591 found 21611
parent transid verify failed on 25606820560896 wanted 21591 found 21611
parent transid verify failed on 25606820560896 wanted 21591 found 21611
Ignoring transid failure
leaf parent key incorrect 25606820560896
Couldn't setup extent tree
parent transid verify failed on 25606820560896 wanted 21591 found 21611
Ignoring transid failure
leaf parent key incorrect 25606820560896
Couldn't setup device tree
Could not open root, trying backup super
warning, device 6 is missing
warning, device 7 is missing
bytenr mismatch, want=25606758596608, have=0
Couldn't read chunk root
Could not open root, trying backup super
warning, device 6 is missing
warning, device 7 is missing
bytenr mismatch, want=25606758596608, have=0
Couldn't read chunk root
Could not open root, trying backup super

A bunch of data came back, but not the whole file…
The device 6 missing sounds like btrfs restore can’t find the other disks. 
(this is a 6 disk group - raid 0 - see below for details)

Should I be using a UUID here somehow (didn’t see anything in the docs… but 
maybe)?

Thanks for any advice!

Brad

—
Brad Langhorst, Ph.D.
Development Scientist
New England Biolabs




> On Dec 16, 2015, at 6:01 AM, Duncan <1i5t5.dun...@cox.net> wrote:
> 
> Langhorst, Brad posted on Wed, 16 Dec 2015 03:13:48 + as excerpted:
> 
>> Hi:
>> 
>> I just screwed up  spent the last 3 weeks generting a 400G file
>> (genome assembly) .
>> Went to back it up and swapped the arguments to tar (tar Jcf my_precious
>> my_precious.tar.xz)
>> what was once 400G is now 108 bytes of xz header - argh.
>> 
>> This is on a 6-volume btrfs filesystem.
>> 
>> I immediately unmounted the fs (had to cd /  first).
>> 
>> After a bit of searching I found Chris Mason’s post about using
>> btrfs-debug-tree -R
> 
> [Snip most of the result as I'm not familiar with this utility.  But it 
> ends with...]
> 
>> Btrfs v3.12
> 
> [...]
> 
>> I also read that one can use btrfs-find-root to get a list of files to
>> recover and just ran btrfs-find-root on one of the underlying disks but
>> I get an error "Super think's the tree root is at 25606900367360,
>> chunk root 25606758596608
>> Went past the fs size, exiting
> 
> WTF?  I thought that bug was patched a long time ago.  How old a btrfs-
> progs are you using, anyway?
> 
> *OH!* *3.12*
> 
> Why are you still using 3.12?  That's nigh back in the dark ages as far 
> as btrfs is concerned.  AFAIK, btrfs was either still labeled 
> experimental back then, or 3.12 was the first version where experimental 
> was stripped, so it's a very long time ago indeed, particularly for a 
> filesystem that while no longer experimental, is still under heavy 
> development, with bugs being fixed in every release, with the patches not 
> always reaching old stable tho since 3.12 for the kernel anyway they do 
> try, so if you're running old releases, you're code that's known to be 
> buggy and known to have fixes for those bugs in newer releases.
> 
> The general list recommendation for the kernel, unless you have a known 
> reason (like an already reported but still unfixed bug in newer), is run 
> the latest or next to latest release series of either current, which 
> would ATM be 4.3 or 4.2, as 4.4 isn't out yet, tho it's getting close, or 
> or LTS, which would be 4.1 or 3.18, tho 4.4 will be also and it's getting 
> close so you should be already preparing to upgrade to at least 4.1 if 
> you're on the LTS series.
> 
> The coverage of the penultimate series gives you some time to upgrade to 
> the latest, since the penultimate series is covered too.
> 
> For runtime, the kernel code is generally used (userspace mostly just 
> makes calls to the kernel and lets it do the work), so kernel code is 
> most important.  However, once you're trying to recover things, 
> basically, when you're working with the unmounted filesystem, userspace 
> code is used, so having something reasonably current there becomes 
> important.
> 
> As a rule of thumb, then, running a btrfs-progs userspace of at least the 
> same release series as the kernel you're running is recommended (tho 
> newer is fine), since the kernel and userspace were developed at about 
> the same time and with the same problems in mind, and if you're keeping 
> up with the kernel series recommendation, that means y