Re: Help Recovering BTRFS array

2017-09-21 Thread grondinm

Hi Duncan,

I'm not sure if this will attache to my original message...

Thank you for your reply. For some reason i'm not getting list messages even 
tho i know i am subscribed.

I know all to well about the golden rule of data. It has bitten me  a few 
times. The data on this array is mostly data that i don't really care about. I 
was able to copy off what i wanted. The main reason i sent it to the list was 
just to see if i could somehow return the FS to a working state without having 
to recreate. I'm just surprised that all 3 copies of the super block got 
corrupted. Probably my lack of understanding but i always assumed that if one 
copy got corrupted it would be replaced by a good copy therefore leaving all 
copies in a good state. Is that not the case. If it is then what back luck that 
all 3 got messed up at same time. 

Some information i forgot to include in my original message

uname -a
Linux thebeach 4.12.13-gentoo-GMAN #1 SMP Sat Sep 16 15:28:26 ADT 2017 x86_64 
Intel(R) Core(TM) i5-2320 CPU @ 3.00GHz GenuineIntel GNU/Linux

btrfs --version
btrfs-progs v4.10.2

Anyways thank you again for your reply. I will leave the FS intact for a few 
days in case anymore details could help the development of BTRFS and maybe 
avoid this happening or having a recovery option.

Marc


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help Recovering BTRFS array

2017-09-18 Thread Duncan
grondinm posted on Mon, 18 Sep 2017 14:14:08 -0300 as excerpted:

> superblock: bytenr=65536, device=/dev/md0
> -
> ERROR: bad magic on superblock on /dev/md0 at 65536
> 
> superblock: bytenr=67108864, device=/dev/md0
> -
> ERROR: bad magic on superblock on /dev/md0 at 67108864
> 
> superblock: bytenr=274877906944, device=/dev/md0
> -
> ERROR: bad magic on superblock on /dev/md0 at 274877906944
> 
> Now i'm really panicked. Is the FS toast? Can any recovery be attempted?

First I'm a user and list regular, not a dev.  With luck they can help 
beyond the below suggestions...

However, there's no need to panic in any case, due to the sysadmin's 
first rule of backups: The true value of any data is defined by the 
number of backups of that data you consider(ed) it worth having.

As a result, there are precisely two possibilities, neither one of which 
calls for panic.

1) No need to panic because you have a backup, and recovery is as simple 
as restoring from that backup.

2) You don't have a backup, in which case the lack of that backup means 
you have defined the value of the data as only trivial, worth less than 
the time/trouble/resources you saved by not making that backup.  Because 
the data is only of trivial value anyway, and you saved the more valuable 
assets of the time/trouble/resources you would have put into that backup 
were the data of more than trivial value, you've still saved the stuff 
you considered most valuable, so again, no need to panic.

It's a binary state.  There's no third possibility available, and no 
possibility you lost what your actions, or lack of them in the case of no 
backup, defined as of most value to you.

(As for the freshness of that backup, the same rule applies, but to the 
data delta between the state as of the backup and the current state.  If 
the value of the changed data is worth it to you to have it backed up, 
you'll have freshened your backup.  If not, you defined it to be as of 
such trivial value as to not be worth the time/trouble/resources to do 
so.)


That said, at the time you're calculating the value of the data against 
the value of the time/trouble/resources required to back it up, the loss 
potential remains theoretical.  Once something actually happens to the 
data, it's no longer theoretical, and the data, while of trivial enough 
value to be worth the risk when it was theoretical, may still be valuable 
enough to you to spend at least some time/trouble on trying to recover it.

In that case, since you can still mount, I'd suggest mounting read-only 
to prevent any further damage, and then do a copy off of the data you 
can, to a different, unaffected, filesystem.

Then if there's still data you want that you couldn't simply copy off, 
you can try btrfs restore.  While I do have backups here, a couple times 
when things went bad, btrfs restore was able to get back pretty much 
everything to current, while were I to have had to restore from backups, 
I'd have lost enough changed data to hurt, even if I had defined it as of 
trivial enough value when the risk remained theoretical that I hadn't yet 
freshened the backup.  (Since then I upgraded the rest of my storage to 
ssd, thus lowering the time and hassle cost of backups, encouraging me to 
do them more frequently.  Talking about which, I need to freshen them in 
the near future.  It's now on my list for my next day off...)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Help Recovering BTRFS array

2017-09-18 Thread grondinm
Hello,

I will try to provide all information pertinent to the situation i find myself 
in.

Yesterday while trying to write some data to a BTRFS filesystem on top of a 
mdadm raid5 array encrypted with dmcrypt comprising of 4 1tb HDD my system 
became unresponsive and i had no choice but to hard reset. System came back up 
no problem and the array in question mounted without a complaint. Once i tried 
to write data to it again however the system became unresponsive again and 
required another hard reset. Again system came back up and everything mounted 
with no complaints.

This time i decided to run some checks. Ran a raid check by issuing 'echo check 
> /sys/block/md0/md/sync_action'. This completed without a single error. So i 
performed a proper restart just because and once the system came back up i 
initiated a scrub on the btrfs filesystem. This greeted me with my first 
indication that something is wrong:

btrfs sc stat /media/Storage2 
scrub status for e5bd5cf3-c736-48ff-b1c6-c9f678567788
scrub started at Mon Sep 18 06:05:21 2017, running for 07:40:47
total bytes scrubbed: 1.03TiB with 1 errors
error details: super=1
corrected errors: 0, uncorrectable errors: 0, unverified errors: 0

I was concerned but since it was still scrubbing i left it. Now things look 
really bleak... 

Every few minutes the scrub process goes into a D status as shown by htop it 
eventually keeps going and as far as i can see is still scrubbing(slowly). I 
decided to check a something else(based on the error above) I ran btrfs 
inspect-internal dump-super -a -f /dev/md0 which gave me this:

superblock: bytenr=65536, device=/dev/md0 
-
ERROR: bad magic on superblock on /dev/md0 at 65536

superblock: bytenr=67108864, device=/dev/md0
-
ERROR: bad magic on superblock on /dev/md0 at 67108864

superblock: bytenr=274877906944, device=/dev/md0
-
ERROR: bad magic on superblock on /dev/md0 at 274877906944

Now i'm really panicked. Is the FS toast? Can any recovery be attempted?

Here is the output of dump-super with the -F option:

superblock: bytenr=65536, device=/dev/md0
-
csum_type   43668 (INVALID)
csum_size   32
csum
0x76c647b04abf1057f04e40d1dc52522397258064b98a1b8f6aa6934c74c0dd55 [DON'T MATCH]
bytenr  6376050623103086821
flags   0x7edcc412b742c79f
( WRITTEN |
  RELOC |
  METADUMP |
  unknown flag: 0x7edcc410b742c79c )
magic   ..l~...q [DON'T MATCH]
fsid2cf827fa-7ab8-e290-b152-1735c2735a37
label   
.a.9.@.=4.#.|.D...]..dh=d,..k..n..~.5.i.8...(.._.tl.a.@..2..qidj.>Hy.U..{X5.kG0.)t..;/.2...@.T.|.u.<.`!J*9./8...&.g\.V...*.,/95.uEs..W.i..z..h...n(...VGn^F...H...5.DT..3.A..mK...~..}.1..n.
generation  1769598730239175261
root14863846352370317867
sys_array_size  1744503544
chunk_root_generation   18100024505086712407
root_level  79
chunk_root  10848092274453435018
chunk_root_level156
log_root7514172289378668244
log_root_transid6227239369566282426
log_root_level  18
total_bytes 5481087866519986730
bytes_used  13216280034370888020
sectorsize  4102056786
nodesize1038279258
leafsize276348297
stripesize  2473897044
root_dir12090183195204234845
num_devices 12836127619712721941
compat_flags0xf98ff436fc954bd4
compat_ro_flags 0x3fe8246616164da7
( FREE_SPACE_TREE |
  FREE_SPACE_TREE_VALID |
  unknown flag: 0x3fe8246616164da4 )
incompat_flags  0x3989a5037330bfd8
( COMPRESS_LZO |
  COMPRESS_LZOv2 |
  EXTENDED_IREF |
  RAID56 |
  SKINNY_METADATA |
  NO_HOLES |
  unknown flag: 0x3989a5037330bc10 )
cache_generation10789185961859482334
uuid_tree_generation14921288820846890813
dev_item.uuid   e6e382b3-de66-4c25-7cc9-3cc43cde9c24
dev_item.fsid   f8430e37-12ca-adaf-b038-f0ee10ce6327 [DON'T MATCH]
dev_item.type   7909001383421391155
dev_item.total_bytes4839925749276763097
dev_item.bytes_used 14330418354255459170
dev_item.io_align   4136652250
dev_item.io_width   1113335506
dev_item.sector_size1197062542
dev_item.devid  16559830033162408461
dev_item.dev_group  3271056113