Re: [linux-lvm] repair pool with bad checksum in superblock

2019-08-23 Thread Stuart D. Gathman

On Fri, 23 Aug 2019, Gionatan Danti wrote:


Il 23-08-2019 14:47 Zdenek Kabelac ha scritto:

Ok - serious disk error might lead to eventually irrepairable metadata
content - since if you lose some root b-tree node sequence it might be
really hard
to get something sensible  (it's the reason why the metadata should be 
located

on some 'mirrored' device - since while there is lot of effort put into
protection again software errors - it's hard to do something with
hardware error...


Would be possible to have a backup superblock, maybe located on device end?
XFS, EXT4 and ZFS already do something similar...


On my btree file system, I can recover from arbitrary hardware
corruption by storing the root id of the file (table) in each node. 
Leaf nodes (with full data records) are also indicated.  Thus, even if

the root node of a file is lost/corrupted, the raw file/device can be
scanned for corresponding leaf nodes to rebuild the file (table) with
all remaining records.

Drawbacks: deleting individual leaf nodes requires changing the root id
of the node requiring an extra write.  (Otherwise records could be
included in some future recovery.)  Deleting entire files (tables) 
just requires marking the root node deleted - no need to write all the

leaf nodes.

--
  Stuart D. Gathman 
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] repair pool with bad checksum in superblock

2019-08-23 Thread Gionatan Danti

Il 23-08-2019 14:47 Zdenek Kabelac ha scritto:

Ok - serious disk error might lead to eventually irrepairable metadata
content - since if you lose some root b-tree node sequence it might be
really hard
to get something sensible  (it's the reason why the metadata should be 
located

on some 'mirrored' device - since while there is lot of effort put into
protection again software errors - it's hard to do something with
hardware error...


Would be possible to have a backup superblock, maybe located on device 
end?

XFS, EXT4 and ZFS already do something similar...

Regards.

--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.da...@assyoma.it - i...@assyoma.it
GPG public key ID: FF5F32A8

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] repair pool with bad checksum in superblock

2019-08-23 Thread Zdenek Kabelac

Dne 23. 08. 19 v 13:40 Dave Cohen napsal(a):






$ thin_check --version
0.8.5


Hi

So if repairing fails even with the latest version - it's better to upload 
metadata into BZ created here:


https://bugzilla.redhat.com/enter_bug.cgi?product=LVM%20and%20device-mapper


If so  - feel free to open Bugzilla and upload your metadata so we can check
what's going on there.

In BZ provide also lvm2 metadata and the way how the error was reached.



When you say "upload your metadata" and "lvm2 metadata", can you tell me 
exactly how to get it?  Sorry for the basic question but I'm not sure what to run and what to 
upload.



Upload 'dd' compressed copy of you ORIGINAL  _tmeta content (which now could 
be likely already in volume  _meta0 - if you had one succesful run of --repair 
command).


If you use older 'lvm2' you might have a problem with accessing _tmeta
device content - if you have latest fc30 - you should be able
to activate _tmeta as standalone component activation.

To get lvm2 metadata backup just use  'vgcfgbackup -f output.txt  VGNAME'

Let us know if you have problem with getting kernel _tmeta or lvm2 meta.


In my case, lvm was set up by qubes-os, on a laptop.  The disk drive had a 
physical problem.  I'll put those details into bugzilla.  (But I'm waiting for 
answer to metadata question above before I submit ticket.)


Ok - serious disk error might lead to eventually irrepairable metadata content 
- since if you lose some root b-tree node sequence it might be really hard

to get something sensible  (it's the reason why the metadata should be located
on some 'mirrored' device - since while there is lot of effort put into
protection again software errors - it's hard to do something with hardware 
error...



Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] repair pool with bad checksum in superblock

2019-08-23 Thread Dave Cohen



On Fri, Aug 23, 2019, at 4:59 AM, Zdenek Kabelac wrote:
> Dne 23. 08. 19 v 2:18 Dave Cohen napsal(a):
> > I've read some old posts on this group, which give me some hope that I 
> > might recover a failed drive.  But I'm not well-versed in LVM, so details 
> > of what I've read are going over my head.
> > 
> > My problems started when my laptop failed to shut down properly, and 
> > afterwards booted only to dracut emergency shell.  I've since attempted to 
> > rescue the bad drive, using `ddrescue`.  That tool reported 99.99% of the 
> > drive rescued, but so far I'm unable to access the LVM data.
> > 
> > Decrypting the copy I made with `ddrescue` gives me 
> > /dev/mapper/encrypted_rescue, but I can't activate the LVM data that is 
> > there.  I get these errors:
> > 
> > $ sudo lvconvert --repair qubes_dom0/pool00
> >WARNING: Not using lvmetad because of repair.
> >WARNING: Disabling lvmetad cache for repair command.
> > bad checksum in superblock, wanted 823063976
> >Repair of thin metadata volume of thin pool qubes_dom0/pool00 failed 
> > (status:1). Manual repair required!
> > 
> > $ sudo thin_check /dev/mapper/encrypted_rescue
> > examining superblock
> >superblock is corrupt
> >  bad checksum in superblock, wanted 636045691
> > 
> > (Note the two command return different "wanted" values.  Are there two 
> > superblocks?)
> > 
> > I found a post, several years old, written by Ming-Hung Tsai, which 
> > describes restoring a broken superblock.  I'll show that post below, along 
> > with my questions, because I'm missing some of the knowledge necessary.
> > 
> > I would greatly appreciate any help!
> 
> 
> I think it's important to know the version of thin tools ?
> 
> Are you using  0.8.5 ?

I had been using "0.7.6-4.fc30" (provided by fedora).  Upon seeing your email, 
I built tag "v0.8.5", but the results from `lvconvert` and `thin_check` 
commands are identical to what I wrote above.

$ thin_check --version
0.8.5

> 
> If so  - feel free to open Bugzilla and upload your metadata so we can check 
> what's going on there.
> 
> In BZ provide also lvm2 metadata and the way how the error was reached.
> 

When you say "upload your metadata" and "lvm2 metadata", can you tell me 
exactly how to get it?  Sorry for the basic question but I'm not sure what to 
run and what to upload.

> Out typical error we see with thin-pool usage is  'doubled' activation.
> So thin-pool gets acticated on 2 host in parallel (usually unwantedly) - and 
> when this happens and 2 pools are updating same metadata - it gets damaged.

In my case, lvm was set up by qubes-os, on a laptop.  The disk drive had a 
physical problem.  I'll put those details into bugzilla.  (But I'm waiting for 
answer to metadata question above before I submit ticket.)

Thanks for your help!

-Dave

> 
> Regards
> 
> Zdenek
>

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] repair pool with bad checksum in superblock

2019-08-23 Thread Zdenek Kabelac

Dne 23. 08. 19 v 2:18 Dave Cohen napsal(a):

I've read some old posts on this group, which give me some hope that I might 
recover a failed drive.  But I'm not well-versed in LVM, so details of what 
I've read are going over my head.

My problems started when my laptop failed to shut down properly, and afterwards 
booted only to dracut emergency shell.  I've since attempted to rescue the bad 
drive, using `ddrescue`.  That tool reported 99.99% of the drive rescued, but 
so far I'm unable to access the LVM data.

Decrypting the copy I made with `ddrescue` gives me 
/dev/mapper/encrypted_rescue, but I can't activate the LVM data that is there.  
I get these errors:

$ sudo lvconvert --repair qubes_dom0/pool00
   WARNING: Not using lvmetad because of repair.
   WARNING: Disabling lvmetad cache for repair command.
bad checksum in superblock, wanted 823063976
   Repair of thin metadata volume of thin pool qubes_dom0/pool00 failed 
(status:1). Manual repair required!

$ sudo thin_check /dev/mapper/encrypted_rescue
examining superblock
   superblock is corrupt
 bad checksum in superblock, wanted 636045691

(Note the two command return different "wanted" values.  Are there two 
superblocks?)

I found a post, several years old, written by Ming-Hung Tsai, which describes 
restoring a broken superblock.  I'll show that post below, along with my 
questions, because I'm missing some of the knowledge necessary.

I would greatly appreciate any help!



I think it's important to know the version of thin tools ?

Are you using  0.8.5 ?

If so  - feel free to open Bugzilla and upload your metadata so we can check 
what's going on there.


In BZ provide also lvm2 metadata and the way how the error was reached.

Out typical error we see with thin-pool usage is  'doubled' activation.
So thin-pool gets acticated on 2 host in parallel (usually unwantedly) - and 
when this happens and 2 pools are updating same metadata - it gets damaged.


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/