subject:"\[linux\-lvm\] repair pool with bad checksum in superblock"

Re: [linux-lvm] repair pool with bad checksum in superblock

2019-08-24 Thread Dave Cohen




On Fri, Aug 23, 2019, at 8:47 AM, Zdenek Kabelac wrote:
> Dne 23. 08. 19 v 13:40 Dave Cohen napsal(a):
> > 
> > 
> 
> > $ thin_check --version
> > 0.8.5
> 
> Hi
> 
> So if repairing fails even with the latest version - it's better to upload 
> metadata into BZ created here:
> 
> https://bugzilla.redhat.com/enter_bug.cgi?product=LVM%20and%20device-mapper
>

I've created https://bugzilla.redhat.com/show_bug.cgi?id=1745204

 
> >> If so  - feel free to open Bugzilla and upload your metadata so we can 
> >> check
> >> what's going on there.
> >>
> >> In BZ provide also lvm2 metadata and the way how the error was reached.
> >>
> > 
> > When you say "upload your metadata" and "lvm2 metadata", can you tell me 
> > exactly how to get it?  Sorry for the basic question but I'm not sure what 
> > to run and what to upload.
> 
> 
> Upload 'dd' compressed copy of you ORIGINAL  _tmeta content (which now could 
> be likely already in volume  _meta0 - if you had one succesful run of 
> --repair 
> command).
> 

Hmmm.  I'm not sure how to use `dd` for this.  If I'm missing something 
obvious, please let me know. Note, I cannot activate any portion of the pool.

> If you use older 'lvm2' you might have a problem with accessing _tmeta
> device content - if you have latest fc30 - you should be able
> to activate _tmeta as standalone component activation.
> 
> To get lvm2 metadata backup just use  'vgcfgbackup -f output.txt  VGNAME'

This succeeded, and I attached to the ticket.

> 
> Let us know if you have problem with getting kernel _tmeta or lvm2 meta.

As I wrote above, could not get the _tmeta.  If you're referring to a part of 
the pool, it does not activate via `lvchange -ay`


> 
> > In my case, lvm was set up by qubes-os, on a laptop.  The disk drive had a 
> > physical problem.  I'll put those details into bugzilla.  (But I'm waiting 
> > for answer to metadata question above before I submit ticket.)
> 
> Ok - serious disk error might lead to eventually irrepairable metadata 
> content 
> - since if you lose some root b-tree node sequence it might be really hard
> to get something sensible  (it's the reason why the metadata should be located
> on some 'mirrored' device - since while there is lot of effort put into
> protection again software errors - it's hard to do something with hardware 
> error...

Exactly how to do this is still beyond me.  But I'm up for learning, and 
contributing it back to the qubes-os project.

-Dave

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] repair pool with bad checksum in superblock

2019-08-23 Thread Stuart D. Gathman


On Fri, 23 Aug 2019, Gionatan Danti wrote:


Il 23-08-2019 14:47 Zdenek Kabelac ha scritto:

Ok - serious disk error might lead to eventually irrepairable metadata
content - since if you lose some root b-tree node sequence it might be
really hard
to get something sensible  (it's the reason why the metadata should be 
located

on some 'mirrored' device - since while there is lot of effort put into
protection again software errors - it's hard to do something with
hardware error...


Would be possible to have a backup superblock, maybe located on device end?
XFS, EXT4 and ZFS already do something similar...


On my btree file system, I can recover from arbitrary hardware
corruption by storing the root id of the file (table) in each node. 
Leaf nodes (with full data records) are also indicated.  Thus, even if

the root node of a file is lost/corrupted, the raw file/device can be
scanned for corresponding leaf nodes to rebuild the file (table) with
all remaining records.

Drawbacks: deleting individual leaf nodes requires changing the root id
of the node requiring an extra write.  (Otherwise records could be
included in some future recovery.)  Deleting entire files (tables) 
just requires marking the root node deleted - no need to write all the

leaf nodes.

--
  Stuart D. Gathman 
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] repair pool with bad checksum in superblock

2019-08-23 Thread Gionatan Danti


Il 23-08-2019 14:47 Zdenek Kabelac ha scritto:

Ok - serious disk error might lead to eventually irrepairable metadata
content - since if you lose some root b-tree node sequence it might be
really hard
to get something sensible  (it's the reason why the metadata should be 
located

on some 'mirrored' device - since while there is lot of effort put into
protection again software errors - it's hard to do something with
hardware error...


Would be possible to have a backup superblock, maybe located on device 
end?

XFS, EXT4 and ZFS already do something similar...

Regards.

--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.da...@assyoma.it - i...@assyoma.it
GPG public key ID: FF5F32A8

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] repair pool with bad checksum in superblock

2019-08-23 Thread Zdenek Kabelac

Dne 23. 08. 19 v 13:40 Dave Cohen napsal(a):

$ thin_check --version
0.8.5

So if repairing fails even with the latest version - it's better to upload
metadata into BZ created here:

https://bugzilla.redhat.com/enter_bug.cgi?product=LVM%20and%20device-mapper

If so - feel free to open Bugzilla and upload your metadata so we can check
what's going on there.

In BZ provide also lvm2 metadata and the way how the error was reached.

When you say "upload your metadata" and "lvm2 metadata", can you tell me
exactly how to get it? Sorry for the basic question but I'm not sure what to run and what to
upload.

Upload 'dd' compressed copy of you ORIGINAL _tmeta content (which now could
be likely already in volume _meta0 - if you had one succesful run of --repair
command).

If you use older 'lvm2' you might have a problem with accessing _tmeta
device content - if you have latest fc30 - you should be able
to activate _tmeta as standalone component activation.

To get lvm2 metadata backup just use 'vgcfgbackup -f output.txt VGNAME'

Let us know if you have problem with getting kernel _tmeta or lvm2 meta.

In my case, lvm was set up by qubes-os, on a laptop. The disk drive had a
physical problem. I'll put those details into bugzilla. (But I'm waiting for
answer to metadata question above before I submit ticket.)

Ok - serious disk error might lead to eventually irrepairable metadata content
- since if you lose some root b-tree node sequence it might be really hard

to get something sensible (it's the reason why the metadata should be located
on some 'mirrored' device - since while there is lot of effort put into
protection again software errors - it's hard to do something with hardware
error...

Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] repair pool with bad checksum in superblock

2019-08-23 Thread Dave Cohen




On Fri, Aug 23, 2019, at 4:59 AM, Zdenek Kabelac wrote:
> Dne 23. 08. 19 v 2:18 Dave Cohen napsal(a):
> > I've read some old posts on this group, which give me some hope that I 
> > might recover a failed drive.  But I'm not well-versed in LVM, so details 
> > of what I've read are going over my head.
> > 
> > My problems started when my laptop failed to shut down properly, and 
> > afterwards booted only to dracut emergency shell.  I've since attempted to 
> > rescue the bad drive, using `ddrescue`.  That tool reported 99.99% of the 
> > drive rescued, but so far I'm unable to access the LVM data.
> > 
> > Decrypting the copy I made with `ddrescue` gives me 
> > /dev/mapper/encrypted_rescue, but I can't activate the LVM data that is 
> > there.  I get these errors:
> > 
> > $ sudo lvconvert --repair qubes_dom0/pool00
> >WARNING: Not using lvmetad because of repair.
> >WARNING: Disabling lvmetad cache for repair command.
> > bad checksum in superblock, wanted 823063976
> >Repair of thin metadata volume of thin pool qubes_dom0/pool00 failed 
> > (status:1). Manual repair required!
> > 
> > $ sudo thin_check /dev/mapper/encrypted_rescue
> > examining superblock
> >superblock is corrupt
> >  bad checksum in superblock, wanted 636045691
> > 
> > (Note the two command return different "wanted" values.  Are there two 
> > superblocks?)
> > 
> > I found a post, several years old, written by Ming-Hung Tsai, which 
> > describes restoring a broken superblock.  I'll show that post below, along 
> > with my questions, because I'm missing some of the knowledge necessary.
> > 
> > I would greatly appreciate any help!
> 
> 
> I think it's important to know the version of thin tools ?
> 
> Are you using  0.8.5 ?

I had been using "0.7.6-4.fc30" (provided by fedora).  Upon seeing your email, 
I built tag "v0.8.5", but the results from `lvconvert` and `thin_check` 
commands are identical to what I wrote above.

$ thin_check --version
0.8.5

> 
> If so  - feel free to open Bugzilla and upload your metadata so we can check 
> what's going on there.
> 
> In BZ provide also lvm2 metadata and the way how the error was reached.
> 

When you say "upload your metadata" and "lvm2 metadata", can you tell me 
exactly how to get it?  Sorry for the basic question but I'm not sure what to 
run and what to upload.

> Out typical error we see with thin-pool usage is  'doubled' activation.
> So thin-pool gets acticated on 2 host in parallel (usually unwantedly) - and 
> when this happens and 2 pools are updating same metadata - it gets damaged.

In my case, lvm was set up by qubes-os, on a laptop.  The disk drive had a 
physical problem.  I'll put those details into bugzilla.  (But I'm waiting for 
answer to metadata question above before I submit ticket.)

Thanks for your help!

-Dave

> 
> Regards
> 
> Zdenek
>

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] repair pool with bad checksum in superblock

2019-08-23 Thread Zdenek Kabelac


Dne 23. 08. 19 v 2:18 Dave Cohen napsal(a):

I've read some old posts on this group, which give me some hope that I might 
recover a failed drive.  But I'm not well-versed in LVM, so details of what 
I've read are going over my head.

My problems started when my laptop failed to shut down properly, and afterwards 
booted only to dracut emergency shell.  I've since attempted to rescue the bad 
drive, using `ddrescue`.  That tool reported 99.99% of the drive rescued, but 
so far I'm unable to access the LVM data.

Decrypting the copy I made with `ddrescue` gives me 
/dev/mapper/encrypted_rescue, but I can't activate the LVM data that is there.  
I get these errors:

$ sudo lvconvert --repair qubes_dom0/pool00
   WARNING: Not using lvmetad because of repair.
   WARNING: Disabling lvmetad cache for repair command.
bad checksum in superblock, wanted 823063976
   Repair of thin metadata volume of thin pool qubes_dom0/pool00 failed 
(status:1). Manual repair required!

$ sudo thin_check /dev/mapper/encrypted_rescue
examining superblock
   superblock is corrupt
 bad checksum in superblock, wanted 636045691

(Note the two command return different "wanted" values.  Are there two 
superblocks?)

I found a post, several years old, written by Ming-Hung Tsai, which describes 
restoring a broken superblock.  I'll show that post below, along with my 
questions, because I'm missing some of the knowledge necessary.

I would greatly appreciate any help!



I think it's important to know the version of thin tools ?

Are you using  0.8.5 ?

If so  - feel free to open Bugzilla and upload your metadata so we can check 
what's going on there.


In BZ provide also lvm2 metadata and the way how the error was reached.

Out typical error we see with thin-pool usage is  'doubled' activation.
So thin-pool gets acticated on 2 host in parallel (usually unwantedly) - and 
when this happens and 2 pools are updating same metadata - it gets damaged.


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

[linux-lvm] repair pool with bad checksum in superblock

2019-08-22 Thread Dave Cohen

I've read some old posts on this group, which give me some hope that I might 
recover a failed drive.  But I'm not well-versed in LVM, so details of what 
I've read are going over my head. 

My problems started when my laptop failed to shut down properly, and afterwards 
booted only to dracut emergency shell.  I've since attempted to rescue the bad 
drive, using `ddrescue`.  That tool reported 99.99% of the drive rescued, but 
so far I'm unable to access the LVM data.

Decrypting the copy I made with `ddrescue` gives me 
/dev/mapper/encrypted_rescue, but I can't activate the LVM data that is there.  
I get these errors:

$ sudo lvconvert --repair qubes_dom0/pool00
  WARNING: Not using lvmetad because of repair.
  WARNING: Disabling lvmetad cache for repair command.
bad checksum in superblock, wanted 823063976
  Repair of thin metadata volume of thin pool qubes_dom0/pool00 failed 
(status:1). Manual repair required!

$ sudo thin_check /dev/mapper/encrypted_rescue
examining superblock
  superblock is corrupt
bad checksum in superblock, wanted 636045691

(Note the two command return different "wanted" values.  Are there two 
superblocks?)

I found a post, several years old, written by Ming-Hung Tsai, which describes 
restoring a broken superblock.  I'll show that post below, along with my 
questions, because I'm missing some of the knowledge necessary.

I would greatly appreciate any help! 

-Dave

Original post from several years ago, plus my questions:
> The original post asks how to do if the superblock was broken (his superblock
> was accidentally wiped). Since that I don't have time to update the program
> at this moment, here's my workaround:
> 
> 1. Partially rebuild the superblock
> 
> (1) Obtain pool parameter from LVM
> 
> ./sbin/lvm lvs vg1/tp1 -o transaction_id,chunksize,lv_size --units s
> 
> sample output:
> Tran Chunk LSize
> 3545 128S 7999381504S
> 
> The number of data blocks is $((7999381504/128)) = 62495168
> 

Here's what I get:

$ sudo lvs qubes_dom0/pool00 -o transaction_id,chunksize,lv_size --units S 
  TransId Chunk LSize 
14757  512S 901660672S

So, number of data blocks if I undestand correctly is $((901660672/512)) = 
1761056

> (2) Create input.xml with pool parameters obtained from LVM:
> 
>  data_block_size="128" nr_data_blocks="62495168">
> 
> 
> (3) Run thin_restore to generate a temporary metadata with correct superblock
> 
> dd if=/dev/zero of=/tmp/test.bin bs=1M count=16
> thin_restore -i input.xml -o /tmp/test.bin
> 
> The size of /tmp/test.bin depends on your pool size.

I don't understand the last sentence.  What should the size of my /tmp/test.bin 
be?  Should I be using "bs=1M count=16"?


> 
> (4) Copy the partially-rebuilt superblock (4KB) to your broken metadata.
> ().
> 
> dd if=/tmp/test.bin of= bs=4k count=1 conv=notrunc
>

What is  here?
 
> 2. Run thin_ll_dump and thin_ll_restore
> https://www.redhat.com/archives/linux-lvm/2016-February/msg00038.html
> 
> Example: assume that we found data-mapping-root=2303
> and device-details-root=277313
> 
> ./pdata_tools thin_ll_dump  --data-mapping-root=2303 \
> --device-details-root 277313 -o thin_ll_dump.txt
> 
> ./pdata_tools thin_ll_restore -E  -i thin_ll_dump.txt \
> -o 
> 
> Note that  should be sufficient large especially when you
> have snapshots, since that the mapping trees reconstructed by thintools
> do not share blocks.

Here, I don't have commands `thin_ll_dump` or `thin_ll_restore`.  How should I 
obtain those?  Or is there a way to do this with the tools I do have.  (I'm on 
fedora 30, FYI).

> 
> 3. Fix superblock's time field
> 
> (1) Run thin_dump on the repaired metadata
> 
> thin_dump  -o thin_dump.txt
> 
> (2) Find the maximum time value in data mapping trees
> (the device with maximum snap_time might be remove, so find the
> maximum time in data mapping trees, not the device detail tree)
> 
> grep "time=\"[0-9]*\"" thin_dump.txt -o | uniq | sort | uniq | tail
> 
> (I run uniq twice to avoid sorting too much data)
> 
> sample output:
> ...
> time="1785"
> time="1786"
> time="1787"
> 
> so the maximum time is 1787.
> 
> (3) Edit the "time" value of the  tag in thin_dump's output
> 
> 
> ...
> 
> (4) Run thin_restore to get the final metadata
> 
> thin_restore -i thin_dump.txt -o 
> 
> 
> Ming-Hung Tsai

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] repair pool with bad checksum in superblock

Re: [linux-lvm] repair pool with bad checksum in superblock

Re: [linux-lvm] repair pool with bad checksum in superblock

Re: [linux-lvm] repair pool with bad checksum in superblock

Re: [linux-lvm] repair pool with bad checksum in superblock

Re: [linux-lvm] repair pool with bad checksum in superblock

[linux-lvm] repair pool with bad checksum in superblock

7 matches

Site Navigation

Mail list logo

Footer information