Re: [linux-lvm] repair pool with bad checksum in superblock
On Fri, Aug 23, 2019, at 8:47 AM, Zdenek Kabelac wrote: > Dne 23. 08. 19 v 13:40 Dave Cohen napsal(a): > > > > > > > $ thin_check --version > > 0.8.5 > > Hi > > So if repairing fails even with the latest version - it's better to upload > metadata into BZ created here: > > https://bugzilla.redhat.com/enter_bug.cgi?product=LVM%20and%20device-mapper > I've created https://bugzilla.redhat.com/show_bug.cgi?id=1745204 > >> If so - feel free to open Bugzilla and upload your metadata so we can > >> check > >> what's going on there. > >> > >> In BZ provide also lvm2 metadata and the way how the error was reached. > >> > > > > When you say "upload your metadata" and "lvm2 metadata", can you tell me > > exactly how to get it? Sorry for the basic question but I'm not sure what > > to run and what to upload. > > > Upload 'dd' compressed copy of you ORIGINAL _tmeta content (which now could > be likely already in volume _meta0 - if you had one succesful run of > --repair > command). > Hmmm. I'm not sure how to use `dd` for this. If I'm missing something obvious, please let me know. Note, I cannot activate any portion of the pool. > If you use older 'lvm2' you might have a problem with accessing _tmeta > device content - if you have latest fc30 - you should be able > to activate _tmeta as standalone component activation. > > To get lvm2 metadata backup just use 'vgcfgbackup -f output.txt VGNAME' This succeeded, and I attached to the ticket. > > Let us know if you have problem with getting kernel _tmeta or lvm2 meta. As I wrote above, could not get the _tmeta. If you're referring to a part of the pool, it does not activate via `lvchange -ay` > > > In my case, lvm was set up by qubes-os, on a laptop. The disk drive had a > > physical problem. I'll put those details into bugzilla. (But I'm waiting > > for answer to metadata question above before I submit ticket.) > > Ok - serious disk error might lead to eventually irrepairable metadata > content > - since if you lose some root b-tree node sequence it might be really hard > to get something sensible (it's the reason why the metadata should be located > on some 'mirrored' device - since while there is lot of effort put into > protection again software errors - it's hard to do something with hardware > error... Exactly how to do this is still beyond me. But I'm up for learning, and contributing it back to the qubes-os project. -Dave ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] repair pool with bad checksum in superblock
On Fri, 23 Aug 2019, Gionatan Danti wrote: Il 23-08-2019 14:47 Zdenek Kabelac ha scritto: Ok - serious disk error might lead to eventually irrepairable metadata content - since if you lose some root b-tree node sequence it might be really hard to get something sensible (it's the reason why the metadata should be located on some 'mirrored' device - since while there is lot of effort put into protection again software errors - it's hard to do something with hardware error... Would be possible to have a backup superblock, maybe located on device end? XFS, EXT4 and ZFS already do something similar... On my btree file system, I can recover from arbitrary hardware corruption by storing the root id of the file (table) in each node. Leaf nodes (with full data records) are also indicated. Thus, even if the root node of a file is lost/corrupted, the raw file/device can be scanned for corresponding leaf nodes to rebuild the file (table) with all remaining records. Drawbacks: deleting individual leaf nodes requires changing the root id of the node requiring an extra write. (Otherwise records could be included in some future recovery.) Deleting entire files (tables) just requires marking the root node deleted - no need to write all the leaf nodes. -- Stuart D. Gathman "Confutatis maledictis, flamis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] repair pool with bad checksum in superblock
Il 23-08-2019 14:47 Zdenek Kabelac ha scritto: Ok - serious disk error might lead to eventually irrepairable metadata content - since if you lose some root b-tree node sequence it might be really hard to get something sensible (it's the reason why the metadata should be located on some 'mirrored' device - since while there is lot of effort put into protection again software errors - it's hard to do something with hardware error... Would be possible to have a backup superblock, maybe located on device end? XFS, EXT4 and ZFS already do something similar... Regards. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.da...@assyoma.it - i...@assyoma.it GPG public key ID: FF5F32A8 ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] repair pool with bad checksum in superblock
Dne 23. 08. 19 v 13:40 Dave Cohen napsal(a): $ thin_check --version 0.8.5 Hi So if repairing fails even with the latest version - it's better to upload metadata into BZ created here: https://bugzilla.redhat.com/enter_bug.cgi?product=LVM%20and%20device-mapper If so - feel free to open Bugzilla and upload your metadata so we can check what's going on there. In BZ provide also lvm2 metadata and the way how the error was reached. When you say "upload your metadata" and "lvm2 metadata", can you tell me exactly how to get it? Sorry for the basic question but I'm not sure what to run and what to upload. Upload 'dd' compressed copy of you ORIGINAL _tmeta content (which now could be likely already in volume _meta0 - if you had one succesful run of --repair command). If you use older 'lvm2' you might have a problem with accessing _tmeta device content - if you have latest fc30 - you should be able to activate _tmeta as standalone component activation. To get lvm2 metadata backup just use 'vgcfgbackup -f output.txt VGNAME' Let us know if you have problem with getting kernel _tmeta or lvm2 meta. In my case, lvm was set up by qubes-os, on a laptop. The disk drive had a physical problem. I'll put those details into bugzilla. (But I'm waiting for answer to metadata question above before I submit ticket.) Ok - serious disk error might lead to eventually irrepairable metadata content - since if you lose some root b-tree node sequence it might be really hard to get something sensible (it's the reason why the metadata should be located on some 'mirrored' device - since while there is lot of effort put into protection again software errors - it's hard to do something with hardware error... Regards Zdenek ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] repair pool with bad checksum in superblock
On Fri, Aug 23, 2019, at 4:59 AM, Zdenek Kabelac wrote: > Dne 23. 08. 19 v 2:18 Dave Cohen napsal(a): > > I've read some old posts on this group, which give me some hope that I > > might recover a failed drive. But I'm not well-versed in LVM, so details > > of what I've read are going over my head. > > > > My problems started when my laptop failed to shut down properly, and > > afterwards booted only to dracut emergency shell. I've since attempted to > > rescue the bad drive, using `ddrescue`. That tool reported 99.99% of the > > drive rescued, but so far I'm unable to access the LVM data. > > > > Decrypting the copy I made with `ddrescue` gives me > > /dev/mapper/encrypted_rescue, but I can't activate the LVM data that is > > there. I get these errors: > > > > $ sudo lvconvert --repair qubes_dom0/pool00 > >WARNING: Not using lvmetad because of repair. > >WARNING: Disabling lvmetad cache for repair command. > > bad checksum in superblock, wanted 823063976 > >Repair of thin metadata volume of thin pool qubes_dom0/pool00 failed > > (status:1). Manual repair required! > > > > $ sudo thin_check /dev/mapper/encrypted_rescue > > examining superblock > >superblock is corrupt > > bad checksum in superblock, wanted 636045691 > > > > (Note the two command return different "wanted" values. Are there two > > superblocks?) > > > > I found a post, several years old, written by Ming-Hung Tsai, which > > describes restoring a broken superblock. I'll show that post below, along > > with my questions, because I'm missing some of the knowledge necessary. > > > > I would greatly appreciate any help! > > > I think it's important to know the version of thin tools ? > > Are you using 0.8.5 ? I had been using "0.7.6-4.fc30" (provided by fedora). Upon seeing your email, I built tag "v0.8.5", but the results from `lvconvert` and `thin_check` commands are identical to what I wrote above. $ thin_check --version 0.8.5 > > If so - feel free to open Bugzilla and upload your metadata so we can check > what's going on there. > > In BZ provide also lvm2 metadata and the way how the error was reached. > When you say "upload your metadata" and "lvm2 metadata", can you tell me exactly how to get it? Sorry for the basic question but I'm not sure what to run and what to upload. > Out typical error we see with thin-pool usage is 'doubled' activation. > So thin-pool gets acticated on 2 host in parallel (usually unwantedly) - and > when this happens and 2 pools are updating same metadata - it gets damaged. In my case, lvm was set up by qubes-os, on a laptop. The disk drive had a physical problem. I'll put those details into bugzilla. (But I'm waiting for answer to metadata question above before I submit ticket.) Thanks for your help! -Dave > > Regards > > Zdenek > ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] repair pool with bad checksum in superblock
Dne 23. 08. 19 v 2:18 Dave Cohen napsal(a): I've read some old posts on this group, which give me some hope that I might recover a failed drive. But I'm not well-versed in LVM, so details of what I've read are going over my head. My problems started when my laptop failed to shut down properly, and afterwards booted only to dracut emergency shell. I've since attempted to rescue the bad drive, using `ddrescue`. That tool reported 99.99% of the drive rescued, but so far I'm unable to access the LVM data. Decrypting the copy I made with `ddrescue` gives me /dev/mapper/encrypted_rescue, but I can't activate the LVM data that is there. I get these errors: $ sudo lvconvert --repair qubes_dom0/pool00 WARNING: Not using lvmetad because of repair. WARNING: Disabling lvmetad cache for repair command. bad checksum in superblock, wanted 823063976 Repair of thin metadata volume of thin pool qubes_dom0/pool00 failed (status:1). Manual repair required! $ sudo thin_check /dev/mapper/encrypted_rescue examining superblock superblock is corrupt bad checksum in superblock, wanted 636045691 (Note the two command return different "wanted" values. Are there two superblocks?) I found a post, several years old, written by Ming-Hung Tsai, which describes restoring a broken superblock. I'll show that post below, along with my questions, because I'm missing some of the knowledge necessary. I would greatly appreciate any help! I think it's important to know the version of thin tools ? Are you using 0.8.5 ? If so - feel free to open Bugzilla and upload your metadata so we can check what's going on there. In BZ provide also lvm2 metadata and the way how the error was reached. Out typical error we see with thin-pool usage is 'doubled' activation. So thin-pool gets acticated on 2 host in parallel (usually unwantedly) - and when this happens and 2 pools are updating same metadata - it gets damaged. Regards Zdenek ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
[linux-lvm] repair pool with bad checksum in superblock
I've read some old posts on this group, which give me some hope that I might recover a failed drive. But I'm not well-versed in LVM, so details of what I've read are going over my head. My problems started when my laptop failed to shut down properly, and afterwards booted only to dracut emergency shell. I've since attempted to rescue the bad drive, using `ddrescue`. That tool reported 99.99% of the drive rescued, but so far I'm unable to access the LVM data. Decrypting the copy I made with `ddrescue` gives me /dev/mapper/encrypted_rescue, but I can't activate the LVM data that is there. I get these errors: $ sudo lvconvert --repair qubes_dom0/pool00 WARNING: Not using lvmetad because of repair. WARNING: Disabling lvmetad cache for repair command. bad checksum in superblock, wanted 823063976 Repair of thin metadata volume of thin pool qubes_dom0/pool00 failed (status:1). Manual repair required! $ sudo thin_check /dev/mapper/encrypted_rescue examining superblock superblock is corrupt bad checksum in superblock, wanted 636045691 (Note the two command return different "wanted" values. Are there two superblocks?) I found a post, several years old, written by Ming-Hung Tsai, which describes restoring a broken superblock. I'll show that post below, along with my questions, because I'm missing some of the knowledge necessary. I would greatly appreciate any help! -Dave Original post from several years ago, plus my questions: > The original post asks how to do if the superblock was broken (his superblock > was accidentally wiped). Since that I don't have time to update the program > at this moment, here's my workaround: > > 1. Partially rebuild the superblock > > (1) Obtain pool parameter from LVM > > ./sbin/lvm lvs vg1/tp1 -o transaction_id,chunksize,lv_size --units s > > sample output: > Tran Chunk LSize > 3545 128S 7999381504S > > The number of data blocks is $((7999381504/128)) = 62495168 > Here's what I get: $ sudo lvs qubes_dom0/pool00 -o transaction_id,chunksize,lv_size --units S TransId Chunk LSize 14757 512S 901660672S So, number of data blocks if I undestand correctly is $((901660672/512)) = 1761056 > (2) Create input.xml with pool parameters obtained from LVM: > > data_block_size="128" nr_data_blocks="62495168"> > > > (3) Run thin_restore to generate a temporary metadata with correct superblock > > dd if=/dev/zero of=/tmp/test.bin bs=1M count=16 > thin_restore -i input.xml -o /tmp/test.bin > > The size of /tmp/test.bin depends on your pool size. I don't understand the last sentence. What should the size of my /tmp/test.bin be? Should I be using "bs=1M count=16"? > > (4) Copy the partially-rebuilt superblock (4KB) to your broken metadata. > (). > > dd if=/tmp/test.bin of= bs=4k count=1 conv=notrunc > What is here? > 2. Run thin_ll_dump and thin_ll_restore > https://www.redhat.com/archives/linux-lvm/2016-February/msg00038.html > > Example: assume that we found data-mapping-root=2303 > and device-details-root=277313 > > ./pdata_tools thin_ll_dump --data-mapping-root=2303 \ > --device-details-root 277313 -o thin_ll_dump.txt > > ./pdata_tools thin_ll_restore -E -i thin_ll_dump.txt \ > -o > > Note that should be sufficient large especially when you > have snapshots, since that the mapping trees reconstructed by thintools > do not share blocks. Here, I don't have commands `thin_ll_dump` or `thin_ll_restore`. How should I obtain those? Or is there a way to do this with the tools I do have. (I'm on fedora 30, FYI). > > 3. Fix superblock's time field > > (1) Run thin_dump on the repaired metadata > > thin_dump -o thin_dump.txt > > (2) Find the maximum time value in data mapping trees > (the device with maximum snap_time might be remove, so find the > maximum time in data mapping trees, not the device detail tree) > > grep "time=\"[0-9]*\"" thin_dump.txt -o | uniq | sort | uniq | tail > > (I run uniq twice to avoid sorting too much data) > > sample output: > ... > time="1785" > time="1786" > time="1787" > > so the maximum time is 1787. > > (3) Edit the "time" value of the tag in thin_dump's output > > > ... > > (4) Run thin_restore to get the final metadata > > thin_restore -i thin_dump.txt -o > > > Ming-Hung Tsai ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/