Firstly, I'll say I'm not experienced, but knowing a fair bit about raid and recovering corrupted arrays ...

On 01/03/2021 22:25, John Blinka wrote:
HI, Gentooers!

So, I typed dd if=/dev/zero of=/dev/sd<wrong letter>, and despite
hitting ctrl-c quite quickly, zeroed out some portion of the initial
part of a disk.  Which did this to my zfs raidz3 array:

     NAME                                         STATE     READ WRITE CKSUM
     zfs                                          DEGRADED     0     0     0
       raidz3-0                                   DEGRADED     0     0     0
         ata-HGST_HUS724030ALE640_PK1234P8JJJVKP  ONLINE       0     0     0
         ata-HGST_HUS724030ALE640_PK1234P8JJP3AP  ONLINE       0     0     0
         ata-ST4000NM0033-9ZM170_Z1Z80P4C         ONLINE       0     0     0
         ata-ST4000NM0033-9ZM170_Z1ZAZ8F1         ONLINE       0     0     0
         14296253848142792483                     UNAVAIL      0     0
    0  was /dev/disk/by-id/ata-ST4000NM0033-9ZM170_Z1ZAZDJ0-part1
         ata-ST4000NM0033-9ZM170_Z1Z80KG0         ONLINE       0     0     0

Could have been worse.  I do have backups, and it is raid3, so all
I've injured is my pride, but I do want to fix things.    I'd
appreciate some guidance before I attempt doing this - I have no
experience at it myself.

The steps I envision are

1) zpool offline zfs 14296253848142792483 (What's that number?)
2) do something to repair the damaged disk
3) zpool online zfs <repaired disk>

Right now, the device name for the damaged disk is /dev/sda.  Gdisk
says this about it:

Caution: invalid main GPT header, but valid backup; regenerating main header
from backup!

The GPT table is stored at least twice, this is telling you the primary copy is trashed, but the backup seems okay ...

Warning: Invalid CRC on main header data; loaded backup partition table.
Warning! Main and backup partition tables differ! Use the 'c' and 'e' options
on the recovery & transformation menu to examine the two tables.

Warning! Main partition table CRC mismatch! Loaded backup partition table
instead of main partition table!

Warning! One or more CRCs don't match. You should repair the disk!
Main header: ERROR
Backup header: OK
Main partition table: ERROR
Backup partition table: OK

Partition table scan:
   MBR: not present
   BSD: not present
   APM: not present
   GPT: damaged

Found invalid MBR and corrupt GPT. What do you want to do? (Using the
GPT MAY permit recovery of GPT data.)
  1 - Use current GPT
  2 - Create blank GPT

Your answer: ( I haven't given one yet)

I'm not exactly sure what this is telling me.  But I'm guessing it
means that the main partition table is gone, but there's a good
backup.

Yup. I don't understand that prompt, but I THINK it's saying that if you do choose choice 1, it will recover your partition table for you.

 In addition, some, but not all disk id info is gone:
1) /dev/disk/by-id still shows ata-ST4000NM0033-9ZM170_Z1ZAZDJ0 (the
damaged disk) but none of its former partitions

Because this is the disk, and you've damaged the contents, so this is completely unaffected.

2) /dev/disk/by-partlabel shows entries for the undamaged disks in the
pool, but not the damaged one
3) /dev/disk/by-partuuid similar to /dev/disk/by-partlabel

For both of these, "part" is short for partition, and you've just trashed them ...

4) /dev/disk/by-uuid does not show the damaged disk

Because the uuid is part of the partition table.

This particular disk is from a batch of 4 I bought with the same make
and specification and very similar ids (/dev/disk/by-id).  Can I
repair this disk by copying something off one of those other disks
onto this one?

GOD NO! You'll start copying uuids, so they'll no longer be unique, and things really will be broken!

Is repair just repartitioning - as in the Gentoo
handbook?  Is it as simple as running gdisk and typing 1 to accept
gdisk's attempt at recovering the gpt?  Is running gdisk's recovery
and transformation facilities the way to go (the b option looks like
it's made for exactly this situation)?

Anybody experienced at this and willing to guide me?

Make sure that option 1 really does recover the GPT, then use it. Of course, the question then becomes what further damage will rear its head.

You need to make sure that your raid 3 array can recover from a corrupt disk. THIS IS IMPORTANT. If you tried to recover an md-raid-5 array from this situation you'd almost certainly trash it completely.


Actually, if your setup is raid, I'd just blow out the trashed disk completely. Take it out of your system, replace it, and let zfs repair itself onto the new disk.

You can then zero out the old disk and it's now a spare.

Just be careful here, because I don't know what zfs does, but btrfs by default mirrors metadata but not data, so with that you'd think a mirrored filesystem could repair itself but it can't ... if you want to repair the filesystem without rebuilding from scratch, you need to know rather more about zfs than I do ...

Cheers,
Wol

Reply via email to