Re: panic: Solaris(panic): blkptr invalid CHECKSUM1

Harry Schmalzbauer Tue, 03 Oct 2017 08:35:07 -0700

 Bezüglich Harry Schmalzbauer's Nachricht vom 03.10.2017 16:39 (localtime):
> Bezüglich Andriy Gapon's Nachricht vom 03.10.2017 16:28 (localtime):
>> On 03/10/2017 17:19, Harry Schmalzbauer wrote:
>>> Have tried several different txg IDs, but the latest 5 or so lead to the
>>> panic and some other random picked all claim missing devices...
>>> Doh, if I only knew about -T some days ago, when I had all 4 devices
>>> available.
>> I don't think that the error is really about the missing devices.
>> Most likely the real problem is that you are going too far back in history 
>> where
>> the data required to import the pool is not present.  It's just that there 
>> is no
>> special error code to report that condition distinctly, so it gets 
>> interpreted
>> as a missing device condition.
> Sounds reasonable.
> When the RAM-corruption happened, a live update was started, where
> several pool availability checks were done. No data write.
> Last data write were view KBytes some minutes before the corruption, and
> the last significant ammount written to that pool was long time before that.
> So I still have hope to find an importable txg ID.
>
> Are they strictly serialized?


Seems so.
Just for the records, I couldn't recover any data yet, but in general,
if a pool isn't damaged that much, the following promising steps were
the ones I got closest:

I have attached dumps of the physical disks as md2 and md3.
'zpool import' offers
    cetusPsys                DEGRADED
      mirror-0               DEGRADED
        8178308212021996317  UNAVAIL  cannot open
        md3                  ONLINE
      mirror-1               DEGRADED
        md2p5                ONLINE
        4036286347185017167  UNAVAIL  cannot open

Which is ḱnown to be corrupt.
This time I also attached zdb(8) dumps (sparse files) of the remaining
two disks, resp. partition.
Now import offers this:
   pool: cetusPsys
     id: 13207378952432032998
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

    cetusPsys   ONLINE
      mirror-0  ONLINE
        md5     ONLINE
        md3     ONLINE
      mirror-1  ONLINE
        md2p5   ONLINE
        md4     ONLINE

'zdb -ue cetusPsys' showed me the latest txg ID (3757573 in my case).

So I decremented the txg ID by one and repeated until the following
fatal panicing indicator vanished:
loading space map for vdev 1 of 2, metaslab 108 of 109 ...
WARNING: blkptr at 0x80e0ead00 has invalid CHECKSUM 1
WARNING: blkptr at 0x80e0ead00 has invalid COMPRESS 0
WARNING: blkptr at 0x80e0ead00 DVA 0 has invalid VDEV 2337865727
WARNING: blkptr at 0x80e0ead00 DVA 1 has invalid VDEV 289407040
WARNING: blkptr at 0x80e0ead00 DVA 2 has invalid VDEV 3959586324

Which was 'zdb -c -t 3757569 -AAA -e cetusPsys':

Traversing all blocks to verify metadata checksums and verify nothing
leaked ...

loading space map for vdev 1 of 2, metaslab 108 of 109 ...
89.0M completed (   6MB/s) estimated time remaining: 3hr 34min 47sec
zdb_blkptr_cb: Got error 122 reading <69, 0, 0, c>  -- skipping
86.8G completed ( 588MB/s) estimated time remaining: 0hr 00min 00sec       
Error counts:

    errno  count
      122  1
leaked space: vdev 0, offset 0xa01084200, size 512
leaked space: vdev 0, offset 0xd0dc23c00, size 512
leaked space: vdev 0, offset 0x2380182200, size 3072
leaked space: vdev 0, offset 0x2380189a00, size 1536
leaked space: vdev 0, offset 0x2380183000, size 1536
leaked space: vdev 0, offset 0x238039a200, size 2560
leaked space: vdev 0, offset 0x238039be00, size 18944
leaked space: vdev 0, offset 0x23801b3200, size 9216
leaked space: vdev 0, offset 0x33122a8800, size 512
leaked space: vdev 1, offset 0x2808f1600, size 512
leaked space: vdev 1, offset 0x2808f1e00, size 512
leaked space: vdev 1, offset 0x2808f2e00, size 4096
leaked space: vdev 1, offset 0x2808f1a00, size 512
leaked space: vdev 1, offset 0x9010e6c00, size 512
leaked space: vdev 1, offset 0x23c5ad9c00, size 512
leaked space: vdev 1, offset 0x2e00ad4800, size 512
leaked space: vdev 1, offset 0x2f0030b200, size 50176
leaked space: vdev 1, offset 0x2f000ca800, size 512
leaked space: vdev 1, offset 0x2f003a9800, size 15360
leaked space: vdev 1, offset 0x2f003af600, size 13312
leaked space: vdev 1, offset 0x2f00715c00, size 1024
leaked space: vdev 1, offset 0x2f003adc00, size 6144
leaked space: vdev 1, offset 0x2f00363600, size 38912
block traversal size 93540302336 != alloc 93540473344 (leaked 171008)

    bp count:         3670624
    ganged count:           0
    bp logical:    96083156992      avg:  26176
    bp physical:   93308853248      avg:  25420     compression:   1.03
    bp allocated:  93540302336      avg:  25483     compression:   1.03
    bp deduped:             0    ref>1:      0   deduplication:   1.00
    SPA allocated: 93540473344     used: 19.98%

    additional, non-pointer bps of type 0:      48879
    Dittoed blocks on same vdev: 23422


In my case, import didn't work with the highest non-panicing txg ID:
zpool import -o readonly=on -R /mnt -T 3757569 cetusPsys
cannot import 'cetusPsys': one or more devices is currently unavailable

Maybe anybody else will have more luck... just keep the "-T" parameter
for zpool(8)'s import command in mind.

thanks,

-harry
_______________________________________________
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: panic: Solaris(panic): blkptr invalid CHECKSUM1

Reply via email to