Hello, I need your help.

I am trying to copy the readable sectors from a 1TB disk containing important data belonging to a migrant who cannot afford professional data recovery, despite read errors.
The data recovery is being performed under Debian Trixi with ddrescue V 1.29
The disk, model ST1000LM048-2E7172, shows unrecoverable errors in the log. However, the problem is the behavior of ddrescue after connection errors are displayed in the log and, after several retry attempts, the device is shut down without a read error being reported.

first error:

Nov 24 11:26:08 optiplex3020 kernel: ata2.00: exception Emask 0x0 SAct 0x20 SErr 0x50000 action 0x6 frozen
Nov 24 11:26:08 optiplex3020 kernel: ata2: SError: { PHYRdyChg CommWake }
Nov 24 11:26:08 optiplex3020 kernel: ata2.00: failed command: READ FPDMA QUEUED Nov 24 11:26:08 optiplex3020 kernel: ata2.00: cmd 60/80:28:00:ce:88/00:00:06:00:00/40 tag 5 ncq dma 65536 in                                               res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Nov 24 11:26:08 optiplex3020 kernel: ata2.00: status: { DRDY }
Nov 24 11:26:08 optiplex3020 kernel: ata2: hard resetting link
...

Nov 24 11:26:08 optiplex3020 kernel: ata2.00: exception Emask 0x0 SAct 0x20 SErr 0x50000 action 0x6 frozen
Nov 24 11:26:08 optiplex3020 kernel: ata2: SError: { PHYRdyChg CommWake }
Nov 24 11:26:08 optiplex3020 kernel: ata2.00: failed command: READ FPDMA QUEUED Nov 24 11:26:08 optiplex3020 kernel: ata2.00: cmd 60/80:28:00:ce:88/00:00:06:00:00/40 tag 5 ncq dma 65536 in                                               res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Nov 24 11:26:08 optiplex3020 kernel: ata2.00: status: { DRDY }
Nov 24 11:26:08 optiplex3020 kernel: ata2: hard resetting link
Nov 24 11:29:48 optiplex3020 kernel: sd 1:0:0:0: [sdb] tag#13 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=71s Nov 24 11:29:48 optiplex3020 kernel: sd 1:0:0:0: [sdb] tag#13 CDB: ATA command pass through(16) 85 06 20 00 00 00 00 00 00 00 00 00 00 00 e5 00 Nov 24 11:29:48 optiplex3020 kernel: sd 1:0:0:0: [sdb] tag#14 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=90s Nov 24 11:29:48 optiplex3020 kernel: sd 1:0:0:0: [sdb] tag#14 CDB: Read(10) 28 00 06 8c de 80 00 00 80 00 Nov 24 11:29:48 optiplex3020 kernel: I/O error, dev sdb, sector 109895296 op 0x0:(READ) flags 0x800 phys_seg 12 prio class 2 Nov 24 11:29:48 optiplex3020 kernel: sd 1:0:0:0: [sdb] tag#16 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s Nov 24 11:29:48 optiplex3020 kernel: sd 1:0:0:0: [sdb] tag#16 CDB: Read(10) 28 00 06 8c df 00 00 00 80 00 Nov 24 11:29:48 optiplex3020 kernel: I/O error, dev sdb, sector 109895424 op 0x0:(READ) flags 0x800 phys_seg 12 prio class 2 Nov 24 11:29:48 optiplex3020 kernel: sd 1:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s Nov 24 11:29:48 optiplex3020 kernel: sd 1:0:0:0: [sdb] tag#0 CDB: Read(10) 28 00 06 8c df 80 00 00 80 00 Nov 24 11:29:48 optiplex3020 kernel: I/O error, dev sdb, sector 109895552 op 0x0:(READ) flags 0x800 phys_seg 12 prio class 2 Nov 24 11:29:48 optiplex3020 kernel: sd 1:0:0:0: [sdb] tag#7 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s Nov 24 11:29:48 optiplex3020 kernel: sd 1:0:0:0: [sdb] tag#7 CDB: Read(10) 28 00 06 8c e0 00 00 00 80 00
...

From this point on, the error counter counts up quickly and the error rate exceeds 2500 MB/s. ddrescue does not recognize that the device is no longer available, even when I stop and restart ddrescue.

root@optiplex3020 /r/k/ST1000LM048-2E7172# ddrescue --idirect /dev/sda sdc.img sdc.log
GNU ddrescue 1.29
Press Ctrl-C to interrupt
Initial status (read from mapfile)
rescued: 29998 MB, tried: 131072 B, bad-sector: 0 B, bad areas: 0

Current status
     ipos:  990476 MB, non-trimmed:    6105 kB,   current rate:      0 B/s
     opos:  990476 MB, non-scraped:        0 B,   average rate:   2759 B/s
non-tried:  970200 MB,  bad-sector:        0 B,     error rate:    720 B/s
  rescued:   29998 MB,   bad areas:          0,       run time:     1m 35s
pct rescued:    2.99%, read errors:        106, remaining time:   244d 18h
                               time since last successful read:     1m 31s
Copying non-tried blocks... Pass 1 (forwards)
     ipos:   30008 MB, non-trimmed:    7888 kB,   current rate:      0 B/s
     opos:   30008 MB, non-scraped:        0 B,   average rate:   2759 B/s
non-tried:  970198 MB,  bad-sector:        0 B,     error rate:    720 B/s
  rescued:   29998 MB,   bad areas:          0,       run time:     1m 35s
pct rescued:    2.99%, read errors:        214, remaining time:        n/a
                               time since last successful read:     1m 31s
Copying non-tried blocks... Pass 2 (backwards)
     ipos:   36095 MB, non-trimmed:    6103 MB,   current rate:      0 B/s
     opos:   36095 MB, non-scraped:        0 B,   average rate:   2702 B/s
non-tried:  964102 MB,  bad-sector:        0 B,     error rate:  2584 MB/s
  rescued:   29998 MB,   bad areas:          0,       run time:     1m 36s
pct rescued:    2.99%, read errors:      93228, remaining time:        n/a
                               time since last successful read:     1m 32s
Copying non-tried blocks... Pass 5 (forwards)^C
  Interrupted by user
hdparm, Smartctl, and fdisk no longer recognize the device; only lsblk still displays it.

root@optiplex3020 /r/k/ST1000LM048-2E7172 [22]# hdparm -g /dev/sda

/dev/sda:
 geometry      = 0/255/63, sectors = 0, start = 0
root@optiplex3020 /r/k/ST1000LM048-2E7172# hdparm -I /dev/sda

/dev/sda:
root@optiplex3020 /r/k/ST1000LM048-2E7172# hdparm -i /dev/sda

/dev/sda:
 HDIO_GET_IDENTITY failed: No message of desired type
root@optiplex3020 /r/k/ST1000LM048-2E7172# hdparm -C /dev/sda

/dev/sda:
 drive state is:  unknown
root@optiplex3020 /r/k/ST1000LM048-2E7172 [SIGINT]# lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda      8:0    0 931,5G  0 disk
└─sda1   8:1    0 931,5G  0 part
sdb      8:16   0   5,5T  0 disk
└─sdb1   8:17   0   5,5T  0 part /rescue
sdc      8:32   0 238,5G  0 disk
├─sdc1   8:33   0   300M  0 part /boot/efi
├─sdc2   8:34   0 229,4G  0 part /
└─sdc3   8:35   0   8,8G  0 part [SWAP]

I would like to mention a special feature of the system: When I restart after the error, the Dell system shows me that a disk is causing problems and I can only continue with PF1. The device name often changes, as do the other two disks, system and data, which does not change anything due to the definition with UUID in fstab.  The motherboard has 3 SSD connections, which are referred to in the log as ata 1, 2, and 6. Because the reported error code indicates a defective cable or SSD connection, I replaced the cable and reconnected the disk from the original ata2 to ata6, both without any change in behavior.

If I abort the process a little later rather than immediately after the error occurs, ipos shifts. After restarting the system, ddrescue recovers additional MB or even GB of data until the error occurs again. The problem is that after the error occurs, I have to abort ddrescue and then always restart the system, because that is the only way to make the disk available again.

I can still try to automate the process with a script: instruct ddrescue with the parameters -s and -i to perform the copy process in small steps and restart the system when the disk is no longer available.

I hope to receive suggestions on how to recover the data, because the condition of the disk is stable.
Many thanks in advance
Greetings from the Alps
Franz


Reply via email to