Hello, I need your help.
I am trying to copy the readable sectors from a 1TB disk containing
important data belonging to a migrant who cannot afford professional
data recovery, despite read errors.
The data recovery is being performed under Debian Trixi with ddrescue
V 1.29
The disk, model ST1000LM048-2E7172, shows unrecoverable errors in the
log. However, the problem is the behavior of ddrescue after connection
errors are displayed in the log and, after several retry attempts, the
device is shut down without a read error being reported.
first error:
Nov 24 11:26:08 optiplex3020 kernel: ata2.00: exception Emask 0x0 SAct
0x20 SErr 0x50000 action 0x6 frozen
Nov 24 11:26:08 optiplex3020 kernel: ata2: SError: { PHYRdyChg CommWake }
Nov 24 11:26:08 optiplex3020 kernel: ata2.00: failed command: READ
FPDMA QUEUED
Nov 24 11:26:08 optiplex3020 kernel: ata2.00: cmd
60/80:28:00:ce:88/00:00:06:00:00/40 tag 5 ncq dma 65536 in
res
40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Nov 24 11:26:08 optiplex3020 kernel: ata2.00: status: { DRDY }
Nov 24 11:26:08 optiplex3020 kernel: ata2: hard resetting link
...
Nov 24 11:26:08 optiplex3020 kernel: ata2.00: exception Emask 0x0 SAct
0x20 SErr 0x50000 action 0x6 frozen
Nov 24 11:26:08 optiplex3020 kernel: ata2: SError: { PHYRdyChg CommWake }
Nov 24 11:26:08 optiplex3020 kernel: ata2.00: failed command: READ
FPDMA QUEUED
Nov 24 11:26:08 optiplex3020 kernel: ata2.00: cmd
60/80:28:00:ce:88/00:00:06:00:00/40 tag 5 ncq dma 65536 in
res
40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Nov 24 11:26:08 optiplex3020 kernel: ata2.00: status: { DRDY }
Nov 24 11:26:08 optiplex3020 kernel: ata2: hard resetting link
Nov 24 11:29:48 optiplex3020 kernel: sd 1:0:0:0: [sdb] tag#13 FAILED
Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=71s
Nov 24 11:29:48 optiplex3020 kernel: sd 1:0:0:0: [sdb] tag#13 CDB: ATA
command pass through(16) 85 06 20 00 00 00 00 00 00 00 00 00 00 00 e5 00
Nov 24 11:29:48 optiplex3020 kernel: sd 1:0:0:0: [sdb] tag#14 FAILED
Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=90s
Nov 24 11:29:48 optiplex3020 kernel: sd 1:0:0:0: [sdb] tag#14 CDB:
Read(10) 28 00 06 8c de 80 00 00 80 00
Nov 24 11:29:48 optiplex3020 kernel: I/O error, dev sdb, sector
109895296 op 0x0:(READ) flags 0x800 phys_seg 12 prio class 2
Nov 24 11:29:48 optiplex3020 kernel: sd 1:0:0:0: [sdb] tag#16 FAILED
Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
Nov 24 11:29:48 optiplex3020 kernel: sd 1:0:0:0: [sdb] tag#16 CDB:
Read(10) 28 00 06 8c df 00 00 00 80 00
Nov 24 11:29:48 optiplex3020 kernel: I/O error, dev sdb, sector
109895424 op 0x0:(READ) flags 0x800 phys_seg 12 prio class 2
Nov 24 11:29:48 optiplex3020 kernel: sd 1:0:0:0: [sdb] tag#0 FAILED
Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
Nov 24 11:29:48 optiplex3020 kernel: sd 1:0:0:0: [sdb] tag#0 CDB:
Read(10) 28 00 06 8c df 80 00 00 80 00
Nov 24 11:29:48 optiplex3020 kernel: I/O error, dev sdb, sector
109895552 op 0x0:(READ) flags 0x800 phys_seg 12 prio class 2
Nov 24 11:29:48 optiplex3020 kernel: sd 1:0:0:0: [sdb] tag#7 FAILED
Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
Nov 24 11:29:48 optiplex3020 kernel: sd 1:0:0:0: [sdb] tag#7 CDB:
Read(10) 28 00 06 8c e0 00 00 00 80 00
...
From this point on, the error counter counts up quickly and the error
rate exceeds 2500 MB/s.
ddrescue does not recognize that the device is no longer available,
even when I stop and restart ddrescue.
root@optiplex3020 /r/k/ST1000LM048-2E7172# ddrescue --idirect /dev/sda
sdc.img sdc.log
GNU ddrescue 1.29
Press Ctrl-C to interrupt
Initial status (read from mapfile)
rescued: 29998 MB, tried: 131072 B, bad-sector: 0 B, bad areas: 0
Current status
ipos: 990476 MB, non-trimmed: 6105 kB, current rate: 0 B/s
opos: 990476 MB, non-scraped: 0 B, average rate: 2759 B/s
non-tried: 970200 MB, bad-sector: 0 B, error rate: 720 B/s
rescued: 29998 MB, bad areas: 0, run time: 1m 35s
pct rescued: 2.99%, read errors: 106, remaining time: 244d 18h
time since last successful read: 1m 31s
Copying non-tried blocks... Pass 1 (forwards)
ipos: 30008 MB, non-trimmed: 7888 kB, current rate: 0 B/s
opos: 30008 MB, non-scraped: 0 B, average rate: 2759 B/s
non-tried: 970198 MB, bad-sector: 0 B, error rate: 720 B/s
rescued: 29998 MB, bad areas: 0, run time: 1m 35s
pct rescued: 2.99%, read errors: 214, remaining time: n/a
time since last successful read: 1m 31s
Copying non-tried blocks... Pass 2 (backwards)
ipos: 36095 MB, non-trimmed: 6103 MB, current rate: 0 B/s
opos: 36095 MB, non-scraped: 0 B, average rate: 2702 B/s
non-tried: 964102 MB, bad-sector: 0 B, error rate: 2584
MB/s
rescued: 29998 MB, bad areas: 0, run time: 1m 36s
pct rescued: 2.99%, read errors: 93228, remaining time: n/a
time since last successful read: 1m 32s
Copying non-tried blocks... Pass 5 (forwards)^C
Interrupted by user
hdparm, Smartctl, and fdisk no longer recognize the device; only lsblk
still displays it.
root@optiplex3020 /r/k/ST1000LM048-2E7172 [22]# hdparm -g /dev/sda
/dev/sda:
geometry = 0/255/63, sectors = 0, start = 0
root@optiplex3020 /r/k/ST1000LM048-2E7172# hdparm -I /dev/sda
/dev/sda:
root@optiplex3020 /r/k/ST1000LM048-2E7172# hdparm -i /dev/sda
/dev/sda:
HDIO_GET_IDENTITY failed: No message of desired type
root@optiplex3020 /r/k/ST1000LM048-2E7172# hdparm -C /dev/sda
/dev/sda:
drive state is: unknown
root@optiplex3020 /r/k/ST1000LM048-2E7172 [SIGINT]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 0 931,5G 0 disk
└─sda1 8:1 0 931,5G 0 part
sdb 8:16 0 5,5T 0 disk
└─sdb1 8:17 0 5,5T 0 part /rescue
sdc 8:32 0 238,5G 0 disk
├─sdc1 8:33 0 300M 0 part /boot/efi
├─sdc2 8:34 0 229,4G 0 part /
└─sdc3 8:35 0 8,8G 0 part [SWAP]
I would like to mention a special feature of the system: When I
restart after the error, the Dell system shows me that a disk is
causing problems and I can only continue with PF1. The device name
often changes, as do the other two disks, system and data, which does
not change anything due to the definition with UUID in fstab.
The motherboard has 3 SSD connections, which are referred to in the
log as ata 1, 2, and 6. Because the reported error code indicates a
defective cable or SSD connection, I replaced the cable and
reconnected the disk from the original ata2 to ata6, both without any
change in behavior.
If I abort the process a little later rather than immediately after
the error occurs, ipos shifts. After restarting the system, ddrescue
recovers additional MB or even GB of data until the error occurs
again. The problem is that after the error occurs, I have to abort
ddrescue and then always restart the system, because that is the only
way to make the disk available again.
I can still try to automate the process with a script: instruct
ddrescue with the parameters -s and -i to perform the copy process in
small steps and restart the system when the disk is no longer available.
I hope to receive suggestions on how to recover the data, because the
condition of the disk is stable.
Many thanks in advance
Greetings from the Alps
Franz