On 8/31/22 06:25, ppr wrote:
I would appreciate advice from the community about a failing hard drive.
When booting up, the computer complained about /dev/sdb, which is a ext4
HDD with data (not the computer main disk). dmesg shows `AE_NOT_FOUND`
and `failed command: READ FPDMA QUEUED` messages (full dmesg log at
https://hastebin.com/raw/jebelileru).
It has finally booted after trying unsuccessfully to start /dev/sdb.
I launched smartctl which shows hard drive failure.
---
# smartctl -H -i /dev/sdb
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.19.0-21-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Toshiba 3.5" DT01ACA... Desktop HDD
Device Model: TOSHIBA DT01ACA100
Serial Number: 663X1XGNS
LU WWN Device Id: 5 000039 fe9dad918
Firmware Version: MS2OA750
User Capacity: 1 000 204 886 016 bytes [1,00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Wed Aug 31 13:56:34 2022 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
Failed Attributes:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
WHEN_FAILED RAW_VALUE
2 Throughput_Performance 0x0005 037 037 054 Pre-fail
Offline FAILING_NOW 3774
5 Reallocated_Sector_Ct 0x0033 001 001 005 Pre-fail
Always FAILING_NOW 2004
---
I did not try to mount the HDD. I plugged an external HDD (ext4) and
launched ddrescue. After two days it has recovered 33GB of 1TB but the
speed are now so slow it will take 7104 days to complete.
# ddrescue -n /dev/sdb
/media/sara/2274a2da-1f02-4afd-a5c5-e8dcb1c02195/recup_HDD_sara/image_HDD1.img
/media/sara/2274a2da-1f02-4afd-a5c5-e8dcb1c02195/recup_HDD_sara/recup.log
GNU ddrescue 1.23
Press Ctrl-C to interrupt
ipos: 33992 MB, non-trimmed: 0 B, current rate: 636 B/s
opos: 33992 MB, non-scraped: 0 B, average rate: 188 kB/s
non-tried: 966212 MB, bad-sector: 0 B, error rate: 0 B/s
rescued: 33992 MB, bad areas: 0, run time: 2d 2h 6m
pct rescued: 3.39%, read errors: 0, remaining time: 7104d 20h
time since last successful read: 0s
Copying non-tried blocks... Pass 1 (forwards)^C
Should I wait hoping for a speeding? Should I pass different option to
ddrescue or use another tool?
Unless you have enterprise grade equipment designed for 100% duty cycle
for 48 hours, I would kill the ddresue job before your hardware is
destroyed.
Both the failed drive and the destination drive will be in heavy use
while you attempt to recover sectors. At 100 MB/s, transferring 1 TB
will take nearly 3 hours (!). Make sure everything has good power
supplies and good cooling. Use the best drive you have for the
destination; an SSD will expedite this process and steps that follow.
Ensure that the destination contains zeros for sectors not recovered.
Comment out the /etc/crypttab and/or /etc/fstab entries for the failed
drive. When you mount the drive, mount it read only.
The challenge is figuring out the right options and strategies for using
ddresue(1) to get as many good sectors as you can off the failing drive
before it dies completely. Fortunately or unfortunately, I have not
needed ddrescue(1) in many years; so, I would RFTM carefully and then
STFW for articles about using ddrescue(1) effectively. Consider doing
the work in chunks. You should already have sectors 0- 33 GB. Skip 33
GB and/or 34 GB. Do 35-100 GB. Then, 100-200 GB, 200-300 GB, 300-400
GB, etc.. Get the good sectors first. Do the problem sectors last.
Once you have an image file containing whatever sectors you could
recover, make the file read-only and back it up. Better yet, make two
backups and put one off-site.
To do the filesystem repair/ recovery work, make a copy of the image and
work on the copy. If you make a mistake, you can throw away the copy
and start over.
I find it very useful to install Debian onto a good quality USB 3.0
flash drive, to use for system administration, maintenance,
trouble-shooting, etc.. I prefer this approach over "live"
distributions because I have a full Debian system and can install
anything I want or need.
I find it very useful to have a spare computer for maintenance and
troubleshooting tasks.
I find it very useful to use a version control system for system
configuration files, system administration notes, etc..
I backup, archive, and image compulsively. I keep a supply of spare
parts on hand. Do not be afraid to spend money new an improved parts --
the last time I lost data when when I tried to "get by" with old and
inadequate parts.
David
https://toshiba.semicon-storage.com/us/storage/product/internal-specialty/pc/articles/dt01aca-series.html
https://linux.die.net/man/1/ddrescue