On 21/10/2021 00:45, Thomas Anderson wrote:
Here are the results, of my smartctl test:

Five metrics from smartctl require attention:

    Device Model                ST8000DM004-2CX188
9   Power_On_Hours              14558

183 Runtime_Bad_Block           5
187 Reported_Uncorrect          1334
195 Hardware_ECC_Recovered      7226480
197 Current_Pending_Sector      8
198 Offline_Uncorrectable       8

I have similar Seagate drives (4), and they look like this:

    Device Model                ST2000DM006-2DM164
9   Power_On_Hours              20124,20125,23527,23511

183 Runtime_Bad_Block           2,3,1,1 yours 5
187 Reported_Uncorrect          0,0,0,0 yours 1334
195 Hardware_ECC_Recovered my HDDs don't have that field, yours 7226480
197 Current_Pending_Sector      0,0,0,0 yours 5
198 Offline_Uncorrectable       0,0,0,0 yours 5

Clearly you have problem with surface errors. 1334 reported bad sectors,
5 current pending sectors and so on. However, last full surface read
test SMART "long" test you performed at 14551 hrs (7 hours before
fetching this SMART data) has passed, meaning drive is still able to
read all sectors. That means it still has spares and/or it doesn't have
any more unknown bad sectors.

Another good news is, that SMART log has logged last error in 10525
power-on hours. You are on 14558 hrs. That's 4033 hrs or 168 days ago if
you run it 24/7. That means, drive is not discovering new bad sectors,
either because you don't read or write to certain surface spots where
undiscovered bad sectors are, or simply because remainder of the drive
is in good condition.

If your drive is still on warranty, you should return it, most likely it
will be accepted based solely on smartctl results. If not, you can do
following:

Change data cable first thing, as also Gene suggested

Watch these five metrics very closely:
183 Runtime_Bad_Block           5
187 Reported_Uncorrect          1334
195 Hardware_ECC_Recovered      7226480
197 Current_Pending_Sector      8
198 Offline_Uncorrectable       8

If any of these raise rapidly over time (maybe except ECC errors, which
are usually kinda normal nowadays), you may consider retiring it.

Also do smart "long" test every 1 month or so. Check SMART results after
each test.

Lastly, you should use it in RAID1 or similar mode, to ensure there is
always backup of data this drive keeps. You can try Btrfs raid1 or mdadm
raid1. Whatever you choose, try not to use this drive without any
backup, as you are not sure yet if damage are not progressing.

--
With kindest regards, Piotr.

⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Debian - The universal operating system
⢿⡄⠘⠷⠚⠋⠀ https://www.debian.org
⠈⠳⣄⠀⠀⠀⠀

Reply via email to