Re: [CentOS] HDD badblocks

Lamar Owen Wed, 20 Jan 2016 06:17:35 -0800

On 01/19/2016 06:46 PM, Chris Murphy wrote:

Hence, bad sectors accumulate. And the consequence of this often
doesn't get figured out until a user looks at kernel messages and sees
a bunch of hard link resets....

The standard Unix way of refreshing the disk contents is with badblocks'non-destructive read-write test (badblocks -n or as the -cc option toe2fsck, for ext2/3/4 filesystems). The remap will happen on thewriteback of the contents. It's been this way with enterprise SCSIdrives for as long as I can remember there being enterprise-class SCSIdrives. ATA drives caught up with the SCSI ones back in the early 90'swith this feature. But it's always been true, to the best of myrecollection, that the remap always happens on a write. The rationaleis pretty simple: only on a write error does the drive know that it hasthe valid data in its buffer, and so that's the only safe time to putthe data elsewhere.

This problem affects all software raid, including btrfs raid1. The
ideal scenario is you'll use 'smartctl -l scterc,70,70 /dev/sdX' in
startup script, so the drive fails reads on marginally bad sectors
with an error in 7 seconds maximum.

This is partly why enterprise arrays manage their own per-sector ECC anduse 528-byte sector sizes. The drives for these arrays make very poorworkstation standalone drives, since the drive is no longer doing allthe error recovery itself, but relying on the storage processor to dothe work. Now, the drive is still doing some basic ECC on the sector,but the storage processor is getting a much better idea of the health ofeach sector than when the drive's firmware is managing remap.Sophisticated enterprise arrays, like NetApp's, EMC's, and Nimble's, cando some very accurate predictions and proactive hotsparing when needed.That's part of what you pay for when you buy that sort of array.

But the other fact of life of modern consumer-level hard drives is that*errored sectors are expected* and not exceptions. Why else would adrive have a TLER in the two minute range like many of the WD Greendrives do? And with a consumer-level drive I would be shocked ifbadblocks reported the same number each time it ran through.


_______________________________________________
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos

Re: [CentOS] HDD badblocks

Reply via email to