Re: [CentOS] HDD badblocks

Warren Young Tue, 19 Jan 2016 14:25:32 -0800

On Jan 17, 2016, at 9:59 AM, Alessandro Baggi <alessandro.ba...@gmail.com> 
wrote:
> 
> On sdb there are not problem but with sda:
> 
> 1) First run badblocks reports 28 badblocks on disk
> 2) Second run badblocks reports 32 badblocks
> 3) Third reports 102 badblocks
> 4) Last run reports 92 badblocks.


It’s dying.  Replace it now.

On a modern hard disk, you should *never* see bad sectors, because the drive is 
busy hiding all the bad sectors it does find, then telling you everything is 
fine.

Once the drive has swept so many problems under the rug that it is forced to 
admit to normal user space programs (e.g. badblocks) that there are bad 
sectors, it’s because the spare sector pool is full.  At that point, the only 
safe remediation is to replace the disk.

> Running smartctl after the last badblocks check I've noticed that 
> Current_Pending_Sector was 32 (not 92 as badblocks found).

SMART is allowed to lie to you.  That’s why there’s the RAW_VALUE column, yet 
there is no explanation in the manual as to what that value means.  The reason 
is, the low-level meanings of these values are documented by the drive 
manufacturers.  “92” is not necessarily a sector count.  For all you know, it 
is reporting that there are currently 92 lemmings in midair off the fjords of 
Finland.

The only important results here are:

a) the numbers are nonzero
b) the numbers are changing

That is all.  A zero value just means it hasn’t failed *yet*, and a static 
nonzero value means the drive has temporarily arrested its failures-in-progress.

There is no such thing as a hard drive with zero actual bad sectors, just one 
that has space left in its spare sector pool.  A “working” drive is one that is 
swapping sectors from the spare pool rarely enough that it is expected not to 
empty the pool before the warranty expires.

> Why each consecutive run of badblocks reports different results?

Because physics.  The highly competitive nature of the HDD business plus the 
relentless drive of Moore’s Business Law — as it should be called, since it is 
not a physical law, just an arbitrary fiction that the tech industry has bought 
into as the ground rules for the game — pushes the manufacturers to design them 
right up against the ragged edge of functionality.

HDD manufacturers could solve all of this by making them with 1/4 the capacity 
and twice the cost and get 10x the reliability.  And they do: they’re called 
SAS drives. :)

> Why smartctl does not update Reallocated_Event_Count?

Because SMART lies.

> What other test I can perform to verify disks problems?

Quit poking the tiger to see if it will bite you.  Replace the bad disk and 
resilver that mirror before you lose the other disk, too.
_______________________________________________
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos

Re: [CentOS] HDD badblocks

Reply via email to