Re: How do hard drives handle bad blocks nowadays?

2011-04-03 Thread Chuck Anderson
On Sun, Apr 03, 2011 at 05:00:27PM -0400, MBR wrote:
 It's now two decades later, and I'm trying to understand what's changed 
 since then.  In particular I recently cloned a laptop drive (IDE) to a 
 new drive.  When I did so, I encountered 2 bad blocks on the new drive.  
 Based on my recollection from the late 1980s, I didn't think 2 bad 
 blocks was a big deal because I assumed I could manually enter their 
 addresses into the bad block list and they'd be replaced by spare 
 blocks.  But I haven't managed to find a tool to allow me to examine 
 and/or edit the bad block list.

Modern ATA (IDE) drives do this remapping automatically, and 
transparently to the host system--the LBA block number stays the same, 
but the underlying physical sector is moved by the drive firmware to a 
spare sector that was reserved for this purpose.  Apparently, this 
feature can be turned on and off with hdparm -D.

SCSI drives can also do this, and may be configured with this turned 
off by default since they are expected to be used in RAID arrays and 
servers that would handle this disk management on a higher level.

 After doing some web searches and a bit of reading on this, I get the 
 impression that nowadays all modern drives implement S.M.A.R.T. 
 (Self-Monitoring, Analysis, and Reporting Technology) and that using 
 S.M.A.R.T. they all handle this behind the scenes.  If that's true, then 
 presumably the only time I should ever see a disk report a bad block is 
 when there are no more spare blocks left.  Am I right about that?

The remapping only happens on write, not read.  This is so that you 
can keep trying to read a bad block in the hopes that you might 
eventually recover the data with a good read or partial good read.  
Once you write to the sector, it then attempts the reallocation.  
After it is reallocated, there is no easy way to get at the old 
sector's data--it is effectively orphaned on the disk. (If that old 
sector happened to have sensitive data on it, there is now no way for 
you to erase it, hence the development of Anti-Forensic Splitting for 
use with encryption schemes such as LUKS to mitigate against this 
issue.)

I've had drives that were stubborn about reallocating automatically 
with normal overwrites.  I had to poke the sectors manually with 
hdparm:

hdparm --read-sector sector-number  # check if it's really bad
hdparm --write-sector sector-number # repair (reallocate) bad sector

 If so, then the fact that I encountered write errors on two blocks on 
 the drive suggests that the brand new drive was in pretty bad shape to 
 begin with.

Check smartctl -a /dev/foo and look for pending and reallocated 
sectors.  I usually replace a disk once it starts getting any of 
those.  A new disk shouldn't have any IMO, and I'd RMA it if that were 
the case.  I do have some older drives that were given to me that have 
1 or 2 reallocated sectors that I might use for scratch storage as 
long as the pending or reallocated counts don't keep increasing.

 Is there some tool that will allow me to examine the disk's bad block list?

For ATA, I'm not aware of how to examine the defect list.  For SCSI, 
you can use sdparm or sg3_utils.  smartctl -a will at least tell you 
how many have been reallocated.

I usually do the following to test suspect drives:

smartctl -l selftest # look for existing test results
smartctl -t short# do a quick test
smartctl -l selftest # look at the results
smartctl -t long # do a long test (could take an hour or more)
smartctl -l selftest # look at the results

 Also, should I use 'dd' to test all blocks before I put a drive into 
 service, or is there a better tool out there?

Besides the above tests, I've often used dd for reading and writing 
the entire drive as an extra sanity test, and to force overwrites and 
possibly reallocate any bad sectors:

dd if=/dev/zero of=/dev/foo bs=32M
dd if=/dev/foo of=/dev/null bs=32M

In another window:

while true; do killall -USR1 dd; sleep 10; done

Watch the first window for once-per-10-second status updates from dd 
:-)
___
Discuss mailing list
Discuss@blu.org
http://lists.blu.org/mailman/listinfo/discuss


Re: How do hard drives handle bad blocks nowadays?

2011-04-03 Thread Tom Metro
MBR wrote:
 In particular I recently cloned a laptop drive (IDE) to a 
 new drive.  When I did so, I encountered 2 bad blocks on the new drive.  

What did you use to perform the clone and how were the bad blocks reported?


 After doing some web searches and a bit of reading on this, I get the 
 impression that nowadays all modern drives...handle this behind the scenes.

Correct.


 If that's true, then presumably the only time I should ever see a
 disk report a bad block is when there are no more spare blocks left.
 Am I right about that?

That's my understanding.


 If so, then the fact that I encountered write errors on two blocks on 
 the drive suggests that the brand new drive was in pretty bad shape to 
 begin with.

Unless the error was actually a read error during a verification step.
It is possible to still encounter unrecoverable read errors. Bad blocks
are only remapped on a write operation.


 Is there some tool that will allow me to examine the disk's bad block list?

The specific location of the bad blocks may be something that only a
drive manufacturer's proprietary tools can extract.


 Also, should I use 'dd' to test all blocks before I put a drive into 
 service, or is there a better tool out there?

See the hard drive burn-in thread:
http://thread.gmane.org/gmane.org.user-groups.linux.boston.discuss/30555/focus=30559

(BTW, we now have list archives at Gmane:
http://dir.gmane.org/gmane.org.user-groups.linux.boston.discuss
(thanks JABR) which work a bit better than Nabble, which has gone down
hill in recent years.)

 -Tom

-- 
Tom Metro
Venture Logic, Newton, MA, USA
Enterprise solutions through open source.
Professional Profile: http://tmetro.venturelogic.com/
___
Discuss mailing list
Discuss@blu.org
http://lists.blu.org/mailman/listinfo/discuss


Re: How do hard drives handle bad blocks nowadays?

2011-04-03 Thread Rajiv Aaron Manglani
 Also, should I use 'dd' to test all blocks before I put a drive into 
 service, or is there a better tool out there?
 Besides the above tests, I've often used dd for reading and writing 
 the entire drive as an extra sanity test, and to force overwrites and 
 possibly reallocate any bad sectors:
 dd if=/dev/zero of=/dev/foo bs=32M
 dd if=/dev/foo of=/dev/null bs=32M

i use dban for new drives hosts i can take offline, or put the drive in another 
box.


___
Discuss mailing list
Discuss@blu.org
http://lists.blu.org/mailman/listinfo/discuss


Re: How do hard drives handle bad blocks nowadays?

2011-04-03 Thread MBR
Thanks a lot for your very informative response.  I'll have to read 
through the man-pages for hdparm and smartctl.

Mark

On 4/3/2011 5:57 PM, Chuck Anderson wrote:
 On Sun, Apr 03, 2011 at 05:00:27PM -0400, MBR wrote:
 It's now two decades later, and I'm trying to understand what's changed
 since then.  In particular I recently cloned a laptop drive (IDE) to a
 new drive.  When I did so, I encountered 2 bad blocks on the new drive.
 Based on my recollection from the late 1980s, I didn't think 2 bad
 blocks was a big deal because I assumed I could manually enter their
 addresses into the bad block list and they'd be replaced by spare
 blocks.  But I haven't managed to find a tool to allow me to examine
 and/or edit the bad block list.
 Modern ATA (IDE) drives do this remapping automatically, and
 transparently to the host system--the LBA block number stays the same,
 but the underlying physical sector is moved by the drive firmware to a
 spare sector that was reserved for this purpose.  Apparently, this
 feature can be turned on and off with hdparm -D.

 SCSI drives can also do this, and may be configured with this turned
 off by default since they are expected to be used in RAID arrays and
 servers that would handle this disk management on a higher level.

 After doing some web searches and a bit of reading on this, I get the
 impression that nowadays all modern drives implement S.M.A.R.T.
 (Self-Monitoring, Analysis, and Reporting Technology) and that using
 S.M.A.R.T. they all handle this behind the scenes.  If that's true, then
 presumably the only time I should ever see a disk report a bad block is
 when there are no more spare blocks left.  Am I right about that?
 The remapping only happens on write, not read.  This is so that you
 can keep trying to read a bad block in the hopes that you might
 eventually recover the data with a good read or partial good read.
 Once you write to the sector, it then attempts the reallocation.
 After it is reallocated, there is no easy way to get at the old
 sector's data--it is effectively orphaned on the disk. (If that old
 sector happened to have sensitive data on it, there is now no way for
 you to erase it, hence the development of Anti-Forensic Splitting for
 use with encryption schemes such as LUKS to mitigate against this
 issue.)

 I've had drives that were stubborn about reallocating automatically
 with normal overwrites.  I had to poke the sectors manually with
 hdparm:

 hdparm --read-sectorsector-number   # check if it's really bad
 hdparm --write-sectorsector-number  # repair (reallocate) bad sector

 If so, then the fact that I encountered write errors on two blocks on
 the drive suggests that the brand new drive was in pretty bad shape to
 begin with.
 Check smartctl -a /dev/foo and look for pending and reallocated
 sectors.  I usually replace a disk once it starts getting any of
 those.  A new disk shouldn't have any IMO, and I'd RMA it if that were
 the case.  I do have some older drives that were given to me that have
 1 or 2 reallocated sectors that I might use for scratch storage as
 long as the pending or reallocated counts don't keep increasing.

 Is there some tool that will allow me to examine the disk's bad block list?
 For ATA, I'm not aware of how to examine the defect list.  For SCSI,
 you can use sdparm or sg3_utils.  smartctl -a will at least tell you
 how many have been reallocated.

 I usually do the following to test suspect drives:

 smartctl -l selftest # look for existing test results
 smartctl -t short# do a quick test
 smartctl -l selftest # look at the results
 smartctl -t long # do a long test (could take an hour or more)
 smartctl -l selftest # look at the results

 Also, should I use 'dd' to test all blocks before I put a drive into
 service, or is there a better tool out there?
 Besides the above tests, I've often used dd for reading and writing
 the entire drive as an extra sanity test, and to force overwrites and
 possibly reallocate any bad sectors:

 dd if=/dev/zero of=/dev/foo bs=32M
 dd if=/dev/foo of=/dev/null bs=32M

 In another window:

 while true; do killall -USR1 dd; sleep 10; done

 Watch the first window for once-per-10-second status updates from dd
 :-)
 ___

___
Discuss mailing list
Discuss@blu.org
http://lists.blu.org/mailman/listinfo/discuss