Re: Tool for HD analyzing

2007-10-01 Thread Tony Abernethy
Calomel wrote:
> 
> If you really want to check the drive and verify it has 
> errors then check
> out the binary called badblocks. I do not believe OpenBSD has 
> badblocks but
> you can use the cd distro "system rescue cd" and run 
> badblocks from there
> without removing the drive from the current machine.  
> 
> NON-destructive BadBlock test (1gig ram in machine)
>   badblocks -b 4096 -c 98304 -p 0 -s /dev/hda
> 
> For a more detailed explanation http://calomel.org/badblocks_wipe.html
> 
After spending (most of) the weekend recovering a failed drive (XP),
there's a few things about disks that are worth knowing.
(ALSO READ ANY AND ALL BY NICK HOLLAND! -- He knows whereof)
With a rescue CD 
(Don't use Windows without one)
(OpenBSD's purpose in life is NOT rescuing Windows computers)
ONE (only one) sector was "unreadable" (irrecoverable)
A good disk cloned from bad-disk was very unusable.
Running destructive badblocks showed that the disk had no errors.

Until the disk further degrades, it will test out good.
That disk is "reserved" with a "EMERGENCY USE ONLY" tag.

If things work the way I think they work,
the "non-destructive" read test will actually destroy ALL
ability to tell that the disks had (past tense) errors.
(remapping bad sectors)

(This oughta be off-list, but where else can you get good info ;-)
There's people on this list who actually know what I'm trying to talk about.



Re: Tool for HD analyzing

2007-10-01 Thread Calomel
Hard drives these days are cheep. I agree that you should get the data off
if you can and buy another drive. 

If you really want to check the drive and verify it has errors then check
out the binary called badblocks. I do not believe OpenBSD has badblocks but
you can use the cd distro "system rescue cd" and run badblocks from there
without removing the drive from the current machine.  

NON-destructive BadBlock test (1gig ram in machine)
  badblocks -b 4096 -c 98304 -p 0 -s /dev/hda

For a more detailed explanation http://calomel.org/badblocks_wipe.html

--
 Calomel @ http://calomel.org

On Fri, Sep 28, 2007 at 08:17:22AM -0300, Leonardo Marques wrote:
>Hey guys,
>
>I've a HD which are returning a lot of errors. Someone know some good
>tool to analyze this disk and tell me if i've to replace it or if
>exist some way to repair it?
>
>HD: WDC WD1200JD-00HBB0
>
>Errors:
>
>dmesg |grep -i wd0
>wd0 at pciide0 channel 0 drive 0: 
>wd0: 16-sector PIO, LBA48, 114473MB, 234441648 sectors
>wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 5
>dkcsum: wd0 matches BIOS drive 0x82
>wd0(pciide0:0:0): timeout
>wd0a: device timeout reading fsbn 96 of 96-127 (wd0 bn 159; cn 0 tn 2
>sn 33), retrying
>wd0a: device timeout reading fsbn 96 of 96-127 (wd0 bn 159; cn 0 tn 2
>sn 33), retrying
>wd0a: device timeout reading fsbn 96 of 96-127 (wd0 bn 159; cn 0 tn 2
>sn 33), retrying
>wd0a: device timeout reading fsbn 96 of 96-127 (wd0 bn 159; cn 0 tn 2
>sn 33), retrying
>wd0a: device timeout reading fsbn 96 of 96-127 (wd0 bn 159; cn 0 tn 2
>sn 33), retrying
>wd0a: device timeout reading fsbn 96 of 96-127 (wd0 bn 159; cn 0 tn 2 sn 33)
>wd0c: device timeout reading fsbn 0 (wd0 bn 0; cn 0 tn 0 sn 0), retrying
>wd0c: device timeout reading fsbn 0 (wd0 bn 0; cn 0 tn 0 sn 0), retrying
>wd0c: device timeout reading fsbn 0 (wd0 bn 0; cn 0 tn 0 sn 0), retrying
>wd0c: device timeout reading fsbn 0 (wd0 bn 0; cn 0 tn 0 sn 0), retrying
>wd0c: device timeout reading fsbn 0 (wd0 bn 0; cn 0 tn 0 sn 0), retrying
>wd0c: device timeout reading fsbn 0 (wd0 bn 0; cn 0 tn 0 sn 0)
>
>Thanks for all attention.
>
>-- 
>---
>Leonardo Marques
>---
>Blog: BeNerd.analyx.org
>Website: www.analyx.org



Re: Tool for HD analyzing

2007-09-28 Thread Marco Peereboom
Uhm sorry claudio but it is the other way around.  The only time you can
detect failed sectors is on reads.  When a read fails the disk goes into
a proprietary algorithm to try to recover as much of the sector as
possible.  Depending on the manufacturer they'll do something in excess
of 15000 reads of the same sector to try to recover the data.  They then
use heuristics to try to determine what the original data most likely
was.  If this process is successful enough the block gets re-assigned.
Most disks don't do verifies on writes (too slow) and rely on the
recovery algorithm to reassign these blocks; which are triggered by
subsequent reads.  RAID manufacturers implement algorithms that
continuous read all sectors of idle disks to ensure data integrity.
When they run into a failure they first let the disk try to recover and
if that fails they use parity to recover the block.  They also will
"puncture" the sector so that the disk will skip it going forward.

Some of the dmesg lines pasted in this message are likely LBA
relocations that take too long.  There is no set timeout in the spec and
therefore vendors try really hard and long to recover the data resulting
in OS timeouts.  A good example is calculating how long it takes to read
an LBA 15000 times on a 1 RPM disk.  Add some fudge in there for the
head to find the exact spot and you'll see that it gets in excess of
seconds.  Repeat that a few times due to various retries in various
layers and you'll see where those lengthy timeouts come from.

On Fri, Sep 28, 2007 at 02:47:48PM +0200, Claudio Jeker wrote:
> On Fri, Sep 28, 2007 at 01:37:52PM +0200, Peter N. M. Hansteen wrote:
> > "Leonardo Marques" <[EMAIL PROTECTED]> writes:
> > 
> > > I've a HD which are returning a lot of errors. Someone know some good
> > > tool to analyze this disk and tell me if i've to replace it or if
> > > exist some way to repair it?
> > 
> > To my mind that kind of errors say "run, don't walk to the store for
> > replacement".  Modern disks remap bad parts away from active use, when
> > they've run out of remappable space, they start complaining like that.
> > 
> 
> Remapping is only possible when writing to blocks. The disk can not remap
> on reads. By forcing writes to such blocks you can remap them but the data
> on them is still lost. I use some partitions with almost only read access
> that had bad blocks and I could fix the problem with dd if=/dev/zero
> of=/dev/rsd0x (not important data so I did not replace the disk)
> 
> -- 
> :wq Claudio



Re: Tool for HD analyzing

2007-09-28 Thread Dorian Büttner

Leonardo Marques wrote:

Hey guys,

I've a HD which are returning a lot of errors. Someone know some good
tool to analyze this disk and tell me if i've to replace it or if
exist some way to repair it?

HD: WDC WD1200JD-00HBB0

Errors:

dmesg |grep -i wd0
wd0 at pciide0 channel 0 drive 0: 
wd0: 16-sector PIO, LBA48, 114473MB, 234441648 sectors
wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 5
dkcsum: wd0 matches BIOS drive 0x82
wd0(pciide0:0:0): timeout
wd0a: device timeout reading fsbn 96 of 96-127 (wd0 bn 159; cn 0 tn 2
sn 33), retrying
wd0a: device timeout reading fsbn 96 of 96-127 (wd0 bn 159; cn 0 tn 2
sn 33), retrying
wd0a: device timeout reading fsbn 96 of 96-127 (wd0 bn 159; cn 0 tn 2
sn 33), retrying
wd0a: device timeout reading fsbn 96 of 96-127 (wd0 bn 159; cn 0 tn 2
sn 33), retrying
wd0a: device timeout reading fsbn 96 of 96-127 (wd0 bn 159; cn 0 tn 2
sn 33), retrying
wd0a: device timeout reading fsbn 96 of 96-127 (wd0 bn 159; cn 0 tn 2 sn 33)
wd0c: device timeout reading fsbn 0 (wd0 bn 0; cn 0 tn 0 sn 0), retrying
wd0c: device timeout reading fsbn 0 (wd0 bn 0; cn 0 tn 0 sn 0), retrying
wd0c: device timeout reading fsbn 0 (wd0 bn 0; cn 0 tn 0 sn 0), retrying
wd0c: device timeout reading fsbn 0 (wd0 bn 0; cn 0 tn 0 sn 0), retrying
wd0c: device timeout reading fsbn 0 (wd0 bn 0; cn 0 tn 0 sn 0), retrying
wd0c: device timeout reading fsbn 0 (wd0 bn 0; cn 0 tn 0 sn 0)

Thanks for all attention.

  


Wasn't it up to the manufacturer to provide some diagnostic tool which 
also delivers an error code used for the RMA process.


Thanks,
Dorian



Re: Tool for HD analyzing

2007-09-28 Thread Claudio Jeker
On Fri, Sep 28, 2007 at 01:37:52PM +0200, Peter N. M. Hansteen wrote:
> "Leonardo Marques" <[EMAIL PROTECTED]> writes:
> 
> > I've a HD which are returning a lot of errors. Someone know some good
> > tool to analyze this disk and tell me if i've to replace it or if
> > exist some way to repair it?
> 
> To my mind that kind of errors say "run, don't walk to the store for
> replacement".  Modern disks remap bad parts away from active use, when
> they've run out of remappable space, they start complaining like that.
> 

Remapping is only possible when writing to blocks. The disk can not remap
on reads. By forcing writes to such blocks you can remap them but the data
on them is still lost. I use some partitions with almost only read access
that had bad blocks and I could fix the problem with dd if=/dev/zero
of=/dev/rsd0x (not important data so I did not replace the disk)

-- 
:wq Claudio



Re: Tool for HD analyzing

2007-09-28 Thread Leonardo Marques
Really thanks for all guys!

I'm running now for the store to buy another HD for the BACKUP server :P

On 9/28/07, Peter N. M. Hansteen <[EMAIL PROTECTED]> wrote:
> "Leonardo Marques" <[EMAIL PROTECTED]> writes:
>
> > I've a HD which are returning a lot of errors. Someone know some good
> > tool to analyze this disk and tell me if i've to replace it or if
> > exist some way to repair it?
>
> To my mind that kind of errors say "run, don't walk to the store for
> replacement".  Modern disks remap bad parts away from active use, when
> they've run out of remappable space, they start complaining like that.
>
> For measuring the health of your next hard drive, it's possible
> sysutils/smartmontools will do the job.
>
> --
> Peter N. M. Hansteen, member of the first RFC 1149 implementation team
> http://bsdly.blogspot.com/ http://www.datadok.no/ http://www.nuug.no/
> "Remember to set the evil bit on all malicious network traffic"
> delilah spamd[29949]: 85.152.224.147: disconnected after 42673 seconds.
>
>


-- 
---
Leonardo Marques
---
Blog: BeNerd.analyx.org
Website: www.analyx.org



Re: Tool for HD analyzing

2007-09-28 Thread Michał Koc
Peter N. M. Hansteen pisze:
> "Leonardo Marques" <[EMAIL PROTECTED]> writes:
>
>   
>> I've a HD which are returning a lot of errors. Someone know some good
>> tool to analyze this disk and tell me if i've to replace it or if
>> exist some way to repair it?
>> 
>
> To my mind that kind of errors say "run, don't walk to the store for
> replacement".  Modern disks remap bad parts away from active use, when
> they've run out of remappable space, they start complaining like that.
>
> For measuring the health of your next hard drive, it's possible
> sysutils/smartmontools will do the job.
>
>   

What if you have an ata/serialata HD attached to scsi/ahci/jmb ?

atactl is useless, will smartmontools work ?

redargds
M.K.



Re: Tool for HD analyzing

2007-09-28 Thread Stuart Henderson
On 2007/09/28 08:17, Leonardo Marques wrote:
> I've a HD which are returning a lot of errors. Someone know some good
> tool to analyze this disk and tell me if i've to replace it or if
> exist some way to repair it?

It's worth trying a different cable, if that doesn't help,
replace the disk...



Re: Tool for HD analyzing

2007-09-28 Thread Peter N. M. Hansteen
"Leonardo Marques" <[EMAIL PROTECTED]> writes:

> I've a HD which are returning a lot of errors. Someone know some good
> tool to analyze this disk and tell me if i've to replace it or if
> exist some way to repair it?

To my mind that kind of errors say "run, don't walk to the store for
replacement".  Modern disks remap bad parts away from active use, when
they've run out of remappable space, they start complaining like that.

For measuring the health of your next hard drive, it's possible
sysutils/smartmontools will do the job.

-- 
Peter N. M. Hansteen, member of the first RFC 1149 implementation team
http://bsdly.blogspot.com/ http://www.datadok.no/ http://www.nuug.no/
"Remember to set the evil bit on all malicious network traffic"
delilah spamd[29949]: 85.152.224.147: disconnected after 42673 seconds.



Re: Tool for HD analyzing

2007-09-28 Thread knitti
Hi,

On 9/28/07, Leonardo Marques <[EMAIL PROTECTED]> wrote:
> Hey guys,
>
> I've a HD which are returning a lot of errors. Someone know some good
> tool to analyze this disk and tell me if i've to replace it or if
> exist some way to repair it?

I don't know, which tools exist for OpenBSD, but if you're on x86/AMD64
and are OK with a DOS bootdisk, search for MHDD. This is a really nice
tool.

Or just burn yourself an "ultimate boot cd" (ultimatebootcd.com), which also
includes MHDD and a ton of other diagnosis and repair tools.

greetings,
knitti



Tool for HD analyzing

2007-09-28 Thread Leonardo Marques
Hey guys,

I've a HD which are returning a lot of errors. Someone know some good
tool to analyze this disk and tell me if i've to replace it or if
exist some way to repair it?

HD: WDC WD1200JD-00HBB0

Errors:

dmesg |grep -i wd0
wd0 at pciide0 channel 0 drive 0: 
wd0: 16-sector PIO, LBA48, 114473MB, 234441648 sectors
wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 5
dkcsum: wd0 matches BIOS drive 0x82
wd0(pciide0:0:0): timeout
wd0a: device timeout reading fsbn 96 of 96-127 (wd0 bn 159; cn 0 tn 2
sn 33), retrying
wd0a: device timeout reading fsbn 96 of 96-127 (wd0 bn 159; cn 0 tn 2
sn 33), retrying
wd0a: device timeout reading fsbn 96 of 96-127 (wd0 bn 159; cn 0 tn 2
sn 33), retrying
wd0a: device timeout reading fsbn 96 of 96-127 (wd0 bn 159; cn 0 tn 2
sn 33), retrying
wd0a: device timeout reading fsbn 96 of 96-127 (wd0 bn 159; cn 0 tn 2
sn 33), retrying
wd0a: device timeout reading fsbn 96 of 96-127 (wd0 bn 159; cn 0 tn 2 sn 33)
wd0c: device timeout reading fsbn 0 (wd0 bn 0; cn 0 tn 0 sn 0), retrying
wd0c: device timeout reading fsbn 0 (wd0 bn 0; cn 0 tn 0 sn 0), retrying
wd0c: device timeout reading fsbn 0 (wd0 bn 0; cn 0 tn 0 sn 0), retrying
wd0c: device timeout reading fsbn 0 (wd0 bn 0; cn 0 tn 0 sn 0), retrying
wd0c: device timeout reading fsbn 0 (wd0 bn 0; cn 0 tn 0 sn 0), retrying
wd0c: device timeout reading fsbn 0 (wd0 bn 0; cn 0 tn 0 sn 0)

Thanks for all attention.

-- 
---
Leonardo Marques
---
Blog: BeNerd.analyx.org
Website: www.analyx.org