Re: Tool for HD analyzing
Calomel wrote: > > If you really want to check the drive and verify it has > errors then check > out the binary called badblocks. I do not believe OpenBSD has > badblocks but > you can use the cd distro "system rescue cd" and run > badblocks from there > without removing the drive from the current machine. > > NON-destructive BadBlock test (1gig ram in machine) > badblocks -b 4096 -c 98304 -p 0 -s /dev/hda > > For a more detailed explanation http://calomel.org/badblocks_wipe.html > After spending (most of) the weekend recovering a failed drive (XP), there's a few things about disks that are worth knowing. (ALSO READ ANY AND ALL BY NICK HOLLAND! -- He knows whereof) With a rescue CD (Don't use Windows without one) (OpenBSD's purpose in life is NOT rescuing Windows computers) ONE (only one) sector was "unreadable" (irrecoverable) A good disk cloned from bad-disk was very unusable. Running destructive badblocks showed that the disk had no errors. Until the disk further degrades, it will test out good. That disk is "reserved" with a "EMERGENCY USE ONLY" tag. If things work the way I think they work, the "non-destructive" read test will actually destroy ALL ability to tell that the disks had (past tense) errors. (remapping bad sectors) (This oughta be off-list, but where else can you get good info ;-) There's people on this list who actually know what I'm trying to talk about.
Re: Tool for HD analyzing
Hard drives these days are cheep. I agree that you should get the data off if you can and buy another drive. If you really want to check the drive and verify it has errors then check out the binary called badblocks. I do not believe OpenBSD has badblocks but you can use the cd distro "system rescue cd" and run badblocks from there without removing the drive from the current machine. NON-destructive BadBlock test (1gig ram in machine) badblocks -b 4096 -c 98304 -p 0 -s /dev/hda For a more detailed explanation http://calomel.org/badblocks_wipe.html -- Calomel @ http://calomel.org On Fri, Sep 28, 2007 at 08:17:22AM -0300, Leonardo Marques wrote: >Hey guys, > >I've a HD which are returning a lot of errors. Someone know some good >tool to analyze this disk and tell me if i've to replace it or if >exist some way to repair it? > >HD: WDC WD1200JD-00HBB0 > >Errors: > >dmesg |grep -i wd0 >wd0 at pciide0 channel 0 drive 0: >wd0: 16-sector PIO, LBA48, 114473MB, 234441648 sectors >wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 5 >dkcsum: wd0 matches BIOS drive 0x82 >wd0(pciide0:0:0): timeout >wd0a: device timeout reading fsbn 96 of 96-127 (wd0 bn 159; cn 0 tn 2 >sn 33), retrying >wd0a: device timeout reading fsbn 96 of 96-127 (wd0 bn 159; cn 0 tn 2 >sn 33), retrying >wd0a: device timeout reading fsbn 96 of 96-127 (wd0 bn 159; cn 0 tn 2 >sn 33), retrying >wd0a: device timeout reading fsbn 96 of 96-127 (wd0 bn 159; cn 0 tn 2 >sn 33), retrying >wd0a: device timeout reading fsbn 96 of 96-127 (wd0 bn 159; cn 0 tn 2 >sn 33), retrying >wd0a: device timeout reading fsbn 96 of 96-127 (wd0 bn 159; cn 0 tn 2 sn 33) >wd0c: device timeout reading fsbn 0 (wd0 bn 0; cn 0 tn 0 sn 0), retrying >wd0c: device timeout reading fsbn 0 (wd0 bn 0; cn 0 tn 0 sn 0), retrying >wd0c: device timeout reading fsbn 0 (wd0 bn 0; cn 0 tn 0 sn 0), retrying >wd0c: device timeout reading fsbn 0 (wd0 bn 0; cn 0 tn 0 sn 0), retrying >wd0c: device timeout reading fsbn 0 (wd0 bn 0; cn 0 tn 0 sn 0), retrying >wd0c: device timeout reading fsbn 0 (wd0 bn 0; cn 0 tn 0 sn 0) > >Thanks for all attention. > >-- >--- >Leonardo Marques >--- >Blog: BeNerd.analyx.org >Website: www.analyx.org
Re: Tool for HD analyzing
Uhm sorry claudio but it is the other way around. The only time you can detect failed sectors is on reads. When a read fails the disk goes into a proprietary algorithm to try to recover as much of the sector as possible. Depending on the manufacturer they'll do something in excess of 15000 reads of the same sector to try to recover the data. They then use heuristics to try to determine what the original data most likely was. If this process is successful enough the block gets re-assigned. Most disks don't do verifies on writes (too slow) and rely on the recovery algorithm to reassign these blocks; which are triggered by subsequent reads. RAID manufacturers implement algorithms that continuous read all sectors of idle disks to ensure data integrity. When they run into a failure they first let the disk try to recover and if that fails they use parity to recover the block. They also will "puncture" the sector so that the disk will skip it going forward. Some of the dmesg lines pasted in this message are likely LBA relocations that take too long. There is no set timeout in the spec and therefore vendors try really hard and long to recover the data resulting in OS timeouts. A good example is calculating how long it takes to read an LBA 15000 times on a 1 RPM disk. Add some fudge in there for the head to find the exact spot and you'll see that it gets in excess of seconds. Repeat that a few times due to various retries in various layers and you'll see where those lengthy timeouts come from. On Fri, Sep 28, 2007 at 02:47:48PM +0200, Claudio Jeker wrote: > On Fri, Sep 28, 2007 at 01:37:52PM +0200, Peter N. M. Hansteen wrote: > > "Leonardo Marques" <[EMAIL PROTECTED]> writes: > > > > > I've a HD which are returning a lot of errors. Someone know some good > > > tool to analyze this disk and tell me if i've to replace it or if > > > exist some way to repair it? > > > > To my mind that kind of errors say "run, don't walk to the store for > > replacement". Modern disks remap bad parts away from active use, when > > they've run out of remappable space, they start complaining like that. > > > > Remapping is only possible when writing to blocks. The disk can not remap > on reads. By forcing writes to such blocks you can remap them but the data > on them is still lost. I use some partitions with almost only read access > that had bad blocks and I could fix the problem with dd if=/dev/zero > of=/dev/rsd0x (not important data so I did not replace the disk) > > -- > :wq Claudio
Re: Tool for HD analyzing
Leonardo Marques wrote: Hey guys, I've a HD which are returning a lot of errors. Someone know some good tool to analyze this disk and tell me if i've to replace it or if exist some way to repair it? HD: WDC WD1200JD-00HBB0 Errors: dmesg |grep -i wd0 wd0 at pciide0 channel 0 drive 0: wd0: 16-sector PIO, LBA48, 114473MB, 234441648 sectors wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 5 dkcsum: wd0 matches BIOS drive 0x82 wd0(pciide0:0:0): timeout wd0a: device timeout reading fsbn 96 of 96-127 (wd0 bn 159; cn 0 tn 2 sn 33), retrying wd0a: device timeout reading fsbn 96 of 96-127 (wd0 bn 159; cn 0 tn 2 sn 33), retrying wd0a: device timeout reading fsbn 96 of 96-127 (wd0 bn 159; cn 0 tn 2 sn 33), retrying wd0a: device timeout reading fsbn 96 of 96-127 (wd0 bn 159; cn 0 tn 2 sn 33), retrying wd0a: device timeout reading fsbn 96 of 96-127 (wd0 bn 159; cn 0 tn 2 sn 33), retrying wd0a: device timeout reading fsbn 96 of 96-127 (wd0 bn 159; cn 0 tn 2 sn 33) wd0c: device timeout reading fsbn 0 (wd0 bn 0; cn 0 tn 0 sn 0), retrying wd0c: device timeout reading fsbn 0 (wd0 bn 0; cn 0 tn 0 sn 0), retrying wd0c: device timeout reading fsbn 0 (wd0 bn 0; cn 0 tn 0 sn 0), retrying wd0c: device timeout reading fsbn 0 (wd0 bn 0; cn 0 tn 0 sn 0), retrying wd0c: device timeout reading fsbn 0 (wd0 bn 0; cn 0 tn 0 sn 0), retrying wd0c: device timeout reading fsbn 0 (wd0 bn 0; cn 0 tn 0 sn 0) Thanks for all attention. Wasn't it up to the manufacturer to provide some diagnostic tool which also delivers an error code used for the RMA process. Thanks, Dorian
Re: Tool for HD analyzing
On Fri, Sep 28, 2007 at 01:37:52PM +0200, Peter N. M. Hansteen wrote: > "Leonardo Marques" <[EMAIL PROTECTED]> writes: > > > I've a HD which are returning a lot of errors. Someone know some good > > tool to analyze this disk and tell me if i've to replace it or if > > exist some way to repair it? > > To my mind that kind of errors say "run, don't walk to the store for > replacement". Modern disks remap bad parts away from active use, when > they've run out of remappable space, they start complaining like that. > Remapping is only possible when writing to blocks. The disk can not remap on reads. By forcing writes to such blocks you can remap them but the data on them is still lost. I use some partitions with almost only read access that had bad blocks and I could fix the problem with dd if=/dev/zero of=/dev/rsd0x (not important data so I did not replace the disk) -- :wq Claudio
Re: Tool for HD analyzing
Really thanks for all guys! I'm running now for the store to buy another HD for the BACKUP server :P On 9/28/07, Peter N. M. Hansteen <[EMAIL PROTECTED]> wrote: > "Leonardo Marques" <[EMAIL PROTECTED]> writes: > > > I've a HD which are returning a lot of errors. Someone know some good > > tool to analyze this disk and tell me if i've to replace it or if > > exist some way to repair it? > > To my mind that kind of errors say "run, don't walk to the store for > replacement". Modern disks remap bad parts away from active use, when > they've run out of remappable space, they start complaining like that. > > For measuring the health of your next hard drive, it's possible > sysutils/smartmontools will do the job. > > -- > Peter N. M. Hansteen, member of the first RFC 1149 implementation team > http://bsdly.blogspot.com/ http://www.datadok.no/ http://www.nuug.no/ > "Remember to set the evil bit on all malicious network traffic" > delilah spamd[29949]: 85.152.224.147: disconnected after 42673 seconds. > > -- --- Leonardo Marques --- Blog: BeNerd.analyx.org Website: www.analyx.org
Re: Tool for HD analyzing
Peter N. M. Hansteen pisze: > "Leonardo Marques" <[EMAIL PROTECTED]> writes: > > >> I've a HD which are returning a lot of errors. Someone know some good >> tool to analyze this disk and tell me if i've to replace it or if >> exist some way to repair it? >> > > To my mind that kind of errors say "run, don't walk to the store for > replacement". Modern disks remap bad parts away from active use, when > they've run out of remappable space, they start complaining like that. > > For measuring the health of your next hard drive, it's possible > sysutils/smartmontools will do the job. > > What if you have an ata/serialata HD attached to scsi/ahci/jmb ? atactl is useless, will smartmontools work ? redargds M.K.
Re: Tool for HD analyzing
On 2007/09/28 08:17, Leonardo Marques wrote: > I've a HD which are returning a lot of errors. Someone know some good > tool to analyze this disk and tell me if i've to replace it or if > exist some way to repair it? It's worth trying a different cable, if that doesn't help, replace the disk...
Re: Tool for HD analyzing
"Leonardo Marques" <[EMAIL PROTECTED]> writes: > I've a HD which are returning a lot of errors. Someone know some good > tool to analyze this disk and tell me if i've to replace it or if > exist some way to repair it? To my mind that kind of errors say "run, don't walk to the store for replacement". Modern disks remap bad parts away from active use, when they've run out of remappable space, they start complaining like that. For measuring the health of your next hard drive, it's possible sysutils/smartmontools will do the job. -- Peter N. M. Hansteen, member of the first RFC 1149 implementation team http://bsdly.blogspot.com/ http://www.datadok.no/ http://www.nuug.no/ "Remember to set the evil bit on all malicious network traffic" delilah spamd[29949]: 85.152.224.147: disconnected after 42673 seconds.
Re: Tool for HD analyzing
Hi, On 9/28/07, Leonardo Marques <[EMAIL PROTECTED]> wrote: > Hey guys, > > I've a HD which are returning a lot of errors. Someone know some good > tool to analyze this disk and tell me if i've to replace it or if > exist some way to repair it? I don't know, which tools exist for OpenBSD, but if you're on x86/AMD64 and are OK with a DOS bootdisk, search for MHDD. This is a really nice tool. Or just burn yourself an "ultimate boot cd" (ultimatebootcd.com), which also includes MHDD and a ton of other diagnosis and repair tools. greetings, knitti
Tool for HD analyzing
Hey guys, I've a HD which are returning a lot of errors. Someone know some good tool to analyze this disk and tell me if i've to replace it or if exist some way to repair it? HD: WDC WD1200JD-00HBB0 Errors: dmesg |grep -i wd0 wd0 at pciide0 channel 0 drive 0: wd0: 16-sector PIO, LBA48, 114473MB, 234441648 sectors wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 5 dkcsum: wd0 matches BIOS drive 0x82 wd0(pciide0:0:0): timeout wd0a: device timeout reading fsbn 96 of 96-127 (wd0 bn 159; cn 0 tn 2 sn 33), retrying wd0a: device timeout reading fsbn 96 of 96-127 (wd0 bn 159; cn 0 tn 2 sn 33), retrying wd0a: device timeout reading fsbn 96 of 96-127 (wd0 bn 159; cn 0 tn 2 sn 33), retrying wd0a: device timeout reading fsbn 96 of 96-127 (wd0 bn 159; cn 0 tn 2 sn 33), retrying wd0a: device timeout reading fsbn 96 of 96-127 (wd0 bn 159; cn 0 tn 2 sn 33), retrying wd0a: device timeout reading fsbn 96 of 96-127 (wd0 bn 159; cn 0 tn 2 sn 33) wd0c: device timeout reading fsbn 0 (wd0 bn 0; cn 0 tn 0 sn 0), retrying wd0c: device timeout reading fsbn 0 (wd0 bn 0; cn 0 tn 0 sn 0), retrying wd0c: device timeout reading fsbn 0 (wd0 bn 0; cn 0 tn 0 sn 0), retrying wd0c: device timeout reading fsbn 0 (wd0 bn 0; cn 0 tn 0 sn 0), retrying wd0c: device timeout reading fsbn 0 (wd0 bn 0; cn 0 tn 0 sn 0), retrying wd0c: device timeout reading fsbn 0 (wd0 bn 0; cn 0 tn 0 sn 0) Thanks for all attention. -- --- Leonardo Marques --- Blog: BeNerd.analyx.org Website: www.analyx.org