RE: ZFS and DMA read error
Mark Stapper wrote: Yeah, i did the long SMART selftest three times now, each of which it failed on the same LBA address. I assume 'smartctl -a /dev/adX' reports that the read test failed at LBA XXX something? Why would I want to clear my driver before I run these tests? In this case it's not really clearing the drive you are aiming for, it is to write to every sector. If you have a failed sector (which you do), writing to it will force the drive firmware to remap the sector. As far as I know, most drives will not remap an unreadable sector until it is written to. /Daniel Eriksson ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: ZFS and DMA read error
Daniel Eriksson wrote: Mark Stapper wrote: Yeah, i did the long SMART selftest three times now, each of which it failed on the same LBA address. I assume 'smartctl -a /dev/adX' reports that the read test failed at LBA XXX something? Indeed it does. Always with the same LBA code/sector/address or whichever. Why would I want to clear my driver before I run these tests? In this case it's not really clearing the drive you are aiming for, it is to write to every sector. If you have a failed sector (which you do), writing to it will force the drive firmware to remap the sector. As far as I know, most drives will not remap an unreadable sector until it is written to. So I see. Could this be why I haven't had any read errors anymore? (After the zpool scrub that is) /Daniel Eriksson signature.asc Description: OpenPGP digital signature
RE: ZFS and DMA read error
Mark Stapper wrote: So I see. Could this be why I haven't had any read errors anymore? (After the zpool scrub that is) Possibly, but in that case the SMART selftest should pass also. Have you tried a selftest after you did the scrub? /Daniel Eriksson ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: ZFS and DMA read error
Daniel Eriksson wrote: Mark Stapper wrote: So I see. Could this be why I haven't had any read errors anymore? (After the zpool scrub that is) Possibly, but in that case the SMART selftest should pass also. Have you tried a selftest after you did the scrub? /Daniel Eriksson ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org multiple times signature.asc Description: OpenPGP digital signature
RE: ZFS and DMA read error
Mark Stapper wrote: People are REALLY pushing spinrite lately... I did get it though, just to try it. SpinRite is OK but it hasn't been updated in ages. It does not work on large drives. 250GB works, 1TB does not. Haven't tried it on 500GB drives. If I were you I would 'zpool offline ...' the offending drive, rewrite the entire drive with 'dd if=/dev/zero ...' and then run a SMART selftest on it using smartmontools ('smartctl -t long /dev/adX'). When you 'zpool online ...' the drive ZFS will resilver it for you. After doing all of this I would then run a 'zpool scrub ...'. If the scrub finishes without checksum errors and without any ATA-related errors the drive is probably in good enough condition to keep using, but watch out for more ATA errors. If the drive is dying it won't be long before it starts to generate more ATA errors. /Daniel Eriksson ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: ZFS and DMA read error
SpinRite is OK but it hasn't been updated in ages. It does not work on large drives. 250GB works, 1TB does not. Haven't tried it on 500GB drives. So it will be useles in... well in this case it IS useles... If I were you I would 'zpool offline ...' the offending drive, rewrite the entire drive with 'dd if=/dev/zero ...' and then run a SMART selftest on it using smartmontools ('smartctl -t long /dev/adX'). When you 'zpool online ...' the drive ZFS will resilver it for you. After doing all of this I would then run a 'zpool scrub ...'. If the scrub finishes without checksum errors and without any ATA-related errors the drive is probably in good enough condition to keep using, but watch out for more ATA errors. If the drive is dying it won't be long before it starts to generate more ATA errors. Yeah, i did the long SMART selftest three times now, each of which it failed on the same LBA address. Did the scrub as well, took two hours, and no DMA errors were reported. Why would I want to clear my driver before I run these tests? I ordered a spare drive so I'll wait until it arrives, replace the faulty drive with this one by dd-ing data from one to the other (I have only 4 SATA ports so I can't do zpool replace). Or meybe I'll just swap them out and do zpool scrub. I'm uncomfortable doing this though, because if any of the other drives fails/crashes/flips me off I'll have to restore from my backup which took two days to make... (which is the drawback of a gzipped zfs backup partition) Once I've replaced the drive I'll run hitachi's Drive Fitness Test on the (presumably) failing drive. Even if it doesn't generate any ATA errors during everyday use, the error it gave before, combined with the failing SMART self test disturbs me. I bought the drive 2 months ago, so my faith has gone. However, if it passes Hitachi's DFT my faith will be restored :-). Greetz, Mark signature.asc Description: OpenPGP digital signature
Re: ZFS and DMA read error
Mark Stapper wrote: [snip] I ordered a spare drive so I'll wait until it arrives, replace the faulty drive with this one by dd-ing data from one to the other (I have only 4 SATA ports so I can't do zpool replace). zpool replace has two forms zpool replace pool old-device new-device and zpool replace pool device The latter is for when you pull the old drive and put the new one on the same {S,P}ATA port because you've no free ports. I did that a couple of weeks ago when one of my raidz drives fried (in its warranty period!) and it worked like a dream. I did a zpool replace and then a zpool scrub to make sure everything was OK because of this section of the zpool man page: Scrubbing and resilvering are very similar operations. The differ- ence is that resilvering only examines data that ZFS knows to be out of date (for example, when attaching a new device to a mirror or replacing an existing device), whereas scrubbing examines all data to discover silent errors due to hardware faults or disk fail- ure. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: ZFS and DMA read error
Arthur Chance wrote: Mark Stapper wrote: [snip] I ordered a spare drive so I'll wait until it arrives, replace the faulty drive with this one by dd-ing data from one to the other (I have only 4 SATA ports so I can't do zpool replace). zpool replace has two forms zpool replace pool old-device new-device and zpool replace pool device The latter is for when you pull the old drive and put the new one on the same {S,P}ATA port because you've no free ports. I did that a couple of weeks ago when one of my raidz drives fried (in its warranty period!) and it worked like a dream. I did a zpool replace and then a zpool scrub to make sure everything was OK because of this section of the zpool man page: Scrubbing and resilvering are very similar operations. The differ- ence is that resilvering only examines data that ZFS knows to be out of date (for example, when attaching a new device to a mirror or replacing an existing device), whereas scrubbing examines all data to discover silent errors due to hardware faults or disk fail- ure. Thanks for the tip. I'll be sure to try that. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: ZFS and DMA read error
snip 9 identical messages, based on the uncorrectable LBA error Since it's all throwing errors at the same LBA, I'd run a SMART diagnostics on the drive (i think it's port sysutils/smartmontools) and see if it's showing errors too. Looks like a failing/failed drive and I would recommend replacing it. I doubt (but you can try) spinrite will help you when you get to this point. Thought about that, will do that, after running zfs scrub. Weird thing is that zfs hasn't show any data/checksum errors. Does this mean successive reads were successful? spinrite's website is at grc.com People are REALLY pushing spinrite lately... I did get it though, just to try it. Hope you have backups or redundancy. No fun replacing data. I have both :-). signature.asc Description: OpenPGP digital signature
ZFS and DMA read error
Good day to you, I'm having a bit of trouble with one of the disks in my zfs raidz1 pool. It's giving me dma read error, and zpool is reporting READ failures. However, data integrity is OK :-) Unfortunately I was in the middle of rearranging my backup media, so I'm backup up everything as we speak. I will be testing the failing drive in another computer soon, however before I return it i'd like to know if this could be caused my something other than hardware failing. Below the output of zpool status and a snippet of /var/log/messages showing the DMA errors. Thanks for the input. Greetz, Mark pool: data state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: none requested config: NAMESTATE READ WRITE CKSUM dataONLINE 0 0 0 raidz1ONLINE 0 0 0 ad4 ONLINE 0 0 0 ad6 ONLINE 21 0 0 ad8 ONLINE 0 0 0 ad10ONLINE 0 0 0 errors: No known data errors Aug 31 03:04:35 yoshi kernel: ad6: FAILURE - READ_DMA48 status=51READY,DSC,ERROR error=40UNCORRECTABLE LBA=932040832 Aug 31 03:04:35 yoshi root: ZFS: vdev I/O failure, zpool=data path=/dev/ad6 offset=477204905984 size=65536 error=5 Aug 31 03:04:35 yoshi root: ZFS: vdev I/O failure, zpool=data path=/dev/ad6 offset=477204925440 size=2560 error=5 Aug 31 03:04:53 yoshi kernel: ad6: FAILURE - READ_DMA48 status=51READY,DSC,ERROR error=40UNCORRECTABLE LBA=932040832 Aug 31 03:04:53 yoshi root: ZFS: vdev I/O failure, zpool=data path=/dev/ad6 offset=477204905984 size=65536 error=5 Aug 31 03:05:17 yoshi kernel: ad6: FAILURE - READ_DMA48 status=51READY,DSC,ERROR error=40UNCORRECTABLE LBA=932040832 Aug 31 03:05:17 yoshi root: ZFS: vdev I/O failure, zpool=data path=/dev/ad6 offset=477204905984 size=65536 error=5 Aug 31 03:05:17 yoshi root: ZFS: vdev I/O failure, zpool=data path=/dev/ad6 offset=477204918272 size=512 error=5 Aug 31 06:12:01 yoshi login: ROOT LOGIN (root) ON ttyv2 Aug 31 06:35:34 yoshi kernel: ad6: FAILURE - READ_DMA48 status=51READY,DSC,ERROR error=40UNCORRECTABLE LBA=932040832 Aug 31 06:35:34 yoshi root: ZFS: vdev I/O failure, zpool=data path=/dev/ad6 offset=477204905984 size=65536 error=5 Aug 31 06:35:34 yoshi root: ZFS: vdev I/O failure, zpool=data path=/dev/ad6 offset=477204925440 size=2560 error=5 Aug 31 06:36:33 yoshi kernel: ad6: FAILURE - READ_DMA48 status=51READY,DSC,ERROR error=40UNCORRECTABLE LBA=932040832 Aug 31 06:36:34 yoshi root: ZFS: vdev I/O failure, zpool=data path=/dev/ad6 offset=477204905984 size=65536 error=5 Aug 31 06:36:34 yoshi root: ZFS: vdev I/O failure, zpool=data path=/dev/ad6 offset=477204923392 size=2048 error=5 Aug 31 06:36:38 yoshi kernel: ad6: FAILURE - READ_DMA48 status=51READY,DSC,ERROR error=40UNCORRECTABLE LBA=932040832 Aug 31 06:36:38 yoshi root: ZFS: vdev I/O failure, zpool=data path=/dev/ad6 offset=477204905984 size=65536 error=5 Aug 31 06:36:38 yoshi root: ZFS: vdev I/O failure, zpool=data path=/dev/ad6 offset=477204918272 size=512 error=5 Aug 31 06:36:42 yoshi kernel: ad6: FAILURE - READ_DMA48 status=51READY,DSC,ERROR error=40UNCORRECTABLE LBA=932040832 Aug 31 06:36:42 yoshi root: ZFS: vdev I/O failure, zpool=data path=/dev/ad6 offset=477204905984 size=65536 error=5 Aug 31 06:36:42 yoshi root: ZFS: vdev I/O failure, zpool=data path=/dev/ad6 offset=477204918272 size=512 error=5 Aug 31 06:37:52 yoshi kernel: ad6: FAILURE - READ_DMA48 status=51READY,DSC,ERROR error=40UNCORRECTABLE LBA=932040832 Aug 31 06:37:52 yoshi root: ZFS: vdev I/O failure, zpool=data path=/dev/ad6 offset=477204905984 size=65536 error=5 Aug 31 06:37:52 yoshi root: ZFS: vdev I/O failure, zpool=data path=/dev/ad6 offset=477204918272 size=512 error=5 Aug 31 06:38:31 yoshi kernel: ad6: FAILURE - READ_DMA48 status=51READY,DSC,ERROR error=40UNCORRECTABLE LBA=932040832 Aug 31 06:38:31 yoshi root: ZFS: vdev I/O failure, zpool=data path=/dev/ad6 offset=477204905984 size=65536 error=5 Aug 31 06:38:31 yoshi root: ZFS: vdev I/O failure, zpool=data path=/dev/ad6 offset=477204918272 size=512 error=5 Aug 31 06:38:45 yoshi kernel: ad6: FAILURE - READ_DMA48 status=51READY,DSC,ERROR error=40UNCORRECTABLE LBA=932040832 Aug 31 06:38:45 yoshi root: ZFS: vdev I/O failure, zpool=data path=/dev/ad6 offset=477204905984 size=65536 error=5 Aug 31 06:38:45 yoshi root: ZFS: vdev I/O failure, zpool=data path=/dev/ad6 offset=477204947968 size=512 error=5 signature.asc Description: OpenPGP digital signature
Re: ZFS and DMA read error
On 8/31/09, Mark Stapper st...@mapper.nl wrote: Good day to you, I'm having a bit of trouble with one of the disks in my zfs raidz1 pool. It's giving me dma read error, and zpool is reporting READ failures. However, data integrity is OK :-) Unfortunately I was in the middle of rearranging my backup media, so I'm backup up everything as we speak. I will be testing the failing drive in another computer soon, however before I return it i'd like to know if this could be caused my something other than hardware failing. Below the output of zpool status and a snippet of /var/log/messages showing the DMA errors. Thanks for the input. Greetz, Mark pool: data state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: none requested config: NAMESTATE READ WRITE CKSUM dataONLINE 0 0 0 raidz1ONLINE 0 0 0 ad4 ONLINE 0 0 0 ad6 ONLINE 21 0 0 ad8 ONLINE 0 0 0 ad10ONLINE 0 0 0 errors: No known data errors Aug 31 03:04:35 yoshi kernel: ad6: FAILURE - READ_DMA48 status=51READY,DSC,ERROR error=40UNCORRECTABLE LBA=932040832 Aug 31 03:04:35 yoshi root: ZFS: vdev I/O failure, zpool=data path=/dev/ad6 offset=477204905984 size=65536 error=5 Aug 31 03:04:35 yoshi root: ZFS: vdev I/O failure, zpool=data path=/dev/ad6 offset=477204925440 size=2560 error=5 snip 9 identical messages, based on the uncorrectable LBA error Since it's all throwing errors at the same LBA, I'd run a SMART diagnostics on the drive (i think it's port sysutils/smartmontools) and see if it's showing errors too. Looks like a failing/failed drive and I would recommend replacing it. I doubt (but you can try) spinrite will help you when you get to this point. spinrite's website is at grc.com Hope you have backups or redundancy. No fun replacing data. --TJ ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org