ATA troubles

Andrea Venturoli Sun, 24 Jul 2011 16:58:42 -0700

(Sorry for the previous post, I accidentally hit sent, while themessages was still unfinished).


Hello everyone.

For those interested, this post is a sequel of:
http://www.mailinglistarchive.com/html/freebsd-questions%40freebsd.org/2011-06/msg00018.html
However, I'll summarize.

At the beginning of June, I installed two WD 1TB Caviar Green SATAdrives into an Intel-S5000-based production box of mine and it was hell!This server runs 7.3/i386 off a SAS RAID and the two new drives shouldhave worked with gstripe to constitute a secondary storage.

I started getting:

ad4: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing
request directly
ad4: WARNING - SMART taskqueue timeout - completing request directly
ad8: WARNING - SMART taskqueue timeout - completing request directly
ad4: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing
request directly
ad4: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing
request directly
ad8: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing
request directly
ad4: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout - completing request
directly

and the box would reboot within minutes.
This also prevented me from running tests with smartctl.
Notice the box had previously a single SATA drive working perfectly.

It was suggested I ran wdidle.exe from DOS to prevent the drives fromspinning down and it helped: now I was at least able to fsck the stripeand copy something on it.Still I keep getting the above messages; the drives would alsooccasionally hang and then restart. Uptime raised to some hours, but thebox would still reboot.

In the meantime the drives went bad (smartd, BIOS and WD-tools proven)and I had them replaced.

When they came back, I decided to put up a test box: hardware iscompletely different from the production box, but still FreeBSD will runfrom a SCSI drive and the two WD will constitute an additional stripe.First I run WD tools to check the driver and they passed every test(including long one).


So I installed FreeBSD 7.3/i386, smartctl and verified the disks again.

I created the stripe, fscked it, and copied about 420GB of data viarsync over NFS. It seemed to work fine, but, after about 15 hours, thebox rebooted after:

ad6: FAILURE - device detached
g_vfs_done():stripe/backup[WRITE(offset=1709926940672, length=131072)]error = 6
/mnt/local: got error 6 while accessing filesystem
panic: softdep_deallocate_dependencies: unrecovered I/O error

Subsequent retries always gave the same results, until I disabledsoftupdates on the stripe. I then was able to complete the rsync.


Not quite happy, I made a local to local copy and started getting a lot of:

Jul 24 18:54:28 mydavid kernel: ad4: WARNING - READ_DMA48 UDMA ICRC error 
(retrying request) LBA=1620416000
Jul 24 18:54:28 mydavid kernel: ad4: FAILURE - READ_DMA48 status=51<READY,DSC,ERROR> 
error=10<NID_NOT_FOUND> LBA=1620416000
Jul 24 18:54:28 mydavid kernel: 
g_vfs_done():stripe/backup[READ(offset=1659305967616, length=131072)]error = 5
Jul 24 18:54:42 mydavid kernel: ad6: WARNING - READ_DMA48 UDMA ICRC error 
(retrying request) LBA=1621920384
Jul 24 18:54:42 mydavid kernel: ad6: FAILURE - READ_DMA48 status=51<READY,DSC,ERROR> 
error=10<NID_NOT_FOUND> LBA=1621920384
Jul 24 18:54:42 mydavid kernel: 
g_vfs_done():stripe/backup[READ(offset=1660846522368, length=131072)]error = 5

I run smartctl's short test on both drives and they were ok; I tried theoffline test, but they got interrupted (???).

In spite of the messages above, it looked like it was working...

However, I was logged in via ssh and had to turn of the client; so Istopped it, went into the console and started it again.

Now it looks like one drive is not working fine anymore...

Jul 24 23:48:36 mydavid kernel: ad6: FAILURE - READ_DMA48 status=51<READY,DSC,ERROR> 
error=40<UNCORRECTABLE> LBA=1671887488
Jul 24 23:48:36 mydavid kernel: 
g_vfs_done():stripe/backup[READ(offset=1712012836864, length=131072)]error = 5
Jul 24 23:48:39 mydavid kernel: ad6: FAILURE - READ_DMA48 status=51<READY,DSC,ERROR> 
error=40<UNCORRECTABLE> LBA=1671897856
Jul 24 23:48:39 mydavid kernel: 
g_vfs_done():stripe/backup[READ(offset=1712023420928, length=131072)]error = 5
Jul 24 23:48:41 mydavid kernel: ad6: FAILURE - READ_DMA48 status=51<READY,DSC,ERROR> 
error=40<UNCORRECTABLE> LBA=1671897888
Jul 24 23:48:41 mydavid kernel: 
g_vfs_done():stripe/backup[READ(offset=1712023486464, length=131072)]error = 5

Also, smartd is complaining:

Jul 24 23:41:59 mydavid smartd[2630]: Device: /dev/ad6, 38 Currently unreadable 
(pending) sectors
Jul 24 23:50:56 mydavid smartd[538]: Device: /dev/ad6, 39 Currently unreadable 
(pending) sectors


After a reboot, I've got back to the NID_NOT_FOUND errors...




While I'm still conducting other tests, has anyone any hint on this?


 bye & Thanks
        av.
_______________________________________________
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"

ATA troubles

Reply via email to