Hi,

Recently, a ZFS pool on my FreeBSD box started showing lots of errors on one drive in a mirrored pair.

The pool consists of around 14 drives (as 7 mirrored pairs), hung off of a couple of SuperMicro 8 port SATA controllers (1 drive of each pair is on each controller).

One of the drives started picking up a lot of errors (by the end of things it was returning errors pretty much for any reads/writes issued) - and taking ages to complete the I/O's.

However, ZFS kept trying to use the drive - e.g. as I attached another drive to the remaining 'good' drive in the mirrored pair, ZFS was still trying to read data off the failed drive (and remaining good one) in order to complete it's re-silver to the newly attached drive.

Having posted on the Open Solaris ZFS list - it appears, under Solaris there's an 'FMA Engine' which communicates drive failures and the like to ZFS - advising ZFS when a drive should be marked as 'failed'.

Is there anything similar to this on FreeBSD yet? - i.e. Does/can anything on the system tell ZFS "This drives experiencing failures" rather than ZFS just seeing lots of timed out I/O 'errors'? (as appears to be the case).

In the end, the failing drive was timing out literally every I/O - I did recover the situation by detaching it from the pool (which hung the machine - probably caused by ZFS having to update the meta-data on all drives, including the failed one). A reboot bought the pool back, minus the 'failed' drive, so enough of the 'detach' must have completed.

The newly attached drive completed the re-silver in half an hour (as opposed to an estimated 755 hours and climbing with the other drive still in the pool, limping along).

-Kp

_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to