On 02/02/2011 09:00 AM, Lamar Owen wrote: > > On Wednesday, February 02, 2011 02:06:15 am Chuck Munro wrote: >> > The real key is to carefully label each SATA cable and its associated >> > drive. Then the little mapping script can be used to identify the >> > faulty drive which mdadm reports by its device name. It just occurred >> > to me that whenever mdadm sends an email report, it can also run a >> > script which groks out the path info and puts it in the email message. >> > Problem solved:-)
> Ok, perhaps I'm dense, but, if this is not a hot-swap bay you're talking > about, wouldn't it be easier to have the drive's serial number (or other > identifier found on the label) pulled into the e-mail, and compare with the > label physically found on the drive, since you're going to have to open the > case anyway? Using something like: > > hdparm -I $DEVICE | grep Serial.Number > > works here (the regexp Serial.Number matches the string "Serial Number" > without requiring the double quotes....). Use whatever $DEVICE you need to > use, as long as it's on a controller compatible with hdparm usage. > > I have seen cases with a different Linux distribution where the actual module > load order was nondeterministic (modules loaded in parallel); while upstream > and the CentOS rebuild try to make things more deterministic, wouldn't it be > safer to get a really unique, externally visible identifier from the drive? > If the drive has failed to the degree that it won't respond to the query, > then query all the good drives in the array for their serial numbers, and use > a process of elimination. This, IMO, is more robust than relying on the > drive detect order to remain deterministic. > > If in a hotswap or coldswap bay, do some data access to the array, and see > which LED's don't blink; that should correspond to the failed drive. If the > bay has secondary LED's, you might be able to blink those, too. > > Well no, you're not being dense. It's a case of making the best of what the physical hardware can do for me. In my case, the drives are segregated into several 3-drive bays which are bolted into the case individually, so removing each one to compare serial numbers would be a major pain, since I'd have to unbolt a bay and remove each drive one at a time to read the label. The use of the new RHEL-6/CentOS-6 'udevadm' command nicely maps out the hardware path no matter the order the drives are detected/named, and since hardware paths are fixed, I just have to attach a little tag to each SATA cable with that path number on it. One thing I did was reboot the machine *many* times to make sure the controller cards were always enumerated by Linux in the same slot order. I also notice that the RHEL-6 DriveInfo GUI application shows which drive is giving trouble, but it only maps the controllers in a vague way with respect to the hardware path. (At least that's what I remember seeing a couple of days ago, I could be mistaken.) On this particular machine I don't have the luxury of per-drive LED activity indicators, so whacking each drive with a big read won't point the way (but I have used that technique on other machines). I didn't have the funds to buy the hot-swap bays I would have preferred. I may retrofit later. Your suggestions are well taken, but the hardware I have doesn't readily allow my use of them. Thanks for the ideas. Chuck _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos