On 02/02/2011 09:00 AM, Lamar Owen wrote:
>
> On Wednesday, February 02, 2011 02:06:15 am Chuck Munro wrote:
>> >  The real key is to carefully label each SATA cable and its associated
>> >  drive.  Then the little mapping script can be used to identify the
>> >  faulty drive which mdadm reports by its device name.  It just occurred
>> >  to me that whenever mdadm sends an email report, it can also run a
>> >  script which groks out the path info and puts it in the email message.
>> >  Problem solved:-)

> Ok, perhaps I'm dense, but, if this is not a hot-swap bay you're talking 
> about, wouldn't it be easier to have the drive's serial number (or other 
> identifier found on the label) pulled into the e-mail, and compare with the 
> label physically found on the drive, since you're going to have to open the 
> case anyway?  Using something like:
>
> hdparm -I $DEVICE | grep Serial.Number
>
> works here (the regexp Serial.Number matches the string "Serial Number" 
> without requiring the double quotes....).  Use whatever $DEVICE you need to 
> use, as long as it's on a controller compatible with hdparm usage.
>
> I have seen cases with a different Linux distribution where the actual module 
> load order was nondeterministic (modules loaded in parallel); while upstream 
> and the CentOS rebuild try to make things more deterministic, wouldn't it be 
> safer to get a really unique, externally visible identifier from the drive?  
> If the drive has failed to the degree that it won't respond to the query, 
> then query all the good drives in the array for their serial numbers, and use 
> a process of elimination.  This, IMO, is more robust than relying on the 
> drive detect order to remain deterministic.
>
> If in a hotswap or coldswap bay, do some data access to the array, and see 
> which LED's don't blink; that should correspond to the failed drive.  If the 
> bay has secondary LED's, you might be able to blink those, too.
>
>

Well no, you're not being dense.  It's a case of making the best of what 
the physical hardware can do for me.  In my case, the drives are 
segregated into several 3-drive bays which are bolted into the case 
individually, so removing each one to compare serial numbers would be a 
major pain, since I'd have to unbolt a bay and remove each drive one at 
a time to read the label.

The use of the new RHEL-6/CentOS-6 'udevadm' command nicely maps out the 
hardware path no matter the order the drives are detected/named, and 
since hardware paths are fixed, I just have to attach a little tag to 
each SATA cable with that path number on it.  One thing I did was reboot 
the machine *many* times to make sure the controller cards were always 
enumerated by Linux in the same slot order.

I also notice that the RHEL-6 DriveInfo GUI application shows which 
drive is giving trouble, but it only maps the controllers in a vague way 
with respect to the hardware path.  (At least that's what I remember 
seeing a couple of days ago, I could be mistaken.)

On this particular machine I don't have the luxury of per-drive LED 
activity indicators, so whacking each drive with a big read won't point 
the way (but I have used that technique on other machines).  I didn't 
have the funds to buy the hot-swap bays I would have preferred.  I may 
retrofit later.

Your suggestions are well taken, but the hardware I have doesn't readily 
allow my use of them.  Thanks for the ideas.

Chuck
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Reply via email to