Re: [CentOS] [HW] Do the HDD cages in rack mount chassis indicate visual/audio HDD failure?

2011-10-19 Thread Lamar Owen
On Tuesday, October 18, 2011 01:07:02 PM Les Mikesell wrote:
 I don't think anything is immune to failure.  Another fun case is a
 randomly-bad memory bit causing different things to be written to
 software raid mirrors.  I had one that took 3+ days of running
 memtest86 to catch.

ECC RAM?
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] [HW] Do the HDD cages in rack mount chassis indicate visual/audio HDD failure?

2011-10-19 Thread Les Mikesell
On Wed, Oct 19, 2011 at 2:33 PM, Lamar Owen lo...@pari.edu wrote:
 On Tuesday, October 18, 2011 01:07:02 PM Les Mikesell wrote:
 I don't think anything is immune to failure.  Another fun case is a
 randomly-bad memory bit causing different things to be written to
 software raid mirrors.  I had one that took 3+ days of running
 memtest86 to catch.

 ECC RAM?

The server said it was one-bit-correcting or something like that.  I
thought it was supposed to stop if it had errors it couldn't correct.
I swapped the whole set out at once without digging much more into the
details.

-- 
  Les Mikesell
lesmikes...@gmail.com
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] [HW] Do the HDD cages in rack mount chassis indicate visual/audio HDD failure?

2011-10-18 Thread Arun Khan
Hi Michael,

On Fri, Oct 14, 2011 at 5:35 PM, Michael Schumacher  wrote:

 On Tuesday, October 11, 2011 you wrote:


 I would appreciate clarification on the following:

 (a) Indicate disk failure. LED lights up and/or audio alarm?
 (b) The failed HDD can be swapped.

 Don't rely on the LED going on. I mark all my hot swap disks with
 labels with their serial number. This label is visible from the
 outside without removing the HD.
 That way, I can double check that I remove the faulty disk.
 Pulling the wrong disk is the last thing you want to risk in a RAID
 setup. Relying on a fault LED is close to that.
 Also make a list of the HD serial numbers and their position within
 the RAID in time. Store that in a safe place.

Thanks for these very helpful suggestions - good admin practice.

 I pulled ONCE the wrong disk out of a Raid5 array. :-(
 You know what that means?

You mean, it is not OK to pull out a functioning disk?  Pulling one
disk out of RAID 5 should be OK.  Am I missing something?

Thanks,
-- Arun Khan
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] [HW] Do the HDD cages in rack mount chassis indicate visual/audio HDD failure?

2011-10-18 Thread Karanbir Singh
On 10/11/2011 03:29 PM, Arun Khan wrote:
 Are the hot swap bays compatible with Linux mdadm RAID?  i.e. Upon
 detection of disk failure, the respective HDD LED on the bay can be
 turned ON?

no, not all are. Only a few work with mdadm ( or rather in a way that
mdadm can work with them, even the basic mdadm hotswap capability is
new'ish. Test it a few times to make sure it works for your setup. ).

 I am trying to reduce the cost if I can get by
 with mdadm RAID10 with additional tools to detect failed drive and

Also, mdraid10 isnt the same as a normal raid-10, unless you meant to
imply that you are doing a raid10 with md-raid tools.

- KB
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] [HW] Do the HDD cages in rack mount chassis indicate visual/audio HDD failure?

2011-10-18 Thread Karanbir Singh
On 10/18/2011 04:11 PM, Arun Khan wrote:
 You mean, it is not OK to pull out a functioning disk?  Pulling one
 disk out of RAID 5 should be OK.  Am I missing something?

grab yourself a bunch of usb keys +  a usb hub - fire up mdadm on your
laptop and use those keys as target disks and see how things work with
mdadm and hotswap. Much fun to be had there. I would also recommend
using CentOS6.

Pulling a disk that isnt set bad and deactivated in mdadm can cause some
very funky results - best of all, the machine will freeze and you can
reinsert the disk boot up and carry on. Worst of all, you will lose all
the data on the array.

btw, dont think that these issues dont affect hardware raid - they do.
its just that the management for these things is slightly more
abstracted away and the controllers are better integrated with the disk
cages.

- KB
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] [HW] Do the HDD cages in rack mount chassis indicate visual/audio HDD failure?

2011-10-18 Thread Les Mikesell
On Tue, Oct 18, 2011 at 10:11 AM, Arun Khan knu...@gmail.com wrote:

 I would appreciate clarification on the following:

 (a) Indicate disk failure. LED lights up and/or audio alarm?
 (b) The failed HDD can be swapped.

 Don't rely on the LED going on. I mark all my hot swap disks with
 labels with their serial number. This label is visible from the
 outside without removing the HD.
 That way, I can double check that I remove the faulty disk.
 Pulling the wrong disk is the last thing you want to risk in a RAID
 setup. Relying on a fault LED is close to that.
 Also make a list of the HD serial numbers and their position within
 the RAID in time. Store that in a safe place.

 Thanks for these very helpful suggestions - good admin practice.

 I pulled ONCE the wrong disk out of a Raid5 array. :-(
 You know what that means?

 You mean, it is not OK to pull out a functioning disk?  Pulling one
 disk out of RAID 5 should be OK.  Am I missing something?

Usually you would be swapping drives to repair an already-broken raid.
 Unless you have a hot spare and the raid has already rebuilt on it,
pulling a working disk will take a 2nd drive out of the failed raid5
and kill it.

-- 
   Les Mikesell
lesmikes...@gmail.com
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] [HW] Do the HDD cages in rack mount chassis indicate visual/audio HDD failure?

2011-10-18 Thread Arun Khan
On Tue, Oct 18, 2011 at 8:57 PM, Les Mikesell lesmikes...@gmail.com wrote:
 On Tue, Oct 18, 2011 at 10:11 AM, Arun Khan knu...@gmail.com wrote:


 I pulled ONCE the wrong disk out of a Raid5 array. :-(
 You know what that means?

 You mean, it is not OK to pull out a functioning disk?  Pulling one
 disk out of RAID 5 should be OK.  Am I missing something?

 Usually you would be swapping drives to repair an already-broken raid.
  Unless you have a hot spare and the raid has already rebuilt on it,
 pulling a working disk will take a 2nd drive out of the failed raid5
 and kill it.


Thanks I get it now :)

-- Arun Khan
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] [HW] Do the HDD cages in rack mount chassis indicate visual/audio HDD failure?

2011-10-18 Thread Arun Khan
On Tue, Oct 18, 2011 at 8:52 PM, Karanbir Singh mail-li...@karan.org wrote:
 Also, mdraid10 isnt the same as a normal raid-10, unless you meant to
 imply that you are doing a raid10 with md-raid tools.

Yes, the plan is to create raid10 with the md tools.

-- Arun Khan
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] [HW] Do the HDD cages in rack mount chassis indicate visual/audio HDD failure?

2011-10-18 Thread Arun Khan
On Tue, Oct 18, 2011 at 8:55 PM, Karanbir Singh mail-li...@karan.org wrote:
 On 10/18/2011 04:11 PM, Arun Khan wrote:
 You mean, it is not OK to pull out a functioning disk?  Pulling one
 disk out of RAID 5 should be OK.  Am I missing something?

 grab yourself a bunch of usb keys +  a usb hub - fire up mdadm on your
 laptop and use those keys as target disks and see how things work with
 mdadm and hotswap. Much fun to be had there. I would also recommend
 using CentOS6.

Thanks for the suggestion - a great way to experiment.

From the feedback on this thread, I am leaning towards h/w raid controller.

 Pulling a disk that isnt set bad and deactivated in mdadm can cause some
 very funky results - best of all, the machine will freeze and you can
 reinsert the disk boot up and carry on. Worst of all, you will lose all
 the data on the array.

I agree.

 btw, dont think that these issues dont affect hardware raid - they do.
 its just that the management for these things is slightly more
 abstracted away and the controllers are better integrated with the disk
 cages.

About 10 years ago, I had a h/w raid controller go bad (HDDs connected
via SCSI cable - no HDD bays involved).  The replacement card
recreated the RAID array - lost all data.  I did have a back up to
restore most of the data.

-- Arun Khan
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] [HW] Do the HDD cages in rack mount chassis indicate visual/audio HDD failure?

2011-10-18 Thread Les Mikesell
On Tue, Oct 18, 2011 at 11:05 AM, Arun Khan knu...@gmail.com wrote:

 btw, dont think that these issues dont affect hardware raid - they do.
 its just that the management for these things is slightly more
 abstracted away and the controllers are better integrated with the disk
 cages.

 About 10 years ago, I had a h/w raid controller go bad (HDDs connected
 via SCSI cable - no HDD bays involved).  The replacement card
 recreated the RAID array - lost all data.  I did have a back up to
 restore most of the data.

I don't think anything is immune to failure.  Another fun case is a
randomly-bad memory bit causing different things to be written to
software raid mirrors.  I had one that took 3+ days of running
memtest86 to catch.

-- 
  Les Mikesell
lesmikes...@gmail.com
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] [HW] Do the HDD cages in rack mount chassis indicate visual/audio HDD failure?

2011-10-14 Thread Michael Schumacher
Dear Arun,

On Tuesday, October 11, 2011 you wrote:


 I would appreciate clarification on the following:

 (a) Indicate disk failure. LED lights up and/or audio alarm?
 (b) The failed HDD can be swapped.

Don't rely on the LED going on. I mark all my hot swap disks with
labels with their serial number. This label is visible from the
outside without removing the HD.
That way, I can double check that I remove the faulty disk.
Pulling the wrong disk is the last thing you want to risk in a RAID
setup. Relying on a fault LED is close to that.
Also make a list of the HD serial numbers and their position within
the RAID in time. Store that in a safe place.

I pulled ONCE the wrong disk out of a Raid5 array. :-(
You know what that means?

best regards
---
Michael Schumacher
PAMAS Partikelmess- und Analysesysteme GmbH
Dieselstr.10, D-71277 Rutesheim
Tel +49-7152-99630
Fax +49-7152-996333
Geschäftsführer: Gerhard Schreck
Handelsregister B Stuttgart HRB 252024

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


[CentOS] [HW] Do the HDD cages in rack mount chassis indicate visual/audio HDD failure?

2011-10-11 Thread Arun Khan
I have no personal experience with rack mount chassis.

From the past postings, I reckon there are  members, in this list, who
have experience in rack mount setups and would like get their advice.

To reduce the H/W cost, I am considering Linux mdadm RAID10 on a 2U chassis.

I would appreciate clarification on the following:

In rack mount chassis, do the cages that house the
hard disks have the following feature?

(a) Indicate disk failure. LED lights up and/or audio alarm?
(b) The failed HDD can be swapped.

TIA.
-- 
Arun Khan
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] [HW] Do the HDD cages in rack mount chassis indicate visual/audio HDD failure?

2011-10-11 Thread John R Pierce
On 10/11/11 12:11 AM, Arun Khan wrote:
 In rack mount chassis, do the cages that house the
 hard disks have the following feature?

 (a) Indicate disk failure. LED lights up and/or audio alarm?

that requires specific configuration to suit whatever drive interconnect 
you have.


 (b) The failed HDD can be swapped.

If they are in hotswap bays, yes.  if they aren't, no.

they have 2U servers with as many as 25 2.5 SAS bays now.


-- 
john r pierceN 37, W 122
santa cruz ca mid-left coast

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] [HW] Do the HDD cages in rack mount chassis indicate visual/audio HDD failure?

2011-10-11 Thread m . roth
John R Pierce wrote:
 On 10/11/11 12:11 AM, Arun Khan wrote:
 In rack mount chassis, do the cages that house the
 hard disks have the following feature?

 (a) Indicate disk failure. LED lights up and/or audio alarm?

 that requires specific configuration to suit whatever drive interconnect
 you have.

Sometimes, it's just looking at what drive is showing SMART (or other)
errors. Other times, look at the light.

 (b) The failed HDD can be swapped.

 If they are in hotswap bays, yes.  if they aren't, no.

Don't waste time and money getting anything *other* than hot swap bays.
You really, really don't want to have to pull the server out, and take it
apart just to swap a bad drive.

 they have 2U servers with as many as 25 2.5 SAS bays now.

  mark

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] [HW] Do the HDD cages in rack mount chassis indicate visual/audio HDD failure?

2011-10-11 Thread Arun Khan
On Tue, Oct 11, 2011 at 12:46 PM, John R Pierce pie...@hogranch.com wrote:
 On 10/11/11 12:11 AM, Arun Khan wrote:
 In rack mount chassis, do the cages that house the
 hard disks have the following feature?

 (a) Indicate disk failure. LED lights up and/or audio alarm?

 that requires specific configuration to suit whatever drive interconnect
 you have.

Does these bays have a connector (+ cable) that is connected to the
motherboard or RAID card to control the HDD LEDs in the bay?
(sorry if this appears basic but I have no experience with such hardware)

 (b) The failed HDD can be swapped.

 If they are in hotswap bays, yes.  if they aren't, no.

Are the hot swap bays compatible with Linux mdadm RAID?  i.e. Upon
detection of disk failure, the respective HDD LED on the bay can be
turned ON?


 they have 2U servers with as many as 25 2.5 SAS bays now.


The 2U system is for an appliance that I am building and it will be a
commercial product.  I plan to order the integrated system from
value add SIs of Supremicro/Tyan (whover is able to satisfy the h/w
spec.).  My storage requirement is 2TB (4X1TB disks), a 6 disk bay
should be sufficient (or whatever is the lowest denominator).

I am trying to reduce the cost if I can get by
with mdadm RAID10 with additional tools to detect failed drive and
then rebuild the s/w RAID10 when a new disk is inserted.  In some
cases, it will not be possible to service the unit by me and for that
reason I am looking for a visual clue (LED ON) so that I can guide the
local sysadmin.  I may have to go with H/W RAID if it is not possible
to do the same with mdadm RAID.

Thanks for your help.
-- Arun Khan
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] [HW] Do the HDD cages in rack mount chassis indicate visual/audio HDD failure?

2011-10-11 Thread Arun Khan
On Tue, Oct 11, 2011 at 6:36 PM,  m.r...@5-cent.us wrote:
 John R Pierce wrote:
 On 10/11/11 12:11 AM, Arun Khan wrote:
 In rack mount chassis, do the cages that house the
 hard disks have the following feature?

 (a) Indicate disk failure. LED lights up and/or audio alarm?

 that requires specific configuration to suit whatever drive interconnect
 you have.

 Sometimes, it's just looking at what drive is showing SMART (or other)
 errors. Other times, look at the light.

Agree but it requires me to be in physical proximity of the system.
My objective is to make the HDD failure recovery process as
deterministic as possible.  As stated in another response, this will
be an appliance running 24x7 at sites where Linux admin knowledge is
likely to be sparse and some sites may not give me connectivity over
the 'Net.


 (b) The failed HDD can be swapped.

 If they are in hotswap bays, yes.  if they aren't, no.

 Don't waste time and money getting anything *other* than hot swap bays.
 You really, really don't want to have to pull the server out, and take it
 apart just to swap a bad drive.

OK, I have understood the value of hot swap bays for disk failure scenario.

Thanks,
-- Arun Khan
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] [HW] Do the HDD cages in rack mount chassis indicate visual/audio HDD failure?

2011-10-11 Thread m . roth
Arun Khan wrote:
 On Tue, Oct 11, 2011 at 12:46 PM, John R Pierce pie...@hogranch.com
 wrote:
 On 10/11/11 12:11 AM, Arun Khan wrote:
 In rack mount chassis, do the cages that house the
 hard disks have the following feature?
snip
 Does these bays have a connector (+ cable) that is connected to the
 motherboard or RAID card to control the HDD LEDs in the bay?
 (sorry if this appears basic but I have no experience with such hardware)

They have sleds. You screw a std. drive into one, and shove it in -
literally, that's all there is to it.

 (b) The failed HDD can be swapped.

 If they are in hotswap bays, yes.  if they aren't, no.

 Are the hot swap bays compatible with Linux mdadm RAID?  i.e. Upon
 detection of disk failure, the respective HDD LED on the bay can be
 turned ON?

Everything understands hot swap bays these days, and certainly Linux, like
every other version of Unix, does. Let's see, I have well over a hundred
rackmounts in our server rooms and the data center, all have hot swap, and
90% are running CentOS (and a very few RHEL, and a couple of odd things,
and there are the few WinDoze servers (they have hot swap, also).
snip
 The 2U system is for an appliance that I am building and it will be a
 commercial product.  I plan to order the integrated system from
 value add SIs of Supremicro/Tyan (whover is able to satisfy the h/w
 spec.).  My storage requirement is 2TB (4X1TB disks), a 6 disk bay
 should be sufficient (or whatever is the lowest denominator).

You absolutely do *NOT* want anything but hot swap. Take a look at the
Dell R[468]10's.
snip
mark

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] [HW] Do the HDD cages in rack mount chassis indicate visual/audio HDD failure?

2011-10-11 Thread John R Pierce
On 10/11/11 7:29 AM, Arun Khan wrote:
 that requires specific configuration to suit whatever drive interconnect
   you have.
 Does these bays have a connector (+ cable) that is connected to the
 motherboard or RAID card to control the HDD LEDs in the bay?
 (sorry if this appears basic but I have no experience with such hardware)


typically, a server will have a SAS backplane which sas/sata drives hot 
plug into, and 1 or more 4 channel SAS ports that plug into the host bus 
adapter or raid controller.  this SAS backplane usually has a 'SES' 
controller[*] embedded on it, which appears to the host as another SAS 
device, and manages the LEDs.   If its a brand name server (hp, dell, 
ibm, etc) using the vendor's raid cards, the LEDs all just work.if 
its whitebox stuff, with JBOD, getting the right failure LEDs to come on 
may require some custom configuration.


[*] SES supercedes the earlier SAF-TE design for the same functionality.


-- 
john r pierceN 37, W 122
santa cruz ca mid-left coast

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos