Re: [ami] Unable to set "Hot Spare" from bioctl on a Dell PERC 4/Di

2008-02-27 Thread Matthew Mulrooney

On Thu, 21 Feb 2008, Matthew Mulrooney wrote:

On Wed, 20 Feb 2008, Marco Peereboom wrote:

 My natural answer is that this is a firmware issue.  But since you


I will upgrade the firmware and rerun my test case.


I've upgraded my firmware to the latest version:

  Firmware version:   252D
  Firmware release date:  July 17, 2007

And re-run the test case with the same results.

Matthew



 On Wed, Feb 20, 2008 at 01:42:59AM -0700, Matthew Mulrooney wrote:
>  Hi there, I'm back with another LSI controller, and I'm experiencing
>  problems with creating hot spares from bioctl.  This seems to be the 
>  same

>  problem that I posted to misc@ on Oct 16, 2006 with the subject line of:
> 
>[ami] Unable to set "Hot Spare" on MegaRAID SATA 300-8x
> 
>  I've got the same symptoms, but now with a PERC 4/Di controller.  [And 
>  this
>  time I've found a better work around than just avoiding bioctl -H with 
>  this

>  LSI controller :).]
> 
>  Problem summary

>  ===
>  When I use bioctl to mark an Unused drive as a Hot Spare, that drive 
>  will

>  fail to be integrated when another disk fails.
> 
>  The only way, that I've found, to make that drive properly act as a Hot

>  Spare, is to only set it as such from the LSI boot menu.  If you have
>  already marked it as a Hot Spare from bioctl, pull the Hot Spare-marked
>  drive, and replace it (it can be the same physical disk).  At that point
>  your disk should be showing up as an 'Unused' disk, from where you can 
>  go

>  do the thing in the LSI boot menu.
> 
>  This is an improvement over my 2006 analysis of the situation, where I
>  couldn't find a way to reset the drive back to Unused (after Hot Sparing 
>  it
>  from bioctl).  The LSI boot menu requires a drive to be in an Unused 
>  state

>  before it will allow me to correctly mark it as a Hot Spare.
> 
> 
>  If you're interested, please let me know what I can do to be of 
>  assistance

>  in trouble shooting this.  I have a limited window before this box will
>  have to be pushed into production, and I can live with the current
>  situation (an after hours reboot in the case of a drive failure is
>  perfectly fine).
> 
>  Matthew
> 
> 
>  Test case

>  =
>  s => step succeeded
>  F => step failed
> 
>  Normal case (RAID 1 + one hot spare)

>  ---
>  s Configure array from the LSI boot menu
>  s   Clear configuration
>  s   New configuration
>  s Disks 0, 1:  RAID 1 array
>  s Disk  2: Hot spare
> 
>  s Install OpenBSD-4.2
> 
>  s Single disk failure

>  s   Disk 0:  Fails (I pulled it from the hot swap cage)
>  s   Disk 2:  Automatically replaces it
>  s   Observe the RAID 1 array get fully rebuilt
> 
>  s Replace failed disk

>  s   Replace Disk 0 with a new disk
>  s   Observe that Disk 0 is marked as "Unused" through bioctl
>  s   Set Disk 0 to be a hot spare (through bioctl)
> 
>  s Single disk failure

>  s   Disk 1:  Fails (I pulled it)
>  F   Disk 0:  FAILS TO GET INTEGRATED, DESPITE STILL BEING MARKED AS A
>   HOT SPARE - Array is still degraded.
> 
>  s Reboot, enter into the LSI boot menu

>  s   Configure > View/Add Configurarion
>  s Highlight disk 0 > F4 (hot spare)
>  s   "This Physical Drive is already a HOTSPARE\nPress any key to
>   continue"
>  s   F10 (Configure), Esc, Esc
>  s   "Exit?" = YES
>  s   "Please REBOOT YOUR SYSTEM", CTRL-ALT-DEL
> 
>  s Recheck array

>  F   Disk 0:  Still failing to integrate.  Array still degraded.
> 
>  s Attempt to shake loose the 'Hot Spare' bit from disk 0

>  s   Remove disk 0
>  s   Replace disk 0 (with the same physical disk)
>  s   Disk 0 is *no longer* marked as a 'Hot Spare' (either through
>  bioctl or through the LSI boot menu).  Yeah! :)
>  [I don't think I tested this method with my SATA 300-8x.]
> 
> 
>  Log file

>  
> #  The output is generated by:
> #date; bioctl ami0
> 
>  ##
>  # Created a new RAID 1 array from the LSI boot menu and installed 
>  OpenBSD 4.2

>  Tue Feb 19 04:01:42 MST 2008
>  Volume  Status   Size Device
>   ami0 0 Scrubbing146695782400 sd0 RAID1 3% done
>0 Online   146811125760 0:0.0   safte0 >ATLAS10K5_146SCAJNZM>
>1 Online   146811125760 0:1.0   safte0 >DS09>
>   ami0 1 Hot spare146811125760 0:2.0   safte0 >   IC35L146UCDY10-0S27F>
> 
>  Tue Feb 19 10:02:15 MST 2008

>  Volume  Status   Size Device
>   ami0 0 Scrubbing146695782400 sd0 RAID1 94% done
>0 Online   146811125760 0:0.0   safte0 >ATLAS10K5_146SCAJNZM>
>1 Online   146811125760 0:1.0   safte0 >DS09>
>   ami0 1 Hot spare146811125760 0:2.0   safte0 >   IC35L146UCDY10-0S27F>
> 
>  Tue Feb 19 10:12:15 MST 2008

>  Volume  Status   Size Device
>   ami0 0 Scrubbing146695782400 sd0 RAID1 97% done
>0 Online   14681112576

Re: [ami] Unable to set "Hot Spare" from bioctl on a Dell PERC 4/Di

2008-02-21 Thread Matthew Mulrooney

On Wed, 20 Feb 2008, Marco Peereboom wrote:

My natural answer is that this is a firmware issue.  But since you


I will upgrade the firmware and rerun my test case.


provided such good steps I will try to recreate this.  Thank you for
this outstanding report.


No problem :).

Matthew



On Wed, Feb 20, 2008 at 01:42:59AM -0700, Matthew Mulrooney wrote:

Hi there, I'm back with another LSI controller, and I'm experiencing
problems with creating hot spares from bioctl.  This seems to be the same
problem that I posted to misc@ on Oct 16, 2006 with the subject line of:

  [ami] Unable to set "Hot Spare" on MegaRAID SATA 300-8x

I've got the same symptoms, but now with a PERC 4/Di controller.  [And this
time I've found a better work around than just avoiding bioctl -H with this
LSI controller :).]

Problem summary
===
When I use bioctl to mark an Unused drive as a Hot Spare, that drive will
fail to be integrated when another disk fails.

The only way, that I've found, to make that drive properly act as a Hot
Spare, is to only set it as such from the LSI boot menu.  If you have
already marked it as a Hot Spare from bioctl, pull the Hot Spare-marked
drive, and replace it (it can be the same physical disk).  At that point
your disk should be showing up as an 'Unused' disk, from where you can go
do the thing in the LSI boot menu.

This is an improvement over my 2006 analysis of the situation, where I
couldn't find a way to reset the drive back to Unused (after Hot Sparing it
from bioctl).  The LSI boot menu requires a drive to be in an Unused state
before it will allow me to correctly mark it as a Hot Spare.


If you're interested, please let me know what I can do to be of assistance
in trouble shooting this.  I have a limited window before this box will
have to be pushed into production, and I can live with the current
situation (an after hours reboot in the case of a drive failure is
perfectly fine).

Matthew


Test case
=
s => step succeeded
F => step failed

Normal case (RAID 1 + one hot spare)
---
s Configure array from the LSI boot menu
s   Clear configuration
s   New configuration
s Disks 0, 1:  RAID 1 array
s Disk  2: Hot spare

s Install OpenBSD-4.2

s Single disk failure
s   Disk 0:  Fails (I pulled it from the hot swap cage)
s   Disk 2:  Automatically replaces it
s   Observe the RAID 1 array get fully rebuilt

s Replace failed disk
s   Replace Disk 0 with a new disk
s   Observe that Disk 0 is marked as "Unused" through bioctl
s   Set Disk 0 to be a hot spare (through bioctl)

s Single disk failure
s   Disk 1:  Fails (I pulled it)
F   Disk 0:  FAILS TO GET INTEGRATED, DESPITE STILL BEING MARKED AS A
 HOT SPARE - Array is still degraded.

s Reboot, enter into the LSI boot menu
s   Configure > View/Add Configurarion
s Highlight disk 0 > F4 (hot spare)
s   "This Physical Drive is already a HOTSPARE\nPress any key to
 continue"
s   F10 (Configure), Esc, Esc
s   "Exit?" = YES
s   "Please REBOOT YOUR SYSTEM", CTRL-ALT-DEL

s Recheck array
F   Disk 0:  Still failing to integrate.  Array still degraded.

s Attempt to shake loose the 'Hot Spare' bit from disk 0
s   Remove disk 0
s   Replace disk 0 (with the same physical disk)
s   Disk 0 is *no longer* marked as a 'Hot Spare' (either through
bioctl or through the LSI boot menu).  Yeah! :)
[I don't think I tested this method with my SATA 300-8x.]


Log file

# The output is generated by:
#   date; bioctl ami0

##
# Created a new RAID 1 array from the LSI boot menu and installed OpenBSD 4.2
Tue Feb 19 04:01:42 MST 2008
Volume  Status   Size Device
 ami0 0 Scrubbing146695782400 sd0 RAID1 3% done
  0 Online   146811125760 0:0.0   safte0 
  1 Online   146811125760 0:1.0   safte0 
 ami0 1 Hot spare146811125760 0:2.0   safte0 

Tue Feb 19 10:02:15 MST 2008
Volume  Status   Size Device
 ami0 0 Scrubbing146695782400 sd0 RAID1 94% done
  0 Online   146811125760 0:0.0   safte0 
  1 Online   146811125760 0:1.0   safte0 
 ami0 1 Hot spare146811125760 0:2.0   safte0 

Tue Feb 19 10:12:15 MST 2008
Volume  Status   Size Device
 ami0 0 Scrubbing146695782400 sd0 RAID1 97% done
  0 Online   146811125760 0:0.0   safte0 
  1 Online   146811125760 0:1.0   safte0 
 ami0 1 Hot spare146811125760 0:2.0   safte0 

##
# Mirroring complete
Tue Feb 19 10:22:16 MST 2008
Volume  Status   Size Device
 ami0 0 Online   146695782400 sd0 RAID1
  0 Online   146811125760 0:0.0   safte0 
  1 Online   146811125760 0:1.0   safte0 
 ami0 1 Hot spare146811125760 0:2.0   safte0 

##
# Pulling Drive 0:0.0
Tue Feb 19 16:15:15 MST 2

Re: [ami] Unable to set "Hot Spare" from bioctl on a Dell PERC 4/Di

2008-02-20 Thread Unix Fan
Woah, Has anyone "ever" provided such a detailed and thorough error report 
before?



That was just amazing..  lol :)



-Nix Fan.







-Nix Fan.




Re: [ami] Unable to set "Hot Spare" from bioctl on a Dell PERC 4/Di

2008-02-20 Thread Marco Peereboom
My natural answer is that this is a firmware issue.  But since you
provided such good steps I will try to recreate this.  Thank you for
this outstanding report.

On Wed, Feb 20, 2008 at 01:42:59AM -0700, Matthew Mulrooney wrote:
> Hi there, I'm back with another LSI controller, and I'm experiencing 
> problems with creating hot spares from bioctl.  This seems to be the same 
> problem that I posted to misc@ on Oct 16, 2006 with the subject line of:
>
>   [ami] Unable to set "Hot Spare" on MegaRAID SATA 300-8x
>
> I've got the same symptoms, but now with a PERC 4/Di controller.  [And this 
> time I've found a better work around than just avoiding bioctl -H with this 
> LSI controller :).]
>
> Problem summary
> ===
> When I use bioctl to mark an Unused drive as a Hot Spare, that drive will 
> fail to be integrated when another disk fails.
>
> The only way, that I've found, to make that drive properly act as a Hot 
> Spare, is to only set it as such from the LSI boot menu.  If you have 
> already marked it as a Hot Spare from bioctl, pull the Hot Spare-marked 
> drive, and replace it (it can be the same physical disk).  At that point 
> your disk should be showing up as an 'Unused' disk, from where you can go 
> do the thing in the LSI boot menu.
>
> This is an improvement over my 2006 analysis of the situation, where I 
> couldn't find a way to reset the drive back to Unused (after Hot Sparing it 
> from bioctl).  The LSI boot menu requires a drive to be in an Unused state 
> before it will allow me to correctly mark it as a Hot Spare.
>
>
> If you're interested, please let me know what I can do to be of assistance 
> in trouble shooting this.  I have a limited window before this box will 
> have to be pushed into production, and I can live with the current 
> situation (an after hours reboot in the case of a drive failure is 
> perfectly fine).
>
> Matthew
>
>
> Test case
> =
> s => step succeeded
> F => step failed
>
> Normal case (RAID 1 + one hot spare)
> ---
> s Configure array from the LSI boot menu
> s   Clear configuration
> s   New configuration
> s Disks 0, 1:  RAID 1 array
> s Disk  2: Hot spare
>
> s Install OpenBSD-4.2
>
> s Single disk failure
> s   Disk 0:  Fails (I pulled it from the hot swap cage)
> s   Disk 2:  Automatically replaces it
> s   Observe the RAID 1 array get fully rebuilt
>
> s Replace failed disk
> s   Replace Disk 0 with a new disk
> s   Observe that Disk 0 is marked as "Unused" through bioctl
> s   Set Disk 0 to be a hot spare (through bioctl)
>
> s Single disk failure
> s   Disk 1:  Fails (I pulled it)
> F   Disk 0:  FAILS TO GET INTEGRATED, DESPITE STILL BEING MARKED AS A
>  HOT SPARE - Array is still degraded.
>
> s Reboot, enter into the LSI boot menu
> s   Configure > View/Add Configurarion
> s Highlight disk 0 > F4 (hot spare)
> s   "This Physical Drive is already a HOTSPARE\nPress any key to
>  continue"
> s   F10 (Configure), Esc, Esc
> s   "Exit?" = YES
> s   "Please REBOOT YOUR SYSTEM", CTRL-ALT-DEL
>
> s Recheck array
> F   Disk 0:  Still failing to integrate.  Array still degraded.
>
> s Attempt to shake loose the 'Hot Spare' bit from disk 0
> s   Remove disk 0
> s   Replace disk 0 (with the same physical disk)
> s   Disk 0 is *no longer* marked as a 'Hot Spare' (either through
> bioctl or through the LSI boot menu).  Yeah! :)
> [I don't think I tested this method with my SATA 300-8x.]
>
>
> Log file
> 
> # The output is generated by:
> #   date; bioctl ami0
>
> ##
> # Created a new RAID 1 array from the LSI boot menu and installed OpenBSD 4.2
> Tue Feb 19 04:01:42 MST 2008
> Volume  Status   Size Device
>  ami0 0 Scrubbing146695782400 sd0 RAID1 3% done
>   0 Online   146811125760 0:0.0   safte0  ATLAS10K5_146SCAJNZM>
>   1 Online   146811125760 0:1.0   safte0  DS09>
>  ami0 1 Hot spare146811125760 0:2.0   safte0  IC35L146UCDY10-0S27F>
>
> Tue Feb 19 10:02:15 MST 2008
> Volume  Status   Size Device
>  ami0 0 Scrubbing146695782400 sd0 RAID1 94% done
>   0 Online   146811125760 0:0.0   safte0  ATLAS10K5_146SCAJNZM>
>   1 Online   146811125760 0:1.0   safte0  DS09>
>  ami0 1 Hot spare146811125760 0:2.0   safte0  IC35L146UCDY10-0S27F>
>
> Tue Feb 19 10:12:15 MST 2008
> Volume  Status   Size Device
>  ami0 0 Scrubbing146695782400 sd0 RAID1 97% done
>   0 Online   146811125760 0:0.0   safte0  ATLAS10K5_146SCAJNZM>
>   1 Online   146811125760 0:1.0   safte0  DS09>
>  ami0 1 Hot spare146811125760 0:2.0   safte0  IC35L146UCDY10-0S27F>
>
> ##
> # Mirroring complete
> Tue Feb 19 10:22:16 MST 2008
> Volume  Status   Size Device
>  ami0 0 Online   146695782400 sd0 RAID1
>   0 Online   14