Hi there, I'm back with another LSI controller, and I'm experiencing
problems with creating hot spares from bioctl. This seems to be the same
problem that I posted to misc@ on Oct 16, 2006 with the subject line of:
[ami] Unable to set "Hot Spare" on MegaRAID SATA 300-8x
I've got the same symptoms, but now with a PERC 4/Di controller. [And this
time I've found a better work around than just avoiding bioctl -H with this
LSI controller :).]
Problem summary
===============
When I use bioctl to mark an Unused drive as a Hot Spare, that drive will
fail to be integrated when another disk fails.
The only way, that I've found, to make that drive properly act as a Hot
Spare, is to only set it as such from the LSI boot menu. If you have
already marked it as a Hot Spare from bioctl, pull the Hot Spare-marked
drive, and replace it (it can be the same physical disk). At that point
your disk should be showing up as an 'Unused' disk, from where you can go
do the thing in the LSI boot menu.
This is an improvement over my 2006 analysis of the situation, where I
couldn't find a way to reset the drive back to Unused (after Hot Sparing it
from bioctl). The LSI boot menu requires a drive to be in an Unused state
before it will allow me to correctly mark it as a Hot Spare.
If you're interested, please let me know what I can do to be of assistance
in trouble shooting this. I have a limited window before this box will
have to be pushed into production, and I can live with the current
situation (an after hours reboot in the case of a drive failure is
perfectly fine).
Matthew
Test case
=========
s => step succeeded
F => step failed
Normal case (RAID 1 + one hot spare)
-----------
s Configure array from the LSI boot menu
s Clear configuration
s New configuration
s Disks 0, 1: RAID 1 array
s Disk 2: Hot spare
s Install OpenBSD-4.2
s Single disk failure
s Disk 0: Fails (I pulled it from the hot swap cage)
s Disk 2: Automatically replaces it
s Observe the RAID 1 array get fully rebuilt
s Replace failed disk
s Replace Disk 0 with a new disk
s Observe that Disk 0 is marked as "Unused" through bioctl
s Set Disk 0 to be a hot spare (through bioctl)
s Single disk failure
s Disk 1: Fails (I pulled it)
F Disk 0: FAILS TO GET INTEGRATED, DESPITE STILL BEING MARKED AS A
HOT SPARE - Array is still degraded.
s Reboot, enter into the LSI boot menu
s Configure > View/Add Configurarion
s Highlight disk 0 > F4 (hot spare)
s "This Physical Drive is already a HOTSPARE\nPress any key to
continue"
s F10 (Configure), Esc, Esc
s "Exit?" = YES
s "Please REBOOT YOUR SYSTEM", CTRL-ALT-DEL
s Recheck array
F Disk 0: Still failing to integrate. Array still degraded.
s Attempt to shake loose the 'Hot Spare' bit from disk 0
s Remove disk 0
s Replace disk 0 (with the same physical disk)
s Disk 0 is *no longer* marked as a 'Hot Spare' (either through
bioctl or through the LSI boot menu). Yeah! :)
[I don't think I tested this method with my SATA 300-8x.]
Log file
========
# The output is generated by:
# date; bioctl ami0
##############################################################################
# Created a new RAID 1 array from the LSI boot menu and installed OpenBSD 4.2
Tue Feb 19 04:01:42 MST 2008
Volume Status Size Device
ami0 0 Scrubbing 146695782400 sd0 RAID1 3% done
0 Online 146811125760 0:0.0 safte0 <MAXTOR ATLAS10K5_146SCAJNZM>
1 Online 146811125760 0:1.0 safte0 <SEAGATE ST3146807LC DS09>
ami0 1 Hot spare 146811125760 0:2.0 safte0 <IBM IC35L146UCDY10-0S27F>
Tue Feb 19 10:02:15 MST 2008
Volume Status Size Device
ami0 0 Scrubbing 146695782400 sd0 RAID1 94% done
0 Online 146811125760 0:0.0 safte0 <MAXTOR ATLAS10K5_146SCAJNZM>
1 Online 146811125760 0:1.0 safte0 <SEAGATE ST3146807LC DS09>
ami0 1 Hot spare 146811125760 0:2.0 safte0 <IBM IC35L146UCDY10-0S27F>
Tue Feb 19 10:12:15 MST 2008
Volume Status Size Device
ami0 0 Scrubbing 146695782400 sd0 RAID1 97% done
0 Online 146811125760 0:0.0 safte0 <MAXTOR ATLAS10K5_146SCAJNZM>
1 Online 146811125760 0:1.0 safte0 <SEAGATE ST3146807LC DS09>
ami0 1 Hot spare 146811125760 0:2.0 safte0 <IBM IC35L146UCDY10-0S27F>
##############################################################################
# Mirroring complete
Tue Feb 19 10:22:16 MST 2008
Volume Status Size Device
ami0 0 Online 146695782400 sd0 RAID1
0 Online 146811125760 0:0.0 safte0 <MAXTOR ATLAS10K5_146SCAJNZM>
1 Online 146811125760 0:1.0 safte0 <SEAGATE ST3146807LC DS09>
ami0 1 Hot spare 146811125760 0:2.0 safte0 <IBM IC35L146UCDY10-0S27F>
##############################################################################
# Pulling Drive 0:0.0
Tue Feb 19 16:15:15 MST 2008
Volume Status Size Device
ami0 0 Online 146695782400 sd0 RAID1
0 Online 146811125760 0:0.0 safte0 <MAXTOR ATLAS10K5_146SCAJNZM>
1 Online 146811125760 0:1.0 safte0 <SEAGATE ST3146807LC DS09>
ami0 1 Hot spare 146811125760 0:2.0 safte0 <IBM IC35L146UCDY10-0S27F>
##############################################################################
# LSI boot menu-defined 'Hot Spare' has been integrated
Tue Feb 19 16:15:26 MST 2008
Volume Status Size Device
ami0 0 Rebuild 146695782400 sd0 RAID1 0% done
0 Rebuild 146811125760 0:2.0 safte0 <IBM IC35L146UCDY10-0S27F>
1 Online 146811125760 0:1.0 safte0 <SEAGATE ST3146807LC DS09>
Tue Feb 19 17:06:14 MST 2008
Volume Status Size Device
ami0 0 Rebuild 146695782400 sd0 RAID1 18% done
0 Rebuild 146811125760 0:2.0 safte0 <IBM IC35L146UCDY10-0S27F>
1 Online 146811125760 0:1.0 safte0 <SEAGATE ST3146807LC DS09>
Tue Feb 19 20:46:38 MST 2008
Volume Status Size Device
ami0 0 Rebuild 146695782400 sd0 RAID1 98% done
0 Rebuild 146811125760 0:2.0 safte0 <IBM IC35L146UCDY10-0S27F>
1 Online 146811125760 0:1.0 safte0 <SEAGATE ST3146807LC DS09>
Tue Feb 19 20:56:39 MST 2008
Volume Status Size Device
ami0 0 Online 146695782400 sd0 RAID1
0 Online 146811125760 0:2.0 safte0 <IBM IC35L146UCDY10-0S27F>
1 Online 146811125760 0:1.0 safte0 <SEAGATE ST3146807LC DS09>
##############################################################################
# Mirroring complete
Tue Feb 19 21:06:40 MST 2008
Volume Status Size Device
ami0 0 Online 146695782400 sd0 RAID1
0 Online 146811125760 0:2.0 safte0 <IBM IC35L146UCDY10-0S27F>
1 Online 146811125760 0:1.0 safte0 <SEAGATE ST3146807LC DS09>
Tue Feb 19 21:46:45 MST 2008
Volume Status Size Device
ami0 0 Online 146695782400 sd0 RAID1
0 Online 146811125760 0:2.0 safte0 <IBM IC35L146UCDY10-0S27F>
1 Online 146811125760 0:1.0 safte0 <SEAGATE ST3146807LC DS09>
##############################################################################
# Replaced 0:0.0
Tue Feb 19 21:49:59 MST 2008
Volume Status Size Device
ami0 0 Online 146695782400 sd0 RAID1
0 Online 146811125760 0:2.0 safte0 <IBM IC35L146UCDY10-0S27F>
1 Online 146811125760 0:1.0 safte0 <SEAGATE ST3146807LC DS09>
ami0 1 Unused 146811125760 0:0.0 safte0 <IBM IC35L146UCDY10-0S27F>
##############################################################################
# Marking 0:0.0 as Hot Spare from bioctl (bioctl -H 0:0.0 ami0)
Tue Feb 19 21:51:56 MST 2008
Volume Status Size Device
ami0 0 Online 146695782400 sd0 RAID1
0 Online 146811125760 0:2.0 safte0 <IBM IC35L146UCDY10-0S27F>
1 Online 146811125760 0:1.0 safte0 <SEAGATE ST3146807LC DS09>
ami0 1 Unused 146811125760 0:0.0 safte0 <IBM IC35L146UCDY10-0S27F>
Tue Feb 19 21:52:07 MST 2008
Volume Status Size Device
ami0 0 Online 146695782400 sd0 RAID1
0 Online 146811125760 0:2.0 safte0 <IBM IC35L146UCDY10-0S27F>
1 Online 146811125760 0:1.0 safte0 <SEAGATE ST3146807LC DS09>
ami0 1 Hot spare 146811125760 0:0.0 safte0 <IBM IC35L146UCDY10-0S27F>
##############################################################################
# Pulling 0:1.0
Tue Feb 19 21:53:02 MST 2008
Volume Status Size Device
ami0 0 Degraded 146695782400 sd0 RAID1
0 Online 146811125760 0:2.0 safte0 <IBM IC35L146UCDY10-0S27F>
1 Failed 146811125760 0:1.0 safte0 <>
ami0 1 Hot spare 146811125760 0:0.0 safte0 <IBM IC35L146UCDY10-0S27F>
##############################################################################
# bioctl-defined Hot Spare 0:1.0 has failed to integrate
Tue Feb 19 21:53:15 MST 2008
Volume Status Size Device
ami0 0 Degraded 146695782400 sd0 RAID1
0 Online 146811125760 0:2.0 safte0 <IBM IC35L146UCDY10-0S27F>
1 Failed 146811125760 0:1.0 safte0 <>
ami0 1 Hot spare 146811125760 0:0.0 safte0 <IBM IC35L146UCDY10-0S27F>
Tue Feb 19 21:53:37 MST 2008
Volume Status Size Device
ami0 0 Degraded 146695782400 sd0 RAID1
0 Online 146811125760 0:2.0 safte0 <IBM IC35L146UCDY10-0S27F>
1 Failed 146811125760 0:1.0 safte0 <>
ami0 1 Hot spare 146811125760 0:0.0 safte0 <IBM IC35L146UCDY10-0S27F>
Tue Feb 19 22:06:04 MST 2008
Volume Status Size Device
ami0 0 Degraded 146695782400 sd0 RAID1
0 Online 146811125760 0:2.0 safte0 <IBM IC35L146UCDY10-0S27F>
1 Failed 146811125760 0:1.0 safte0 <>
ami0 1 Hot spare 146811125760 0:0.0 safte0 <IBM IC35L146UCDY10-0S27F>
##############################################################################
# System rebooted - no change
Tue Feb 19 22:25:56 MST 2008
Volume Status Size Device
ami0 0 Degraded 146695782400 sd0 RAID1
0 Online 146811125760 0:2.0 safte0 <IBM IC35L146UCDY10-0S27F>
1 Failed 146811125760 0:1.0 safte0 <>
ami0 1 Hot spare 146811125760 0:0.0 safte0 <IBM IC35L146UCDY10-0S27F>
Wed Feb 20 00:50:21 MST 2008
Volume Status Size Device
ami0 0 Degraded 146695782400 sd0 RAID1
0 Online 146811125760 0:2.0 safte0 <IBM IC35L146UCDY10-0S27F>
1 Failed 146811125760 0:1.0 safte0 <>
ami0 1 Hot spare 146811125760 0:0.0 safte0 <IBM IC35L146UCDY10-0S27F>
##############################################################################
# Pulling drive 0 (in an attempt to undo the bioctl hot sparing)
Wed Feb 20 00:50:44 MST 2008
Volume Status Size Device
ami0 0 Degraded 146695782400 sd0 RAID1
0 Online 146811125760 0:2.0 safte0 <IBM IC35L146UCDY10-0S27F>
1 Failed 146811125760 0:1.0 safte0 <>
##############################################################################
# Replaced drive 0
Wed Feb 20 00:51:07 MST 2008
Volume Status Size Device
ami0 0 Degraded 146695782400 sd0 RAID1
0 Online 146811125760 0:2.0 safte0 <IBM IC35L146UCDY10-0S27F>
1 Failed 146811125760 0:1.0 safte0 <>
ami0 1 Unused 146811125760 0:0.0 safte0 <IBM IC35L146UCDY10-0S27F>
##############################################################################
# Success! That drive is back to a status of Unused!!
Wed Feb 20 00:51:18 MST 2008
Volume Status Size Device
ami0 0 Degraded 146695782400 sd0 RAID1
0 Online 146811125760 0:2.0 safte0 <IBM IC35L146UCDY10-0S27F>
1 Failed 146811125760 0:1.0 safte0 <>
ami0 1 Unused 146811125760 0:0.0 safte0 <IBM IC35L146UCDY10-0S27F>
##############################################################################
# Rebooted and set drive 0 to Hot Spare from the LSI boot menu
Wed Feb 20 01:08:06 MST 2008
Volume Status Size Device
ami0 0 Rebuild 146695782400 sd0 RAID1 1% done
0 Online 146811125760 0:2.0 safte0 <IBM IC35L146UCDY10-0S27F>
1 Rebuild 146811125760 0:0.0 safte0 <IBM IC35L146UCDY10-0S27F>
##############################################################################
# Success! The array is rebuilding!
Wed Feb 20 01:18:07 MST 2008
Volume Status Size Device
ami0 0 Rebuild 146695782400 sd0 RAID1 5% done
0 Online 146811125760 0:2.0 safte0 <IBM IC35L146UCDY10-0S27F>
1 Rebuild 146811125760 0:0.0 safte0 <IBM IC35L146UCDY10-0S27F>
System configuration (dmesg)
=====
OpenBSD 4.2 (GENERIC.MP) #252: Tue Aug 28 10:53:04 MDT 2007
[EMAIL PROTECTED]:/usr/src/sys/arch/i386/compile/GENERIC.MP
cpu0: Intel(R) Xeon(TM) CPU 2.40GHz ("GenuineIntel" 686-class) 2.40 GHz
cpu0:
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,CNXT-ID,xTPR
real mem = 2146861056 (2047MB)
avail mem = 2068238336 (1972MB)
mainbus0 at root
bios0 at mainbus0: AT/286+ BIOS, date 02/21/05, BIOS32 rev. 0 @ 0xffe90, SMBIOS
rev. 2.3 @ 0xfaf40 (71 entries)
bios0: vendor Dell Computer Corporation version "A14" date 02/21/2005
bios0: Dell Computer Corporation PowerEdge 2600
pcibios0 at bios0: rev 2.1 @ 0xf0000/0x10000
pcibios0: PCI IRQ Routing Table rev 1.0 @ 0xfc160/224 (12 entries)
pcibios0: PCI Interrupt Router at 000:31:0 ("Intel 82801CA LPC" rev 0x00)
pcibios0: PCI bus #11 is the last bus
bios0: ROM list: 0xc0000/0x8000 0xc8000/0x2200 0xec000/0x4000!
acpi at mainbus0 not configured
ipmi0 at mainbus0: version 1.0 interface BT iobase 0xe4/3 spacing 1 irq 10
mainbus0: Intel MP Specification (Version 1.4)
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: apic clock running at 132 MHz
cpu1 at mainbus0: apid 6 (application processor)
cpu1: Intel(R) Xeon(TM) CPU 2.40GHz ("GenuineIntel" 686-class) 2.40 GHz
cpu1:
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,CNXT-ID,xTPR
mainbus0: bus 0 is type PCI
mainbus0: bus 1 is type PCI
mainbus0: bus 2 is type PCI
mainbus0: bus 3 is type PCI
mainbus0: bus 4 is type PCI
mainbus0: bus 5 is type PCI
mainbus0: bus 6 is type PCI
mainbus0: bus 7 is type PCI
mainbus0: bus 8 is type PCI
mainbus0: bus 9 is type PCI
mainbus0: bus 10 is type PCI
mainbus0: bus 11 is type PCI
mainbus0: bus 12 is type ISA
ioapic0 at mainbus0: apid 8 pa 0xfec00000, version 20, 24 pins
ioapic0: misconfigured as apic 0, remapped to apid 8
ioapic1 at mainbus0: apid 9 pa 0xfec80000, version 20, 24 pins
ioapic1: misconfigured as apic 0, remapped to apid 9
ioapic2 at mainbus0: apid 10 pa 0xfec81000, version 20, 24 pins
ioapic2: misconfigured as apic 0, remapped to apid 10
ioapic3 at mainbus0: apid 11 pa 0xfec82000, version 20, 24 pins
ioapic3: misconfigured as apic 0, remapped to apid 11
ioapic4 at mainbus0: apid 12 pa 0xfec82800, version 20, 24 pins
ioapic4: misconfigured as apic 0, remapped to apid 12
pci0 at mainbus0 bus 0: configuration mode 1 (no bios)
pchb0 at pci0 dev 0 function 0 "Intel E7501 MCH Host" rev 0x01
ppb0 at pci0 dev 2 function 0 "Intel E7500 MCH" rev 0x01
pci1 at ppb0 bus 1
"Intel 82870P2 IOxAPIC" rev 0x04 at pci1 dev 28 function 0 not configured
ppb1 at pci1 dev 29 function 0 "Intel 82870P2 PCIX-PCIX" rev 0x04
pci2 at ppb1 bus 2
fxp0 at pci2 dev 2 function 0 "Intel 8255x" rev 0x0d, i82550: apic 9 int 0 (irq
5), address 00:02:b3:e8:25:b2
inphy0 at fxp0 phy 1: i82555 10/100 PHY, rev. 4
"Intel 82870P2 IOxAPIC" rev 0x04 at pci1 dev 30 function 0 not configured
ppb2 at pci1 dev 31 function 0 "Intel 82870P2 PCIX-PCIX" rev 0x04
pci3 at ppb2 bus 3
em0 at pci3 dev 1 function 0 "Intel PRO/1000XT (82544GC)" rev 0x02: apic 9 int
4 (irq 5), address 00:0f:1f:67:39:ea
ppb3 at pci0 dev 3 function 0 "Intel E7500 MCH" rev 0x01
pci4 at ppb3 bus 4
"Intel 82870P2 IOxAPIC" rev 0x04 at pci4 dev 28 function 0 not configured
ppb4 at pci4 dev 29 function 0 "Intel 82870P2 PCIX-PCIX" rev 0x04
pci5 at ppb4 bus 5
"Intel 82870P2 IOxAPIC" rev 0x04 at pci4 dev 30 function 0 not configured
ppb5 at pci4 dev 31 function 0 "Intel 82870P2 PCIX-PCIX" rev 0x04
pci6 at ppb5 bus 6
ppb6 at pci0 dev 4 function 0 "Intel E7500 MCH" rev 0x01
pci7 at ppb6 bus 7
"Intel 82870P2 IOxAPIC" rev 0x04 at pci7 dev 28 function 0 not configured
ppb7 at pci7 dev 29 function 0 "Intel 82870P2 PCIX-PCIX" rev 0x04
pci8 at ppb7 bus 8
ami0 at pci8 dev 8 function 0 "Dell PERC 4/Di i960" rev 0x01: apic 11 int 0
(irq 11)
ami0: Dell 123, 64b/lhc, FW 251X, BIOS v1.07, 128MB RAM
ami0: 2 channels, 0 FC loops, 1 logical drives
scsibus0 at ami0: 40 targets
sd0 at scsibus0 targ 0 lun 0: <AMI, Host drive #00, > SCSI2 0/direct fixed
sd0: 139900MB, 17834 cyl, 255 head, 63 sec, 512 bytes/sec, 286515200 sec total
scsibus1 at ami0: 16 targets
safte0 at scsibus1 targ 6 lun 0: <PE/PV, 1x6 SCSI BP, 1.1> SCSI2 3/processor
fixed
scsibus2 at ami0: 16 targets
"Intel 82870P2 IOxAPIC" rev 0x04 at pci7 dev 30 function 0 not configured
ppb8 at pci7 dev 31 function 0 "Intel 82870P2 PCIX-PCIX" rev 0x04
pci9 at ppb8 bus 10
uhci0 at pci0 dev 29 function 0 "Intel 82801CA/CAM USB" rev 0x02: apic 8 int 16
(irq 11)
ppb9 at pci0 dev 30 function 0 "Intel 82801BA AGP" rev 0x42
pci10 at ppb9 bus 11
vga1 at pci10 dev 4 function 0 "ATI Rage XL" rev 0x27
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
ichpcib0 at pci0 dev 31 function 0 "Intel 82801CA LPC" rev 0x02: 24-bit timer
at 3579545Hz
pciide0 at pci0 dev 31 function 1 "Intel 82801CA IDE" rev 0x02: DMA, channel 0
configured to compatibility, channel 1 configured to compatibility
atapiscsi0 at pciide0 channel 0 drive 0
scsibus3 at atapiscsi0: 2 targets
cd0 at scsibus3 targ 0 lun 0: <TEAC, CD-224E, K.9A> SCSI0 5/cdrom removable
cd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 2
pciide0: channel 1 disabled (no drives)
usb0 at uhci0: USB revision 1.0
uhub0 at usb0: Intel UHCI root hub, rev 1.00/1.00, addr 1
isa0 at ichpcib0
isadma0 at isa0
pckbc0 at isa0 port 0x60/5
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pcppi0 at isa0 port 0x61
midi0 at pcppi0: <PC speaker>
spkr0 at pcppi0
lpt0 at isa0 port 0x378/4 irq 7
npx0 at isa0 port 0xf0/16: reported by CPUID; using exception 16
pccom0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
pccom1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo
fdc0 at isa0 port 0x3f0/6 irq 6 drq 2
fd0 at fdc0 drive 0: 1.44MB 80 cyl, 2 head, 18 sec
fd1 at fdc0 drive 1: density unknown
pctr: user-level cycle counter enabled
mtrr: Pentium Pro MTRR support
dkcsum: sd0 matches BIOS drive 0x80
root on sd0a swap on sd0b dump on sd0b