* [EMAIL PROTECTED] [2006-06-29 20:21]:
> I'm trying to make a spare drive the "hot spare", without rebooting my
> OpenBSD 3.9 server.  Bioctl is letting me at least query my raid array,
> but it's not letting me set an "Unused" drive to "Hot Spare":

there's bugs with -H in 3.9, current hs it betterer.

Hi again, it took me a while but I've installed the snapshot from 2006.09.25, put my LSI SATA 300-8x through its paces, and found there are still issues with setting hot spares on the 300-8x.

300-8x firmware info
====================
Here is the relevant card information (as displayed on boot-up):

  LSI MegaRAID BIOS     Version H425 (Build Nov 17, 2004)
  Copyright(c) 2004 LSI Logic Corp.
  HA-1 (Bus 3 Dev 14) LSI MegaRAID SATA PCI-X
    Standard FW 813G DRAM =128MB (SDRAM)

That is, I am running firmware version 813G. [According to the LSILogic website, it was released on 2005.03.11, and is now 5 versions old.]

Problem summary (problems with bioctl -H on a SATA 300-8x)
===============
To summarize (I've included the full test case below) - I can now use bioctl -H to set an "Unused" drive to "Hot spare". However, despite showing as hot spare in *both* bioctl and the LSI boot menu, when I fail a drive in my RAID array, the "hot spare" fails to behave as such (it will not be integrated into the degraded RAID array).

It gets worse - once a drive has been set as a hot spare through bioctl, it can never be changed back to unused, nor can it be properly set as a hotspare through the LSI boot menu. Essentially that slot is now unusable. The only solution that I have found is to "Clear configuration" from the LSI boot menu (which then requires reinstall of the contents of the drives).

Problem workaround (avoid bioctl -H on my SATA 300-8x)
------------------
If I only set hot spares through the LSI boot menu, the RAID array behaves as expected. Unfortunately this requires rebooting.

Request for help
================
So, I'm wondering if someone can guide me through peeking the memory on this card, so we can compare the difference between setting a hotspare through bioctl, and setting a hotspare through the LSI boot menu.

Alternately, I can upgrade the firmware and re-run the test case if you would prefer.

Thanks,

Matthew Mulrooney



The test cases
==============
[Pardon the lack of terminal captures - I forgot to transfer my typescript logs off the partition before recreating the array :(.
These notes were taken from a second machine.]

s => step succeeded
F => step failed

Normal case (RAID 5 + one hot spare)
-----------
s Configure array from the LSI boot menu
s   Clear configuration
s   New configuration
s     Disks 0, 1, 2:  RAID 5 array
s     Disk 3:         Hot spare

s Install OpenBSD-snapshot-2006.09.25

s Single disk failure
s   Disk 0:  Fails (I pulled it from the CSE-M35T1 enclosure)
s   Disk 3:  Automatically replaces it

s Replace failed disk
s   Replace Disk 0 with a new disk
s   Observe that Disk 0 is marked as "Unused" through bioctl
s   Set Disk 0 to be a hot spare (through bioctl)

s Single disk failure
s   Disk 1:  Fails (I pulled it)
F   Disk 0:  FAILS TO GET INTEGRATED, DESPITE STILL BEING MARKED AS A
             HOT SPARE

[Now I'm in what's-the-best-of-a-bad-set-of-options mode]

s Replace the failed drive
s   Disk 1:  Replaced
s   Observe that the RAID 5 array automatically gets rebuilt

s Wait until the rebuild is complete
s Reboot and hope that the LSI card init properly marks the hotspare as
  a hotspare

s Single disk failure
s   Disk 1:  Fails
F   Disk 0:  FAILS TO GET INTEGRATED, DESPITE STILL BEING MARKED AS A
             HOT SPARE

[So the reboot didn't magically make the hot spare work properly.]

s Replace the failed drive
s  Disk 1:  Replaced, and the RAID 5 array automatically gets rebuilt

s Wait until the rebuild is complete

s Reboot, enter into the LSI boot menu
s   Configure > View/Add Configurarion
s     Highlight disk 0 > F4 (hot spare)
s       "The Physical Drive is already a HOTSPARE\nPress any key to
         continue"
s       F10 (Configure), Esc, Esc
s       "Exit?" = YES
s       "Please Press Ctrl-Alt-Del to REBOOT the system", CTRL-ALT-DEL

s Single disk failure
s   Disk 1:  Fails
F   Disk 0:  FAILS TO GET INTEGRATED, DESPITE STILL BEING MARKED AS A
             HOT SPARE (as indicated by bioctl anyway)

[OK, this is really bad.  I have no way of telling the RAID card to
 recognize the hotspare, even from the LSI boot menu.  The boot
 menu is already showing it as a hot spare.]

s Reboot
s LSI boot menu
s   Configure > Easy Configuration > Esc > Esc
s  "Exit?" = YES
s   "Please Press Ctrl-Alt-Del to REBOOT the system", CTRL-ALT-DEL

[From this point, it looks as if I have no options.  Even doing an easy
 configuration didn't fix things.  That drive is still reported as a
 hot spare by both bioctl and the LSI boot menu; but it is *not* being
 integrated into a degraded array.  My only options is to clear the
 configuration from the LSI boot, and rebuild my system from scratch.]


Finding a path that does work (avoid bioctl -H)
-----------------------------
s Configured from LSI boot menu
s   Clear configuration
s   New configuration
s     Disks 0, 1, 2:  RAID 5 array
s     Disk 3:         Hot spare
s     Initialize > Logical Drive 1

s Install OpenBSD-snapshot-2006.09.25

s Single disk failure
s   Disk 0:  Fails
s   Disk 3:  Automatically replaces it

s Wait for the RAID 5 set to rebuild
s Replace disk 0 (now showing as "Unused" by bioctl)

s Reboot into LSI boot menu
s   Set disk 0 as Hot spare

s Boot into OpenBSD
s   bioctl reports disk 0:0.0 as "Hot spare"
s   Fail disk 1
s   Watch the RAID 5 array automatically incorporate disk 0 and rebuild

s Once array has finished rebuilding
s  Replace disk 1
s  Reboot into LSI boot menu
s Set Disk 1 as a hot spare (this cannot be done until the array has finished rebuilding)

s Reboot into OpenBSD and continue watching it rebuild

[No problems as long as I only set my hot spares through the LSI boot menu.]


System info (dmesg)
===========
OpenBSD 4.0-current (GENERIC) #1112: Mon Sep 25 03:49:49 MDT 2006
    [EMAIL PROTECTED]:/usr/src/sys/arch/i386/compile/GENERIC
cpu0: Intel(R) Pentium(R) 4 CPU 2.40GHz ("GenuineIntel" 686-class) 2.40 GHz cpu0: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,CNXT-ID
real mem  = 1072197632 (1047068K)
avail mem = 970051584 (947316K)
using 4256 buffers containing 53710848 bytes (52452K) of memory
mainbus0 (root)
bios0 at mainbus0: AT/286+(9a) BIOS, date 10/20/05, BIOS32 rev. 0 @ 0xfb6d0, SMBIOS rev. 2.3 @ 0xf0800 (41 entries)
bios0: Supermicro P4SC8
apm0 at bios0: Power Management spec V1.2
apm0: AC on, battery charge unknown
apm0: flags 70102 dobusy 1 doidle 1
pcibios0 at bios0: rev 2.1 @ 0xf0000/0xdf64
pcibios0: PCI IRQ Routing Table rev 1.0 @ 0xfde80/224 (12 entries)
pcibios0: PCI Exclusive IRQs: 5 9 10 11 12
pcibios0: PCI Interrupt Router at 000:31:0 ("Intel 6300ESB LPC" rev 0x00)
pcibios0: PCI bus #4 is the last bus
bios0: ROM list: 0xc0000/0x8000 0xc8000/0x2200
cpu0 at mainbus0
pci0 at mainbus0 bus 0: configuration mode 1 (no bios)
pchb0 at pci0 dev 0 function 0 "Intel 82875P Host" rev 0x02
ppb0 at pci0 dev 3 function 0 "Intel 82875P PCI-CSA" rev 0x02
pci1 at ppb0 bus 1
em0 at pci1 dev 1 function 0 "Intel PRO/1000CT (82547GI)" rev 0x00: irq 11, address 00:30:48:87:ad:e4
ppb1 at pci0 dev 28 function 0 "Intel 6300ESB PCIX" rev 0x02
pci2 at ppb1 bus 2
ppb2 at pci2 dev 2 function 0 "Intel IOP331 PCIX-PCIX" rev 0x07
pci3 at ppb2 bus 3
ami0 at pci3 dev 14 function 0 "Symbios Logic MegaRAID SATA 4x/8x" rev 0x07: irq 9
ami0: LSI 3008, 32b, FW 813G, BIOS vH425, 128MB RAM
ami0: 1 channels, 0 FC loops, 1 logical drives
scsibus0 at ami0: 40 targets
sd0 at scsibus0 targ 0 lun 0: <AMI, Host drive #00, > SCSI2 0/direct fixed sd0: 284194MB, 284194 cyl, 64 head, 32 sec, 512 bytes/sec, 582029312 sec total
scsibus1 at ami0: 16 targets
uhci0 at pci0 dev 29 function 0 "Intel 6300ESB USB" rev 0x02: irq 10
usb0 at uhci0: USB revision 1.0
uhub0 at usb0
uhub0: Intel UHCI root hub, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1 at pci0 dev 29 function 1 "Intel 6300ESB USB" rev 0x02: irq 12
usb1 at uhci1: USB revision 1.0
uhub1 at usb1
uhub1: Intel UHCI root hub, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
"Intel 6300ESB WDT" rev 0x02 at pci0 dev 29 function 4 not configured
"Intel 6300ESB APIC" rev 0x02 at pci0 dev 29 function 5 not configured
ehci0 at pci0 dev 29 function 7 "Intel 6300ESB USB" rev 0x02: irq 5
usb2 at ehci0: USB revision 2.0
uhub2 at usb2
uhub2: Intel EHCI root hub, rev 2.00/1.00, addr 1
uhub2: 4 ports with 4 removable, self powered
ppb3 at pci0 dev 30 function 0 "Intel 82801BA AGP" rev 0x0a
pci4 at ppb3 bus 4
vga1 at pci4 dev 9 function 0 "ATI Rage XL" rev 0x27
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
em1 at pci4 dev 10 function 0 "Intel PRO/1000MT (82541GI)" rev 0x00: irq 12, address 00:30:48:87:ad:e5
ichpcib0 at pci0 dev 31 function 0 "Intel 6300ESB LPC" rev 0x02
pciide0 at pci0 dev 31 function 1 "Intel 6300ESB IDE" rev 0x02: DMA, channel 0 configured to compatibility, channel 1 configured to compatibility
pciide0: channel 0 disabled (no drives)
pciide0: channel 1 disabled (no drives)
pciide1 at pci0 dev 31 function 2 "Intel 6300ESB SATA" rev 0x02: DMA, channel 0 configured to native-PCI, channel 1 configured to native-PCI
pciide1: using irq 11 for native-PCI interrupt
ichiic0 at pci0 dev 31 function 3 "Intel 6300ESB SMBus" rev 0x02: irq 9
iic0 at ichiic0
isa0 at ichpcib0
isadma0 at isa0
pckbc0 at isa0 port 0x60/5
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pcppi0 at isa0 port 0x61
midi0 at pcppi0: <PC speaker>
spkr0 at pcppi0
lpt0 at isa0 port 0x378/4 irq 7
lm0 at isa0 port 0x290/8: W83627HF
npx0 at isa0 port 0xf0/16: using exception 16
pccom0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
pccom1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo
fdc0 at isa0 port 0x3f0/6 irq 6 drq 2
fd0 at fdc0 drive 0: 1.44MB 80 cyl, 2 head, 18 sec
biomask ff65 netmask ff65 ttymask ffe7
pctr: user-level cycle counter enabled
uhub2: device problem, disabling port 1
dkcsum: sd0 matches BIOS drive 0x80
root on sd0a
rootdev=0x400 rrootdev=0xd00 rawdev=0xd02

Reply via email to