This problem may have been more far-reaching than I thought: the machine is now entirely dead. Testing shows the motherboard is blown, not the processor or memory, so I'll get a replacement in and then report on whether the expander works with that.
Will On Tue, Sep 15, 2009 at 22:42, Vikram Hegde <[email protected]> wrote: > Ok, I will investigate. I may contact you privately to run some tests > > Will Murnane <[email protected]> wrote: > >>On Tue, Sep 15, 2009 at 21:35, Vikram Hegde <[email protected]> wrote: >>> Hi, >>> >>> Did you follow this sequence: >>> >>> 1. power down machine >>> 2. install hardware >>> 3. reboot machine >>> 4. install drivers/software for expander >>No driver is necessary, as far as I can tell. It occupies a PCI >>express slot, but (I believe) uses it only for power. The LSI card >>that drives it (I'm using the mpt driver) has been in the system for a >>long time now, stable. >> >>> 5. *no* reboot >>> 6. crash >>The machine actually crashed twice. The first time I wrote it off as >>a power glitch despite the UPS (the lights flickered at the same time, >>and it's a cheap UPS) and the second time I took notes, removed >>memory, dug up logs and generally started debugging. >> >>> >>> Vikram >>> >>> Will Murnane wrote: >>>> >>>> Oh, sorry. I forgot to give details. Here's the machine in question: >>>> >>>> w...@will-fs:~$ uname -a >>>> SunOS will-fs 5.11 snv_122 i86pc i386 i86pc >>>> w...@will-fs:~$ isainfo >>>> amd64 i386 >>>> w...@will-fs:~$ prtdiag >>>> System Configuration: Intel Corporation S3210SH >>>> BIOS Configuration: Intel Corporation >>>> S3200X38.86B.00.00.0046.011420090950 01/14/2009 >>>> BMC Configuration: IPMI 2.0 (KCS: Keyboard Controller Style) >>>> >>>> ==== Processor Sockets ==================================== >>>> >>>> Version Location Tag >>>> -------------------------------- -------------------------- >>>> Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz Intel(R) Genuine processor >>>> >>>> ==== Memory Device Sockets ================================ >>>> >>>> Type Status Set Device Locator Bank Locator >>>> ----------- ------ --- ------------------- ---------------- >>>> DDR2 in use 0 DIMM_A1 CHAN A DIMM 1 >>>> DDR2 empty 0 DIMM_A2 CHAN A DIMM 2 >>>> DDR2 empty 0 DIMM_B1 CHAN B DIMM 1 >>>> DDR2 empty 0 DIMM_B2 CHAN B DIMM 2 >>>> >>>> ==== On-Board Devices ===================================== >>>> PCIe x1 VGA embedded in ServerEngine(tm) Pilot-II >>>> Intel 82541PI Ethernet Device >>>> Intel 82566DM Ethernet Device >>>> >>>> ==== Upgradeable Slots ==================================== >>>> >>>> ID Status Type Description >>>> --- --------- ---------------- ---------------------------- >>>> 1 available PCI PCI SLOT 1 PCI 32/33 >>>> 2 available PCI PCI SLOT 2 PCI 32/33 >>>> 3 available PCI Express PCIe x4 SLOT 3 >>>> 5 in use PCI Express PCIe x8 SLOT 5 >>>> 6 in use PCI Express PCIe x16 SLOT 6 / FL RISER >>>> >>>> Slot 3, which says "available" is in fact where the SAS card is. >>>> >>>> When the box first crashed, I took out one stick of memory (there's >>>> normally two, for a total of 4GB). The box hasn't crashed in about >>>> three hours now. Once zpool scrub finishes I can pull out the disks >>>> and put the other stick back in and see what happens. >>>> >>>> Will >>>> >>>> On Tue, Sep 15, 2009 at 21:12, Vikram Hegde <[email protected]> wrote: >>>> >>>>> >>>>> Hi, >>>>> >>>>> What build are you running ? Several bugs in this area have been fixed in >>>>> recent builds. >>>>> >>>>> Vikram >>>>> >>>>> Will Murnane wrote: >>>>> >>>>>> >>>>>> I recently purchased an HP-branded SAS expander, model number >>>>>> 468406-B21. Today I installed it, and got a panic ("genunix: [ID >>>>>> 655072 kern.notice]" at beginning of lines removed): >>>>>> panic[cpu1]/thread=ffffff0004883c60: >>>>>> Freeing a free IOMMU page: paddr=0x14eb7000 >>>>>> ffffff0004883100 rootnex:iommu_page_free+cb () >>>>>> ffffff0004883120 rootnex:iommu_free_page+15 () >>>>>> ffffff0004883190 rootnex:iommu_setup_level_table+a4 () >>>>>> ffffff00048831d0 rootnex:iommu_setup_page_table+e1 () >>>>>> ffffff0004883250 rootnex:iommu_map_page_range+6a () >>>>>> ffffff00048832a0 rootnex:iommu_map_dvma+50 () >>>>>> ffffff0004883360 rootnex:intel_iommu_map_sgl+22f () >>>>>> ffffff0004883400 rootnex:rootnex_coredma_bindhdl+11e () >>>>>> ffffff0004883440 rootnex:rootnex_dma_bindhdl+36 () >>>>>> ffffff00048834e0 genunix:ddi_dma_buf_bind_handle+117 () >>>>>> ffffff0004883540 scsi:scsi_dma_buf_bind_attr+48 () >>>>>> ffffff00048835d0 scsi:scsi_init_cache_pkt+2e1 () >>>>>> ffffff0004883650 scsi:scsi_init_pkt+5c () >>>>>> ffffff0004883730 sd:sd_setup_rw_pkt+12a () >>>>>> ffffff00048837a0 sd:sd_initpkt_for_buf+ad () >>>>>> ffffff0004883810 sd:sd_start_cmds+197 () >>>>>> ffffff0004883860 sd:sd_core_iostart+184 () >>>>>> ffffff00048838d0 sd:sd_mapblockaddr_iostart+302 () >>>>>> ffffff0004883910 sd:sd_xbuf_strategy+50 () >>>>>> ffffff0004883960 sd:xbuf_iostart+1e5 () >>>>>> ffffff00048839a0 sd:ddi_xbuf_qstrategy+d3 () >>>>>> ffffff00048839d0 sd:sdstrategy+10b () >>>>>> ffffff0004883a00 genunix:bdev_strategy+75 () >>>>>> ffffff0004883a30 genunix:ldi_strategy+59 () >>>>>> ffffff0004883a70 zfs:vdev_disk_io_start+d0 () >>>>>> ffffff0004883ab0 zfs:zio_vdev_io_start+17d () >>>>>> ffffff0004883ae0 zfs:zio_execute+a0 () >>>>>> ffffff0004883b00 zfs:zio_nowait+42 () >>>>>> ffffff0004883b40 zfs:vdev_queue_io_done+9c () >>>>>> ffffff0004883b70 zfs:zio_vdev_io_done+62 () >>>>>> ffffff0004883ba0 zfs:zio_execute+a0 () >>>>>> ffffff0004883c40 genunix:taskq_thread+1b7 () >>>>>> ffffff0004883c50 unix:thread_start+8 () >>>>>> >>>>>> On the next boot, I saw this set of messages three times: >>>>>> WARNING: bios issue: rmrr is not in reserved memory range >>>>>> WARNING: rmrr overlap with physmem [0x24400000 - 0x7fcf0000] for >>>>>> pci8086,34d0 >>>>>> WARNING: rmrr overlap with physmem [0x7fd96000 - 0x7fdfd000] for >>>>>> pci8086,34d0 >>>>>> >>>>>> and this, several times: >>>>>> NOTICE: IRQ21 is being shared by drivers with different interrupt >>>>>> levels. >>>>>> This may result in reduced system performance. >>>>>> >>>>>> I found a suggestion for showing the interrupt map: >>>>>> # echo ::interrupts -d | mdb -k >>>>>> IRQ Vect IPL Bus Trg Type CPU Share APIC/INT# Driver Name(s) >>>>>> 1 0x41 5 ISA Edg Fixed 3 1 0x0/0x1 i8042#1 >>>>>> 3 0xb1 12 ISA Edg Fixed 3 1 0x0/0x3 asy#3 >>>>>> 4 0xb0 12 ISA Edg Fixed 2 1 0x0/0x4 asy#2 >>>>>> 6 0x44 5 ISA Edg Fixed 1 1 0x0/0x6 fdc#1 >>>>>> 9 0x82 9 PCI Lvl Fixed 1 1 0x0/0x9 acpi_wrapper_isr >>>>>> 12 0x42 5 ISA Edg Fixed 0 1 0x0/0xc i8042#1 >>>>>> 14 0x40 5 ISA Edg Fixed 2 1 0x0/0xe ata#2 >>>>>> 16 0x84 9 PCI Lvl Fixed 3 1 0x0/0x10 nvidia#1 >>>>>> 17 0x85 9 PCI Lvl Fixed 0 1 0x0/0x11 ehci#2 >>>>>> 18 0x87 9 PCI Lvl Fixed 2 3 0x0/0x12 e1000g#3, uhci#10, >>>>>> uhci#6 >>>>>> 19 0x89 9 PCI Lvl Fixed 0 1 0x0/0x13 uhci#9 >>>>>> 21 0x88 9 PCI Lvl Fixed 3 1 0x0/0x15 pci-ide#2 >>>>>> 23 0x86 9 PCI Lvl Fixed 1 2 0x0/0x17 uhci#8, ehci#3 >>>>>> 24 0x83 7 PCI Edg MSI 1 1 - pcieb#2 >>>>>> 25 0x30 4 PCI Edg MSI 2 1 - pcieb#3 >>>>>> 26 0x60 6 PCI Edg MSI 1 1 - e1000g#0 >>>>>> 27 0x43 5 PCI Edg MSI 3 1 - mpt#0 >>>>>> 128 0x81 8 Edg IPI all 1 - iommu_intr_handler >>>>>> 160 0xa0 0 Edg IPI all 0 - poke_cpu >>>>>> 208 0xd0 14 Edg IPI all 1 - >>>>>> kcpc_hw_overflow_intr >>>>>> 209 0xd1 14 Edg IPI all 1 - cbe_fire >>>>>> 210 0xd3 14 Edg IPI all 1 - cbe_fire >>>>>> 240 0xe0 15 Edg IPI all 1 - xc_serv >>>>>> 241 0xe1 15 Edg IPI all 1 - apic_error_intr >>>>>> >>>>>> ... which seems to indicate that IRQ21 is in fact not shared. >>>>>> >>>>>> Any suggestions where I should start debugging this? From what I can >>>>>> find, the rmrr issue is a should-never-happen case; should I raise a >>>>>> bug with Intel? This didn't happen before adding the SAS expander, so >>>>>> I'm reluctant to point fingers at the onboard ethernet controller >>>>>> (whose PCI ID that is). Maybe swapping cards around will help; I'll >>>>>> try that in a bit. >>>>>> >>>>>> Will >>>>>> _______________________________________________ >>>>>> indiana-discuss mailing list >>>>>> [email protected] >>>>>> http://mail.opensolaris.org/mailman/listinfo/indiana-discuss >>>>>> >>>>>> >>>>> >>>>> >>> >>> > _______________________________________________ indiana-discuss mailing list [email protected] http://mail.opensolaris.org/mailman/listinfo/indiana-discuss
