On Tue, Sep 15, 2009 at 21:35, Vikram Hegde <[email protected]> wrote: > Hi, > > Did you follow this sequence: > > 1. power down machine > 2. install hardware > 3. reboot machine > 4. install drivers/software for expander No driver is necessary, as far as I can tell. It occupies a PCI express slot, but (I believe) uses it only for power. The LSI card that drives it (I'm using the mpt driver) has been in the system for a long time now, stable.
> 5. *no* reboot > 6. crash The machine actually crashed twice. The first time I wrote it off as a power glitch despite the UPS (the lights flickered at the same time, and it's a cheap UPS) and the second time I took notes, removed memory, dug up logs and generally started debugging. > > Vikram > > Will Murnane wrote: >> >> Oh, sorry. I forgot to give details. Here's the machine in question: >> >> w...@will-fs:~$ uname -a >> SunOS will-fs 5.11 snv_122 i86pc i386 i86pc >> w...@will-fs:~$ isainfo >> amd64 i386 >> w...@will-fs:~$ prtdiag >> System Configuration: Intel Corporation S3210SH >> BIOS Configuration: Intel Corporation >> S3200X38.86B.00.00.0046.011420090950 01/14/2009 >> BMC Configuration: IPMI 2.0 (KCS: Keyboard Controller Style) >> >> ==== Processor Sockets ==================================== >> >> Version Location Tag >> -------------------------------- -------------------------- >> Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz Intel(R) Genuine processor >> >> ==== Memory Device Sockets ================================ >> >> Type Status Set Device Locator Bank Locator >> ----------- ------ --- ------------------- ---------------- >> DDR2 in use 0 DIMM_A1 CHAN A DIMM 1 >> DDR2 empty 0 DIMM_A2 CHAN A DIMM 2 >> DDR2 empty 0 DIMM_B1 CHAN B DIMM 1 >> DDR2 empty 0 DIMM_B2 CHAN B DIMM 2 >> >> ==== On-Board Devices ===================================== >> PCIe x1 VGA embedded in ServerEngine(tm) Pilot-II >> Intel 82541PI Ethernet Device >> Intel 82566DM Ethernet Device >> >> ==== Upgradeable Slots ==================================== >> >> ID Status Type Description >> --- --------- ---------------- ---------------------------- >> 1 available PCI PCI SLOT 1 PCI 32/33 >> 2 available PCI PCI SLOT 2 PCI 32/33 >> 3 available PCI Express PCIe x4 SLOT 3 >> 5 in use PCI Express PCIe x8 SLOT 5 >> 6 in use PCI Express PCIe x16 SLOT 6 / FL RISER >> >> Slot 3, which says "available" is in fact where the SAS card is. >> >> When the box first crashed, I took out one stick of memory (there's >> normally two, for a total of 4GB). The box hasn't crashed in about >> three hours now. Once zpool scrub finishes I can pull out the disks >> and put the other stick back in and see what happens. >> >> Will >> >> On Tue, Sep 15, 2009 at 21:12, Vikram Hegde <[email protected]> wrote: >> >>> >>> Hi, >>> >>> What build are you running ? Several bugs in this area have been fixed in >>> recent builds. >>> >>> Vikram >>> >>> Will Murnane wrote: >>> >>>> >>>> I recently purchased an HP-branded SAS expander, model number >>>> 468406-B21. Today I installed it, and got a panic ("genunix: [ID >>>> 655072 kern.notice]" at beginning of lines removed): >>>> panic[cpu1]/thread=ffffff0004883c60: >>>> Freeing a free IOMMU page: paddr=0x14eb7000 >>>> ffffff0004883100 rootnex:iommu_page_free+cb () >>>> ffffff0004883120 rootnex:iommu_free_page+15 () >>>> ffffff0004883190 rootnex:iommu_setup_level_table+a4 () >>>> ffffff00048831d0 rootnex:iommu_setup_page_table+e1 () >>>> ffffff0004883250 rootnex:iommu_map_page_range+6a () >>>> ffffff00048832a0 rootnex:iommu_map_dvma+50 () >>>> ffffff0004883360 rootnex:intel_iommu_map_sgl+22f () >>>> ffffff0004883400 rootnex:rootnex_coredma_bindhdl+11e () >>>> ffffff0004883440 rootnex:rootnex_dma_bindhdl+36 () >>>> ffffff00048834e0 genunix:ddi_dma_buf_bind_handle+117 () >>>> ffffff0004883540 scsi:scsi_dma_buf_bind_attr+48 () >>>> ffffff00048835d0 scsi:scsi_init_cache_pkt+2e1 () >>>> ffffff0004883650 scsi:scsi_init_pkt+5c () >>>> ffffff0004883730 sd:sd_setup_rw_pkt+12a () >>>> ffffff00048837a0 sd:sd_initpkt_for_buf+ad () >>>> ffffff0004883810 sd:sd_start_cmds+197 () >>>> ffffff0004883860 sd:sd_core_iostart+184 () >>>> ffffff00048838d0 sd:sd_mapblockaddr_iostart+302 () >>>> ffffff0004883910 sd:sd_xbuf_strategy+50 () >>>> ffffff0004883960 sd:xbuf_iostart+1e5 () >>>> ffffff00048839a0 sd:ddi_xbuf_qstrategy+d3 () >>>> ffffff00048839d0 sd:sdstrategy+10b () >>>> ffffff0004883a00 genunix:bdev_strategy+75 () >>>> ffffff0004883a30 genunix:ldi_strategy+59 () >>>> ffffff0004883a70 zfs:vdev_disk_io_start+d0 () >>>> ffffff0004883ab0 zfs:zio_vdev_io_start+17d () >>>> ffffff0004883ae0 zfs:zio_execute+a0 () >>>> ffffff0004883b00 zfs:zio_nowait+42 () >>>> ffffff0004883b40 zfs:vdev_queue_io_done+9c () >>>> ffffff0004883b70 zfs:zio_vdev_io_done+62 () >>>> ffffff0004883ba0 zfs:zio_execute+a0 () >>>> ffffff0004883c40 genunix:taskq_thread+1b7 () >>>> ffffff0004883c50 unix:thread_start+8 () >>>> >>>> On the next boot, I saw this set of messages three times: >>>> WARNING: bios issue: rmrr is not in reserved memory range >>>> WARNING: rmrr overlap with physmem [0x24400000 - 0x7fcf0000] for >>>> pci8086,34d0 >>>> WARNING: rmrr overlap with physmem [0x7fd96000 - 0x7fdfd000] for >>>> pci8086,34d0 >>>> >>>> and this, several times: >>>> NOTICE: IRQ21 is being shared by drivers with different interrupt >>>> levels. >>>> This may result in reduced system performance. >>>> >>>> I found a suggestion for showing the interrupt map: >>>> # echo ::interrupts -d | mdb -k >>>> IRQ Vect IPL Bus Trg Type CPU Share APIC/INT# Driver Name(s) >>>> 1 0x41 5 ISA Edg Fixed 3 1 0x0/0x1 i8042#1 >>>> 3 0xb1 12 ISA Edg Fixed 3 1 0x0/0x3 asy#3 >>>> 4 0xb0 12 ISA Edg Fixed 2 1 0x0/0x4 asy#2 >>>> 6 0x44 5 ISA Edg Fixed 1 1 0x0/0x6 fdc#1 >>>> 9 0x82 9 PCI Lvl Fixed 1 1 0x0/0x9 acpi_wrapper_isr >>>> 12 0x42 5 ISA Edg Fixed 0 1 0x0/0xc i8042#1 >>>> 14 0x40 5 ISA Edg Fixed 2 1 0x0/0xe ata#2 >>>> 16 0x84 9 PCI Lvl Fixed 3 1 0x0/0x10 nvidia#1 >>>> 17 0x85 9 PCI Lvl Fixed 0 1 0x0/0x11 ehci#2 >>>> 18 0x87 9 PCI Lvl Fixed 2 3 0x0/0x12 e1000g#3, uhci#10, >>>> uhci#6 >>>> 19 0x89 9 PCI Lvl Fixed 0 1 0x0/0x13 uhci#9 >>>> 21 0x88 9 PCI Lvl Fixed 3 1 0x0/0x15 pci-ide#2 >>>> 23 0x86 9 PCI Lvl Fixed 1 2 0x0/0x17 uhci#8, ehci#3 >>>> 24 0x83 7 PCI Edg MSI 1 1 - pcieb#2 >>>> 25 0x30 4 PCI Edg MSI 2 1 - pcieb#3 >>>> 26 0x60 6 PCI Edg MSI 1 1 - e1000g#0 >>>> 27 0x43 5 PCI Edg MSI 3 1 - mpt#0 >>>> 128 0x81 8 Edg IPI all 1 - iommu_intr_handler >>>> 160 0xa0 0 Edg IPI all 0 - poke_cpu >>>> 208 0xd0 14 Edg IPI all 1 - >>>> kcpc_hw_overflow_intr >>>> 209 0xd1 14 Edg IPI all 1 - cbe_fire >>>> 210 0xd3 14 Edg IPI all 1 - cbe_fire >>>> 240 0xe0 15 Edg IPI all 1 - xc_serv >>>> 241 0xe1 15 Edg IPI all 1 - apic_error_intr >>>> >>>> ... which seems to indicate that IRQ21 is in fact not shared. >>>> >>>> Any suggestions where I should start debugging this? From what I can >>>> find, the rmrr issue is a should-never-happen case; should I raise a >>>> bug with Intel? This didn't happen before adding the SAS expander, so >>>> I'm reluctant to point fingers at the onboard ethernet controller >>>> (whose PCI ID that is). Maybe swapping cards around will help; I'll >>>> try that in a bit. >>>> >>>> Will >>>> _______________________________________________ >>>> indiana-discuss mailing list >>>> [email protected] >>>> http://mail.opensolaris.org/mailman/listinfo/indiana-discuss >>>> >>>> >>> >>> > > _______________________________________________ indiana-discuss mailing list [email protected] http://mail.opensolaris.org/mailman/listinfo/indiana-discuss
