Hi,

Did you follow this sequence:

1. power down machine
2. install hardware
3. reboot machine
4. install drivers/software for expander
5. *no* reboot
6. crash

Vikram

Will Murnane wrote:
Oh, sorry.  I forgot to give details.  Here's the machine in question:

w...@will-fs:~$ uname -a
SunOS will-fs 5.11 snv_122 i86pc i386 i86pc
w...@will-fs:~$ isainfo
amd64 i386
w...@will-fs:~$ prtdiag
System Configuration: Intel Corporation S3210SH
BIOS Configuration: Intel Corporation
S3200X38.86B.00.00.0046.011420090950 01/14/2009
BMC Configuration: IPMI 2.0 (KCS: Keyboard Controller Style)

==== Processor Sockets ====================================

Version                          Location Tag
-------------------------------- --------------------------
Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz Intel(R) Genuine processor

==== Memory Device Sockets ================================

Type        Status Set Device Locator      Bank Locator
----------- ------ --- ------------------- ----------------
DDR2        in use 0   DIMM_A1             CHAN A DIMM 1
DDR2        empty  0   DIMM_A2             CHAN A DIMM 2
DDR2        empty  0   DIMM_B1             CHAN B DIMM 1
DDR2        empty  0   DIMM_B2             CHAN B DIMM 2

==== On-Board Devices =====================================
PCIe x1 VGA embedded in ServerEngine(tm) Pilot-II
Intel 82541PI Ethernet Device
Intel 82566DM Ethernet Device

==== Upgradeable Slots ====================================

ID  Status    Type             Description
--- --------- ---------------- ----------------------------
1   available PCI              PCI SLOT 1 PCI 32/33
2   available PCI              PCI SLOT 2 PCI 32/33
3   available PCI Express      PCIe x4 SLOT 3
5   in use    PCI Express      PCIe x8 SLOT 5
6   in use    PCI Express      PCIe x16 SLOT 6 / FL RISER

Slot 3, which says "available" is in fact where the SAS card is.

When the box first crashed, I took out one stick of memory (there's
normally two, for a total of 4GB).  The box hasn't crashed in about
three hours now.  Once zpool scrub finishes I can pull out the disks
and put the other stick back in and see what happens.

Will

On Tue, Sep 15, 2009 at 21:12, Vikram Hegde <[email protected]> wrote:
Hi,

What build are you running ? Several bugs in this area have been fixed in
recent builds.

Vikram

Will Murnane wrote:
I recently purchased an HP-branded SAS expander, model number
468406-B21.  Today I installed it, and got a panic ("genunix: [ID
655072 kern.notice]" at beginning of lines removed):
 panic[cpu1]/thread=ffffff0004883c60:
 Freeing a free IOMMU page: paddr=0x14eb7000
 ffffff0004883100 rootnex:iommu_page_free+cb ()
 ffffff0004883120 rootnex:iommu_free_page+15 ()
 ffffff0004883190 rootnex:iommu_setup_level_table+a4 ()
 ffffff00048831d0 rootnex:iommu_setup_page_table+e1 ()
 ffffff0004883250 rootnex:iommu_map_page_range+6a ()
 ffffff00048832a0 rootnex:iommu_map_dvma+50 ()
 ffffff0004883360 rootnex:intel_iommu_map_sgl+22f ()
 ffffff0004883400 rootnex:rootnex_coredma_bindhdl+11e ()
 ffffff0004883440 rootnex:rootnex_dma_bindhdl+36 ()
 ffffff00048834e0 genunix:ddi_dma_buf_bind_handle+117 ()
 ffffff0004883540 scsi:scsi_dma_buf_bind_attr+48 ()
 ffffff00048835d0 scsi:scsi_init_cache_pkt+2e1 ()
 ffffff0004883650 scsi:scsi_init_pkt+5c ()
 ffffff0004883730 sd:sd_setup_rw_pkt+12a ()
 ffffff00048837a0 sd:sd_initpkt_for_buf+ad ()
 ffffff0004883810 sd:sd_start_cmds+197 ()
 ffffff0004883860 sd:sd_core_iostart+184 ()
 ffffff00048838d0 sd:sd_mapblockaddr_iostart+302 ()
 ffffff0004883910 sd:sd_xbuf_strategy+50 ()
 ffffff0004883960 sd:xbuf_iostart+1e5 ()
 ffffff00048839a0 sd:ddi_xbuf_qstrategy+d3 ()
 ffffff00048839d0 sd:sdstrategy+10b ()
 ffffff0004883a00 genunix:bdev_strategy+75 ()
 ffffff0004883a30 genunix:ldi_strategy+59 ()
 ffffff0004883a70 zfs:vdev_disk_io_start+d0 ()
 ffffff0004883ab0 zfs:zio_vdev_io_start+17d ()
 ffffff0004883ae0 zfs:zio_execute+a0 ()
 ffffff0004883b00 zfs:zio_nowait+42 ()
 ffffff0004883b40 zfs:vdev_queue_io_done+9c ()
 ffffff0004883b70 zfs:zio_vdev_io_done+62 ()
 ffffff0004883ba0 zfs:zio_execute+a0 ()
 ffffff0004883c40 genunix:taskq_thread+1b7 ()
 ffffff0004883c50 unix:thread_start+8 ()

On the next boot, I saw this set of messages three times:
WARNING: bios issue: rmrr is not in reserved memory range
WARNING: rmrr overlap with physmem [0x24400000 - 0x7fcf0000] for
pci8086,34d0
WARNING: rmrr overlap with physmem [0x7fd96000 - 0x7fdfd000] for
pci8086,34d0

and this, several times:
NOTICE: IRQ21 is being shared by drivers with different interrupt levels.
This may result in reduced system performance.

I found a suggestion for showing the interrupt map:
# echo ::interrupts -d | mdb -k
IRQ  Vect IPL Bus    Trg Type   CPU Share APIC/INT# Driver Name(s)
1    0x41 5   ISA    Edg Fixed  3   1     0x0/0x1   i8042#1
3    0xb1 12  ISA    Edg Fixed  3   1     0x0/0x3   asy#3
4    0xb0 12  ISA    Edg Fixed  2   1     0x0/0x4   asy#2
6    0x44 5   ISA    Edg Fixed  1   1     0x0/0x6   fdc#1
9    0x82 9   PCI    Lvl Fixed  1   1     0x0/0x9   acpi_wrapper_isr
12   0x42 5   ISA    Edg Fixed  0   1     0x0/0xc   i8042#1
14   0x40 5   ISA    Edg Fixed  2   1     0x0/0xe   ata#2
16   0x84 9   PCI    Lvl Fixed  3   1     0x0/0x10  nvidia#1
17   0x85 9   PCI    Lvl Fixed  0   1     0x0/0x11  ehci#2
18   0x87 9   PCI    Lvl Fixed  2   3     0x0/0x12  e1000g#3, uhci#10,
uhci#6
19   0x89 9   PCI    Lvl Fixed  0   1     0x0/0x13  uhci#9
21   0x88 9   PCI    Lvl Fixed  3   1     0x0/0x15  pci-ide#2
23   0x86 9   PCI    Lvl Fixed  1   2     0x0/0x17  uhci#8, ehci#3
24   0x83 7   PCI    Edg MSI    1   1     -         pcieb#2
25   0x30 4   PCI    Edg MSI    2   1     -         pcieb#3
26   0x60 6   PCI    Edg MSI    1   1     -         e1000g#0
27   0x43 5   PCI    Edg MSI    3   1     -         mpt#0
128  0x81 8          Edg IPI    all 1     -         iommu_intr_handler
160  0xa0 0          Edg IPI    all 0     -         poke_cpu
208  0xd0 14         Edg IPI    all 1     -         kcpc_hw_overflow_intr
209  0xd1 14         Edg IPI    all 1     -         cbe_fire
210  0xd3 14         Edg IPI    all 1     -         cbe_fire
240  0xe0 15         Edg IPI    all 1     -         xc_serv
241  0xe1 15         Edg IPI    all 1     -         apic_error_intr

... which seems to indicate that IRQ21 is in fact not shared.

Any suggestions where I should start debugging this?  From what I can
find, the rmrr issue is a should-never-happen case; should I raise a
bug with Intel?  This didn't happen before adding the SAS expander, so
I'm reluctant to point fingers at the onboard ethernet controller
(whose PCI ID that is).  Maybe swapping cards around will help; I'll
try that in a bit.

Will
_______________________________________________
indiana-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/indiana-discuss


_______________________________________________
indiana-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/indiana-discuss

Reply via email to