Hi all, Recently one of the servers that my company uses , a Dell R710, attached to 2 Sun JBOD J4400 started to crash quite often. Finally i got a message in /var/adm/messages that might point to something usefull, but i don't have the expertise to start to troubleshooting this problem, so any help would be highly valuable.
Thanks in advance, Bruno Sousa The significant messages are : Apr 13 11:12:04 san01 savecore: [ID 570001 auth.error] reboot after panic: Freeing a free IOMMU page: paddr=0xccca2000 Apr 13 11:12:04 san01 savecore: [ID 385089 auth.error] Saving compressed system crash dump in /var/crash/san01/vmdump.0 I also noticed other "interesting" messages like : Apr 13 11:11:10 san01 unix: [ID 378719 kern.info] NOTICE: cpu_acpi: _PSS package evaluation failed for with status 5 for CPU 0. Apr 13 11:11:10 san01 unix: [ID 388705 kern.info] NOTICE: cpu_acpi: error parsing _PSS for CPU 0 Apr 13 11:11:10 san01 unix: [ID 928200 kern.info] NOTICE: SpeedStep support is being disabled due to errors parsing ACPI P-state objects exported by BIOS Apr 13 11:10:50 san01 scsi: [ID 243001 kern.info] /p...@0,0/pci8086,3...@4/pci1028,1...@0 (mpt0): Apr 13 11:10:50 san01 DMA restricted below 4GB boundary due to errata Apr 13 11:11:32 san01 scsi: [ID 243001 kern.info] /p...@0,0/pci8086,3...@9/pci1000,3...@0 (mpt2): Apr 13 11:11:32 san01 DMA restricted below 4GB boundary due to errata Relevant specs of the machine : SunOS san01 5.11 snv_134 i86pc i386 i86pc Solaris rpool boot drives attached to a Dell SAS6/iR Integrated RAID Controller (mpt0 Firmware version v0.25.47.0 (IR) ) 2 HBA LSI 1068E, each connect to a J4400 jbod (mpt1 Firmware version v1.26.0.0 (IT) ) multipath enabled and working 2 Quad-Cores, 16Gb ram Detailed info : mdb -k unix.0 vmcore.0 mdb: warning: dump is from SunOS 5.11 snv_132; dcmds and macros may not match kernel implementation Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc pcplusmp scsi_vhci zfs mpt sd sockfs ip hook neti sctp arp usba uhci fctl stmf md lofs idm nfs random sppp fcip cpc crypto logindmux ptm nsctl ufs ipc ] ::status debugging crash dump vmcore.0 (64-bit) from san01 operating system: 5.11 snv_132 (i86pc) panic message: Freeing a free IOMMU page: paddr=0xccca2000 dump content: kernel pages only ::stack vpanic() iommu_page_free+0xcb(ffffff04e3da5000, ccca2000) iommu_free_page+0x15(ffffff04e3da5000, ccca2000) iommu_setup_level_table+0xa0(ffffff054406d000, ffffff0543b99000, 8) iommu_setup_page_table+0xa0(ffffff054406d000, 100c000) iommu_map_page_range+0x6a(ffffff054406d000, 100c000, 3c2329000, 3c2329000, 2) iommu_map_dvma+0x50(ffffff054406d000, 100c000, 3c2329000, 1000, ffffff001f7f31d0) intel_iommu_map_sgl+0x22f(ffffff0553b43e00, ffffff001f7f31d0, 41) rootnex_coredma_bindhdl+0x11e(ffffff04e3ef5cb0, ffffff04e607f540, ffffff0553b43e00, ffffff001f7f31d0, ffffff0553efdc50, ffffff0553efdbf8) rootnex_dma_bindhdl+0x36(ffffff04e3ef5cb0, ffffff04e607f540, ffffff0553b43e00, ffffff001f7f31d0, ffffff0553efdc50, ffffff0553efdbf8) ddi_dma_buf_bind_handle+0x117(ffffff0553b43e00, ffffff055860cd00, a, 0, 0, ffffff0553efdc50) scsi_dma_buf_bind_attr+0x48(ffffff0553efdb90, ffffff055860cd00, a, 0, 0) scsi_init_cache_pkt+0x2d0(ffffff05456302e0, 0, ffffff055860cd00, a, 20, 0) scsi_init_pkt+0x5c(ffffff05456302e0, 0, ffffff055860cd00, a, 20, 0) vhci_bind_transport+0x54d(ffffff0543191c58, ffffff055d2f8968, 40000, 0) vhci_scsi_init_pkt+0x160(ffffff0543191c58, 0, ffffff055860cd00, a, 20, 0) scsi_init_pkt+0x5c(ffffff0543191c58, 0, ffffff055860cd00, a, 20, 0) sd_setup_rw_pkt+0x12a(ffffff0543b9d080, ffffff001f7f3688, ffffff055860cd00, 40000, fffffffff7a91b80, ffffff0543b9d080) sd_initpkt_for_buf+0xad(ffffff055860cd00, ffffff001f7f36f8) sd_start_cmds+0x197(ffffff0543b9d080, 0) sd_core_iostart+0x186(4, ffffff0543b9d080, ffffff055860cd00) sd_mapblockaddr_iostart+0x306(3, ffffff0543b9d080, ffffff055860cd00) sd_xbuf_strategy+0x50(ffffff055860cd00, ffffff0544cf0a00, ffffff0543b9d080) xbuf_iostart+0x1e5(ffffff04f21cce80) ddi_xbuf_qstrategy+0xd3(ffffff055860cd00, ffffff04f21cce80) sdstrategy+0x101(ffffff055860cd00) bdev_strategy+0x75(ffffff055860cd00) ldi_strategy+0x59(ffffff04f29a4df8, ffffff055860cd00) vdev_disk_io_start+0xd0(ffffff055c2379a0) zio_vdev_io_start+0x17d(ffffff055c2379a0) zio_execute+0x8d(ffffff055c2379a0) vdev_queue_io_done+0x92(ffffff055c2fe680) zio_vdev_io_done+0x62(ffffff055c2fe680) zio_execute+0x8d(ffffff055c2fe680) taskq_thread+0x248(ffffff0543a086a0) thread_start+8() ::msgbuf panic[cpu4]/thread=ffffff001f7f3c60: Freeing a free IOMMU page: paddr=0xccca2000 ffffff001f7f2e90 rootnex:iommu_page_free+cb () ffffff001f7f2eb0 rootnex:iommu_free_page+15 () ffffff001f7f2f10 rootnex:iommu_setup_level_table+a0 () ffffff001f7f2f50 rootnex:iommu_setup_page_table+a0 () ffffff001f7f2fd0 rootnex:iommu_map_page_range+6a () ffffff001f7f3020 rootnex:iommu_map_dvma+50 () ffffff001f7f30e0 rootnex:intel_iommu_map_sgl+22f () ffffff001f7f3180 rootnex:rootnex_coredma_bindhdl+11e () ffffff001f7f31c0 rootnex:rootnex_dma_bindhdl+36 () ffffff001f7f3260 genunix:ddi_dma_buf_bind_handle+117 () ffffff001f7f32c0 scsi:scsi_dma_buf_bind_attr+48 () ffffff001f7f3350 scsi:scsi_init_cache_pkt+2d0 () ffffff001f7f33d0 scsi:scsi_init_pkt+5c () ffffff001f7f3480 scsi_vhci:vhci_bind_transport+54d () ffffff001f7f3500 scsi_vhci:vhci_scsi_init_pkt+160 () ffffff001f7f3580 scsi:scsi_init_pkt+5c () ffffff001f7f3660 sd:sd_setup_rw_pkt+12a () ffffff001f7f36d0 sd:sd_initpkt_for_buf+ad () ffffff001f7f3740 sd:sd_start_cmds+197 () ::panicinfo cpu 4 thread ffffff001f7f3c60 message Freeing a free IOMMU page: paddr=0xccca2000 rdi fffffffff78ede80 rsi ffffff001f7f2e10 rdx ccca2000 rcx 1 r8 ffffff001f7f2d60 r9 ffffff001f7f2e60 rax 0 rbx 3 rbp ffffff001f7f2e50 r10 ffffff0561edd000 r10 ffffff0561edd000 r11 ffffff0000003000 r12 fffffffff78ede80 r13 ffffff04e3da5000 r14 0 r15 ccca2000 fsbase 0 gsbase ffffff04f32e0000 ds 4b es 4b fs 0 gs 1c3 trapno 0 err 0 rip fffffffffb862550 cs 30 rflags 246 rsp ffffff001f7f2d58 ss 38 gdt_hi 0 gdt_lo b00001ef idt_hi 0 idt_lo 20000fff ldt 0 task 70 cr0 8005003b cr2 fe6e971b cr3 4000000 cr4 6f8 ::cpuinfo -v 0 fffffffffbc2f9e0 1f 1 0 -1 no no t-0 ffffff001e805c60 (idle) | | RUNNING <--+ +--> PRI THREAD PROC READY 60 ffffff00202a2c60 sched QUIESCED EXISTS ENABLE ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC 1 ffffff04f32e8040 1f 0 0 99 no no t-0 ffffff001fbadc60 zpool-TEST | RUNNING <--+ READY QUIESCED EXISTS ENABLE ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC 2 ffffff04f32e6b00 1f 0 0 99 no no t-0 ffffff001fbc5c60 zpool-TEST | RUNNING <--+ READY QUIESCED EXISTS ENABLE ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC 3 ffffff04f32e1500 1f 1 0 -1 no no t-0 ffffff001f0e3c60 (idle) | | RUNNING <--+ +--> PRI THREAD PROC READY 60 ffffff001e985c60 sched QUIESCED EXISTS ENABLE ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC 4 fffffffffbc3a000 1b 0 0 99 no no t-0 ffffff001f7f3c60 zpool-TEST | RUNNING <--+ READY EXISTS ENABLE ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC 5 ffffff04f32dcac0 1f 0 0 99 no no t-0 ffffff001f7d5c60 zpool-TEST | RUNNING <--+ READY QUIESCED EXISTS ENABLE ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC 6 ffffff04f3897b00 1f 0 0 104 no no t-0 ffffff001f413c60 sched | | RUNNING <--+ +--> PIL THREAD READY 5 ffffff001f413c60 QUIESCED - ffffff001ff99c60 sched EXISTS ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC 7 ffffff04f3894500 1f 0 0 99 no no t-0 ffffff001f7e1c60 zpool-TEST | RUNNING <--+ READY QUIESCED EXISTS ENABLE
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org