Hello,

Recently, I find a crash issue with my Sparc SunFire V215 server. The OS is 
nv76. I doubt that it is probably related to writing debug message to the log 
file : /var/adm/messages, because crash seems happen more often if my driver 
allows more debug messages to be written into the log file. 
But I am not very sure yet. 

All of these fatal error occured in: "PCIe root complex" and running 
"px_err_panic ". Below is my analysis using mdb. 

There might be one or more different threads running when panic happens, but 
one thread writing messages as below always appears in all crash dump, see the 
result of " 30001418080::findstack -v " below.

Is it truly a log message writing bug of the OS?

Tom

::msgbuf
panic[cpu0]/thread=3000201ec20:       
Fatal error has occured in: PCIe root complex.
                                      
                                      
000002a10045fd50 px:px_err_panic+164 (11, 1, 13a0400, 2a10045fe00, 2a10045fe01, 
0)                                    
  %l0-3: 0000000000000001 0000000000000034 00000000018f6400 0000060010817000
  %l4-7: 00000000000ffc00 0000000000000000 000000000183d800 ffffffffffffffff
000002a10045fe60 px:px_err_dmc_pec_intr+cc (300000b5cb0, 0, 300000b5d78, 1, 3000
03c6688, 300000b5c40)                 

  %l0-3: 0000009882001a02 0000000000004000 0000000000000000 0000000000000003
  %l4-7: 0000000000000000 00000000004b0918 0000000000000000 0000000000000011
000002a10045ff50 unix:current_thread+1b8 (fa000000, 0, 4000000, 0, 39f1600, fa60
ea00)                                 
  %l0-3: 00000000010074cc 000002a100c952e1 000000000000000e 0000000070008440
  %l4-7: 000000000000001a 0000000000000068 000003000201ec20 000002a100c95b90
                                      
syncing file systems...               
 3                                    
 3                                    
 done                                 
dumping to /dev/dsk/c1t0d0s1, offset 318898176, content: kernel
> 

> ::cpuinfo -v
 ID ADDR        FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD      PROC
  0 00001838610  1b    0    0  59   no    no t-0    3000201ec20 Xsun
                  |    
       RUNNING <--+    
         READY         
        EXISTS         
        ENABLE         

 ID ADDR        FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD      PROC
  1 0000180c000  1d    5    0   0  yes    no t-90   30001418080 syslogd
                  |    |
       RUNNING <--+    +-->  PRI THREAD      PROC
      QUIESCED                99 2a100247ca0 sched
        EXISTS                59 3000201e5a0 java
        ENABLE                59 300020e6080 java
                              59 3000201e260 java
                              59 3000145b4e0 java


A thread writing messages always can be seen in all these crash dumps.

c> 30001418080::findstack -v
stack pointer for thread 30001418080: 2a10046dca1
[ 000002a10046dca1 panic_idle+0x1c() ]
  000002a10046dd51 ktl0+0x48(ffffffffffffffff, 7f904a00070, 0, 0, 7f9, 
  30001418080)
  000002a10046dea1 bcopy_more+0x454(3, 16ec, 1400, 0, ffffffffffffffff, 0)
  000002a10046dff1 pfb_setup_cmap32+0x74(600108d4000, 0, 14ec, 0, 600108d4aec, 
  600108d48ec)
  000002a10046e0c1 pfb_vis_consdisplay+0x138(600108d4000, 580, 1400, 600108d5740
  , 1800, 1400)
  000002a10046e1b1 tem_display_layered+0x20(600109a2f70, 2a10046eb30, 
  60010803e38, 0, 7ffffc00, 5400)
  000002a10046e271 tem_pix_cls_range+0xf0(600109a2f70, 0, 1, 360, a0, 50)
  000002a10046e361 tem_pix_cls+0x3c(0, 50, 21, 0, 60010803e38, 0)
  000002a10046e431 tem_scroll+0xe0(600109a2f70, 300003d4b50, 21, 0, 0, 
  60010803e38)
  000002a10046e501 tem_lf+0x50(600109a2f70, 60010803e38, 0, 0, 21, 300003d4ac0)
  000002a10046e5c1 tem_terminal_emulate+0x40(600109a2f70, 6001160a193, 
  60010803e38, 300003d4ac0, 0, 7bb7d15c)
  000002a10046e671 tem_write+0x30(600109a2f70, 6001160a170, 24, 60010803e38, 1, 
  600109a2f88)
  000002a10046e721 wcstart+0xfc(600109c4620, 600122abcc0, 38, 0, 18bc400, 
  600122abcc0)
  000002a10046e7d1 wcuwput+0x370(600109c4620, 600122abcc0, 1388, 1388, 
  2a10046f130, 2a10046a000)           
  000002a10046e881 putnext+0x208(600109c1c28, 600109c4620, 600122abcc0, 0, 
  1815800, 0)                         
  000002a10046e931 ldtermwmsg+0x130(60010ab1068, 600122abcc0, 1388, 0, 0, 
  2a10046a000)                        
  000002a10046e9f1 putnext+0x208(60010ab1160, 60010ab1068, 6001110f040, 0, 
  1815800, 0)                         
  000002a10046eaa1 qdrain_syncq+0x6c(60010ab0e40, 60010ab0dd8, 7005fe90, fffe, 
  60010ab0ed0, 61)                    
  000002a10046eb51 drain_syncq+0x2fc(60010ab0ed0, 11, 11, fffe, 0, fc00)
  000002a10046ec01 strput+0x1a0(600109c7878, 0, 2a10046fa98, 2a10046f6c0, 0, 0)
  000002a10046ee01 strwrite_common+0x1f0(600109c2840, 2a10046fa98, 10000, 1, 
  600109c78f8, 600109c7878)           
  000002a10046eed1 iwscnwrite+0x18(e00000000, 2a10046fa98, 60010802458, e, e, 
  60010a91938)                        
  000002a10046ef81 fop_write+0x48(60010b4c300, 2a10046fa98, 0, 60010802458, 0, 
  4b)                                 
  000002a10046f031 sysmwrite+0xe4(480, 2a10046fa98, 60010802458, 2a10046f8f8, 
  2a10046f8e8, 701b3800)              
  000002a10046f131 fop_write+0x48(6001247ccc0, 2a10046fa98, 8, 60010802458, 0, 
  4b)                                 
  000002a10046f1e1 write+0x178(4, 280a, 4b, 280a, 6001247ccc0, 0)
  000002a10046f2e1 syscall_trap32+0xcc(4, 3ac5a0, 4b, 3ac601, 3ac600, 3ac600)
> 


> 3000201ec20::findstack -v
stack pointer for thread 3000201ec20: 2a100c94e11
  000002a100c94ec1 mutex_vector_enter+0x458(0, 30001418080, 600108d5740, 
  5fcf527e6e, 0, 9a)
  000002a100c94f81 pfb_ioctl+0xcc0(600108d4000, 16ec, 4607, 100003, 0, 
  2a100c95adc)
  000002a100c950e1 fop_ioctl+0x48(6001270d640, 4607, ffbffb8c, 100003, 
  6001084d9c0, 2a100c95adc)
  000002a100c95191 ioctl+0x164(8, 4607, ffbffb8c, e6cd4, 600113b80b8, 31)
  000002a100c952e1 syscall_trap32+0xcc(8, 4607, ffbffb8c, e6cd4, 414fe0, 31)
 
 
This message posted from opensolaris.org
_______________________________________________
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org

Reply via email to