[mdb-discuss] help, crashdump analysis, x86

2007-03-30 Thread Raymond LI - Sun Microsystems - Beijing China
rivanwang wrote:
> I have some questions as follows.
> Would you be so kind as to give me some suggestions?
>
>
>   

d2c84de0::findstack -v

will give some clue.


> 
>  ::cpuinfo -v
> ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD   PROC
>   0 fec20ae4  1b80 104   nono t-740847 d2c84de0 sched
>|||
> RUNNING <--+|+--> PIL THREAD
>   READY |   5 d2c84de0
>  EXISTS |   3 d2ca0de0
>  ENABLE |   - d2c28de0 (idle)
> |
> +-->  PRI THREAD   PROC
>99 d2c9ade0 sched
>99 d2c97de0 sched
>60 d3264a00 fsflush
>60 d2e1ade0 sched
>60 d2e37de0 sched
>60 d4644de0 sched
>60 d96dcde0 sched
>59 d38e7400 Xsun
>   d2c84de0::thread
> ADDRSTATE  FLG PFLG SFLG   PRI  EPRI PIL INTR DISPTIME BOUND PR
> d2c84de0 onproc80903   104 0   5 d2ca0de00-1  2
>   d2ca0de0::thread
> ADDRSTATE  FLG PFLG SFLG   PRI  EPRI PIL INTR DISPTIME BOUND PR
> d2ca0de0 onproc  903   102 0   3 d2c28de046a51-1  1
>   d2ca0de0::findstack -v
> stack pointer for thread d2ca0de0: d2ca0c2c
>   d2ca0de0 0xd94c62bc()
>
> 
>
>   After I pressed "F1+A"?the kernel created the thread "d2c84de0" to give 
> responses to keyboard interruption(PIL = 5, PRI= 104).
> but another thread "d2ca0de0",at same time, is still running on CPU. ( PIL = 
> 3 , PRI = 102 ).
>   I guess one event may causes the kernel to create the thread d2ca0de0 , but 
> then the kernel hangs,  until I have pressed "F1+A" , the kernel creates 
> another thead d2c84de0 , and finally crashed down.
>
> I have no idea what causes the kernel to create thread d2ca0de0 
> (PRI=102,PIL=3)? 
>
>
>
>
>
>
> [[ Q3 ]]
>   ::cpuinfo
>  ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD   PROC
>   0 fec20ae4  1b80 104   nono t-740847 d2c84de0 sched
>   ::cycinfo -v
> CPU  CYC_CPU   STATE NELEMS ROOTFIRE HANDLER
>   0 d9aabe00  online  4 d9aabd80 96b6b848e80 clock
>
>2
>|
> +--+--+
> 0 1
> | |
>   +-++  +-+-+
>   3
>   |
>  +++
>
>   ADDR NDX HEAP LEVL  PENDFIRE USECINT HANDLER
>   d9aabd80   01 high 0 96b6b848e80   1 cbe_hres_tick
>   d9aabda0   12  low 74125396b6b848e80   1 
> apic_redistribute_compute
>   d9aabdc0   20 lock   406 96b6b848e80   1 clock
>   d9aabde0   33 high 0 96b6d4e5200 100 deadman
>
> ---
> The value of SWITCH of thread d2c84de0  is 740847 ;
> The value of PEND of apic_redistribute_compute is 741253 ;
> The value of PEND of clock is 406 .
>   (741253 - 406) == 740847 
> What does it mean ? Could you please account for it ?
>
>
>
>
>
> [[ Q4 ]]
>   
>   ::ipcs
> Message queues:
> failed to read 'msq_svc'; module not present
>
> Shared memory:
> ADDR   REFID  KEY  MODE PRJID ZONEID OWNER GROUP CREAT  CGRP
> d4915f50 1 3  103  0666 3  0  1002   102  1002   102
> d3f0b090 1 2  101  0666 3  0 0 0 0 0
> d3f0b2c0 1 1  102  0666 3  0  1002   102  1002   102
> d3f0bbf0 1 0  100  0666 3  0  1002   102  1002   102
>
> Semaphores:
> ADDR   REFID  KEY  MODE PRJID ZONEID OWNER GROUP CREAT  CGRP
> d4915ee0 3 3  103  0666 3  0  1002   102  1002   102
> d3f0b1e0 3 2  101  0666 3  0 0 0 0 0
> d3f0b250 4 1  102  0666 3  0  1002   102  1002   102
> d3f0bb80 7 0  100  0666 3  0  1002   102  1002   102
>   
> ---
> I dont know what threads are accessing to the semaphore "d3f0b1e0" ?
> How can I find these unkown threads?
>
>
>
>
>
>
>   ::showrev
> Hostname: cetc.a28.com
> Release: 5.10
> Kernel architecture: i86pc
> Application architecture: i386
> Kernel version: SunOS 5.10 i86pc Generic
> Platform: i86pc
>  
>
>
>
>   ::msgbuf
> MESSAGE   
> /pci at 0,0/pci103c,3013 at 1d,2 (uhci2): failed to attach
> pcplusmp: pciclass,0c0300 (uhci) instance 3 vector 0x16 ioapic 0x1 intin 0x16 
> is
> 

[mdb-discuss] help, crashdump analysis, x86

2007-03-27 Thread rivanwang
Hi,

I have one workstation(hp xw4300) , with Solaris 10 (x86) and one Digi Sync570i 
card.
The system may hangs at any time, from a few minutes to a couple of hours, when 
the card is receiving data frames.

I doubt the system hanging is caused by the driver module for Sync570, however, 
the same driver  works properly on solaris 8 system.
We used to install Solaris 8 on HP xw4100, but now we have to install Solaris 
10 on HP xw4300.(we cant get HP xw4100 in the market)

I use kmdb to load solaris system. After the system hangs I can't ping the 
host. And the keyboard and mouse have no reponses.
I can get the crashdump file by pressing "F1+A" and then input "$" due to a NULL pointer dereference

sched:
#pf Page fault
Bad kernel fault at addr=0x0
pid=0, pc=0x0, sp=0x202, eflags=0x10002
cr0: 8005003b cr4: 6f8
cr2: 0 cr3: 4226000
 gs:  1b0  fs:0  es:  160  ds:  160
edi: d2f50a60 esi: fef4b2a8 ebp: d2c84d34 esp: d2c84d1c
ebx: d2f54180 edx: d2f541f8 ecx:   1f eax: fed6c870
trp:e err:   10 eip:0  cs:  158
efl:10002 usp:  202  ss: d2c84d3c
d2c84c4c unix:die+a7 (e, d2c84cec, 0, 0)
d2c84cd8 unix:trap+f56 (d2c84cec, 0, 0)
d2c84cec unix:cmntrap+83 ()
d2c84d34 0 (d2c84d44, fe81189a,)
d2c84d3c genunix:kdi_dvec_enter+a (d2c84d50, fe81183c,)
d2c84d44 unix:debug_enter+32 (0)
d2c84d50 unix:abort_sequence_enter+27 (0)
d2c84d64 kbtrans:kbtrans_streams_key+3e (d2f54180, 1f, 0)
d2c84d88 kb8042:kb8042_received_byte+b2 (fef4b1a8, 1e)
d2c84da0 kb8042:kb8042_intr+65 (fef4b1a8)
d2c84db8 i8042:i8042_intr+a4 (d2f50980)


 ::cpuinfo -v
ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD   PROC
  0 fec20ae4  1b80 104   nono t-740847 d2c84de0 sched
   |||
RUNNING <--+|+--> PIL THREAD
  READY |   5 d2c84de0
 EXISTS |   3 d2ca0de0
 ENABLE |   - d2c28de0 (idle)
|
+-->  PRI THREAD   PROC
   99 d2c9ade0 sched
   99 d2c97de0 sched
   60 d3264a00 fsflush
   60 d2e1ade0 sched
   60 d2e37de0 sched
   60 d4644de0 sched
   60 d96dcde0 sched
   59 d38e7400 Xsun
  d2c84de0::thread
ADDRSTATE  FLG PFLG SFLG   PRI  EPRI PIL INTR DISPTIME BOUND PR
d2c84de0 onproc80903   104 0   5 d2ca0de00-1  2
  d2ca0de0::thread
ADDRSTATE  FLG PFLG SFLG   PRI  EPRI PIL INTR DISPTIME BOUND PR
d2ca0de0 onproc  903   102 0   3 d2c28de046a51-1  1
  d2ca0de0::findstack -v
stack pointer for thread d2ca0de0: d2ca0c2c
  d2ca0de0 0xd94c62bc()



  After I pressed "F1+A"?the kernel created the thread "d2c84de0" to give 
responses to keyboard interruption(PIL = 5, PRI= 104).
but another thread "d2ca0de0",at same time, is still running on CPU. ( PIL = 3 
, PRI = 102 ).
  I guess one event may causes the kernel to create the thread d2ca0de0 , but 
then the kernel hangs,  until I have pressed "F1+A" , the kernel creates 
another thead d2c84de0 , and finally crashed down.

I have no idea what causes the kernel to create thread d2ca0de0 
(PRI=102,PIL=3)? 






[[ Q3 ]]
  ::cpuinfo
 ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD   PROC
  0 fec20ae4  1b80 104   nono t-740847 d2c84de0 sched
  ::cycinfo -v
CPU  CYC_CPU   STATE NELEMS ROOTFIRE HANDLER
  0 d9aabe00  online  4 d9aabd80 96b6b848e80 clock

   2
   |
+--+--+
0 1
| |
  +-++  +-+-+
  3
  |
 +++

  ADDR NDX HEAP LEVL  PENDFIRE USECINT HANDLER
  d9aabd80   01 high 0 96b6b848e80   1 cbe_hres_tick
  d9aabda0   12  low 74125396b6b848e80   1 apic_redistribute_compute
  d9aabdc0   20 lock   406 96b6b848e80   1 clock
  d9aabde0   33 high 0 96b6d4e5200 100 deadman

---
The value of SWITCH of thread d2c84de0  is 740847 ;
The value of PEND of apic_redistribute_compute is 741253 ;
The value of PEND of clock is 406 .
  (741253 - 406) == 740847 
What does it mean ? Could you please account for it ?





[[ Q4 ]]
  
  ::ipcs
Message queues:
failed to read 'msq_svc'; module not present

Shared memory:
ADDR   REFID  KEY  MODE PRJID ZONEID OWNER GROUP CREAT  CGRP
d4