Guys, thanks for your help! Finally I find that this is because one thread 
holding mutex is preempted by a high priority timer (soft) interrupt which also 
seeks the same mutex. thus dead lock happens.

I did some research and noted that it crashes only when my send( ) function is 
called and at the same time 1 second timer function is also invoked. In my 
send( ) code, after a packet is prepared and saved in DMA accessible area, it 
will write a device register to inform the hardware to read the packet and send 
out. 1 second timer also needs to read hardware registers to get current link 
status. I use a hardware mutex to control the access to hardware. So, if send( 
) got the mutex but preempted by 1 second timer, however, the timer must get 
this mutex before it can read/write hardware, timer must wait until mutex 
becomes available, since send( ) does not run, timer interrupt will wait 
forever and system crash.

If timer( ) does not call mutex_enter( ) and just read/write hardware registers 
directly, then, the deadlock does not happen any more. But this solution is not 
good. Any other ideas? Do you guys use a mutex lock to control access to PCI 
Network Card? Do you have similar issue?


Tom

Oliver, below is my analysis using mdb following your article, what can you see 
from it? Sorry, I do not have much experience on this. How do you get the ACT 
tool?

panic[cpu0]/thread=[b]ffffff0003eddc80[/b]:  
Deadlock: cycle in blocking chain     
                                      
ffffff0003eddaa0 genunix:turnstile_block+9f3 ()
ffffff0003eddb20 unix:mutex_vector_enter+38d ()
ffffff0003eddb50 qla:qla_link_state_machine+22 ()
ffffff0003eddb70 qla:qla_timer+78 ()  
ffffff0003eddbd0 genunix:callout_execute+b1 ()
ffffff0003eddc60 genunix:taskq_thread+1dc ()

ffffff0003eddc70 unix:thread_start+8 ()

> [b]ffffff0003eddc80[/b]::findstack -v
stack pointer for thread ffffff0003eddc80: ffffff0003edd540
  ffffff0003edd580 0x292()
  ffffff0003edd5a0 7()
  ffffff0003edd670 xc_common+0x3fb(fffffffffb81c240, 0, fffffffffb861981, 
ffffff0003edd680, 0, ff, 
  ffffff0003edd700)
  ffffff0003edd700 xc_mbox_lock+0x10()
  ffffff0003edd750 xc_wait_sync+0x2b(fffffffec586f180, 0, fffffffffb81bf5a, 
ffffff0003edd750, ff, 
  fffffffffb81c240)
  ffffff0003edd7f0 x86pte_inval+0x1e3(fffffffec586f180, c1, 8000000002045561, 0)
  ffffff0003edd880 hat_pte_unmap+0x21b(1b70e7008975471b, 1, fffffffffbbe8a53, 
ffffff0003edd890, 1388)
  ffffff0003edd890 1()
  ffffff0003edd910 dosoftint_prolog+0xa2(246, 91409, ffffff0003edd920, 0)
  ffffff0003edda00 panic+0x9c()
  ffffff0003eddaa0 turnstile_block+0x9f3(0, 0, [b]fffffffee1c178f8,[/b] 
fffffffffbc04550, 0, 0)
  ffffff0003eddb20 mutex_vector_enter+0x38d(fffffffee1c178f8)
  ffffff0003eddb50 qla_link_state_machine+0x22()
  ffffff0003eddb70 qla_timer+0x78()
  ffffff0003eddbd0 callout_execute+0xb1(fffffffec65e8000)
  ffffff0003eddc60 taskq_thread+0x1dc(fffffffec57f7698)
  ffffff0003eddc70 thread_start+8()

> [b]fffffffee1c178f8[/b]::rwlock
            ADDR      OWNER/COUNT FLAGS          WAITERS
fffffffee1c178f8 READERS=23058428717  B001 ffffff0003eddc80 (W)
                                      |
                  HAS_WAITERS --------+

>  [b]fffffffee1c178f8[/b]::mutex
            ADDR  TYPE             HELD MINSPL OLDSPL WAITERS
fffffffee1c178f8 adapt ffffff000425cc80      -      -     yes
>
 
 
This message posted from opensolaris.org
_______________________________________________
networking-discuss mailing list
[email protected]

Reply via email to