Re: [networking-discuss] Re: Re: what is " Deadlock: cycle in blocking chain "

Oliver Yang Fri, 30 Mar 2007 02:09:59 -0800

James Carlson wrote:

Tom Chen writes:
Hi Paul,
Could you give more explanation on your previous reply? Do you mean that I should release all locks before re-scheduling timer in the timeout handler?
[b]"Don't take a lock within your timeout function that's held across
schedulng it. That can lead to precisely the sort of cycle you're
experiencing...."[/b]
In my 1 second timer handler, after job is done, it calls timeout( ) to re-schedule itself, so the qla_timer( ) can be called again at next second, but I did not release mutex before calling timeout( ). Is this the true origin of the deadlock?
I don't believe that's the case.  If you look at the timeout(9F) man
page, only untimeout(9F) has such warnings on it.  It does because
it's possible that _during_ the call to untimeout(9F), you may block
because your callback is already running on another thread, and you'd
thus deadlock with yourself.  No such warning accompanies timeout(9F).

The usual reason for a deadlock is either a lock-ordering problem (A-B
in one thread, B-A in another) or the wrong priority for the mutex
(see mutex_init(9F)).

Hi James,

Yes, you are right.But developer does quite easy to ignore the implied lock usage caused bytimeout(9F) or callout, for exmaple:

Thread A: timeout -> callout_* -> any routines try to grab aadaptive/driver mutex or rwlock, but fail to be blockedThread B: holding thread A's lock and tries to call any kernel APIswhich will cause sleeping and depending on callout processing.

I have a crash dump file, which was caused by 2 e1000g thread deadlockedeach other:

Thread A: Similar with Tom Chen's issue, the e1000g_LocalTimer wasregistered in callout table by timeout(9F), and was called bycallout_execute and blocked on a rwlock and slept. That rwlock washolding by thread B.


stack pointer for thread fffffe80000ddc80: fffffe80000dd9d0
[ fffffe80000dd9d0 _resume_from_idle+0xf8() ]
 fffffe80000dda10 swtch+0x167()

fffffe80000ddab0 turnstile_block+0x76b(ffffffff93338a00, 1,ffffffff82f76288, fffffffffbc05908, 0, 0)

 fffffe80000ddb10 rw_enter_sleep+0x1de(ffffffff82f76288, 1)
 fffffe80000ddb40 e1000g_tx_cleanup+0x56(ffffffff82f76000)
 fffffe80000ddb80 e1000g_LocalTimer+0x19(ffffffff82f76000)
 fffffe80000ddbd0 callout_execute+0xb1(ffffffff80a3e000)
 fffffe80000ddc60 taskq_thread+0x1a7(ffffffff826be7a0)
 fffffe80000ddc70 thread_start+8()

Thread B: It was holding rwlock. It called e1000g_m_tx to sendpackets, and then it called cv_timedwait. Finally, it slept for ever.Because cv_timedwait also was implemented by realtime_timeout, thatmeans it will wait callout table was processed again, but actuallythread A was holding the lock of callout table in callout_execute.


stack pointer for thread fffffe80044e5c80: fffffe80044e5880
[ fffffe80044e5880 _resume_from_idle+0xf8() ]
 fffffe80044e58c0 swtch+0x167()

fffffe80044e5930 cv_timedwait+0xcf(ffffffff82f76390, ffffffff82f76388,1036d)fffffe80044e59c0 cv_timedwait_sig+0x2cc(ffffffff82f76390,ffffffff82f76388, 1036d)

 fffffe80044e5a70 e1000g_send+0x136(ffffffff82f76370, ffffffffac2fce40)
 fffffe80044e5ab0 e1000g_m_tx+0x6f(ffffffff82f76000, ffffffffa21f8180)
 fffffe80044e5ad0 dls_tx+0x1d(ffffffff82f2ec80, ffffffffa21f8180)
 fffffe80044e5b20 dld_wsrv+0xcc(ffffffff894acb70)
 fffffe80044e5b50 runservice+0x42(ffffffff894acb70)
 fffffe80044e5b80 queue_service+0x42(ffffffff894acb70)
 fffffe80044e5bc0 stream_service+0x73(ffffffff83905740)
 fffffe80044e5c60 taskq_d_thread+0xbb(ffffffff833af820)
 fffffe80044e5c70 thread_start+8()


In above situation, deadlock happened between A and B.

Because of this, I think if using lock in a timer routine, which dependon callout is not safe, because driver developer might not havethe knowledge about kernel APIs internal. The safe way is triggeranother soft interrupt thread in the timer routine.

Max mentioned cyclic APIs, but it seems it was not a public API fordriver developer. And I think actually, for CY_LOCK_LEVEL orCY_LOW_LEVEL, cyclic_softint will also trigger another soft interrupt tohander the registered routine.

It's just my understanding about this issue, any comments are welcome.Thanks!

_______________________________________________
networking-discuss mailing list
[email protected]

Re: [networking-discuss] Re: Re: what is " Deadlock: cycle in blocking chain "

Reply via email to