On Mon, 2008-12-08 at 17:03 -0500, Peter Memishian wrote:
> During IPMP test execution, we hit an interesting deadlock.  It's a little
> hard to explain without getting hip-deep in the details of the new
> implementation, but basically:
> 
>       T1: running a timeout (mld_timeout_handler()), waiting to enter
>           the IPSQ for bge3 via ipsq_enter().
> 
>       T2: finishing an IPMP "group leave" operation on bge3.
...
> I explored a few fixes, and the simplest seems to be refactor ipsq_exit()
> into two functions, one of which (ipsq_drain()) doesn't start the timers.
> That version is used from ipsq_dq(), which eliminates the possibility of
> the deadlock.  Note that the timers will still be started when bge3's
> xop is exited.  Please have a look.  This is hairy stuff, so please ask
> questions if it doesn't make sense.
> 
>   http://zhadum.east.sun.com/ws/clearview/clearview-ipmpdev/webrev

I took me a while to go back and wrap my head around this code, and
based on our conversation yesterday, I think the fix makes sense.  I
don't see any obvious issues.  One tricky bit was figuring out how we're
sure that we won't accidentally neglect to start the mld timers in some
cases, but metaphorically speaking, I believe that the last one to leave
the room will indeed shut off the lights.

-Seb



Reply via email to