On Mon, 2008-12-08 at 17:03 -0500, Peter Memishian wrote: > During IPMP test execution, we hit an interesting deadlock. It's a little > hard to explain without getting hip-deep in the details of the new > implementation, but basically: > > T1: running a timeout (mld_timeout_handler()), waiting to enter > the IPSQ for bge3 via ipsq_enter(). > > T2: finishing an IPMP "group leave" operation on bge3. ... > I explored a few fixes, and the simplest seems to be refactor ipsq_exit() > into two functions, one of which (ipsq_drain()) doesn't start the timers. > That version is used from ipsq_dq(), which eliminates the possibility of > the deadlock. Note that the timers will still be started when bge3's > xop is exited. Please have a look. This is hairy stuff, so please ask > questions if it doesn't make sense. > > http://zhadum.east.sun.com/ws/clearview/clearview-ipmpdev/webrev
I took me a while to go back and wrap my head around this code, and based on our conversation yesterday, I think the fix makes sense. I don't see any obvious issues. One tricky bit was figuring out how we're sure that we won't accidentally neglect to start the mld timers in some cases, but metaphorically speaking, I believe that the last one to leave the room will indeed shut off the lights. -Seb
