Hi Randall, On 6/25/16 4:41 PM, Randall Stewart via freebsd-net wrote: > Ok > > Lets try this again with my source changed to my @freebsd.net :-) > > Now I am also attaching a patch for you Gleb, this will take some poking to > get in to your NF-head since it incorporates some changes we made earlier. > > I think this will fix the problem.. i.e. dealing with two locks in the > callout system (which it was > never meant to have done).. > > Note we probably can move the code to use the callout lock init now.. but > lets see if this works > on your setup on c096 and if so we can think about doing that.
Thanks for proposing a patch. I believe your patch will work with callout lock init, but not without: You still have a use-after-free issue on the tcpcb without callout lock init. The case being subtle as usual, let me try to describe that could happen: With your patch we have: void tcp_timer_keep(void *xtp) { struct tcpcb *tp = xtp; struct tcptemp *t_template; struct inpcb *inp; CURVNET_SET(tp->t_vnet); #ifdef TCPDEBUG int ostate; ostate = tp->t_state; #endif inp = tp->t_inpcb; KASSERT(inp != NULL, ("%s: tp %p tp->t_inpcb == NULL", __func__, tp)); INP_WLOCK(inp); if (callout_pending(&tp->t_timers->tt_keep) ### Use after free of tp here !callout_active(&tp->t_timers->tt_keep)) { INP_WUNLOCK(inp); CURVNET_RESTORE(); return; } ... The use-after-free scenario: [CPU 1] the callout fires, tcp_timer_keep entered [CPU 1] blocks on INP_WLOCK(inp); [CPU 2] schedules tcp_timer_keep with callout_reset() [CPU 2] tcp_discardcb called [CPU 2] tcp_timer_keep callout successfully canceled [CPU 2] tcpcb freed [CPU 1] unblocks, the tcpcb is used Then the tcpcb will used just after being freed... Might also crash or not depending in the case. Extra notes: o The invariant I see here is: The "callout successfully canceled" step should never happen when "the callout is currently being executed". o Solutions I see to enforce this invariant: - First solution: Use callout lock init with inp lock, your patch seems to permit that now. - Second solution: Change callout_async_drain() behavior: It can return 0 (fail) when the callout is currently being executed (no matter what). - Third solution: Don't trust callout_async_drain(callout) return value of 1 (success) if the previous call of callout_reset(callout) returned 0 (fail). That was the exact purpose of r284261 change, but this solution is also step backward in modernization of TCP timers/callout... https://svnweb.freebsd.org/base/stable/10/sys/netinet/tcp_timer.c?r1=284261&r2=284260&pathrev=284261 Hopefully my description is clear enough... -- Julien
signature.asc
Description: OpenPGP digital signature