From: Jon Maloy <jon.ma...@ericsson.com> Date: Wed, 24 Feb 2016 11:10:48 -0500
> When the TIPC module is unloaded, we have identified a race condition > that allows a node reference counter to go to zero and the node instance > being freed before the node timer is finished with accessing it. This > leads to occasional crashes, especially in multi-namespace environments. > > The scenario goes as follows: > > CPU0:(node_stop) CPU1:(node_timeout) // ref == 2 > > 1: if(!mod_timer()) > 2: if (del_timer()) > 3: tipc_node_put() // ref -> 1 > 4: tipc_node_put() // ref -> 0 > 5: kfree_rcu(node); > 6: tipc_node_get(node) > 7: // BOOM! > > We now clean up this functionality as follows: > > 1) We remove the node pointer from the node lookup table before we > attempt deactivating the timer. This way, we reduce the risk that > tipc_node_find() may obtain a valid pointer to an instance marked > for deletion; a harmless but undesirable situation. > > 2) We use del_timer_sync() instead of del_timer() to safely deactivate > the node timer without any risk that it might be reactivated by the > timeout handler. There is no risk of deadlock here, since the two > functions never touch the same spinlocks. > > 3: We remove a pointless tipc_node_get() + tipc_node_put() from the > timeout handler. > > Reported-by: Zhijiang Hu <huzhiji...@gmail.com> > Acked-by: Ying Xue <ying....@windriver.com> > Signed-off-by: Jon Maloy <jon.ma...@ericsson.com> Applied.