[tipc-discussion] [PATCH net-next v3 1/1] tipc: fix crash during node removal

2016-02-19 Thread Jon Maloy
When the TIPC module is unloaded, we have identified a race condition that allows a node reference counter to go to zero and the node instance being freed before the node timer is finished with accessing it. This leads to occasional crashes, especially in multi-namespace environments. The scenario

Re: [tipc-discussion] [PATCH net-next v3 1/1] tipc: fix crash during node removal

2016-02-19 Thread Jon Maloy
Ying, This is my current view on what is the best and simplest solution to this. If you still disagree or think you see problems, I am looking forward to a patch from you. Regards ///jon On 02/19/2016 08:16 PM, Jon Maloy wrote: > When the TIPC module is unloaded, we have identified a race condit

Re: [tipc-discussion] [PATCH net-next v3 1/1] tipc: fix crash during node removal

2016-02-22 Thread Xue, Ying
Hi Jon, I think the scenario described below is not true. This is because del_timer() doesn't return 1 while node_timeout() is being called. Please take a look at __run_timers() defined in kernel/time/timer.c. When a timer is expired, __run_timers() will call the timeout function attached to th

Re: [tipc-discussion] [PATCH net-next v3 1/1] tipc: fix crash during node removal

2016-02-23 Thread jason
On Feb 22, 2016 11:22 PM, "Xue, Ying" wrote: > > Hi Jon, > > I think the scenario described below is not true. This is because del_timer() doesn't return 1 while node_timeout() is being called. Please take a look at __run_timers() defined in kernel/time/timer.c. When a timer is expired, __run_time

Re: [tipc-discussion] [PATCH net-next v3 1/1] tipc: fix crash during node removal

2016-02-23 Thread Xue, Ying
Even if node timeout function restarts the node timer with mod_timer(), I don’t see any problem here. If you concern the usage about dealing with node timer and its refcount, please look at another similar example which can demonstrate how to maintain the relationship between sk_timer and sk_ref

Re: [tipc-discussion] [PATCH net-next v3 1/1] tipc: fix crash during node removal

2016-02-23 Thread Jon Maloy
Ying, del_timer() definitely returns 1 after the timer function has called mod_timer(), which is the case here. After mod_timer() has been called, the timer is active, no matter its previous state and return value. So, in the brief interval between mod_timer() and tipc_node_get() this scenario c

Re: [tipc-discussion] [PATCH net-next v3 1/1] tipc: fix crash during node removal

2016-02-23 Thread jason
On Feb 23, 2016 6:20 PM, "Xue, Ying" wrote: > > Even if node timeout function restarts the node timer with mod_timer(), I don’t see any problem here. If you concern the usage about dealing with node timer and its refcount, please look at another similar example which can demonstrate how to maintai

Re: [tipc-discussion] [PATCH net-next v3 1/1] tipc: fix crash during node removal

2016-02-24 Thread Xue, Ying
HI Jon, Thanks for your clear explanation. Yes, I misunderstood the scenario. You are right and its solution is much simpler and safer than before. Please go ahead. Thanks, Ying From: Jon Maloy [mailto:jon.ma...@ericsson.com] Sent: 2016年2月23日 21:17 To: Xue, Ying; jason Cc: Jon Maloy; Richard A

Re: [tipc-discussion] [PATCH net-next v3 1/1] tipc: fix crash during node removal

2016-02-24 Thread Xue, Ying
Ack-by: Ying Xue -Original Message- From: Jon Maloy [mailto:jon.ma...@ericsson.com] Sent: 2016年2月20日 9:17 To: tipc-discussion@lists.sourceforge.net; parthasarathy.bhuvara...@ericsson.com; Xue, Ying; richard.a...@ericsson.com; jon.ma...@ericsson.com; huzhiji...@gmail.com Cc: ma...@donjo