Mircea Damian wrote: > > Hello, > > I was expecting to receive some replies to my last desperate messages: > http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg35446.html > http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg36591.html > > My machine is dyeing in add_timer(). It seems to happen only on SMP > machines and is something related to the network driver. For some reason > one of the timer lists gets broken so we (we are two people trying to > solve this issue) wrote a "safe" timer.c which tries to rebuild the chain > in case it hits a NULL pointer. > > The machine is (ofcourse) slower with this patch but at least it works. > > Maybe someone can see which is the real bug and fix it. > > Please help! I found (at least part of) the problem. In detach_timer() we test if the timer is pending. If it is not the function does not remove the timer from the list and returns 0. The functions that call detach_timer() do not check the return value and unconditionally set the list pointers to NULL, even though the timer is still on the list. Patch against 2.4.3 attached, but there may be a better solution. -- Brian Gerst
diff -urN linux-2.4.3/kernel/timer.c linux/kernel/timer.c --- linux-2.4.3/kernel/timer.c Thu Dec 14 20:52:22 2000 +++ linux/kernel/timer.c Fri Apr 13 13:26:08 2001 @@ -194,6 +194,7 @@ if (!timer_pending(timer)) return 0; list_del(&timer->list); + timer->list.next = timer->list.prev = NULL; return 1; } @@ -217,7 +218,6 @@ spin_lock_irqsave(&timerlist_lock, flags); ret = detach_timer(timer); - timer->list.next = timer->list.prev = NULL; spin_unlock_irqrestore(&timerlist_lock, flags); return ret; } @@ -246,7 +246,6 @@ spin_lock_irqsave(&timerlist_lock, flags); ret += detach_timer(timer); - timer->list.next = timer->list.prev = 0; running = timer_is_running(timer); spin_unlock_irqrestore(&timerlist_lock, flags); @@ -309,7 +308,6 @@ data= timer->data; detach_timer(timer); - timer->list.next = timer->list.prev = NULL; timer_enter(timer); spin_unlock_irq(&timerlist_lock); fn(data);