Mircea Damian wrote:
>
> Hello,
>
> I was expecting to receive some replies to my last desperate messages:
> http://www.mail-archive.com/[email protected]/msg35446.html
> http://www.mail-archive.com/[email protected]/msg36591.html
>
> My machine is dyeing in add_timer(). It seems to happen only on SMP
> machines and is something related to the network driver. For some reason
> one of the timer lists gets broken so we (we are two people trying to
> solve this issue) wrote a "safe" timer.c which tries to rebuild the chain
> in case it hits a NULL pointer.
>
> The machine is (ofcourse) slower with this patch but at least it works.
>
> Maybe someone can see which is the real bug and fix it.
>
> Please help!
I found (at least part of) the problem. In detach_timer() we test if
the timer is pending. If it is not the function does not remove the
timer from the list and returns 0. The functions that call
detach_timer() do not check the return value and unconditionally set the
list pointers to NULL, even though the timer is still on the list.
Patch against 2.4.3 attached, but there may be a better solution.
--
Brian Gerst
diff -urN linux-2.4.3/kernel/timer.c linux/kernel/timer.c
--- linux-2.4.3/kernel/timer.c Thu Dec 14 20:52:22 2000
+++ linux/kernel/timer.c Fri Apr 13 13:26:08 2001
@@ -194,6 +194,7 @@
if (!timer_pending(timer))
return 0;
list_del(&timer->list);
+ timer->list.next = timer->list.prev = NULL;
return 1;
}
@@ -217,7 +218,6 @@
spin_lock_irqsave(&timerlist_lock, flags);
ret = detach_timer(timer);
- timer->list.next = timer->list.prev = NULL;
spin_unlock_irqrestore(&timerlist_lock, flags);
return ret;
}
@@ -246,7 +246,6 @@
spin_lock_irqsave(&timerlist_lock, flags);
ret += detach_timer(timer);
- timer->list.next = timer->list.prev = 0;
running = timer_is_running(timer);
spin_unlock_irqrestore(&timerlist_lock, flags);
@@ -309,7 +308,6 @@
data= timer->data;
detach_timer(timer);
- timer->list.next = timer->list.prev = NULL;
timer_enter(timer);
spin_unlock_irq(&timerlist_lock);
fn(data);