Kevin:

Thanks for the sleuth work. It made it easy to see what was happening.

Some time ago another user had the problem with the timeout structure being zapped while the timeout routine was executing. Spinning on del_timer() was seen as the cure, but I now see that the routine does not really return a useful value for that purpose.

I have "fixed" this in such a way that there is no longer any spinning on the kernel del_timer() routine. What this means is that it is now possible that while your driver is calling untimeout() the timer could spring and your timeout routine get invoked.

Nothing inside LiS or the kernel will break if that happens. STREAMS driver code needs to protect itself against that eventuality with appropriate state variables.

The fix will be in the next release of LiS. I don't have a release date estimate just yet.

-- Dave

At 12:55 PM 7/3/2003 Thursday, Kevin M. Odell wrote:
Sorry about the formatting of the previous message - I'm using an
unfamiliar mail program (I was recently forced to switch to Outlook).
I hope this is better.

I'm getting kernel hangups (non-SMP 2.4.18 kernel) which appear to be in the 2.16 LiS streams.o module. If I turn on streams debugging, the last message displayed is:

lis_strioct(...,I_RECVDFD,...) >> wait terminated

(this is on a system that makes extensive use of streams from many different processes - there are 274 users of the streams modules - and is not easily reproducible, so I don't have a simple program to reproduce this).

With some further tracing, this seems to be hanging in lis_untmout(). When I looked into this, I found out that the timeout function (lis_do_rd_tmout() in this case) had already run and completed. Because of this del_timer() was always returning 0. This doesn't appear to function the way it's described in the comment below:

*******************************************

(this is from LiS-2.16 head/linux-mdep.c ... )

void
lis_untmout( struct timer_list *tl)
{
   /*
     * del_timer returns 0 if the timer's callback function is
     * executing.  We just spin until it finishes.
     */

do {} while (!del_timer(tl));
}
*******************************************
I checked LiS-2.12, which we used in RH6.2, and this was just a direct call to del_timer().


Here are the relevant kernel functions (either from kernel/timer.c or include/linux/timer.h). These don't appear to have changed from the RH6.2 2.2 kernel to the RH7.3 2.4 kernel. Note that del_timer() sets timer->list.next to NULL no matter what detach_timer() returns, so unless someone else is resetting this (from the timer interrupt?), it's hard to see how this would ever return anything other than 0 after the first time it's called. (It appears to me that detach_timer() is called prior to the timeout function being invoked from run_timer_list(), so that the behavior of del_timer() should be to return 0 if the timeout function has already run or is currently being run.

int del_timer(struct timer_list * timer)
{
        int ret;
        unsigned long flags;

        spin_lock_irqsave(&timerlist_lock, flags);
        ret = detach_timer(timer);
        timer->list.next = timer->list.prev = NULL;
        spin_unlock_irqrestore(&timerlist_lock, flags);
        return ret;
}

detach_timer is also in kernel/timer.c:

static inline int detach_timer (struct timer_list *timer)
{
        if (!timer_pending(timer))
                return 0;
        list_del(&timer->list);
        return 1;
}

timer_pending is in include/linux/timer.h:
static inline int timer_pending (const struct timer_list * timer)
{
        return timer->list.next != NULL;
}

Kevin O'Dell
[EMAIL PROTECTED]
303-538-1644


_______________________________________________ Linux-streams mailing list [EMAIL PROTECTED] http://gsyc.escet.urjc.es/mailman/listinfo/linux-streams


_______________________________________________
Linux-streams mailing list
[EMAIL PROTECTED]
http://gsyc.escet.urjc.es/mailman/listinfo/linux-streams

Reply via email to