On Thu, 2005-02-03 at 21:18 -0500, Steven Rostedt wrote:

> Here's the problem: If I raise the priorities of the processes before
> zeroing the semaphore, the machine hangs.  If I zero the semaphore
> before raising the priorities, the program runs fine.

Hi Ingo,

I've looked into this and found where the deadlock occurs. Actually it
is a starving of processes. I finally got around to trying it on
2.6.11-rc3, and it doesn't have a problem.

Anyway, here's the scoop. 

The parent process that zeros the semaphore to allow the child process
to run after the parent upped the priority of the child really high,
gets stuck in the following:

In update_queue in ipc/sem.c (I don't have the correct line numbers,
cause I haven't stripped out the debug yet). at the point of:
                        q->status = IN_WAKEUP;
                        /*
                         * Continue scanning. The next operation
                         * that must be checked depends on the type of the
                         * completed operation:
                         * - if the operation modified the array, then
                         *   restart from the head of the queue and
                         *   check for threads that might be waiting
                         *   for semaphore values to become 0.
                         * - if the operation didn't modify the array,
                         *   then just continue.
                         */
                        if (q->alter)
                                n = sma->sem_pending;
                        else
                                n = q->next;
                        wake_up_process(q->sleeper);

Here is where it locks up.

                        /* hands-off: q will disappear immediately after
                         * writing q->status.
                         */
                        q->status = error;


I also found that the high priority child is running in an infinite
while loop in sys_semtimedop in the same file, at the following:

        while(unlikely(error == IN_WAKEUP)) {
                cpu_relax();
                error = queue.status;
        }

So, what looks to be happening is that as soon as the parent wakes up
the child, the child preempts the parent, and hits this while loop. But
since the child is a realtime task, with the highest priority of the
system, it starves the system. Of course this is a UP and I don't think
this will show a problem on an SMP machine. 

I can't think of a solution right now, so I'll just pass it on to
you ;-).  Once again it's late and I'm going to bed.


Thanks,

-- Steve


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to