On Tue, 22 Feb 2005, Anthony DiSante wrote:

Helge Hafting wrote:
The infrastructure for that does not exist, so instead, the "killed" process remains. Not all of it, but at least the memory pinned down by the io request. This overhead is typically small, and the overehad of adding forced io abort to every driver might
be larger than a handful of stuck processes. It looks ugly, but perhaps a ps flag that hides the ugly processes is enough.

I don't care about any overhead associated with stuck processes, nor do I care that they look ugly in the ps output. What I care about is the fact that at least once a week on multiple systems with different hardware, some HW-related driver/process gets stuck, then immediately cascades its stuckness up to udevd or hald, and then I can't use any of my hardware anymore until I reboot.


-Anthony DiSante
http://nodivisions.com/
-

You don't seem to understand. A process that's stuck in 'D' state shows a SEVERE error, usually with a hardware driver. For instance, somebody may have coded something in a critical section that will wait forever for some bit to be set when, in fact, that bit may never be set because of a hardware glitch. Such problems must be found. One can't just suck some process out of the 'D' state.

So, you need to tell what driver was doing what. If you can't
then you need to provide enough information so that developers
may guess. For instance, if you get a process stuck in the 'D'
state when you use a CD/ROM, but not otherwise when you use
IDE or SCSI or whatever.., then you have a good guess that
there is some "wait-forever" code in the CD/ROM driver.

So, lets suppose that you had a problem with your CD/ROM.
You could eject it by hand and see if the process that
was stuck is no longer stuck, or you might be able to
power it OFF then ON. If this got a process "unstuck"
it might give the CD/ROM driver developer a hint as
to where to look in his code. No code is ever supposed
to wait forever for some hardware, but there are some
possibilities (races and whatever), that can effectively
wait forever. These possibilities need to be discovered
and fixed.

The 'D' state usually stands for 'Down' where a task
was 'down()' on a semaphore. To get out of that state,
that task (and none other) needs to execute `up()`.
This means that whatever that task was waiting for
needs to happen or it won't call 'up()'. The nature
of these mutexes requires that the thread that
acquired the semaphore be the same thread that
released it, otherwise we don't have a MUTEX.
So, there is no way that "somebody else" can
"fix" the task thread waiting with the MUTEX held.

There has been some discussion that these hung
states could be "fixed", but that's absolutely
positively incorrect. If you have a MUTEX that
"times out" or is otherwise breakable, you can't
use it to provide a single execution path to
a shared resource which is what these things
are used for in the first place.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.10 on an i686 machine (5537.79 BogoMips).
 Notice : All mail here is now cached for review by Dictator Bush.
                 98.36% of all statistics are fiction.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to