On Mon, 21 Feb 2005 15:24:21 EST, Anthony DiSante said:
> Or maybe it SHOULD have killed your process, in some "proper" way that 
> prevents any outstanding I/O requests from coming in days later and breaking 
> things.  Again, I'm no kernel hacker, but if an I/O request takes *3 days*, 
> isn't that an indication of a bug or of faulty hardware perhaps?

Right.  And if you're an automated program trying to clean up after a *bug*,
what do you do?  It's quite likely that somebody's borked a lock - in which
case it may even be that the "hung" process is the victim rather than the
culprit, and breaking the lock will just make things worse.  Similar issues
apply to *all* of the resources the wedged process has attached to it.

When these things get posted on lkml, it almost always involves quite a bit
of code introspection and scratching of heads before we figure out how the
system *got* its figurative head wedged into that crevice.  Until you
figure out how it *got* there, a safe cleanup is in general impossible.  And
we haven't seen yet the automatic program that can introspect the code to that
detail - even the Stanford automated checker and sparse and the like are quite 
the
impressive pieces of work.

> > It's been covered before, look in the lkml archives for details.
> 
> Thanks, I'll do that.  But could you give me a more specific pointer? 

See the thread rooted here:
 
Date: Wed, 03 Nov 2004 07:51:39 -0500
From: Gene Heskett <[EMAIL PROTECTED]>
Subject: is killing zombies possible w/o a reboot?
Sender: [EMAIL PROTECTED]
To: linux-kernel@vger.kernel.org
Reply-to: [EMAIL PROTECTED]
Message-id: <[EMAIL PROTECTED]>

Attachment: pgpsW5siGTRXf.pgp
Description: PGP signature

Reply via email to