On Mon, 21 Feb 2005 15:24:21 EST, Anthony DiSante said: > Or maybe it SHOULD have killed your process, in some "proper" way that > prevents any outstanding I/O requests from coming in days later and breaking > things. Again, I'm no kernel hacker, but if an I/O request takes *3 days*, > isn't that an indication of a bug or of faulty hardware perhaps?
Right. And if you're an automated program trying to clean up after a *bug*, what do you do? It's quite likely that somebody's borked a lock - in which case it may even be that the "hung" process is the victim rather than the culprit, and breaking the lock will just make things worse. Similar issues apply to *all* of the resources the wedged process has attached to it. When these things get posted on lkml, it almost always involves quite a bit of code introspection and scratching of heads before we figure out how the system *got* its figurative head wedged into that crevice. Until you figure out how it *got* there, a safe cleanup is in general impossible. And we haven't seen yet the automatic program that can introspect the code to that detail - even the Stanford automated checker and sparse and the like are quite the impressive pieces of work. > > It's been covered before, look in the lkml archives for details. > > Thanks, I'll do that. But could you give me a more specific pointer? See the thread rooted here: Date: Wed, 03 Nov 2004 07:51:39 -0500 From: Gene Heskett <[EMAIL PROTECTED]> Subject: is killing zombies possible w/o a reboot? Sender: [EMAIL PROTECTED] To: linux-kernel@vger.kernel.org Reply-to: [EMAIL PROTECTED] Message-id: <[EMAIL PROTECTED]>
pgpsW5siGTRXf.pgp
Description: PGP signature