Re: Unkillable processes?

2006-02-27 Thread Paul Lussier
Jerry Feldman [EMAIL PROTECTED] writes:

 On Sat, 25 Feb 2006 02:36:01 -0500
 [EMAIL PROTECTED] wrote:

 
 When I encounter processes which are unresponsive to kill -9, I find that
 this generally works:
 
   runlevel # say the current runlevel is 3
   telinit 1
   telinit 3
 This will almost always work, especially with zombie processes. 
 What you are doing is transitioning into single-user mode. 

Ahm yeah, that's not usually an option on a production server :)

 Then of course going to run level 6 tends to cure all ills :-)

Yes, yes it does.  I usually try to find what the problem is using ps
and lsof and a variety of other tricks to fix things before trying
runlevel 6.  On a production system, it's sometimes better to just
leave the system alone and wait if it's not causing any major
problems, as a reboot is often more disruptive than anything else.
-- 

Seeya,
Paul
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: Unkillable processes?

2006-02-26 Thread Ben Scott
On 2/25/06, Michael ODonnell [EMAIL PROTECTED] wrote:
 The kernel has no notions such as run level and single-user
 mode   So, to the extent that frobbing your run level has
 any effect  at all on any wedged processes, it's a safe bet that
 it's a distant side effect of all the other flailing that's going on.

  See also: Shotgun debugging.

http://www.catb.org/jargon/html/S/shotgun-debugging.html

... the making of relatively undirected changes to software in the
hope that a bug will be perturbed out of existence ...

:-)

-- Ben
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: Unkillable processes?

2006-02-25 Thread Jerry Feldman
On Sat, 25 Feb 2006 02:36:01 -0500
[EMAIL PROTECTED] wrote:

 
 When I encounter processes which are unresponsive to kill -9, I find that
 this generally works:
 
   runlevel # say the current runlevel is 3
   telinit 1
   telinit 3
This will almost always work, especially with zombie processes. 
What you are doing is transitioning into single-user mode. 

Many times, if you are in run level 5 (GUI in most systems),
transitioning to run level 3 (multi-user mode no GUI) will also do some
very good cleanup. Then of course going to run level 6 tends to cure
all ills :-)
-- 
Jerry Feldman [EMAIL PROTECTED]
Boston Linux and Unix user group
http://www.blu.org PGP key id:C5061EA9
PGP Key fingerprint:053C 73EC 3AC1 5C44 3E14 9245 FB00 3ED5 C506 1EA9


signature.asc
Description: PGP signature


Re: Unkillable processes?

2006-02-25 Thread Bill McGonigle

On Feb 25, 2006, at 08:15, Jerry Feldman wrote:


This will almost always work, especially with zombie processes.
What you are doing is transitioning into single-user mode.


Any idea what the underlying mechanism is that makes this work?

-Bill

-
Bill McGonigle, Owner   Work: 603.448.4440
BFC Computing, LLC  Home: 603.448.1668
[EMAIL PROTECTED]   Cell: 603.252.2606
http://www.bfccomputing.com/Page: 603.442.1833
Blog: http://blog.bfccomputing.com/
VCard: http://bfccomputing.com/vcard/bill.vcf

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: Unkillable processes?

2006-02-25 Thread Jerry Feldman
On Sat, 25 Feb 2006 10:40:19 -0500
Bill McGonigle [EMAIL PROTECTED] wrote:

 On Feb 25, 2006, at 08:15, Jerry Feldman wrote:
 
  This will almost always work, especially with zombie processes.
  What you are doing is transitioning into single-user mode.
 
 Any idea what the underlying mechanism is that makes this work?
First, the kill(1) command calls the kill(2) system call with very
little else. 

In Unix and Linux there is effectively a process tree.
The /sbin/init(8) command is always process number 1 and is The Mother
of All Processes. When you boot into run level 1 (Single User Mode)
there should be no daemons running, and no file systems mounted other
than root, unless you manually mounted them. The only other user user
processes running are generally those associated with the superuser. 
(There will be some kernel processes running, but those show in square
brackets [kthread].

When your system transitions into multi-user mode (either run levels 3
or 5), a number of daemon processes are executed as well as some
getty's (virtual terminals). When you start a process from your shell,
that process inherits not only that shell's environment, but it's
ownership as whoever is logged in. In the GUI mode, you have the X
server running, then a display manager, then the window manager, and
when you cause a process to start it has an ancestry. When you
transition down to single user mode, the system kills everything. Most
unkillable processes tend to be locked into an effective deadlock that
is resolved when you get rid of some of the peers and ancestors. 
Transitioning to single user mode also shuts down the network. 

However, there can be cases where a process is locked in an
indefinite I/O wait. It is very rare when these occur and even single
user mode won't clear it. The most common indefinite I/O wait that I
have seen is NFS where the network of NFS server gets hosed. In this
case, going to single user will clear it. 

-- 
Jerry Feldman [EMAIL PROTECTED]
Boston Linux and Unix user group
http://www.blu.org PGP key id:C5061EA9
PGP Key fingerprint:053C 73EC 3AC1 5C44 3E14 9245 FB00 3ED5 C506 1EA9


signature.asc
Description: PGP signature


Re: Unkillable processes?

2006-02-25 Thread Michael ODonnell


  This will almost always work, especially with zombie processes.
  What you are doing is transitioning into single-user mode.

 Any idea what the underlying mechanism is that makes this work?


The kernel has no notions such as run level and single-user
mode - that's all state that's managed strictly in user space -
and there's nothing special or sacred about any of those modes.
You can do everything in single-user mode that you can do in any
other mode, provided you're prepared to do a bunch of stuff by
hand that's normally automated.  The different modes are simply
different rules governing which things are automated.

Meanwhile, a process that's wedged or unresponsive is
one that's unable to make progress until certain conditions
(managed by/in the kernel) are met, such changes most often being
associated (as others have mentioned) with device state where,
too often, we're at the mercy of badly coded drivers :-(

So, to the extent that frobbing your run level has any effect
at all on any wedged processes, it's a safe bet that it's a
distant side effect of all the other flailing that's going on.

I'm not saying that such frobbing never leads to that Fresh
Feeling(tm), just that the cause-effect relationship is anything
but straightforward or reliable.  It always seems like a defeat
when I'm reduced to such superstitious measures because it feels
so similar to the Therapeutic Reboots certain other OS's require.

BTW, The term zombie is usually reserved for the specific
case of a process that has already exited but has not yet been
reaped - its parent (or init, for orphans) has not yet done
a wait() on it, which is necessary for its process slot to
be freed.  I don't think that's what's being discussed here.
 
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: Unkillable processes?

2006-02-24 Thread aluminumsulfate

When I encounter processes which are unresponsive to kill -9, I find that
this generally works:

  runlevel # say the current runlevel is 3
  telinit 1
  telinit 3

For some reason, init seems to clean them up when nothing else does.

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: Unkillable processes?

2006-02-19 Thread Ben Scott
On 2/18/06, Paul Lussier [EMAIL PROTECTED] wrote:
 That means they're in uninterruptable sleep ... Bad hardware or
 buggy device drivers are the most common cause of a process
 stuck in this state.

 Or, an NFS server from which this system mounted a file system has
 gone off the net.

  Oh, yah, that.  I don't have to use NFS very often.  Thankfully.  :)

 ... any process which stats all mounted file systems (think df, ls,
 etc.) hangs and can't be killed.

  Won't mounting the NFS filesystems with the soft,intr options
prevent that from happening in the first place?  (The can't be
killed part -- obviously, they'll still choke if they try to contact
a dead server.)

-- Ben
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: Unkillable processes?

2006-02-19 Thread Paul Lussier
Bill McGonigle [EMAIL PROTECTED] writes:

 On Feb 18, 2006, at 21:26, Paul Lussier wrote:

  One that's done,
 configure the NIC to use the exact same IP as the NFS server which
 suddenly disappeared.

 Did you try to bring up a virtual interface on the problem machine
 with the original server's IP?  Just curious if this would work.  If
 so, it might be near-trivial to script such a thing by divining the
 destination IP, export, etc from mount, /proc, netstat, lsof and
 friends and wrap it all up in an nfskill(8) program to deal with the
 interface, route, etc.

I didn't, but that's because the wedged client in this case was my
primary NFS server which was NFS mounting a file system from our
'dogfood' system.  We (I should specify I actually had nothing to do
with this...) took down our test system to move it without making sure
all systems were properly unmounted, which resulted in wedging my
primary NFS server. 

I don't see why this wouln't have worked though, and I intend to find
out this week sometime using some of our test equipment. I'll let you
know!
-- 

Seeya,
Paul
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: Unkillable processes?

2006-02-19 Thread Ben Scott
On 2/19/06, Paul Lussier [EMAIL PROTECTED] wrote:
   Won't mounting the NFS filesystems with the soft,intr options
 prevent that from happening in the first place?

 If you have the O'Reilly book for NFS and NIS ...

  Let's assume I don't.  Please explain.

  In the past, when working with NFS, soft,intr got me the behavior I
wanted -- problems not on my machine didn't hang my machine, and
failures were propagated up to higher levels as I/O errors (which they
are).  I do recall increasing some timeout or other based on
somebody's recommendation at the time.  Granted, I've never used NFS
all that much.  Most of my experience with it was at UNH, where hung
NFS mounts were a source of frustration.  Again granted, that
environment (the Space Science Center) wasn't a pillar of best
practices, so maybe the whole setup was just hosed.

  I do recall that on a diskless workstation, soft,intr doesn't get
you anything, since if NFS is out you've lost all your filesystems and
you're hosed anyway (so you might as well wait forever), but diskless
workstations are *so* passe.  ;-)

  (I do have a copy of the book *somewhere*, but I can't find it right
now.  (I suspect it's in my box o' books at the office, which I will
likely need an excavator to exhume.))

-- Ben
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


The dread Stale NFS file handle (was Unkillable processes? )

2006-02-19 Thread Michael ODonnell


I have a devious, wave-the-dead-chicken voodoo hack that
will sometimes save the day when you're suffering the dread
Stale NFS file handle error.  It only unwedges NFS clients,
and only sometimes, but when you're foaming at the mouth with
frustration it's a good trick to know.  ( Hey, Paul - you may
be amused to know that I developed this while working at MCLX
around the time you were managing the NFS servers...;-  )


 The following explanation is presented from my perspective where
 I'm trying to regain access to my NFS mounted home directory
 but should be applicable in other contexts if you tweak the
 appropriate things:

 ASSUME:
  - Home directory resides on serverBox:/aaa/bbb/ccc/modonnell
  - Home directory normally mounted on client system(s) locally
as: /h/modonnell
  - serverBox has suffered some fault leading to the dread
Stale NFS file handle error, preventing access to my
home directory but also preventing unmount/remount ops.

 HACK:
  - Create new directory on client machine /tmp/staleHack
  - mount serverBox:/aaa/bbb/ccc/modonnel  /tmp/staleHack
  - ls -la /tmp/staleHack
  - ls -la /h/modonnel
  - umount /tmp/staleHack
  - rmdir  /tmp/staleHack

 Home directory /h/modonnel should once again be accessible.
 
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: Unkillable processes?

2006-02-18 Thread Paul Lussier
Ben Scott [EMAIL PROTECTED] writes:

 On 2/17/06, Dan Coutu [EMAIL PROTECTED] wrote:
 On a Red Hat 9 system I've encountered a situation where there are two
 processes that I cannot kill when using kill -9 (or any other value, for
 that matter.)

   Do a ps aux and note their status.  It's D, right?  That means
 they're in uninterruptable sleep -- waiting for system call to
 finish something that cannot be interrupted.  The D stood for
 driver or disk originally.  Bad hardware or buggy device drivers
 are the most common cause of a process stuck in this state.

Or, an NFS server from which this system mounted a file system has
gone off the net.  I've seen this quite often.  You've got some system
which exports a file system via NFS, it goes down, and on all clients
which NFS mount from this server, any process which stats all mounted
file systems (think df, ls, etc.) hangs and can't be killed.


 The only thing you can do is wait or reboot the system.

This isn't entirely true, *especially* if the cause is as I just
described.  If it is in fact a down NFS server from which the client
didn't properly unmount the file system before it went away, there's
is a cure.  That cure is to bring the NFS server back online.
However, it may be that you can't bring *that* NFS server back on line
for some reason, at least not any time soon.  In that case, find some
other system, configure it as an NFS server, and export *something* as
the same name as the file system which is wedged.  One that's done,
configure the NIC to use the exact same IP as the NFS server which
suddenly disappeared.  This won't fool NFS entirely, but just enough
such that the client with get a 'stale NFS file handle' error, and
allow you to umount the file system.  

I recently used this trick in exactly this way.  If this isn't enough
detail, let me know, and I'll send out the postmortem I mailed to my
dev group (which isn't all that much more detailed than this e-mail :)


 If the syscalls ever complete, the kernel will immediately process
 the kill signals you sent, so those processes are dead, they just
 don't know it yet.  :)

And that's what this NFS spoofing trick does.  It essentially allows
the processes enough room to complete and die :)
-- 

Seeya,
Paul
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: Unkillable processes?

2006-02-18 Thread Steven W. Orr
On Friday, Feb 17th 2006 at 13:58 -0500, quoth Dan Coutu:

=Just to add more confusion to the mix, or maybe a useful clue, the system load
=average is about 4 but top shows 97% system idle time. Strange.

You've gotten good advice, I just wanted to add one thing. The load 
average is simply a count of the number of processes that are in the run 
queue over a specific period of time. It just means that you have four 
processes which sit in the run queue and are either doing nothing or doing 
io which does not consume much cpu resource.

-- 
Time flies like the wind. Fruit flies like a banana. Stranger things have  .0.
happened but none stranger than this. Does your driver's license say Organ ..0
Donor?Black holes are where God divided by zero. Listen to me! We are all- 000
individuals! What if this weren't a hypothetical question?
steveo at syslang.net
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: Unkillable processes?

2006-02-18 Thread Bill McGonigle


On Feb 18, 2006, at 21:26, Paul Lussier wrote:


 One that's done,
configure the NIC to use the exact same IP as the NFS server which
suddenly disappeared.


Did you try to bring up a virtual interface on the problem machine with 
the original server's IP?  Just curious if this would work.  If so, it 
might be near-trivial to script such a thing by divining the 
destination IP, export, etc from mount, /proc, netstat, lsof and 
friends and wrap it all up in an nfskill(8) program to deal with the 
interface, route, etc.


-Bill

-
Bill McGonigle, Owner   Work: 603.448.4440
BFC Computing, LLC  Home: 603.448.1668
[EMAIL PROTECTED]   Cell: 603.252.2606
http://www.bfccomputing.com/Page: 603.442.1833
Blog: http://blog.bfccomputing.com/
VCard: http://bfccomputing.com/vcard/bill.vcf

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Unkillable processes?

2006-02-17 Thread Dan Coutu

Okay, here's a strange one.

On a Red Hat 9 system I've encountered a situation where there are two 
processes that I cannot kill when using kill -9 (or any other value, for 
that matter.)


What's the deal with that? The only kind of process that I've ever run 
across that I could not kill was a zombie and neither one of these is a 
zombie.


Just to add more confusion to the mix, or maybe a useful clue, the 
system load average is about 4 but top shows 97% system idle time. Strange.


Any ideas?

Dan
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: Unkillable processes?

2006-02-17 Thread Neil Schelly
On Friday 17 February 2006 01:58 pm, Dan Coutu wrote:
 Okay, here's a strange one.

 On a Red Hat 9 system I've encountered a situation where there are two
 processes that I cannot kill when using kill -9 (or any other value, for
 that matter.)
The processes could be in an IO Lock, maybe trying to access an NFS share with 
hard locking and not INTR option set?

 Just to add more confusion to the mix, or maybe a useful clue, the
 system load average is about 4 but top shows 97% system idle time. Strange.
That's suspicious, but I suppose not entirely impossible.  Have you done 
anything like chkrootkit on it, just for kicks?
-Neil
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: Unkillable processes?

2006-02-17 Thread Mark Komarinski
On Fri, Feb 17, 2006 at 02:35:15PM -0500, Neil Schelly wrote:
 On Friday 17 February 2006 01:58 pm, Dan Coutu wrote:
  Okay, here's a strange one.
 
  On a Red Hat 9 system I've encountered a situation where there are two
  processes that I cannot kill when using kill -9 (or any other value, for
  that matter.)
 The processes could be in an IO Lock, maybe trying to access an NFS share 
 with 
 hard locking and not INTR option set?
 
  Just to add more confusion to the mix, or maybe a useful clue, the
  system load average is about 4 but top shows 97% system idle time. Strange.
 That's suspicious, but I suppose not entirely impossible.  Have you done 
 anything like chkrootkit on it, just for kicks?

IO locks will cause the load to be high with a low idle time.

-Mark


signature.asc
Description: Digital signature


Re: Unkillable processes?

2006-02-17 Thread Ben Scott
On 2/17/06, Dan Coutu [EMAIL PROTECTED] wrote:
 On a Red Hat 9 system I've encountered a situation where there are two
 processes that I cannot kill when using kill -9 (or any other value, for
 that matter.)

  Do a ps aux and note their status.  It's D, right?  That means
they're in uninterruptable sleep -- waiting for system call to
finish something that cannot be interrupted.  The D stood for
driver or disk originally.  Bad hardware or buggy device drivers
are the most common cause of a process stuck in this state.  The only
thing you can do is wait or reboot the system.

  If the syscalls ever complete, the kernel will immediately process
the kill signals you sent, so those processes are dead, they just
don't know it yet.  :)

-- Ben
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: Unkillable processes?

2006-02-17 Thread Jerry Feldman
On Friday 17 February 2006 1:58 pm, Dan Coutu wrote:
 Okay, here's a strange one.

 On a Red Hat 9 system I've encountered a situation where there are two
 processes that I cannot kill when using kill -9 (or any other value, for
 that matter.)

 What's the deal with that? The only kind of process that I've ever run
 across that I could not kill was a zombie and neither one of these is a
 zombie.

 Just to add more confusion to the mix, or maybe a useful clue, the
 system load average is about 4 but top shows 97% system idle time.
 Strange.
We ran into the same type of thing on a RHEL 3.0 Update 4 system a few weeks 
ago. I think we decided that it was waiting on I/O that did not complete. 

-- 
Jerry Feldman [EMAIL PROTECTED]
Boston Linux and Unix user group
http://www.blu.org PGP key id:C5061EA9
PGP Key fingerprint:053C 73EC 3AC1 5C44 3E14 9245 FB00 3ED5 C506 1EA9
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: Unkillable processes?

2006-02-17 Thread Dan Coutu

Ben Scott wrote:

On 2/17/06, Dan Coutu [EMAIL PROTECTED] wrote:
  

On a Red Hat 9 system I've encountered a situation where there are two
processes that I cannot kill when using kill -9 (or any other value, for
that matter.)



  Do a ps aux and note their status.  It's D, right?  That means
they're in uninterruptable sleep -- waiting for system call to
finish something that cannot be interrupted.  The D stood for
driver or disk originally.  Bad hardware or buggy device drivers
are the most common cause of a process stuck in this state.  The only
thing you can do is wait or reboot the system.

  If the syscalls ever complete, the kernel will immediately process
the kill signals you sent, so those processes are dead, they just
don't know it yet.  :)

-- Ben
  
Hmm, the I/O wait seems likely. We've been having trouble with an IOMega 
REV 10 disk autoloader ever since we bought the thing. Even swapped it 
out for  a new one but still get flaky behavior. Maybe it's time to send 
the thing back...


Dan
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: Unkillable processes?

2006-02-17 Thread Ben Scott
On 2/17/06, Dan Coutu [EMAIL PROTECTED] wrote:
 Hmm, the I/O wait seems likely. We've been having trouble with an IOMega
 REV 10 disk autoloader ever since we bought the thing.

  Yikes!  I've never, ever encountered an IOMega product that didn't
suck in some major way.  No wonder it doesn't work.  Maybe the kernel
is just refusing to have anything to do with such a crummy product on
principle.

-- Ben Click of death Scott
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: Unkillable processes?

2006-02-17 Thread Bill McGonigle

On Feb 17, 2006, at 15:55, Ben Scott wrote:


've never, ever encountered an IOMega product that didn't
suck in some major way.  No wonder it doesn't work.


Hey, my first linux box ran off a 150MB Bernoulli drive hooked up to my 
soundblaster.


That was before Iomega gave up on Bernoulli effect media for the mass 
market, of course.


-Bill

-
Bill McGonigle, Owner   Work: 603.448.4440
BFC Computing, LLC  Home: 603.448.1668
[EMAIL PROTECTED]   Cell: 603.252.2606
http://www.bfccomputing.com/Page: 603.442.1833
Blog: http://blog.bfccomputing.com/
VCard: http://bfccomputing.com/vcard/bill.vcf

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss