Re: kill: cannot kill some processes

2001-03-04 Thread Cory Snavely
 On Sat, Mar 03, 2001 at 08:52:36AM -0500, Cory Snavely wrote:
  Right now on a big Solaris machine of mine I have about a dozen zombied
  Perls--parent process (Apache) long gone, and when I -9ed them, their
PPIDs
  became 1 (init). Classic zombie.

 Hrrrm?  Not quite.  Init eventually inherits zombie children (when the
 parent dies), but init reaps the dead children.  Perhaps your children
 aren't dead?

Brian, you're right. Now that I look more closely, they're in sleep state.
If I just knew why...

  Problem is, these Perls are running scripts off a software RAID, and
thus
  have it locked. This happened before--when I reboot the server to get
rid of
  the zombies, or some other reason, the filesystem won't unmount, won't
get a
  clean flag, and therefore will force fsck on reboot. As it's over 100GB,
a
  full fsck takes several hours.
 
  Now maybe there's something I don't know to recover from this cleanly,
or
  maybe Linux handles it a different way, but it seems like this is an
example
  of zombies causing a real problem. If anyone knows a way around it, I'd
be
  real grateful!

 Doesn't sound like a zombie to me.  A zombie has -no- open files and goes
 away as soon as init inherits it.  A zombie is in state 'Z' on ps.

 What you describe sounds more like something in state 'D', which is
 waiting for IO to complete.  (This can happen on NFS when things break in
 just the wrong way for some reason.)  They're not zombies because they're
 not dead yet (they need to release their files before they are really
 dead).

 For processes stuck in a 'D' state, there is very little you can do about
 them.  You may be able to sneak out of re-fscking by remounting the drive
 read-only before rebooting, though.

Yeah, that's what I was thinking. Thanks.





Re: kill: cannot kill some processes

2001-03-03 Thread William T Wilson
On Fri, 2 Mar 2001, Ron Peterson wrote:

  away.  They don't consume any CPU time, or any other resources other than
  the slot in the process table and the less than 1K of memory required to
...
 Not entirely true.  Init can inherit enough zombie processes that it
 hits its process limit (1024, if I remember correctly).  Can you

Well, like I said, they do still take up the slot in the process table.

Zombies that *have* been inherited by init go away - it's those that are
still waiting for their parent process to check their status that pile up.  
Init itself doesn't have any limit on the number of zombies it can clean
up, otherwise it would be a problem with any long uptime system.

 'shutdown'?  Nope.  Not unless you can free up a slot.  And if
 something's going haywire and spawning zombies quickly, this can be a
 problem.

Linux reserves processes for root so unless your haywire program is
running as root you are at least partly shielded from this.  
control-alt-delete should still be able to reboot the system in such a
case (or login as root on the console), if it comes to that.

 Not a common occurance, though...

This is true :}



Re: kill: cannot kill some processes

2001-03-03 Thread Cory Snavely
  One thing about zombie process: Don't worry about trying to make them
go
  away.  They don't consume any CPU time, or any other resources other
than
  the slot in the process table and the less than 1K of memory required to
  hold their state information.  They are not worth worrying about.

 Not entirely true.  Init can inherit enough zombie processes that it
 hits its process limit (1024, if I remember correctly).  Can you
 'shutdown'?  Nope.  Not unless you can free up a slot.  And if
 something's going haywire and spawning zombies quickly, this can be a
 problem.

 Not a common occurance, though...

Seconded, although for a different reason and based on an experience on
Solaris.

Right now on a big Solaris machine of mine I have about a dozen zombied
Perls--parent process (Apache) long gone, and when I -9ed them, their PPIDs
became 1 (init). Classic zombie.

Problem is, these Perls are running scripts off a software RAID, and thus
have it locked. This happened before--when I reboot the server to get rid of
the zombies, or some other reason, the filesystem won't unmount, won't get a
clean flag, and therefore will force fsck on reboot. As it's over 100GB, a
full fsck takes several hours.

Now maybe there's something I don't know to recover from this cleanly, or
maybe Linux handles it a different way, but it seems like this is an example
of zombies causing a real problem. If anyone knows a way around it, I'd be
real grateful!

c




Re: kill: cannot kill some processes

2001-03-03 Thread brian moore
On Sat, Mar 03, 2001 at 08:52:36AM -0500, Cory Snavely wrote:
 Right now on a big Solaris machine of mine I have about a dozen zombied
 Perls--parent process (Apache) long gone, and when I -9ed them, their PPIDs
 became 1 (init). Classic zombie.

Hrrrm?  Not quite.  Init eventually inherits zombie children (when the
parent dies), but init reaps the dead children.  Perhaps your children
aren't dead?

 Problem is, these Perls are running scripts off a software RAID, and thus
 have it locked. This happened before--when I reboot the server to get rid of
 the zombies, or some other reason, the filesystem won't unmount, won't get a
 clean flag, and therefore will force fsck on reboot. As it's over 100GB, a
 full fsck takes several hours.
 
 Now maybe there's something I don't know to recover from this cleanly, or
 maybe Linux handles it a different way, but it seems like this is an example
 of zombies causing a real problem. If anyone knows a way around it, I'd be
 real grateful!

Doesn't sound like a zombie to me.  A zombie has -no- open files and goes
away as soon as init inherits it.  A zombie is in state 'Z' on ps.

What you describe sounds more like something in state 'D', which is
waiting for IO to complete.  (This can happen on NFS when things break in
just the wrong way for some reason.)  They're not zombies because they're
not dead yet (they need to release their files before they are really
dead).

For processes stuck in a 'D' state, there is very little you can do about
them.  You may be able to sneak out of re-fscking by remounting the drive
read-only before rebooting, though.

-- 
CueCat decoder .signature by Larry Wall:
#!/usr/bin/perl -n
printf Serial: %s Type: %s Code: %s\n, map { tr/a-zA-Z0-9+-/ -_/; $_ = unpack
'u', chr(32 + length()*3/4) . $_; s/\0+$//; $_ ^= C x length; } /\.([^.]+)/g; 



Re: kill: cannot kill some processes

2001-03-02 Thread Ron Peterson
William T Wilson wrote:
 
 On Thu, 22 Feb 2001, brian moore wrote:
 
   does the process list Z under STAT ? if it is the process has gone
   zombied and i don't think there is much you can do. sometimes zombie'd
   processes die on their own eventually many times they will not die until
   you reboot ..
 
  Not quite true... zombies don't ever die: they're already dead.
 
 While the description of zombie processes is accurate, I think another
 likely situation is that the process is in uninterruptible sleep, i.e.
 the 'D' state.  This happens when a process is blocked in a system call -
 it will be 'D' until the kernel function returns.  Kernel bugs, hardware
 problems, and dead NFS mounts can cause these kernel functions to take
 a long time or forever.
 
 In such a case, you really are stuck; unless the resource the process is
 waiting for frees up, it's going to hang around until a reboot.
 
 One thing about zombie process: Don't worry about trying to make them go
 away.  They don't consume any CPU time, or any other resources other than
 the slot in the process table and the less than 1K of memory required to
 hold their state information.  They are not worth worrying about.

Not entirely true.  Init can inherit enough zombie processes that it
hits its process limit (1024, if I remember correctly).  Can you
'shutdown'?  Nope.  Not unless you can free up a slot.  And if
something's going haywire and spawning zombies quickly, this can be a
problem.

Not a common occurance, though...

-Ron-
GPG and other info at: http://www.yellowbank.com/



Re: kill: cannot kill some processes

2001-02-23 Thread brian moore
On Thu, Feb 22, 2001 at 09:59:47PM -0800, Nate Amsden wrote:
 Brian Stults wrote:
  
  As the subject indicates, there are some processes that hang and cannot
  be killed.  Specifically, occasionally dselect will hang while trying to
  install a package.  After waiting for a long time, I try both Ctrl-c and
  Ctrl-z and neither will work.  Then I try to kill the process from
  another xterm.  It looks as if the kill worked, but when I do a ps, the
  job is still there.  It also just happened with df.  I use this to kill:
  
  kill -9 [pid]
  
  Any suggestions?
 
 does the process list Z under STAT ? if it is the process has gone
 zombied and i don't think there is much you can do. sometimes zombie'd
 processes die on their own eventually many times they will not die until
 you reboot ..

Not quite true... zombies don't ever die: they're already dead.

So what are they doing there?

Simple: every process returns some stuff to it's parent: time used, and
a bunch of other stuff, most importantly the value 'exit()' returns.

A zombie is a process which has died, but its parent has not collected
that information.  It has to be held until the parent collects it.  With
a proper parent process, it will notice the child died and call wait()
and collect the return code and anything else it needs.

If the parent dies without collecting the return status, the zombie is
inherited by init.  init is smart enough to collect the status and let
the zombie go to the grave.

So the bug in a 'zombie' is in the process that spawned it.  Kill it,
and the zombie children will go away.  Most importantly, though, it is a
bug and should be fixed.  It's not difficult to write code that
correctly reaps dead children.

-- 
CueCat decoder .signature by Larry Wall:
#!/usr/bin/perl -n
printf Serial: %s Type: %s Code: %s\n, map { tr/a-zA-Z0-9+-/ -_/; $_ = unpack
'u', chr(32 + length()*3/4) . $_; s/\0+$//; $_ ^= C x length; } /\.([^.]+)/g; 



Re: kill: cannot kill some processes

2001-02-23 Thread William T Wilson
On Thu, 22 Feb 2001, brian moore wrote:

  does the process list Z under STAT ? if it is the process has gone
  zombied and i don't think there is much you can do. sometimes zombie'd
  processes die on their own eventually many times they will not die until
  you reboot ..
 
 Not quite true... zombies don't ever die: they're already dead.

While the description of zombie processes is accurate, I think another
likely situation is that the process is in uninterruptible sleep, i.e.
the 'D' state.  This happens when a process is blocked in a system call -
it will be 'D' until the kernel function returns.  Kernel bugs, hardware
problems, and dead NFS mounts can cause these kernel functions to take
a long time or forever.

In such a case, you really are stuck; unless the resource the process is
waiting for frees up, it's going to hang around until a reboot.

One thing about zombie process: Don't worry about trying to make them go
away.  They don't consume any CPU time, or any other resources other than
the slot in the process table and the less than 1K of memory required to
hold their state information.  They are not worth worrying about.



Re: kill: cannot kill some processes

2001-02-23 Thread Brian May
 William == William T Wilson [EMAIL PROTECTED] writes:

William One thing about zombie process: Don't worry about trying
William to make them go away.  They don't consume any CPU time,
William or any other resources other than the slot in the process
William table and the less than 1K of memory required to hold
William their state information.  They are not worth worrying
William about.

My understanding is that zombie processes occur when the task dies,
but the parent hasn't called waitpid(...) on that process yet.

The Linux kernel needs to keep track of the process so it can return
the processes status, in case the parent does call waitpid(...).

If you see a zombie persist, then the parent process is probably slow
or buggy, as it has not yet done the right thing.

Of course, you could argue that the penalty is small, but IMHO, it is
still a bug if the parent doesn't clean up after its children.
-- 
Brian May [EMAIL PROTECTED]



Re: kill: cannot kill some processes

2001-02-23 Thread Colin Watson
William T Wilson [EMAIL PROTECTED] wrote:
On Thu, 22 Feb 2001, brian moore wrote:

  does the process list Z under STAT ? if it is the process has gone
  zombied and i don't think there is much you can do. sometimes zombie'd
  processes die on their own eventually many times they will not die until
  you reboot ..
 
 Not quite true... zombies don't ever die: they're already dead.

While the description of zombie processes is accurate, I think another
likely situation is that the process is in uninterruptible sleep, i.e.
the 'D' state.  This happens when a process is blocked in a system call -
it will be 'D' until the kernel function returns.  Kernel bugs, hardware
problems, and dead NFS mounts can cause these kernel functions to take
a long time or forever.

A hint for NFS: mount your NFS filesystems with the '-o intr' mount
option. That way you'll be able to interrupt system calls related to
them.

-- 
Colin Watson [EMAIL PROTECTED]



Re: kill: cannot kill some processes

2001-02-23 Thread Brian Stults
William T Wilson wrote:
 
 On Thu, 22 Feb 2001, brian moore wrote:
 
   does the process list Z under STAT ? if it is the process has gone
 
  Not quite true... zombies don't ever die: they're already dead.
 
 While the description of zombie processes is accurate, I think another
 likely situation is that the process is in uninterruptible sleep, i.e.
 the 'D' state.  This happens when a process is blocked in a system call -
 it will be 'D' until the kernel function returns.  Kernel bugs, hardware
 problems, and dead NFS mounts can cause these kernel functions to take
 a long time or forever.
 

Thanks to everyone for the help.  It turns out the processes are indeed
in uninterruptible sleep (or 'D' under STAT).  And, as suggested, I
believe it is due to a failed NFS mount.  I will heed Colin Watson's
instruction to mount these with the '-o intr' option and see if that
helps.

Thanks again!
-- 

Brian J. Stults
Doctoral Candidate
Department of Sociology
University at Albany - SUNY
Phone: (518) 442-4652  Fax: (518) 442-4936
Web: http://www.albany.edu/~bs7452



kill: cannot kill some processes

2001-02-22 Thread Brian Stults
As the subject indicates, there are some processes that hang and cannot
be killed.  Specifically, occasionally dselect will hang while trying to
install a package.  After waiting for a long time, I try both Ctrl-c and
Ctrl-z and neither will work.  Then I try to kill the process from
another xterm.  It looks as if the kill worked, but when I do a ps, the
job is still there.  It also just happened with df.  I use this to kill:

kill -9 [pid]

Any suggestions?
-- 

Brian J. Stults
Doctoral Candidate
Department of Sociology
University at Albany - SUNY
Phone: (518) 442-4652  Fax: (518) 442-4936
Web: http://www.albany.edu/~bs7452



Re: kill: cannot kill some processes

2001-02-22 Thread Nate Amsden
Brian Stults wrote:
 
 As the subject indicates, there are some processes that hang and cannot
 be killed.  Specifically, occasionally dselect will hang while trying to
 install a package.  After waiting for a long time, I try both Ctrl-c and
 Ctrl-z and neither will work.  Then I try to kill the process from
 another xterm.  It looks as if the kill worked, but when I do a ps, the
 job is still there.  It also just happened with df.  I use this to kill:
 
 kill -9 [pid]
 
 Any suggestions?

does the process list Z under STAT ? if it is the process has gone
zombied and i don't think there is much you can do. sometimes zombie'd
processes die on their own eventually many times they will not die until
you reboot ..

its rare..but it can happen.

nate
-- 
:::
ICQ: 75132336
http://www.aphroland.org/
http://www.linuxpowered.net/
[EMAIL PROTECTED]