On Thu, Nov 26, 2020 at 11:25:40 +0100, Stefan G. Weichinger wrote:
> Am 25.11.20 um 20:57 schrieb Nathan Stratton Treadway:
> >On Wed, Nov 25, 2020 at 14:34:17 -0500, Nathan Stratton Treadway wrote:
> >>Also, do you see the same defunct pigz process that Jason reported in
> >>his original post?
> >
> >Am 30.05.18 um 20:21 schrieb Jason L Tibbitts III:
> >>root      2690  9.1  0.0 317692 11020 pts/0    S+   12:38   1:43  |         
> >>  \_ amrecover math -s backup2 -t backup2
> >>root      2996 32.5  0.0      0     0 pts/0    Z+   12:48   2:52  |         
> >>      \_ [pigz] <defunct>
> >>root      2998  3.3  0.0      0     0 pts/0    Z+   12:48   0:17  |         
> >>      \_ [xfsrestore] <defunct>
> >
> >Assuming you are seeing this same behavior: one theory that comes to
> >mind is that pigz could be spawning subprocesses which then somehow
> >confuse amrecover such that it doesn't properly detect when pigz
> >terminates (and just keeps waiting for that to happen, even though it
> >already has happened).
> >
> >I don't know enough about how amrecover spawn the pipes to know how
> >likely that is, but one thing you could try is to kill the amrecover
> >process with a SIGCHLD signal (once it reaches the above "everything is
> >hung" situation) and see if one or both of those defunct processes go
> >away, and if the amrecover process starts doing work again
> >afterwards....
> 
> Not sure how to show the process tree as shown above ...

(I think Jason's output was generated using the "--forest" option to ps,
but really all that matters is the "Z" process state for the two
subprocesses).

> 
> "kill -s SIGCHLD" .. ran it against the PIDs of amrecover and pigz,
> no effect.
>
> pigz isn't even killed by a "-9"
>

The fact that the pigz process is in defunct/"Z"ombie status means it's
already dead and only still exists in the process listing because the
parent process hasn't read the exit code yet.  So even a -9 won't help
(since that process is already dead).
 

I was hoping SIGCHLD on the amrecover process would trick it into
exiting whatever wait-loop it is in and checking for subprocesses that
have already terminated (both pigz and xfsrestore in the above
listing)... but sounds like that didn't work.

                                                        Nathan

----------------------------------------------------------------------------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Reply via email to