On Wed, 8 May 2024, Jeremy Drake wrote:

> (this is the same issue discussed in
> https://cygwin.com/pipermail/cygwin-patches/2024q1/012621.html)
>
> On MSYS2, running on Windows on ARM64 only, we've been plagued by issues
> with processes hanging up.  Usually pacman, when it is trying to validate
> signatures with gpgme.  When a process is hung in this way, no debugger
> seems to be able to attach properly.
>
> > anecdotally, the hang occurs when _exit() calls
> > proc_terminate() which is then blocked by a call to TerminateThread()
> > with an invalid thread handle (for more details, see
> > https://github.com/msys2/msys2-autobuild/issues/62#issuecomment-1951796327).


As a follow-up to this, that was from a proposed workaround of just
commenting out the double-fork behavior in gpgme.  After reading a comment
in the code and doing some research online, it seems the double-fork is an
accepted idiom on posix to avoid having to wait for the (grand)child,
without creating zombie processes.  I was unable to see zombie processes
in ps or /proc/<pid>, but I did see extra cygpid.* entries in
/proc/sys/BaseNamedObjects/cygwin* which seem to be much the same thing.

Today, I was attempting to look at the TerminateThread situation.  The
call in question comes from the attempt to terminate the wait_thread of a
chld_procs entry.  I noticed elsewhere in cygwin code (flock.cc) that
CancelSynchronousIo was being called, and that stood out to me because
chances are that the wait thread (if running) is going to be blocked in
ReadFile.  I am testing with the following hack, and so far have not seen
a hang:
diff --git a/winsup/cygwin/sigproc.cc b/winsup/cygwin/sigproc.cc
index 86e4e607ab..020906d797 100644
--- a/winsup/cygwin/sigproc.cc
+++ b/winsup/cygwin/sigproc.cc
@@ -410,7 +410,7 @@ proc_terminate ()
          if (!have_execed || !have_execed_cygwin)
            chld_procs[i]->ppid = 1;
          if (chld_procs[i].wait_thread)
-           chld_procs[i].wait_thread->terminate_thread ();
+           CancelSynchronousIo (chld_procs[i].wait_thread->thread_handle ());
          /* Release memory associated with this process unless it is 'myself'.
             'myself' is only in the chld_procs table when we've execed.  We
             reach here when the next process has finished initializing but we


As a disclaimer, I am having a hard time wrapping my head around this
code, so I don't know what kind of side-effects this may have, but it does
seem to help the hang, without resulting in "zombie" cygpid entries.

(Note that I first tried
+             if (CancelSynchronousIo (chld_procs[i].wait_thread->thread_handle 
()))
+               chld_procs[i].wait_thread->detach ();
+             else
+               chld_procs[i].wait_thread->terminate_thread ();
but that resulted in a (debuggable) hang in detach, because the
cygthread::stub was waiting for thread_sync, while cygthread::detach was
waiting for *this.  That appears to be because this is an auto-releasing
cygthread.  It kind of bothers me that there is no synchronization to be
sure the wait_thread is done shutting down before moving on in
proc_terminate, but I don't see an obvious way in the current structure).

Reply via email to