>>Here's a description of a second hang condition we were encountering, along 
>>with a patch for it.
>>
>>
>>The application (pdksh in this case) does a read on a pipe, which eventually 
>>calls pipe.cc fhandler_pipe::read in Thread 1.  This creates a new cygthread 
>>with "read_pipe()" as the function.  Then >it calls th->detach(read_state).
>>
>>When the hang occurs, the new thread gets terminated early, before
>>cygthread::stub() can call "callfunc()".  You see the error message
>>"erroneous thread activation".  I'm not sure what's causing the thread
>>to fail activation, but the result is, the read_state semaphore never
>>gets signalled.
>
>Sorry but this is another band-aid around a problem.  The real problem
>is that the code shouldn't get into the state that you are describing.
>That's why cygwin prints an error message - it is a serious problem.
>Making the code deal gracefully with a problem like this isn't going
>to solve the underlying issue.
>
>If you can figure out what's causing the erroneous thread activation
>then that will be the real culprit.
>
>cgf
>

OK, I believe I've tracked this down.

The problem occurs when we get into a read_pipe cygthread constructor 
(cygthread::cygthread()) with a NULL h and an ev that is signalled.  When this 
condition exists, a hang can occur as follows:

1) Creator thread calls detach().  This waits for pipe_state to be released 
twice
2) read_pipe thread calls read_pipe, reads data, and releases the semaphore 
twice
3) Creator thread goes to WFSO(*this, INFINITE) which returns immediately 
because ev was set when the thread was created.
4) Creator thread initiates another read_pipe cygthread to read more pipe data.

At this point, there's a race: if the Creator thread gets past the 
initialization part of the constuctor, which sets __name(name), BEFORE the 
original read_pipe thread gets to the part of cygthread::stub() that sets 
info->__name = NULL, then you'll see the hang.  The new pipe_read will give the 
"erroneous thread activation" message, and the parent will be stuck waiting for 
data that will never arrive.

The only path that leaves an unused thread structure in a state where h==NULL 
and ev is signalled is cygthread::release().  So the fix is simple:

$ cat cygthread.cc.udiff
--- cygthread.cc.ORIG   2006-02-22 10:57:42.123931300 -0500
+++ cygthread.cc        2006-03-01 12:59:23.255023000 -0500
@@ -268,7 +268,12 @@
 cygthread::release (bool nuke_h)
 {
   if (nuke_h)
+    {
     h = NULL;
+
+    if (ev)
+      ResetEvent (ev);
+    }
 #ifdef DEBUGGING
   __oldname = __name;
   debug_printf ("released thread '%s'", __oldname);


--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

Reply via email to