Hi Corinna,

On 8/5/2014 2:40 PM, Corinna Vinschen wrote:
Hi Ken,

On Aug  5 13:55, Ken Brown wrote:
On 8/5/2014 9:58 AM, Corinna Vinschen wrote:
On Aug  5 08:21, Ken Brown wrote:
=== modified file 'src/gmalloc.c'
--- src/gmalloc.c       2014-03-04 19:02:49 +0000
+++ src/gmalloc.c       2014-08-05 01:35:38 +0000
@@ -490,8 +490,8 @@
  }

  #ifdef USE_PTHREAD
-pthread_mutex_t _malloc_mutex = PTHREAD_MUTEX_INITIALIZER;
-pthread_mutex_t _aligned_blocks_mutex = PTHREAD_MUTEX_INITIALIZER;
+pthread_mutex_t _malloc_mutex;
+pthread_mutex_t _aligned_blocks_mutex;
  int _malloc_thread_enabled_p;

  static void
@@ -526,8 +526,11 @@
       initialized mutexes when they are used first.  To avoid such a
       situation, we initialize mutexes here while their use is
       disabled in malloc etc.  */
-  pthread_mutex_init (&_malloc_mutex, NULL);
-  pthread_mutex_init (&_aligned_blocks_mutex, NULL);
+  pthread_mutexattr_t attr1, attr2;
+  pthread_mutexattr_settype (&attr1, PTHREAD_MUTEX_NORMAL);
+  pthread_mutexattr_settype (&attr2, PTHREAD_MUTEX_NORMAL);
+  pthread_mutex_init (&_malloc_mutex, &attr1);
+  pthread_mutex_init (&_aligned_blocks_mutex, &attr2);
    pthread_atfork (malloc_atfork_handler_prepare,
                   malloc_atfork_handler_parent,
                   malloc_atfork_handler_child);


The first hunk avoids the double initialization, but I don't understand why
the second hunk does anything.  Since PTHREAD_MUTEX_NORMAL is now the
default, shouldn't calling pthread_mutex_init with NULL second argument be
equivalent to my calls to pthread_mutexattr_settype?  Does this indicate a
Cygwin bug, or am I misunderstanding something?

AFAICS you're missing something.  Your pthread_mutexattr_t attr1, attr2
are not initialized.  They contain some random values, thus they are not
good objects.  The calls to pthread_mutexattr_settype as well as the
calls to pthread_mutex_init will fail with EINVAL, but you won't see it
due to missing error handling, and you end up without mutexes at all.
If you call pthread_mutexattr_init before calling
pthread_mutexattr_settype the situation shoul;d be the same as before.

Thanks for catching my mistake.  Your earlier suggestion about explicitly
setting the pthread_mutexes to be ERRORCHECK mutexes seems to fix the
problem (as long as I remember to call pthread_mutexattr_init).  The revised
patch is attached.  I went back to using both the static and dynamic
initializations as in the original code, since you said that's harmless.

I'm glad to read that, but I'm still a little bit concerned.  If your
code works with ERRORCHECK mutexes but hangs with NORMAL mutexes, you
*might* miss an error case.

I'd suggest to tweak the pthread_mutex_lock/unlock calls and log the
threads calling it.  It looks like the same thread calls malloc from
malloc for some reason and it might be interesting to learn how that
happens and if it's really ok in this scenario, because it seems to
be unexpected by the code.

I think I found the problem with NORMAL mutexes. emacs calls pthread_atfork after initializing the mutexes, and the resulting 'prepare' handler locks the mutexes. (The parent and child handlers unlock them.) So when emacs calls fork, the mutexes are locked, and shortly thereafter the Cygwin DLL calls calloc, leading to a deadlock. Here's a gdb backtrace showing the sequence of calls:

#0  malloc (size=size@entry=40) at gmalloc.c:919
#1  0x0053fc28 in calloc (nmemb=1, size=40) at gmalloc.c:1510
#2  0x61082074 in calloc (nmemb=1, size=40)
    at /usr/src/debug/cygwin-1.7.31-3/winsup/cygwin/malloc_wrapper.cc:100
#3  0x61003177 in operator new (s=s@entry=40)
    at /usr/src/debug/cygwin-1.7.31-3/winsup/cygwin/cxx.cc:23
#4  0x610fc9d3 in pthread_mutex::init (mutex=0x61187d34 <reent_data+852>,
    attr=0x0, initializer=0x12)
    at /usr/src/debug/cygwin-1.7.31-3/winsup/cygwin/thread.cc:3118
#5  0x610fcc13 in pthread_mutex_lock (mutex=0x61187d34 <reent_data+852>)
    at /usr/src/debug/cygwin-1.7.31-3/winsup/cygwin/thread.cc:3170
#6  0x611319d8 in __fp_lock (ptr=0x61187cd0 <reent_data+752>)
    at /usr/src/debug/cygwin-1.7.31-3/newlib/libc/stdio/findfp.c:287
#7  0x61154f75 in _fwalk (ptr=0x28d544,
    function=function@entry=0x611319c0 <__fp_lock>)
    at /usr/src/debug/cygwin-1.7.31-3/newlib/libc/stdio/fwalk.c:50
#8  0x61131dea in __fp_lock_all ()
    at /usr/src/debug/cygwin-1.7.31-3/newlib/libc/stdio/findfp.c:307
#9  0x610fa45e in pthread::atforkprepare ()
    at /usr/src/debug/cygwin-1.7.31-3/winsup/cygwin/thread.cc:2031
#10 0x61076292 in lock_pthread (this=<synthetic pointer>)
    at /usr/src/debug/cygwin-1.7.31-3/winsup/cygwin/sigproc.h:137
#11 hold_everything (x=<synthetic pointer>, this=<synthetic pointer>)
    at /usr/src/debug/cygwin-1.7.31-3/winsup/cygwin/sigproc.h:169
#12 fork () at /usr/src/debug/cygwin-1.7.31-3/winsup/cygwin/fork.cc:582

Is there a better way to deal with this issue than using ERRORCHECK mutexes?

Ken

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

Reply via email to