Re: Randomly hang when compiling Cygwin on Cygwin on Wine
On Mon, Nov 2, 2015 at 10:30 PM, Corinna Vinschen wrote: > I don't answer this question conclusively before consulting my lawyer. Me too, but my lawyer was much more slower, his name is Mr. Flash [1]. ;) > It *might* be possible. See fhandler_base_overlapped::close(), > fhandler_base_overlapped::check_later() and flush_async_io(), all three > in fhandler.cc. Thanks a lot, your information helps a lot. The bug is hard, it takes a lot of time to reproduce, but finally we got it! It is a race condition in the Wine side: Firstly, the reader process calls NtReadFile(), in Wine's implementation, NtReadFile() tries to read some data using read_unix_fd() first, if there is enough data needed it will returns STATUS_SUCCESS immediately, if no enough data available it will register an async read service using register_async() and returns STATUS_PENDING. Ideally this async service will either get some data and wake up the reader process if more data coming form writer, or get nothing but wake up the reader process telling the pipe is broken if the writer close the pipe. However, unfortunately, in some very rare case, the writer process close the writer handle at a very special time point, a little bit later then read_unix_fd() but a little bit sooner than register_async(), as a result, the async read service in the reader side has no chance to be noticed about closing of writer any more because it is already closed. That's why the reader process waits forever on the async overlapped event. The solution we worked out is: If the writer is closed, then the reader register_async() should fail with STATUS_PIPE_BROKEN immediately, so the reader won't try to call WaitForSingleObject on the async overlapped event at all. Patch from Sebastian Lackner: Subject: server: Do not allow to queue async operation for broken pipes. https://github.com/wine-compholio/wine-staging/commit/7b40b6922a0c7a93ff62b69dd2b3c2a6076be851 We've tested this patches for a couple of weeks, and it has been confirmed working, gcc compiles more than 64 times without any hanging! I'm happy to close this thread, next goal will be other Wine+Cygwin bugs... :) [1] http://disney.wikia.com/wiki/Flash -- Regards, Qian Hong - http://www.winehq.org -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: Randomly hang when compiling Cygwin on Cygwin on Wine
On Nov 2 03:08, Qian Hong wrote: > Update: > > I have another case of hang: > > ?(1)-+-bash(1419290) > |-bash(1419318)---make(1423770)---sh(1423783)---sh(1423838) > |-mintty(1249269)---bash(1249277) > > |-mintty(47)---bash(55)---bash(1132827)---make(1139169)---sh(1139182)---make(1139234)---sh(1206695)---sh(1206721)---make(1206764)---sh(1206770)---make(1206783)---perl(1249042) > |-mintty(596773)---bash(596790)---pstree(1423865) > `-mintty(1417449)---bash(1417457) > > > $ ps aux | grep 1249042 > 1249042 1206783 11328271249050 pty0 197608 Oct 31 /usr/bin/perl > 1249077 1249042 11328271249092 pty0 197608 Oct 31 > /usr/bin/nm > > > > Here the process perl(1249042) is waiting for a pipe, while its child > process nm(1249077) is in the status "defunct". nm(1249077) appears in > output of `ps aux`, but doesn't appear in output of `pstree -p -l`. > > Is there any case, where a child process can exit unexpected, but the > parent process still incorrectly wait for the child process forever? I don't answer this question conclusively before consulting my lawyer. It *might* be possible. See fhandler_base_overlapped::close(), fhandler_base_overlapped::check_later() and flush_async_io(), all three in fhandler.cc. No, I'm not happy with using overlapped IO for pipes. I'd prefer to use nonblocking pipes instead, but the most required resource to fix this, which is least available, is time :( Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat pgp6kfIM6T8sS.pgp Description: PGP signature
Re: Randomly hang when compiling Cygwin on Cygwin on Wine
Update: I have another case of hang: ?(1)-+-bash(1419290) |-bash(1419318)---make(1423770)---sh(1423783)---sh(1423838) |-mintty(1249269)---bash(1249277) |-mintty(47)---bash(55)---bash(1132827)---make(1139169)---sh(1139182)---make(1139234)---sh(1206695)---sh(1206721)---make(1206764)---sh(1206770)---make(1206783)---perl(1249042) |-mintty(596773)---bash(596790)---pstree(1423865) `-mintty(1417449)---bash(1417457) $ ps aux | grep 1249042 1249042 1206783 11328271249050 pty0 197608 Oct 31 /usr/bin/perl 1249077 1249042 11328271249092 pty0 197608 Oct 31 /usr/bin/nm Here the process perl(1249042) is waiting for a pipe, while its child process nm(1249077) is in the status "defunct". nm(1249077) appears in output of `ps aux`, but doesn't appear in output of `pstree -p -l`. Is there any case, where a child process can exit unexpected, but the parent process still incorrectly wait for the child process forever? Thanks! -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Randomly hang when compiling Cygwin on Cygwin on Wine
Dear folks, I try to compile Cygwin on Cygwin on Wine in a loop, and found a random hang. Usually, it happens more than 1/10. The hang could happen in different processes, including `sh.exe`, `gcc.exe`, `as.exe`, etc, but the backtraces are all similar. Here is one example I get now: (1) `pstree -p -l` suggest the hanging process might has PID 113: ?(1)─┬─bash(58)───make(93)───sh(89)───make(69)───sh(74)───sh(61)───make(9)───make(44)───sh(129)───sh(145)───make(94)───sh(130)───sh(104)───make(151)───sh(113) └─mintty(91)───bash(102)───pstree(141) (2) `cat /proc/113/cmdline` show below information: /bin/sh -c gcc -L/newlib-cygwin-all/newlib-cygwin-build/i686-pc-cygwin/winsup/cygwin -isystem /newlib-cygwin-all/newlib-cygwin/winsup/cygwin/include -B/newlib-cygwin-all/newlib-cygwin-build/i686-pc-cygwin/newlib/ -isystem /newlib-cygwin-all/newlib-cygwin-build/i686-pc-cygwin/newlib/targ-include -isystem /newlib-cygwin-all/newlib-cygwin/newlib/libc/include -I/newlib-cygwin-all/newlib-cygwin/winsup/cygwin/include -DPACKAGE_NAME=\"newlib\" -DPACKAGE_TARNAME=\"newlib\" -DPACKAGE_VERSION=\"2.2.0\" -DPACKAGE_STRING=\"newlib\ 2.2.0\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -I. -I/newlib-cygwin-all/newlib-cygwin/newlib/libc/xdr -DHAVE_OPENDIR -DHAVE_RENAME -DSIGNAL_PROVIDED -D_COMPILING_NEWLIB -DHAVE_BLKSIZE -DHAVE_FCNTL -DMALLOC_PROVIDED -fno-builtin -g -O2 -c -o lib_a-xdr_float.o `test -f 'xdr_float.c' || echo '/newlib-cygwin-all/newlib-cygwin/newlib/libc/xdr/'`xdr_float.c (3) I cd to the build target directory and list all files: /newlib-cygwin-all/newlib-cygwin-build/i686-pc-cygwin/newlib/libc/xdr$ ls lib_a-dummy.o lib_a-xdr.o lib_a-xdr_private.o Makefile The above list result suggest that the build target "lib_a-xdr_float.o" hasn't been generated, which suggested that the `gcc` process in the above command hasn't run. (4) Then I check the file descriptors opened by process 113: $ file /proc/113/fd/* /proc/113/fd/0: broken symbolic link to /dev/cons0 /proc/113/fd/1: broken symbolic link to /dev/cons0 /proc/113/fd/2: broken symbolic link to /dev/cons0 /proc/113/fd/3: symbolic link to pipe:[296] (5) I use gdb to attach to the process 113: [New Thread 64.0x3c] [New Thread 64.0x4c] [New Thread 64.0x51] Reading symbols from /usr/bin/sh.exe...Reading symbols from /usr/lib/debug//usr/bin/sh.exe.dbg...done. done. 0xb77a0be6 in ?? () (gdb) bt #0 0xb77a0be6 in __kernel_vsyscall+0xe in [vdso].so #1 0x7bc82ea5 in server_select+0x315 [/home/fracting/src/wine-patched-auto/dlls/ntdll/server.c:642] in ntdll #2 0x7bc8ade9 in wait_objects+0x89 [/home/fracting/src/wine-patched-auto/dlls/ntdll/sync.c:1034] in ntdll #3 0x7bc8d5e9 in __syscall_NtWaitForMultipleObjects+0x29 [/home/fracting/src/wine-patched-auto/dlls/ntdll/sync.c:1047] in ntdll #4 0x7b87bbe5 in WaitForMultipleObjectsEx+0x1b5 [/home/fracting/src/wine-patched-auto/dlls/kernel32/sync.c:186] in kernel32 #5 0x7b87ba21 in WaitForMultipleObjects+0x31 [/home/fracting/src/wine-patched-auto/dlls/kernel32/sync.c:146] in kernel32 #6 0x61006354 in cygwait(void*, _LARGE_INTEGER*, unsigned int)@12 (object=, timeout=0x7bc4861c, mask=2076474397) at /usr/src/debug/cygwin-2.2.1-1/winsup/cygwin/cygwait.cc:76 #7 0x6130a188 in ?? () from /usr/bin/cygwin1.dll #8 0x in ?? () (gdb) thread 2 [Switching to thread 2 (Thread 64.0x4c)] #0 0xb77a0be6 in ?? () (gdb) bt #0 0xb77a0be6 in __kernel_vsyscall+0xe in [vdso].so #1 0x7bc4aadc in __syscall_NtReadFile+0x41c [/media/workspace/src/wine-patched-auto/dlls/ntdll/file.c:1104] in ntdll #2 0x7b83f63d in ReadFile+0x1c6 [/home/fracting/src/wine-patched-auto/dlls/kernel32/file.c:448] in kernel32 #3 0x610efbae in wait_sig () at /usr/src/debug/cygwin-2.2.1-1/winsup/cygwin/sigproc.cc:1252 #4 0x611b6c40 in cygthread::main_thread_id () from /usr/bin/cygwin1.dll (gdb) thread 3 [Switching to thread 3 (Thread 64.0x51)] #0 0xb77a0be6 in ?? () (gdb) bt #0 0xb77a0be6 in __kernel_vsyscall+0xe in [vdso].so #1 0x7bc82ea5 in server_select+0x315 [/home/fracting/src/wine-patched-auto/dlls/ntdll/server.c:642] in ntdll #2 0x7bc8ade9 in wait_objects+0x89 [/home/fracting/src/wine-patched-auto/dlls/ntdll/sync.c:1034] in ntdll #3 0x7bc8d5e9 in __syscall_NtWaitForMultipleObjects+0x29 [/home/fracting/src/wine-patched-auto/dlls/ntdll/sync.c:1047] in ntdll #4 0x7b87bbe5 in WaitForMultipleObjectsEx+0x1b5 [/home/fracting/src/wine-patched-auto/dlls/kernel32/sync.c:186] in kernel32 #5 0x7b87b9a3 in WaitForSingleObject+0x2e [/home/fracting/src/wine-patched-auto/dlls/kernel32/sync.c:126] in kernel32 #6 0x6100548a in cygthread::stub(void*)@4 (arg=0xacce64) at /usr/src/debug/cygwin-2.2.1-1/winsup/cygwin/cygthread.cc:114 #7 0x61005400 in cygthread::release(bool) () at /usr/src/debug/cygwin-2.2.1-1/winsup/cygwin/cygthread.cc:292 #8 0x00acce64 in ?? () #9 0x0051 in ?? () Here are the version information I have: $ uname -a CYGWIN_NT-5.1 2.2.1(0.289/5/3) 2015-08-20 11:40 i686 Cygwin $ sh --version GNU bash, version 4.3.4