Re: Multi Threaded programs deadlock doing simple I/O operations
On Sunday, June 12, 2005 T 5:37 PM, Mark Pizzolato wrote: On Friday, June 10, 2005 at 3:44 PM, Mark Pizzolato wrote: > On Thursday, June 09, 2005 at 6:12 PM, Mark Pizzolato wrote: >> On Thursday, June 09, 2005 at 3:35 PM, Christopher Faylor wrote: >> > On Wed, Jun 08, 2005 at 05:43:59PM -0700, Mark Pizzolato wrote: >> > >There is a serious problem for multi threaded programs doing simple >> > >I/O >> > >operations in cygwin (open, dup, fdopen, fclose, and close). >> > > >> > >The attached 81 line test program clearly demonstrates the issue >> > >(by >> > >hanging and no longer consuming CPU or performing any I/O >> > >operations). >> > >> > Thanks for the relatively small test case. That was enough to track >> > the >> > problem down. I'm generating a new snapshot with a fix for this >> > problem. >> >> The snapshot looks good! >> >> This fixes the stability problems with clamav's clamd that I've been >> chasing >> for a long time. > > Some more follow up here...I'm running with the 20050609 snapshot dll. > > clamav's clamd now runs better than it has ever for me on cygwin. > > until "it doesn't", > > once it starts to run poorly it won't run cleanly again until I reboot > the system > (I haven't actually tried after merely exiting all processes ..) Well, i spoke too soon here. There may be some interaction with many recently closed tcp sessions sitting in TIME_WAIT. I'm not sure, but after some time, I can restart and experience aparrently good behavior and then things get "poor" as described. If I run with the 20050607 snapshot, the new "poor" behavior doesn't happen, while the test program I provided earlier in this thread hangs as described. So, the fix to the original problem and the new "poor" behavior are clearly related to changes between the 20050607 and the 20050609 snapshots. > To be more specific about the "poor" behavior: > > > - pthread_unlock_mutex fails leaving errno with a value of 90. This is > in a place where there is only one path through about a dozen lines of > code and the mutex is definately locked. there may have been a call to > pthread_create, and a definate call to pthread_cond_signal. > - once the above error happens, calls (by the same thread) to accept() > fail using a file descriptor which we've been successfully using all > along and only close when the program exists. > > so some change introduced recently (since 1.5.17-1), and possibly in > 20050609 fixes the dup() issue but now mutex operations are failing in > strange ways. > > Sorry not to have a simple isolated test case for this. The good news > is that once it breaks it won't run correcfly again until a reboot. I'm working on a test program to recreate this behavior. Well... The problem wasn't in cygwin. As it happens in clamav's clamd there were several pthread_mutex_t objects which weren't initialized to reasonable values (i.e. left to be zero instead of PTHREAD_MUTEX_INITIALIZER). Calls to pthread_mutex_lock and pthread_mutex_unlock on the uninitialized objects, depending on timing and sequence aparrently confused some aspect of mutex processing causing other calls to pthread_mutex_lock and pthread_mutex_unlock to fail in strange ways. Appropriate patches have been submitted to the clamav team. - Mark Pizzolato -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: Multi Threaded programs deadlock doing simple I/O operations
On Friday, June 10, 2005 at 3:44 PM, Mark Pizzolato wrote: > On Thursday, June 09, 2005 at 6:12 PM, Mark Pizzolato wrote: >> On Thursday, June 09, 2005 at 3:35 PM, Christopher Faylor wrote: >> > On Wed, Jun 08, 2005 at 05:43:59PM -0700, Mark Pizzolato wrote: >> > >There is a serious problem for multi threaded programs doing simple >> > >I/O >> > >operations in cygwin (open, dup, fdopen, fclose, and close). >> > > >> > >The attached 81 line test program clearly demonstrates the issue (by >> > >hanging and no longer consuming CPU or performing any I/O >> > >operations). >> > >> > Thanks for the relatively small test case. That was enough to track >> > the >> > problem down. I'm generating a new snapshot with a fix for this >> > problem. >> >> The snapshot looks good! >> >> This fixes the stability problems with clamav's clamd that I've been >> chasing >> for a long time. > > Some more follow up here...I'm running with the 20050609 snapshot dll. > > clamav's clamd now runs better than it has ever for me on cygwin. > > until "it doesn't", > > once it starts to run poorly it won't run cleanly again until I reboot > the system > (I haven't actually tried after merely exiting all processes ..) Well, i spoke too soon here. There may be some interaction with many recently closed tcp sessions sitting in TIME_WAIT. I'm not sure, but after some time, I can restart and experience aparrently good behavior and then things get "poor" as described. If I run with the 20050607 snapshot, the new "poor" behavior doesn't happen, while the test program I provided earlier in this thread hangs as described. So, the fix to the original problem and the new "poor" behavior are clearly related to changes between the 20050607 and the 20050609 snapshots. > To be more specific about the "poor" behavior: > > > - pthread_unlock_mutex fails leaving errno with a value of 90. This is > in a place where there is only one path through about a dozen lines of > code and the mutex is definately locked. there may have been a call to > pthread_create, and a definate call to pthread_cond_signal. > - once the above error happens, calls (by the same thread) to accept() > fail using a file descriptor which we've been successfully using all > along and only close when the program exists. > > so some change introduced recently (since 1.5.17-1), and possibly in > 20050609 fixes the dup() issue but now mutex operations are failing in > strange ways. > > Sorry not to have a simple isolated test case for this. The good news > is that once it breaks it won't run correcfly again until a reboot. I'm working on a test program to recreate this behavior. - Mark Pizzolato -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: Multi Threaded programs deadlock doing simple I/O operations
On Thursday, June 09, 2005 at 6:12 PM, Mark Pizzolato wrote: On Thursday, June 09, 2005 at 3:35 PM, Christopher Faylor wrote: > On Wed, Jun 08, 2005 at 05:43:59PM -0700, Mark Pizzolato wrote: > >There is a serious problem for multi threaded programs doing simple I/O > >operations in cygwin (open, dup, fdopen, fclose, and close). > > > >The attached 81 line test program clearly demonstrates the issue (by > >hanging and no longer consuming CPU or performing any I/O operations). > > Thanks for the relatively small test case. That was enough to track the > problem down. I'm generating a new snapshot with a fix for this > problem. The snapshot looks good! This fixes the stability problems with clamav's clamd that I've been chasing for a long time. Some more follow up here...I'm running with the 20050609 snapshot dll. clamav's clamd now runs better than it has ever for me on cygwin. until "it doesn't", once it starts to run poorly it won't run cleanly again until I reboot the system (I haven't actually tried after merely exiting all processes ..) To be more specific about the "poor" behavior: - pthread_unlock_mutex fails leaving errno with a value of 90. This is in a place where there is only one path through about a dozen lines of code and the mutex is definately locked. there may have been a call to pthread_create, and a definate call to pthread_cond_signal. - once the above error happens, calls (by the same thread) to accept() fail using a file descriptor which we've been successfully using all along and only close when the program exists. so some change introduced recently (since 1.5.17-1), and possibly in 20050609 fixes the dup() issue but now mutex operations are failing in strange ways. Sorry not to have a simple isolated test case for this. The good news is that once it breaks it won't run correcfly again until a reboot. Ideas? Thanks. - Mark Pizzolato -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: Multi Threaded programs deadlock doing simple I/O operations
On Thursday, June 09, 2005 at 3:35 PM, Christopher Faylor wrote: On Wed, Jun 08, 2005 at 05:43:59PM -0700, Mark Pizzolato wrote: >There is a serious problem for multi threaded programs doing simple I/O >operations in cygwin (open, dup, fdopen, fclose, and close). > >The attached 81 line test program clearly demonstrates the issue (by >hanging and no longer consuming CPU or performing any I/O operations). Thanks for the relatively small test case. That was enough to track the problem down. I'm generating a new snapshot with a fix for this problem. The snapshot looks good! This fixes the stability problems with clamav's clamd that I've been chasing for a long time. Thanks. - Mark Pizzolato -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: Multi Threaded programs deadlock doing simple I/O operations
On Wed, Jun 08, 2005 at 05:43:59PM -0700, Mark Pizzolato wrote: >There is a serious problem for multi threaded programs doing simple I/O >operations in cygwin (open, dup, fdopen, fclose, and close). > >The attached 81 line test program clearly demonstrates the issue (by >hanging and no longer consuming CPU or performing any I/O operations). Thanks for the relatively small test case. That was enough to track the problem down. I'm generating a new snapshot with a fix for this problem. cgf -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Multi Threaded programs deadlock doing simple I/O operations
There is a serious problem for multi threaded programs doing simple I/O operations in cygwin (open, dup, fdopen, fclose, and close). The attached 81 line test program clearly demonstrates the issue (by hanging and no longer consuming CPU or performing any I/O operations). I'm sure that anyone who ever encountered a stange hang in any program running under cygwin would appreciate a fix for this issue. - Mark Pizzolato #include #include #include #include #include #include pthread_mutex_t log_mutex = PTHREAD_MUTEX_INITIALIZER; void logit(const char *fmt, ...) { va_list args; char buf[1024]; int bytes; buf[sizeof(buf)-1] = '\0'; va_start(args, fmt); bytes = vsnprintf(buf, sizeof(buf)-1, fmt, args); va_end(args); pthread_mutex_lock(&log_mutex); printf("%d:", pthread_self()); printf("%s", buf); pthread_mutex_unlock(&log_mutex); } struct TestIoInfo { int Iterations; int Progress; }; void * TestIoThread (void *arg) { struct TestIoInfo *t = (struct TestIoInfo *)arg; int i, j; int fd, fdd; char FileName[255]; FILE *f; logit("IO Thread %d starting...\n", pthread_self()); snprintf(FileName, sizeof(FileName), "/tmp/TestIoThread-%d-%x", getpid(), pthread_self()); sleep(1); for (j=0; jIterations; ++j) { if ((fd = open(FileName, O_RDWR|O_CREAT|O_TRUNC|O_BINARY, S_IRWXU)) < 0) { logit("Error Opening File: %s - %d\n", FileName, errno); return; } fdd = dup(fd); if ((f = fdopen(fdd, "rb")) == NULL) { logit("Can't open descriptor %d - %d\n", fd, errno); return; } fclose(f); close(fd); if (0 == (j%t->Progress)) { logit("IO Thread %d - %d\n", pthread_self(), j); } } unlink(FileName); logit("IO Thread %d done.\n", pthread_self()); return NULL; } main (int argc, char ** argv) { int threadcount = 4; int progress = 1; pthread_t tid[10]; int i; struct TestIoInfo IoInfo; logit("Testing with %d concurrent threads\n", threadcount); logit("Progress indicated every %d operations...\n", progress); IoInfo.Iterations = 200; IoInfo.Progress = progress; for (i=0; i-- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/