On Jan 21, 6:34pm, Corinna Vinschen wrote: -- Subject: [Fwd: RE: ssh problem on Windows XP] > > is there any chance that we get a fix in the next couple of weeks?
I remain absolutely committed to fixing the problems that have been reported, but I can't say that I'll have a fix in that timeframe, because I have some urgent deadlines for other projects. Maybe early to mid-February? > If we don't get a patch, I'm inclined to revert the pipe patch before > we release 1.5.13. Instead of reverting the entire patch, if you want to restore the old behavior (select always returning true for writes on pipes), you could add a small piece of code to "short-circuit" the NtQueryInformationFile logic that I added. That would make it much easier for me to apply my fix when it's available, because I could just remove the "short-circuit" when I add a test to detect the problem, which I think I understand completely, and have described in an earlier posting: NtQueryInformationFile acts strangely when there is a pending, blocking read on the other end of the pipe. I need time to finish prototyping the new test, however. > Btw., didn't you announce more pipe patches yet to come? Is it possible > that you already have a patch which will get that working again? I'm > still hoping for something more satisfying than reverting... > -- End of excerpt from Corinna Vinschen Yes, I have more patches, but they don't fix the outstanding problem with the first patch (I would have certainly sent the fix if I had one). All of my fixes are related to detecting and avoiding deadlocks, and I have some that are not pipe related. In case anyone is curious, let me relate the story of what started all of this ... At Curl (where I work), we have a pool of about a dozen Windows servers that we use for automated builds of our products. Each build starts by rsync-ing the sources from a Linux server to a Windows build server (over ssh), then we launch make via ssh and collect std{out,err} over an ssh channel, and finally the build finishes by rsync-ing the build tree back from the Windows build server to the Linux build server, again over ssh. Our Windows build servers are in almost continuous use. As you can imagine, this setup acts as a severe stress test for Cygwin. Unfortunately, last year Cygwin deadlocks were killing our productivity: at least 25% of our builds were wedging, which was completely unacceptable. So, I rolled up my sleeves, installed a Cygwin DLL with all of the debugging symbols, and went to work investigating and fixing each deadlock that I encountered. It soon became apparent that pipes were a major problem. We observed deadlocks because select for writes on pipes always returned true, so I implemented a fix (that's the first patch). Similar deadlocks occurred because nonblocking writes on pipes could block, so I added an implementation for nonblocking writes too (a second patch, which I submitted, but was never applied, because we wanted to investigate the reported problems with the first patch). Later, I found that Cygwin is burned by unfortunate winsock behavior that we had already encountered in other contexts: it sometimes assigns a local port that can't actually be used immediately, because it is still in TIME_WAIT, so connect fails with EADDRINUSE. I fixed one nasty case where this phenomenon can cause a missed notification in the code for select on sockets (a third patch), and another that caused socketpair to fail sporadically (a fourth patch). The improvement in Cygwin's behavior with these four patches has been dramatic at Curl. Our builds almost never experience deadlocks now. I am eager to contribute them to the rest of the world, but I recognize that I need to fix the first patch before we apply the rest, and I will do it. Our patches have been extensively tested, but we missed the problem that occurs for pending, nonblocking reads, because our automated builds don't use commands like sftp, unison, etc. Most of the other commands seem to use nonblocking I/O on pipes (often with select), and that works with my patches. -- Bob