Re: [Fwd: RE: ssh problem on Windows XP]
On Jan 21, 6:34pm, Corinna Vinschen wrote: -- Subject: [Fwd: RE: ssh problem on Windows XP] is there any chance that we get a fix in the next couple of weeks? I remain absolutely committed to fixing the problems that have been reported, but I can't say that I'll have a fix in that timeframe, because I have some urgent deadlines for other projects. Maybe early to mid-February? If we don't get a patch, I'm inclined to revert the pipe patch before we release 1.5.13. Instead of reverting the entire patch, if you want to restore the old behavior (select always returning true for writes on pipes), you could add a small piece of code to short-circuit the NtQueryInformationFile logic that I added. That would make it much easier for me to apply my fix when it's available, because I could just remove the short-circuit when I add a test to detect the problem, which I think I understand completely, and have described in an earlier posting: NtQueryInformationFile acts strangely when there is a pending, blocking read on the other end of the pipe. I need time to finish prototyping the new test, however. Btw., didn't you announce more pipe patches yet to come? Is it possible that you already have a patch which will get that working again? I'm still hoping for something more satisfying than reverting... -- End of excerpt from Corinna Vinschen Yes, I have more patches, but they don't fix the outstanding problem with the first patch (I would have certainly sent the fix if I had one). All of my fixes are related to detecting and avoiding deadlocks, and I have some that are not pipe related. In case anyone is curious, let me relate the story of what started all of this ... At Curl (where I work), we have a pool of about a dozen Windows servers that we use for automated builds of our products. Each build starts by rsync-ing the sources from a Linux server to a Windows build server (over ssh), then we launch make via ssh and collect std{out,err} over an ssh channel, and finally the build finishes by rsync-ing the build tree back from the Windows build server to the Linux build server, again over ssh. Our Windows build servers are in almost continuous use. As you can imagine, this setup acts as a severe stress test for Cygwin. Unfortunately, last year Cygwin deadlocks were killing our productivity: at least 25% of our builds were wedging, which was completely unacceptable. So, I rolled up my sleeves, installed a Cygwin DLL with all of the debugging symbols, and went to work investigating and fixing each deadlock that I encountered. It soon became apparent that pipes were a major problem. We observed deadlocks because select for writes on pipes always returned true, so I implemented a fix (that's the first patch). Similar deadlocks occurred because nonblocking writes on pipes could block, so I added an implementation for nonblocking writes too (a second patch, which I submitted, but was never applied, because we wanted to investigate the reported problems with the first patch). Later, I found that Cygwin is burned by unfortunate winsock behavior that we had already encountered in other contexts: it sometimes assigns a local port that can't actually be used immediately, because it is still in TIME_WAIT, so connect fails with EADDRINUSE. I fixed one nasty case where this phenomenon can cause a missed notification in the code for select on sockets (a third patch), and another that caused socketpair to fail sporadically (a fourth patch). The improvement in Cygwin's behavior with these four patches has been dramatic at Curl. Our builds almost never experience deadlocks now. I am eager to contribute them to the rest of the world, but I recognize that I need to fix the first patch before we apply the rest, and I will do it. Our patches have been extensively tested, but we missed the problem that occurs for pending, nonblocking reads, because our automated builds don't use commands like sftp, unison, etc. Most of the other commands seem to use nonblocking I/O on pipes (often with select), and that works with my patches. -- Bob
Re: More pipe problems (was Re: [Fwd: 1.5.11-1: sftp performance problem])
I've been working on this, and I think I now understand the underlying problem, but I don't yet have a fix. The problem is that NtQueryInformationFile is stranger than I ever imagined. First, some background, to be sure we're all speaking the same language: Windows pipes actually seem more similar to UNIX domain sockets than POSIX pipes. In particular, Windows pipes have a server side and a client side. Most of the Windows pipe terminology is from the standpoint of the server. An inbound pipe transfers data only in the client-server direction, an outbound pipe transfers server-client, and a duplex pipe transfers both ways. We use CreateNamedPipe to make an inbound pipe for reading: this is the server side. Only the input buffer size is relevant for such a pipe ... we should probably set the output buffer size to zero to emphasize that it is not used. We currently set both input and output buffer sizes to the same value, which is harmless but misleading. We then use CreateFile to obtain a handle to the other (client) side of the pipe for writing, with attributes to allow NtQueryInformationFile to work. When we call NtQueryInformationFile with the FilePipeLocalInformation parameter (FILE_INFORMATION_CLASS), it fills in a FILE_PIPE_LOCAL_INFORMATION struct, which contains the following fields (among others that are not directly related to pipe buffering): ULONG InboundQuota; ULONG ReadDataAvailable; ULONG OutboundQuota; ULONG WriteQuotaAvailable; The InboundQuota and OutboundQuota are the input and output buffer sizes, respectively, that were specified for CreateNamedPipe. For our pipe configuration, therefore, the OutboundQuota is irrelevant: we should be checking the InboundQuota when we are interested in the pipe size. This does not currently affect us because we set InboundQuota == OutboundQuota, but we should fix it. The ReadDataAvailable field is always zero on the client side of an inbound pipe. We're ignoring it, as we should be. Our code assumes that the WriteQuotaAvailable field tells how much space is available for writing. Well, not quite. For an empty pipe, WriteQuotaAvailable == InboundQuota. When data has been written to the pipe (but not yet read by the other side), then WriteQuotaAvailable is decremented by the appropriate amount, until the pipe is full, when WriteQuotaAvailable is zero. WriteFile blocks only when it tries to write more than WriteQuotaAvailable. All of this is normal and expected. But there is a strange twist: When a read is pending on an empty pipe, then WriteQuotaAvailable is also decremented! I can't imagine why this would be the case, but it is easy to demonstrate using a pair of small test programs that I wrote to experiment with pipe buffering. This behavior is unfortunate, because it means that we aren't currently distinguishing pending reads and writes. However, now that I understand what's really going on, I can focus on working around it. We have been baffled by the apparent contradiction that was nicely summarized by Corinna: Do I understand that right? sftp is in the blocking read on the pipe, there is data in the pipe and nevertheless read doesn't return? That's odd. ReadFile only blocks when the pipe is completely empty. If there is any data in the pipe, then it returns immediately, possibly supplying less data than was requested. When ReadFile blocks, there really isn't any data in the pipe. Our select code is just confused by the decremented WriteQuotaAvailable field, because it looks like a *write* is pending! So we start to throttle the writes, which causes the data transfer to slow down, or even to block (if the requested read was large enough). I think my speculations about POSIX atomicity requirements related to PIPE_BUF were a red herring. Ditto for hypothetical bizarre pipe buffering behavior. The real problem here is NtQueryInformationFile. I want to first try reversing the direction of the pipe. We don't really care which end is the server or client, so we could easily modify our pipe creation code to make an outbound pipe, and if WriteQuotaAvailable from NtQueryInformationFile behaves sanely, then we're done. If that doesn't work, then I have some other ideas about how to distinguish pending reads and writes. I hope to have more info to report soon. -- Bob
Re: [Fwd: 1.5.11-1: sftp performance problem]
On Sep 12, 4:42pm, Corinna Vinschen wrote: -- Subject: Re: [Fwd: 1.5.11-1: sftp performance problem] Do I understand that right? sftp is in the blocking read on the pipe, there is data in the pipe and nevertheless read doesn't return? That's odd. Yes, very. I'm still experimenting, so the following description should be considered tentative: ReadFile seems to want to return the number of bytes requested, not a partial amount, at least the way we're using it with pipes now. Bytes in the pipe seem to be buffered on the write side until there are enough to satisfy the read, or the pipe fills (or maybe there is some other high-water mark, but it is very close to full). The system also seems to empty the buffer after a certain amount of time ... I'm not sure about the interval, but it's fairly long. This all works most of the time. But it interacts badly with the POSIX atomic write requirements related to PIPE_BUF. In particular, select should say that a pipe is not writable when there is PIPE_BUF space available (and our implementation does this). So, when sftp does a large blocking read (at least 16k, which is the size of the pipe buffer), and ssh almost fills the pipe, then select stops further writes from happening, and we only deliver data when the timer expires, which is why transfers are so slow. This also explains why the problem disappears when we use sftp -B to reduce the size of the reads, because the data is delivered before the pipe fills to within PIPE_BUF bytes. And similarly when I increase the size of the pipe buffer. Most other programs that use select or nonblocking I/O aren't affected, because they only try to read whatever is already in the pipe, and they get the data immediately. | I guess this means that local | pipes always do buffering as described in the previous paragraph, and | this can't be disabled using FILE_FLAG_WRITE_THROUGH. Did you try that? I haven't yet, but I will. Disabling buffering would fix the problem. Or if we could somehow control the buffering parameters (the high-water mark or the timer), that would also probably be sufficient. In particular, setting the high-water mark to reserve PIPE_BUF bytes would be perfect. I want to try FlushFileBuffers too, but we can't let it block (hmmm ... maybe in a separate thread?). The real problem is that NtQueryInformationFile returns information about the data buffered on the write side of the pipe, and that doesn't account for the possibility that a large read might already be blocked on the other side, which effectively means that more space is actually available. If we could determine the amount of requested data for the read, that would help. The system must know, since it seems to affect the buffering algorithm, but I don't know offhand how to get at the info. Dunno if that's a *better* idea, but would it be reasonable to try changing pipes to use overlapped I/O? Maybe, but that seems complicated. I'm hoping for something simpler. Or what if a read from a pipe always asks for the number of available bytes using NtQueryInformationFile and then only actually reads this number of bytes and returns that to the caller? -- End of excerpt from Corinna Vinschen That's similar to nonblocking reads, and it might work, and I guess for blocking reads we would never attempt to read more than the size of the pipe minus PIPE_BUF. If we don't want to return partial data, then we could redo reads until all of the data is delivered to the calling program, but I think partial reads should be OK. On the read side, we'd use PeekNamedPipe instead of NtQueryInformationFile. -- Bob
Re: [Fwd: 1.5.11-1: sftp performance problem]
On Sep 11, 3:02pm, Corinna Vinschen wrote: -- Subject: Re: [Fwd: 1.5.11-1: sftp performance problem] http://cygwin.com/acronyms/#PCYMTNQREAIYR Sorry ... done. | One really annoying consequence of socketpairs for sshd is that | Windows-native (i.e. non-Cygwin) programs don't know how to write | directly to sockets on stdout (or stderr), so if you try to use | them via ssh, their output silently disappears. Oh lord! I tried it with `net', `ping' and `nslookup', which are roughly the only Windows applications I use on the command line, and all three have no problems. I'm wondering how the above can happen since the interactive application side is running in a PTY (which *is* a pipe), not on socketpairs. Try it without a pty: ssh cygwin-system-with-sshd-using-sockpairs win32-native-command vs. ssh cygwin-system-with-sshd-using-sockpairs win32-native-command | cat | I think OpenSSH uses pipes on most platforms by default, so I'm also Only on those platforms which have known problems with (or no implementation of) socketpairs. OK, maybe I am misremembering, or the default changed ages ago. I see that sshd on Linux uses socketpairs. | somewhat concerned that the socketpair-specific code in OpenSSH might be | a bit crufty. Why? Socketpairs are in the same shape as any other socket connection since they are using the same code. And I surely do care for socket operations. See the ChangeLogs. What especially do you think is crufty? -- End of excerpt from Corinna Vinschen My concern was about the state of the OpenSSH code base, not Cygwin, and it was just a vague fear, based on the previous (I guess now incorrect) [mis]understanding that OpenSSH prefers to use pipes. | It's not obvious to me why socketpairs would be inherently faster | than pipes. It occurred to me after I wrote this that if no pipes were used at all, then the select_pipe thread would not be spawned ... the select_socket thread would handle everything. This might be win, but we'd need to measure to see if the effect is significant. And we'd still have the problem with win32-native apps, which I think is a show-stopper for socketpairs. -- Bob
Re: [Fwd: 1.5.11-1: sftp performance problem]
On Sep 10, 11:55am, [EMAIL PROTECTED] (Bob Byrnes) wrote: -- Subject: Re: [Fwd: 1.5.11-1: sftp performance problem] ... I don't use sftp much, but we pump *enormous* amounts of data through sshd otherwise, so it's odd that I haven't noticed any performance impact. Interesting. The problem seems to affect *only* sftp, and (as Corinna noted) only inbound transfers. Outbound sftp works fine, as do scp transfers in both directions, and data sent through ssh in general. This explains why I haven't noticed the problem; I generally use scp instead of sftp, because the latter requires SSH protocol 2, and I need to deal with a variety of machines, some of which still use the older SSH protocol 1. I guess it's possible that my patch interacts badly with some other recent submissions; I did most of my testing with an earlier snapshot. -- End of excerpt from Bob Byrnes I can reproduce the problem with the earlier snapshot too, so it isn't caused by anything recent. I'll do some stracing to see what sftp is doing (that scp and ssh are not). -- Bob
Re: [Patch] implementation of nonblocking writes on pipes
On Sep 9, 10:06am, [EMAIL PROTECTED] (Christopher Faylor) wrote: -- Subject: Re: [Patch] implementation of nonblocking writes on pipes Before we start adding more patches which are based on your previous work, could you reply to some of the problems raised in the cygwin mailing list? Sure, I'll do that now (I've fallen a few days behind reading that list). There was one problem with Windows 95 which Corinna fixed but now there is another problem with using rsync, which I thought was one of the impetuses for your patch. -- End of excerpt from Christopher Faylor The win95 problem was unfortunate ... I don't have a win95 system here to test. I'll look more closely at the patch, but I think it is correct. The rsync problem (I assume you are talking about rsync + xp sp2 failing) appears to be unrelated to my patch, because I think the failure is reported for file desriptors from a socketpair, not a pipe. I'll try to confirm that. In each case, I'll send a follow up message to the cygwin mailing list. It's important to realize that my first patch does not fix all hangs related to rsync or ssh. I've identified and fixed at least three distinct deadlock scenarios, and I'm submitting separate patches for each. -- Bob
[Patch] implementation of select for write on pipes
The following patch implements select for write on pipes. Currently, pipes always select writable, which sometimes causes programs like rsync to hang when the pipe is full. This can be observed directly from strace output: http://www.mail-archive.com/[EMAIL PROTECTED]/msg07559.html We have been using this patch successfully for several months for automated builds of our Windows products using Cygwin, so it has been extensively tested. It is the first of several patches that I plan to contribute, all with the goal of preventing Cygwin hangs, and it does seem to really improve the behavior of programs like rsync (and sshd) that use select extensively with pipes. The existing code uses PeekNamedPipe to implement select for read on pipes. For the corresponding write test, I used NtQueryInformationFile to fill in a FILE_PIPE_LOCAL_INFORMATION struct, which contains OutboundQuota and WriteQuotaAvailable fields that allow us to detect when the pipe is full. I fixed the signature and return value for NtQueryInformationFile as well. Unfortunately, NtQueryInformationFile isn't supported by Win9x, so we optimistically assume the pipe is always writable on those systems (i.e., no change from the status quo). NtQueryInformationFile also requires FILE_READ_ATTRIBUTES access on the write side of the pipe, but CreatePipe doesn't set that (at least before WinXP SP2), so I added a new create_selectable_pipe function to do the right thing: it uses CreateNamedPipe for the read side, and CreateFile for the write side (with FILE_READ_ATTRIBUTES). It's important to only allow a single pipe instance, to ensure that the named pipe was not created earlier by some other process. We retry if necessary until we're sure that we really have a new pipe (unlikely to ever happen, but it's nice to be bulletproof). If we somehow inherit a (non-Cygwin) pipe without FILE_READ_ATTRIBUTES, then NtQueryInformationFile will fail, and we again optimistically assume that the pipe is writable in this case. We also have to deal with Win9x, which doesn't support CreateNamedPipe, so we just call CreatePipe on those systems to use an anonymous pipe as the best approximation. (As an aside, I couldn't help noticing that CreatePipe stupidly creates a full duplex pipe, which is a waste, since only a single direction is actually used. We avoid that silliness with our named pipes.) For a pipe to select writable, it must have at least PIPE_BUF bytes available, to satisfy POSIX atomic write requirements. Subsequent writes with size PIPE_BUF can still block, but most (all?) UNIX variants seem to work this way. For experimenting, we use the CYGWIN_PIPEBUFSIZE environment variable to set the buffer size for pipes, with 4*PIPE_BUF == 16k as the (unchanged) default. We refuse to create a tiny pipe with size PIPE_BUF, however, to ensure that there will always be enough space for POSIX atomic writes. If we somehow inherit a tiny (non-Cygwin) pipe, then we consider the pipe writable only if it is completely empty, to minimize the probability that a subsequent write will block. -- Bob Byrnes Curl Corporation 1 Cambridge Center, 10th Floor Cambridge, MA 02142-1612 2004-09-02 Bob Byrnes [EMAIL PROTECTED] * autoload.cc (NtQueryInformationFile): Return nonzero on error. * ntdll.h (FILE_PIPE_LOCAL_INFORMATION): Add. (NtQueryInformationFile): Fix types for last two arguments. * pipe.cc: Include stdlib.h, stdio.h, limits.h, and ntdll.h. (create_selectable_pipe): New function to create a pipe that can be used with NtQueryInformationFile for select. (fhandler_pipe::create): Call create_selectable_pipe instead of CreatePipe. (pipe): Use CYGWIN_PIPEBUFSIZE environment variable if defined, or DEFAULT_PIPEBUFSIZE otherwise. * select.cc: Include limits.h and ntdll.h. (peek_pipe): Add select_printf output. Call NtQueryInformationFile to implement select for write on pipes. (fhandler_pipe::select_read): Reorder field assignments to be consistent with fhandler_pipe::select_write. (fhandler_pipe::select_write): Initialize startup, verify, cleanup, and write_ready fields for select_record. (fhandler_pipe::select_except): Tweak indentation to be consistent with fhandler_pipe::select_write. Index: autoload.cc === RCS file: /cvs/src/src/winsup/cygwin/autoload.cc,v retrieving revision 1.86 diff -u -p -r1.86 autoload.cc --- autoload.cc 24 Jul 2004 09:41:34 - 1.86 +++ autoload.cc 2 Sep 2004 22:06:11 - @@ -379,7 +379,7 @@ LoadDLLfuncEx (NtCreateToken, 52, ntdll, LoadDLLfuncEx (NtMapViewOfSection, 40, ntdll, 1) LoadDLLfuncEx (NtOpenFile, 24, ntdll, 1) LoadDLLfuncEx (NtOpenSection, 12, ntdll, 1) -LoadDLLfuncEx (NtQueryInformationFile, 20, ntdll, 1