May I ask what version of Python are you using? If it's < 2.7.4, the EINTR issue is expected.
Excerpts from Mike Hommey's message of 2017-03-26 09:08:25 +0900: > On Sat, Mar 25, 2017 at 10:34:02AM -0700, Gregory Szorc wrote: > > On Sat, Mar 25, 2017 at 4:19 AM, Mike Hommey <m...@glandium.org> wrote: > > > > > Hi, > > > > > > I don't know about you, but occasionally, I've hit "stream ended > > > unexpectedly (got m bytes, expected n)" errors that didn't make sense. > > > Retrying would always work. > > > > > > Recently, I was trying to use signal.setitimer and a signal handler for > > > some memory profiling on git-cinnabar, which uses mercurial as a > > > library, and got "stream ended 4 unexpectedly (got m bytes, expected n)" > > > *very* reproducibly. Like, with an interval timer firing every second, > > > it took only a few seconds to hit the error during a clone. > > > > > > I'm pretty sure this can be reproduced with a similar setup in mercurial > > > itself. > > > > > > Now, the reason this happens in this case is that, the code that fails > > > does: > > > > > > def readexactly(stream, n): > > > '''read n bytes from stream.read and abort if less was available''' > > > s = stream.read(n) > > > if len(s) < n: > > > raise error.Abort(_("stream ended unexpectedly" > > > " (got %d bytes, expected %d)") > > > % (len(s), n)) > > > return s > > > > > > ... and thanks to POSIX, interrupted reads can lead to short reads. So, > > > you request n bytes, and get less, just because something interrupted > > > the process. The problem then is that python doesn't let you know why > > > you just got a short read, and you have to figure that out on your own. > > > > > > The same kind of problem is also true to some extent on writes. > > > > > > Now, the problem is that this piece of code is likely the most visible > > > place where the issue exists, but there are many other places in the > > > mercurial code base that are likely affected. > > > > > > And while the signal.setitimer case is a corner case (and I ended up > > > using a separate thread to work around the problem ; my code wasn't > > > interruption safe either anyways), I wonder if those random "stream > > > ended unexpectedly (got m bytes, expected n)" errors I was getting under > > > normal circumstances are not just a manifestation of the same underlying > > > issue, which is that the code doesn't like interrupted reads. > > > > > > Disclaimer: I'm not going to work on fixing this issue, but I figured > > > I'd let you know, in case someone wants to look into it more deeply. > > > > > > > Thank you for writing this up. This "stream ended unexpectedly" has been a > > thorn in my side for a while, as it comes up frequently in Mozilla's CI > > with a frequency somewhere between 1 in 100-1000. Even retrying failed > > operations multiple times isn't enough to overcome it > > > > I have long suspected interrupted system calls as a likely culprit. > > However, when I initially investigated this a few months ago, I found that > > Python's I/O APIs retry automatically for EINTR. See > > https://hg.python.org/cpython/file/54c93e0fe79b/Lib/socket.py#l365 for > > example. This /should/ make e.g. socket._fileobject.read() resilient > > against signal interruption. (If Python's I/O APIs didn't do this, tons of > > programs would break. Also, the semantics of .read() are such that it is > > always supposed to retrieve all available bytes until EOF - at least for > > some io ABCs. read1() exists to perform at most 1 system call.) > > Note that EINTR is not the only way read() can end from interruption: > > If a read() is interrupted by a signal before it reads any data, it > shall return -1 with errno set to [EINTR]. > > If a read() is interrupted by a signal after it has successfully read > some data, it shall return the number of bytes read. > > From http://pubs.opengroup.org/onlinepubs/9699919799/functions/read.html > > But that's POSIX, Windows is another story. Recv is different too. > > On Sat, Mar 25, 2017 at 12:00:42PM -0700, Gregory Szorc wrote: > > Can you please provide more detailed steps to reproduce? > > > > I added the following code at the top of exchange.pull: > > > > def sighandler(sig, stack): > > pass > > > > import signal > > signal.signal(signal.SIGALRM, sighandler) > > signal.setitimer(signal.ITIMER_REAL, 1.0, 1.0) > > > > However, I was unable to reproduce the "stream ended unexpectedly" failure > > when cloning a Firefox repo from hg.mozilla.org. And I even tried with the > > interval set to 1ms. > > So, I tried to reproduce again with my original testcase, and failed to. > In fact, instead I was getting urllib2.URLError: <urlopen error [Errno 4] > Interrupted system call> errors. Delaying the initial SIGALRM to go past > the start of the HTTP request didn't fail either. Even small intervals. > > Then I tried on another machine and it happened reliably... and I > figured it was because I had a config difference between the two > machines, where in the latter case git-cinnabar would not use mercurial > to talk to the server (but it would still use it to read the bundle2, > and fail in mercurial.changegroup.readexactly). > > The main difference is that in the latter case, the underlying > "connection" is a pipe instead of a socket. This clearly influences the > behavior on interruption from a signal. It's actually rather easy to get > short reads with pipes. Short reads from recv are probably more > difficult to reproduce. > > Mike _______________________________________________ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel