Team, On Mon, 6 Aug 2018, SZEDER Gábor wrote:
> [Resending with Clemens' last used email address. > Clemens, please consider sending a patch to update our .mailmap file.] > > > On Mon, Aug 6, 2018 at 5:11 PM SZEDER Gábor <szeder....@gmail.com> wrote: > > > > Travis CI changed its default OSX image to use XCode 9.4 on 2018-07-31 > > [1]. Since then OSX build jobs fail rather frequently because of a > > SIGPIPE in the tests 'fetch notices corrupt pack' or 'fetch notices > > corrupt idx' in 't5570-git-daemon.sh' [2]. I think this is a symptom > > a real bug in Git affecting other platforms as well, but these tests > > are too lax to catch it. I am seeing this very frequently now, as it feels like failing in the Azure Pipeline about half of the time. The symptom is this: -- snip -- ++ cp -R '/Users/vsts/agent/2.146.0/work/1/s/t/trash directory.t5570-git-daemon/repo/repo_pack.git' '/Users/vsts/agent/2.146.0/work/1/s/t/trash directory.t5570-git-daemon/repo/repo_bad2.git' ++ cd '/Users/vsts/agent/2.146.0/work/1/s/t/trash directory.t5570-git-daemon/repo/repo_bad2.git' +++ ls objects/pack/pack-b2175ab283c0abe358895777ca01c36e88a92e35.idx ++ p=objects/pack/pack-b2175ab283c0abe358895777ca01c36e88a92e35.idx ++ chmod u+w objects/pack/pack-b2175ab283c0abe358895777ca01c36e88a92e35.idx ++ printf %0256d 0 ++ dd of=objects/pack/pack-b2175ab283c0abe358895777ca01c36e88a92e35.idx bs=256 count=1 seek=1 conv=notrunc 1+0 records in 1+0 records out 256 bytes transferred in 0.000018 secs (14316558 bytes/sec) ++ mkdir repo_bad2.git ++ cd repo_bad2.git ++ git --bare init Initialized empty Git repository in /Users/vsts/agent/2.146.0/work/1/s/t/trash directory.t5570-git-daemon/repo_bad2.git/ ++ test_must_fail git --bare fetch git://127.0.0.1:5570/repo_bad2.git ++ case "$1" in ++ _test_ok= ++ git --bare fetch git://127.0.0.1:5570/repo_bad2.git [13879] Connection from 127.0.0.1:60504 [13879] Extended attribute "host": 127.0.0.1:5570 [13879] Extended attribute "protocol": version=0 [13879] Request upload-pack for '/repo_bad2.git' [13879] error: non-monotonic index ./objects/pack/pack-b2175ab283c0abe358895777ca01c36e88a92e35.idx [13879] error: non-monotonic index ./objects/pack/pack-b2175ab283c0abe358895777ca01c36e88a92e35.idx [13879] error: non-monotonic index ./objects/pack/pack-b2175ab283c0abe358895777ca01c36e88a92e35.idx [13879] error: non-monotonic index ./objects/pack/pack-b2175ab283c0abe358895777ca01c36e88a92e35.idx [13879] error: refs/heads/master does not point to a valid object! [13879] error: non-monotonic index ./objects/pack/pack-b2175ab283c0abe358895777ca01c36e88a92e35.idx [13879] error: non-monotonic index ./objects/pack/pack-b2175ab283c0abe358895777ca01c36e88a92e35.idx [13879] error: refs/heads/other does not point to a valid object! [13879] error: non-monotonic index ./objects/pack/pack-b2175ab283c0abe358895777ca01c36e88a92e35.idx [13879] error: non-monotonic index ./objects/pack/pack-b2175ab283c0abe358895777ca01c36e88a92e35.idx [13879] fatal: git upload-pack: not our ref b5c2e4226db03597a64815fd226648510b22ba40 [11833] [13879] Disconnected (with error) ++ exit_code=141 ++ test 141 -eq 0 ++ test_match_signal 13 141 ++ test 141 = 141 ++ return 0 ++ list_contains '' sigpipe ++ case ",$1," in ++ return 1 ++ test 141 -gt 129 ++ test 141 -le 192 ++ echo 'test_must_fail: died by signal 13: git --bare fetch git://127.0.0.1:5570/repo_bad2.git' test_must_fail: died by signal 13: git --bare fetch git://127.0.0.1:5570/repo_bad2.git ++ return 1 error: last command exited with $?=1 not ok 10 - fetch notices corrupt idx # # cp -R "$GIT_DAEMON_DOCUMENT_ROOT_PATH"/repo_pack.git "$GIT_DAEMON_DOCUMENT_ROOT_PATH"/repo_bad2.git && # (cd "$GIT_DAEMON_DOCUMENT_ROOT_PATH"/repo_bad2.git && # p=$(ls objects/pack/pack-*.idx) && # chmod u+w $p && # printf %0256d 0 | dd of=$p bs=256 count=1 seek=1 conv=notrunc # ) && # mkdir repo_bad2.git && # (cd repo_bad2.git && # git --bare init && # test_must_fail git --bare fetch "$GIT_DAEMON_URL/repo_bad2.git" && # test 0 = $(ls objects/pack | wc -l) # ) # -- snap -- Any ideas how to fix this test, anyone? Ciao, Dscho > > > > What it boils down to is this sequence: > > > > - The test first prepares a repository containing a corrupt pack, > > ready to be server via 'git daemon'. > > > > - Then the test runs 'test_must_fail git fetch ....', which connects > > to 'git daemon', which forks 'git upload-pack', which then > > advertises refs (only HEAD) and capabilities. So far so good. > > > > - 'git fetch' eventually calls fetch-pack.c:find_common(). The > > first half of this function assembles a request consisting of a > > want and a flush pkt-line, and sends it via a send_request() call. > > > > At this point the scheduling becomes important: let's suppose that > > fetch is slow and upload-pack is fast. > > > > - 'git upload-pack' receives the request, parses the want line, > > notices the corrupt pack, responds with an 'ERR upload-pack: not > > our ref' pkt-line, and die()s right away. > > > > - 'git fetch' finally approaches the end of the function, where it > > attempts to send a done pkt-line via another send_request() call > > through the now closing TCP socket. > > > > - What happens now seems to depend on the platform: > > > > - On Linux, both on my machine and on Travis CI, it shows textbook > > example behaviour: write() returns with error and sets errno to > > ECONNRESET. Since it happens in write_or_die(), 'git fetch' > > die()s with 'fatal: write error: Connection reset by peer', and > > doesn't show the error send by 'git upload-pack'; how could it, > > it doesn't even get as far to receive upload-pack's ERR > > pkt-line. > > > > The test only checks that 'git fetch' fails, but it doesn't > > check whether it failed with the right error message, so the > > test still succeeds. Had it checked the error message as well, > > we most likely had noticed this issue already, it doesn't happen > > all that rarely. > > > > - On the new OSX images with XCode 9.4 on Travis CI the write() > > triggers SIGPIPE right away, and 'test_must_fail' notices it and > > fails the test. I couldn't see any sign of an ECONNRESET or any > > other error that we could act upon to avoid the SIGPIPE. > > > > - On OSX with XCode 9.2 on Travis CI there is neither SIGPIPE, nor > > ECONNRESET, but sending the request actually succeeds even > > though there is no process on the other end of the socket > > anymore. 'git fetch' then simply continues execution, reads and > > parses the ERR pkt-line, and then dies()s with 'fatal: remote > > error: upload-pack: not our ref'. So, on the face of it, it > > shows the desired behaviour, but I have no idea how that write() > > could succeed instead of returning error. > > > > I don't know what happens on a real Mac as I don't have access to one; > > I figured out all the above by enabling packet tracing, adding a > > couple of well placed tracing printf() and sleep() calls, running a > > bunch of builds on Travis CI, and looking through their logs. But > > without access to a debugger and netstat and what not I can't really > > go any further. So I would now happily pass the baton to those who > > have a Mac and know a thing or two about its porting issues to first > > check whether OSX on a real Mac shows the same behaviour as it does in > > Travis CI's virtualized(?) environment. And then they can pass the > > baton to those who know all the intricacies of the pack protocol and > > its implementation to decide what to do with this issue. > > > > For a mostly reliable reproduction recipe you might want to fetch this > > branch: > > > > https://github.com/szeder/git t5570-git-daemon-sigpipe > > > > and then run 'make && cd t && ./t5570-git-daemon.sh -v -x' > > > > > > Have fun! ;) > > > > > > 1 - https://blog.travis-ci.com/2018-07-19-xcode9-4-default-announce > > > > 2 - On git.git's master: > > https://travis-ci.org/git/git/jobs/411517552#L2717 > >