On Tue, Jun 21, 2016 at 11:23:03AM +0000, Eric Wong wrote:

> I noticed "fatal: The remote end hung up unexpectedly" in server
> logs from shallow clones.  Totally reproducible in the test
> cases, too.  The following change shows it:
> [...]
> [Tue Jun 21 11:07:41.391269 2016] [cgi:error] [pid 21589] [client 
> 127.0.0.1:37518] AH01215: fatal: The remote end hung up unexpectedly
> 
> It doesn't show above, but I think http-backend exits
> with a non-zero status, too, which might cause some CGI
> implementations to complain or break.
> 
> Not sure if it's just a corner case that wasn't tested
> or something else, but the clone itself seems successful...

The dying process is actually upload-pack. If we instrument it like
this:

diff --git a/upload-pack.c b/upload-pack.c
index 9e03c27..a1da676 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -820,6 +820,14 @@ static int upload_pack_config(const char *var, const char 
*value, void *unused)
        return parse_hide_refs_config(var, value, "uploadpack");
 }
 
+NORETURN
+static void custom_die(const char *err, va_list params)
+{
+       vreportf("fatal: ", err, params);
+       warning("aborting");
+       abort();
+}
+
 int main(int argc, const char **argv)
 {
        const char *dir;
@@ -836,6 +844,9 @@ int main(int argc, const char **argv)
                OPT_END()
        };
 
+       warning("running upload-pack");
+       set_die_routine(custom_die);
+
        git_setup_gettext();
 
        packet_trace_identity("upload-pack");

we can see two things. One is we can get a backtrace from the core file:

#0  0x00007f04aef51458 in __GI_raise (sig=sig@entry=6) at 
../sysdeps/unix/sysv/linux/raise.c:55
#1  0x00007f04aef528da in __GI_abort () at abort.c:89
#2  0x0000000000406009 in custom_die (err=0x4ede60 "The remote end hung up 
unexpectedly", 
    params=0x7ffe15858758) at upload-pack.c:828
#3  0x000000000045ec63 in die (err=0x4ede60 "The remote end hung up 
unexpectedly") at usage.c:108
#4  0x000000000041e016 in get_packet_data (fd=0, src_buf=0x0, src_size=0x0, 
dst=0x7ffe158588b0, 
    size=4, options=2) at pkt-line.c:167
#5  0x000000000041e0ea in packet_read (fd=0, src_buf=0x0, src_len=0x0, 
    buffer=0x73cc40 <packet_buffer> "deepen 1", size=65520, options=2) at 
pkt-line.c:204
#6  0x000000000041e22b in packet_read_line_generic (fd=0, src=0x0, src_len=0x0, 
dst_len=0x0)
    at pkt-line.c:234
#7  0x000000000041e27c in packet_read_line (fd=0, len_p=0x0) at pkt-line.c:244
#8  0x0000000000404dc9 in get_common_commits () at upload-pack.c:384
#9  0x0000000000405eb4 in upload_pack () at upload-pack.c:798
#10 0x0000000000406229 in main (argc=1, argv=0x7ffe15858c28) at 
upload-pack.c:872

So we expected to read a packet but didn't get one. Not surprising. The
interesting thing is that we are in get_common_commits(), and if we were
to get a flush packet in stateless-rpc mode, we would simply exit(0).
But we get EOF instead, which provokes packet_read_line() to die.

The other interesting thing is that we can see in httpd's error.log that
it is the penultimate upload-pack that dies (I trimmed the log lines for
readability):

AH01215: warning: running upload-pack
AH01215: fatal: The remote end hung up unexpectedly
AH01215: warning: aborting
AH01215: error: git-upload-pack died of signal 6
AH01215: warning: running upload-pack

So this request actually takes _two_ upload-pack instances to serve
(which is not uncommon when we need multiple rounds of
get_common_commits(), though I am a little surprised that would be the
case for a clone). And the first one seems to be missing a closing
"0000" flush packet from the client to end it.

If that analysis is correct, this isn't affecting operation in any way;
it's just giving a useless message from upload-pack (and as you note,
that could trigger http-backend to exit(1), which may make the webserver
unhappy).

If we further instrument upload-pack to set GIT_TRACE_PACKET on the
server side, we can see the two requests:

packet:  upload-pack< want 379518c0c94e3b1a0710129d03d5560814a0ba6f 
multi_ack_detailed no-done side-band-64k thin-pack include-tag ofs-delta 
agent=git/2.9.0.37.gb3ad8ab.dirty
packet:  upload-pack< deepen 1
packet:  upload-pack< 0000
packet:  upload-pack> shallow 379518c0c94e3b1a0710129d03d5560814a0ba6f
packet:  upload-pack> 0000

packet:  upload-pack< want 379518c0c94e3b1a0710129d03d5560814a0ba6f 
multi_ack_detailed no-done side-band-64k thin-pack include-tag ofs-delta 
agent=git/2.9.0.37.gb3ad8ab.dirty
packet:  upload-pack< deepen 1
packet:  upload-pack< 0000
packet:  upload-pack> shallow 379518c0c94e3b1a0710129d03d5560814a0ba6f
packet:  upload-pack> 0000
packet:  upload-pack< done
packet:  upload-pack> NAK
packet:  upload-pack> 0000

I think in the first one we would get "deepen 1" from the client in
receive_needs(), and similarly write out our "shallow" line. But then we
go into get_common_commits() and the client has hung up, which causes
the message. Whereas in the second line it gives us a "done", which
completes the negotiation.

So my not-very-educated thoughts are:

  1. The client should probably be sending an extra flush in the first
     request. Alternatively, in the stateless-rpc case we should just
     accept the lack of flush as an acceptable end to the request.

  2. Presumably the shallowness is what causes the double-request, as
     fetch-pack wants to see the shallow list before proceeding. But
     since it has no actual commits to negotiate, the negotiation is a
     noop. So probably this case could actually happen in a single
     request.

     I suspect that other fetches could not, though, so I'm not sure how
     much effort is worth putting into optimizing.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to