On Wed, Dec 12, 2018 at 3:02 AM Jeff King <p...@peff.net> wrote:
>
> On Tue, Dec 11, 2018 at 04:25:15PM -0800, Josh Steadmon wrote:
>
> > From: Masaya Suzuki <masayasuz...@google.com>
> >
> > In the Git pack protocol definition, an error packet may appear only in
> > a certain context. However, servers can face a runtime error (e.g. I/O
> > error) at an arbitrary timing. This patch changes the protocol to allow
> > an error packet to be sent instead of any packet.
> >
> > Following this protocol spec change, the error packet handling code is
> > moved to pkt-line.c.
>
> This is a change in the spec with an accompanying change in the code,
> which raises the question: what do other implementations do with this
> change (both older Git, and implementations like JGit, libgit2, etc)?

JGit is similar to Git. It parses "ERR " in limited places. When it sees an ERR
packet in an unexpected place, it'll fail somewhere in the parsing code.

https://github.com/eclipse/jgit/blob/30c6c7542190c149e2aee792f992a312a5fc5793/org.eclipse.jgit/src/org/eclipse/jgit/transport/PacketLineIn.java#L145-L147
https://github.com/eclipse/jgit/blob/f40b39345cd9b54473ee871bff401fe3d394ffe3/org.eclipse.jgit/src/org/eclipse/jgit/transport/BasePackConnection.java#L208

I'm not familiar with libgit2 code, but it seems it handles this at a
lower level. An error type packet is parsed out at a low level, and
the error handling is done by the callers of the packet parser.

https://github.com/libgit2/libgit2/blob/bea65980c7a42e34edfafbdc40b199ba7b2a564e/src/transports/smart_pkt.c#L482-L483

I cannot find an ERR packet handling in go-git. It seems to me that if
an ERR packet appears it treats it as a parsing error.

https://github.com/src-d/go-git/blob/master/plumbing/protocol/packp/common.go#L60-L62

>
> I think the answer for older Git is "hang up unceremoniously", which is
> probably OK given the semantics of the change. And I'd suspect most
> other implementations would do the same. I just wonder if anybody tested
> it with other implementations.

I'm thinking aloud here. There would be two aspects of the protocol
compatibility: (1) new clients speak to old servers (2) old clients
speak to a new server that speaks the updated protocol.

For (1), I assume that in the Git pack protocol, a packet starting
from "ERR " does not appear naturally except for a very special case
that the server doesn't support sideband, but using the updated
protocol. As you mentioned, at first it looks like this can mistakenly
parse the pack file of git-receive-pack as an ERR packet, assuming
that git-receive-pack's pack file is packetized. Actually
git-receive-pack's pack file is not packetized in the Git pack
protocol (https://github.com/git/git/blob/master/builtin/receive-pack.c#L1695).
I recently wrote a Git protocol parser
(https://github.com/google/gitprotocolio), and I confirmed that this
is the case at least for the HTTP transport. git-upload-pack's pack
file is indeed packetized, but packetized with sideband. Except for
the case where sideband is not used, the packfiles wouldn't be
considered as an ERR packet accidentally.

For (2), if the old clients see an unexpected ERR packet, they cannot
parse it. They would handle this unparsable data as if the server is
not speaking Git protocol correctly. Even if the old clients just
ignore the packet, due to the nature of the ERR packet, the server
won't send further data. The client won't be able to proceed. Overall,
the clients anyway face an error, and the only difference would be
whether the clients can show an error nicely or not. The new clients
will show the errors nicely to users. Old clients will not.

>
> > +An error packet is a special pkt-line that contains an error string.
> > +
> > +----
> > +  error-line     =  PKT-LINE("ERR" SP explanation-text)
> > +----
> > +
> > +Throughout the protocol, where `PKT-LINE(...)` is expected, an error 
> > packet MAY
> > +be sent. Once this packet is sent by a client or a server, the data 
> > transfer
> > +process defined in this protocol is terminated.
>
> The packfile data is typically packetized, too, and contains arbitrary
> data (that could have "ERR" in it). It looks like we don't specifically
> say PKT-LINE() in that part of the protocol spec, though, so I think
> this is OK.

As I described above, as far as I can see, the packfile in
git-upload-pack is not packetized. The packfile in git-receive-pack is
packetized but typically with sideband. At least at the Git pack
protocol level, this should be OK.

>
> Likewise, in the implementation:
>
> > diff --git a/pkt-line.c b/pkt-line.c
> > index 04d10bbd03..ce9e42d10e 100644
> > --- a/pkt-line.c
> > +++ b/pkt-line.c
> > @@ -346,6 +346,10 @@ enum packet_read_status packet_read_with_status(int 
> > fd, char **src_buffer,
> >               return PACKET_READ_EOF;
> >       }
> >
> > +     if (starts_with(buffer, "ERR ")) {
> > +             die(_("remote error: %s"), buffer + 4);
> > +     }
> > +
> >       if ((options & PACKET_READ_CHOMP_NEWLINE) &&
> >           len && buffer[len-1] == '\n')
> >               len--;
>
> This ERR handling has been moved to a very low level. What happens if
> we're passing arbitrary data via the packet_read() code? Could we
> erroneously trigger an error if a packfile happens to have the bytes
> "ERR " at a packet boundary?
>
> For packfiles via upload-pack, I _think_ we're OK, because we only
> packetize it when a sideband is in use. In which case this would never
> match, because we'd have "\1" in the first byte slot.
>
> But are there are other cases we need to worry about? Just
> brainstorming, I can think of:
>
>   1. We also pass packetized packfiles between git-remote-https and
>      the stateless-rpc mode of fetch-pack/send-pack. And I don't think
>      we use sidebands there.
>
>   2. The packet code is used for long-lived clean/smudge filters these
>      days, which also pass arbitrary data.
>
> So I think it's probably not a good idea to unconditionally have callers
> of packet_read_with_status() handle this. We'd need a flag like
> PACKET_READ_RESPECT_ERR, and to trigger it from the appropriate callers.

This is outside of the Git pack protocol so having a separate parsing
mode makes sense to me.

>
> -Peff

Reply via email to