Re: [RFC] protocol version 2

2017-11-10 Thread Jonathan Tan
On Fri, 20 Oct 2017 10:18:39 -0700
Brandon Williams  wrote:

> Some of the pain points with the current protocol spec are:

After some in-office discussion, I think that the most important pain
point is that we have to implement each protocol twice: once for
HTTP(S), and once for SSH (and friends) that support bidirectional byte
streams.

If it weren't for this, I think that what is discussed in this document
(e.g. ls-refs, fetch-object) can be less invasively accomplished with
v1, specifying "extra parameters" (explained in this e-mail [1]) to
merely tweak the output of upload-pack instead of replacing it nearly
completely, thus acting more as optimizations than changing the mode of
operation entirely.

[1] 
https://public-inbox.org/git/20171010193956.168385-1-jonathanta...@google.com/

>   * The server's initial response is the ref advertisement.  This
> advertisement cannot be omitted and can become an issue due to the
> sheer number of refs that can be sent with large repositories.  For
> example, when contacting the internal equivalent of
> `https://android.googlesource.com/`, the server will send
> approximately 1 million refs totaling 71MB.  This is data that is
> sent during each and every fetch and is not scalable.

For me, this is not a compelling one, because we can provide a ref
whitelist as an "extra parameter" in v1.

>   * Capabilities were implemented as a hack and are hidden behind a NUL
> byte after the first ref sent from the server during the ref
> advertisement:
> 
>\0  
> 
> Since they are sent in the context of a pkt-line they are also subject
> to the same length limitations (1k bytes with old clients).  While we
> may not be close to hitting this limitation with capabilities alone, it
> has become a problem when trying to abuse capabilities for other
> purposes (e.g. 
> [symrefs](https://public-inbox.org/git/20160816161838.klvjhhoxsftvkfmd@x/)).
> 
>   * Various other technical debt (e.g. abusing capabilities to
> communicate agent and symref data, service name set using a query
> parameter).

I think these 2 are the same - I would emphasize the fact that we cannot
add more stuff here, rather than the fact that we're putting this behind
NUL.

>  Special Packets
> -
> 
> In protocol v2 these special packets will have the following semantics:
> 
>   * '' Flush Packet (flush-pkt) - indicates the end of a message
>   * '0001' End-of-List delimiter (delim-pkt) - indicates the end of a list

To address the pain point of HTTP(S) being different from the others
(mentioned above), I think the packet semantics should be further
qualified:

 - Communications must be divided up into packets terminated by a
   flush-pkt. Also, each side must be implemented without knowing
   whether packets-in-progress can or cannot be seen by the other side.
 - Each request packet must have a corresponding, possibly empty,
   response packet.
 - A request packet may be sent even if a response packet corresponding
   to a previously sent request packet is awaited. (This allows us to
   retain the existing optimization in fetch-pack wherein, during
   negotiation, the "have" request-response packet pairs are
   interleaved.)

This will allow us to more easily share code between HTTP(S) and the
others.

In summary, I think that we need a big motivation to make the jump from
v1 to v2, instead of merely making small changes to v1 (and I do think
that the proposed new commands, such as "ls-refs" and "fetch-object",
can be implemented merely by small changes). And I think that the
ability to better share code between HTTP(S) and others provides that
motivation.


Re: [RFC] protocol version 2

2017-10-31 Thread Brandon Williams
On 10/28, Philip Oakley wrote:
> From: "Brandon Williams" 
> Sent: Friday, October 20, 2017 6:18 PM
> >Objective
> >===
> >
> >Replace Git's current wire protocol with a simpler, less wasteful
> >protocol that can evolve over time.
> >
> 
> 
> 
> >Capability Advertisement
> >--
> >
> >A server which decides to communicate (based on a request from a client)
> >using protocol version 2, notifies the client by sending a version
> >string in its initial response followed by an advertisement of its
> >capabilities.  Each capability is a key with an optional value.  Clients
> >must ignore all unknown keys.
> 
> >   Semantics of unknown values are left to
> >the definition of each key.
> 
> This sounds odd. If the keys are known then their semantics are
> known. Or the keys are unknown and they and their values are
> ignored.
> 
> Maybe: Capability keys shall define their response to unknown key values.

I'll work to make the language a little clearer.

> 
> > Some capabilities will describe commands
> >which can be requested to be executed by the client.
> >
> 
> 
> >Ls-refs
> >-
> >
> >Ls-refs can be looked at as the equivalent of the current ls-remote as
> >it is a way to query a remote for the references that it has.  Unlike
> >the current ls-remote, the filtering of the output is done on the server
> >side by passing a number of parameters to the server-side command
> >instead of the filtering occurring on the client.
> >
> >Ls-ref takes in the following parameters:
> >
> > --head, --tags: Limit to only refs/heads or refs/tags
> > --refs: Do not show peeled tags or pseudorefs like HEAD
> > --symref: In addition to the object pointed by it, show the underlying
> >   ref pointed by it when showing a symbolic ref
> > : When specified, only references matching the given patterns
> >are displayed.
> 
> Does the --symref also the pseudorefs?
> 
> Isn't there a need somethimes to determine the ref that the remote's
> HEAD points to. This is an issue with the current clone and bundle
> code when there is a choice of refs/branches that could be the
> current HEAD ref and the wrong one is chosen, though this V2 change
> doesn't affect bundles.

Yeah, currently the resolution of HEAD is stuffed into the
capability line in v1.  The intention of this would be to allow for the
resolution of all symrefs (including HEAD).

> 
> >
> >The output of ls-refs is as follows:
> >
> >   output = (no-refs / list-of-refs)
> > *symref
> >*shallow
> >flush-pkt
> >
> >   no-refs = PKT-LINE(zero-id SP no-refs LF)
> >   list-of-refs = *ref
> >   ref = PKT-LINE((tip / peeled) LF)
> >   tip = obj-id SP refname
> >   peeled = obj-id SP refname "^{}"
> >
> >   symref = PKT-LINE("symref" SP symbolic-ref SP resolved-ref LF)
> >   shallow = PKT-LINE("shallow" SP obj-id LF)
> >
> >Fetch
> >---
> >
> >Fetch will need to be a modified version of the v1 fetch protocol.  Some
> >potential areas for improvement are: Ref-in-want, CDN offloading,
> >Fetch-options.
> >
> >Since we'll have an 'ls-ref' service we can eliminate the need of fetch
> >to perform a ref-advertisement, instead a client can run the 'ls-refs'
> >service first, in order to find out what refs the server has, and then
> >request those refs directly using the fetch service.
> >
> >//TODO Flush out the design
> >
> >Fetch-object
> >--
> >
> >This service could be used by partial clones in order to request missing
> >objects.
> >
> >//TODO Flush out the design
> >
> >Push
> >--
> >
> >Push will need to be a modified version of the v1 push protocol.  Some
> >potential areas for improvement are: Fix push-options, Negotiation for
> >force push.
> >
> >One change that will need to happen is to improve how `push-options` are
> >sent to the server (so that they aren't sent twice!!).  Also the
> >report-status needs to be better than it currently is in v1 so that
> >tools like gerrit can explain what it did with the ref-update the client
> >sent to it. Maybe have a push-rebase capability or command?
> >
> >//TODO Flush out the design
> >
> >Other Considerations
> >==
> >
> > * Move away from pkt-line framing?
> > * Have responses structured in well known formats (e.g. JSON)
> > * Eliminate initial round-trip using 'GIT_PROTOCOL' side-channel
> > * Additional commands in a partial clone world (e.g. log, grep)
> --
> Philip
> 

-- 
Brandon Williams


Re: [RFC] protocol version 2

2017-10-28 Thread Philip Oakley

From: "Brandon Williams" 
Sent: Friday, October 20, 2017 6:18 PM

Objective
===

Replace Git's current wire protocol with a simpler, less wasteful
protocol that can evolve over time.






Capability Advertisement
--

A server which decides to communicate (based on a request from a client)
using protocol version 2, notifies the client by sending a version
string in its initial response followed by an advertisement of its
capabilities.  Each capability is a key with an optional value.  Clients
must ignore all unknown keys.



   Semantics of unknown values are left to
the definition of each key.


This sounds odd. If the keys are known then their semantics are known. Or 
the keys are unknown and they and their values are ignored.


Maybe: Capability keys shall define their response to unknown key values.


 Some capabilities will describe commands
which can be requested to be executed by the client.





Ls-refs
-

Ls-refs can be looked at as the equivalent of the current ls-remote as
it is a way to query a remote for the references that it has.  Unlike
the current ls-remote, the filtering of the output is done on the server
side by passing a number of parameters to the server-side command
instead of the filtering occurring on the client.

Ls-ref takes in the following parameters:

 --head, --tags: Limit to only refs/heads or refs/tags
 --refs: Do not show peeled tags or pseudorefs like HEAD
 --symref: In addition to the object pointed by it, show the underlying
   ref pointed by it when showing a symbolic ref
 : When specified, only references matching the given patterns
are displayed.


Does the --symref also the pseudorefs?

Isn't there a need somethimes to determine the ref that the remote's HEAD 
points to. This is an issue with the current clone and bundle code when 
there is a choice of refs/branches that could be the current HEAD ref and 
the wrong one is chosen, though this V2 change doesn't affect bundles.




The output of ls-refs is as follows:

   output = (no-refs / list-of-refs)
 *symref
*shallow
flush-pkt

   no-refs = PKT-LINE(zero-id SP no-refs LF)
   list-of-refs = *ref
   ref = PKT-LINE((tip / peeled) LF)
   tip = obj-id SP refname
   peeled = obj-id SP refname "^{}"

   symref = PKT-LINE("symref" SP symbolic-ref SP resolved-ref LF)
   shallow = PKT-LINE("shallow" SP obj-id LF)

Fetch
---

Fetch will need to be a modified version of the v1 fetch protocol.  Some
potential areas for improvement are: Ref-in-want, CDN offloading,
Fetch-options.

Since we'll have an 'ls-ref' service we can eliminate the need of fetch
to perform a ref-advertisement, instead a client can run the 'ls-refs'
service first, in order to find out what refs the server has, and then
request those refs directly using the fetch service.

//TODO Flush out the design

Fetch-object
--

This service could be used by partial clones in order to request missing
objects.

//TODO Flush out the design

Push
--

Push will need to be a modified version of the v1 push protocol.  Some
potential areas for improvement are: Fix push-options, Negotiation for
force push.

One change that will need to happen is to improve how `push-options` are
sent to the server (so that they aren't sent twice!!).  Also the
report-status needs to be better than it currently is in v1 so that
tools like gerrit can explain what it did with the ref-update the client
sent to it. Maybe have a push-rebase capability or command?

//TODO Flush out the design

Other Considerations
==

 * Move away from pkt-line framing?
 * Have responses structured in well known formats (e.g. JSON)
 * Eliminate initial round-trip using 'GIT_PROTOCOL' side-channel
 * Additional commands in a partial clone world (e.g. log, grep)

--
Philip 



Re: [RFC] protocol version 2

2017-10-25 Thread Junio C Hamano
Brandon Williams  writes:

> On 10/24, Junio C Hamano wrote:
>> Brandon Williams  writes:
>> 
>> >   : When specified, only references matching the given patterns
>> >  are displayed.
>> 
>> I do not think you meant  here.
>> 
>> The side that is listing what it has has no reason to know what the
>> recipient plans to do with the result, so you must be only sending
>> the LHS of a refspec.  If your explanation says "given patterns",
>> then replace  with .  Do not abuse a term that has
>> specific and established meaning for something else.
>
> Yes, you're right i intended that to mean  instead so that the
> client could send "refs/heads/*" or some other such pattern and have the
> server limit its output.

Speaking of limiting the bandwidth consumed by the ref
advertisement, I think another trick that came up in past
discussions may be worth considering, which is to allow the
requestor to say, "oh by the way, I made exactly the same request as
this one earlier to you, and your response hashed down to this
value".  The responder may choose to give an incremental response
relative to the known result the requestor claims to have.

So for example, a requestor may have made an earlier ls-refs request
and can still recall that it got:

refs/heads/maintObjectID A
refs/heads/master   ObjectID B
refs/tags/v1.0  ObjectId C

Also assume that these three lines together (textually) hashes to Z.

When the requestor asks about the two hierarchies, it may say "I know
you gave a result that hashes to Z" with an additional parameter:

command=ls-ref
refspec=refs/heads/*
refspec=refs/tags/*
known=Z

If the current response for refs/heads/* and refs/tags/* when fully
spelt out were like this (i.e. we updated a ref, gained another, and
lost one):

refs/heads/master   ObjectID D
refs/tags/v1.0  ObjectId C
refs/tags/v1.1  ObjectID E

then the responder can send the fully spelt version, or it can
choose to say, "It's good that you know the state Z; relative to
that, refs/heads/maint no loner exists, refs/heads/master is now at
D and refs/tags/v1.1 at E has been added", if the latter results in
a shorter response (and if it recognises Z and map it back to the
set of refs and their values it responded with).

The "known" request parameter could further be refined (I do not
think this possibility was discussed in the past) to say "among the
values I received earlier from you, the ones that match this pattern
hashes to this", e.g. the earlier example request might become

command=ls-ref
refspec=refs/heads/*
refspec=refs/tags/*
known=X for refs/heads/*
known=Y for refs/tags/*



Re: [RFC] protocol version 2

2017-10-25 Thread Brandon Williams
On 10/25, Derrick Stolee wrote:
> On 10/20/2017 1:18 PM, Brandon Williams wrote:
> >  Overview
> >==
> >
> >This document presents a specification for a version 2 of Git's wire
> >protocol.  Protocol v2 will improve upon v1 in the following ways:
> >
> >   * Instead of multiple service names, multiple commands will be
> > supported by a single service
> >   * Easily extendable as capabilities are moved into their own section
> > of the protocol, no longer being hidden behind a NUL byte and
> > limited by the size of a pkt-line (as there will be a single
> > capability per pkt-line).
> >   * Separate out other information hidden behind NUL bytes (e.g. agent
> > string as a capability and symrefs can be requested using 'ls-ref')
> >   * Ref advertisement will be omitted unless explicitly requested
> >   * ls-ref command to explicitly request some refs
> 
> Hi Brandon,
> 
> I'm very interested in your protocol as a former server-side dev for
> the VSTS Git server, and understand some of these headaches. We

Always happy to hear that someone is excited about the area I'm working
on :)

> built limited refs specifically to target the problem you are
> solving with ls-ref, but it requires knowledge about the
> authenticated user in order to work. I believe your suggestion is a
> better solution for the Git protocol.

Yes one of the big issues we've run into is the ref advertisement, 1
because it makes it difficult to add any additional features to the
protocol and 2 because it doesn't scale well because all refs are
blasted at the client (unless you use a separate system like you
implemented to reduce that number).  So I'm hoping we can solve (1) by
doing some sort of capability advertisement instead of a ref
advertisement upfront and (2) by allowing clients to express a way of
limiting the ref advertisement server side.

> 
> The "easily extendable" part has specifically caught my interest, as
> we (Microsoft) would like to move most of the GVFS protocol into
> core Git, and this is a great way to do it. Even if not all features
> are accepted by upstream, we could use our GVFS-specific fork of Git
> to communicate to our servers without breaking normal users'
> interactions.

This is one thing I'm excited about too and hope there's enough desire
for such a capability to extend the protocol.  I first was interested in
building such a system when i looked at some of the previous work done
by stephan trying to introduce a protocol v2
(https://public-inbox.org/git/1429658342-5295-1-git-send-email-sbel...@google.com/)
and this idea was further reinforced when I sat down and talked with
some mercurial developers about their protocol and what they did to
migrate to a v2.  I also discovered that their v2 protocol is
service/command oriented and it makes it very simple and easy for
extensions to be added server and client side which allow for additional
commands to be executed during a server/client exchange without breaking
the normal fetch/push interaction.

So I'm hoping that what we decided on for v2 will enable exactly what
you want.  It may even allow for doing things like server side log and
grep when a client has a partial clone or a repo which is so big it
would be difficult to do some of those operations locally.  Of course
you identified some of the issues which such operations below :)

> 
> Please CC me in future versions of this proposal. Let me know if you
> want to chat directly about the "TODO" items below.

Of course, I'll make sure to keep you updated.

> 
> Speaking of TODOs, how much of this concept do you have working in a
> prototype? Do you have code that performs this version 2 handshake
> and communicates the ls-refs result?

Most of the TODOs are areas where more thought and design needs to
happen.  My main goal with this RFC is to see if a modular design like
this would have support from the community and if so, to nail down the
design of the exchanges outside of individual commands since that
infrastructure would be harder to change after the fact.

As for the commands themselves, since this design is meant to be module
we could implement them separately or one at a time (though a base set
of commands would need to be implemented and designed before v2 could
roll out).  That being said I've begun working up a rough prototype of
the basic initial capability/command exchange and the ls-refs command.
Since its pretty rough and not integrated into the actual transport code
yet its no where near ready to be sent out though I wouldn't mind
pushing the WIP code out so people can see/play with what's there.

And if you have any comments or suggestions about any parts of the
design or the TODOs I would love to chat about it :)

> 
> >  Ls-refs
> >-
> >
> >Ls-refs can be looked at as the equivalent of the current ls-remote as
> >it is a way to query a remote for the references that it has.  Unlike
> >the current ls-remote, the filtering of the output is done on 

Re: [RFC] protocol version 2

2017-10-25 Thread Derrick Stolee

On 10/20/2017 1:18 PM, Brandon Williams wrote:

  Overview
==

This document presents a specification for a version 2 of Git's wire
protocol.  Protocol v2 will improve upon v1 in the following ways:

   * Instead of multiple service names, multiple commands will be
 supported by a single service
   * Easily extendable as capabilities are moved into their own section
 of the protocol, no longer being hidden behind a NUL byte and
 limited by the size of a pkt-line (as there will be a single
 capability per pkt-line).
   * Separate out other information hidden behind NUL bytes (e.g. agent
 string as a capability and symrefs can be requested using 'ls-ref')
   * Ref advertisement will be omitted unless explicitly requested
   * ls-ref command to explicitly request some refs


Hi Brandon,

I'm very interested in your protocol as a former server-side dev for the 
VSTS Git server, and understand some of these headaches. We built 
limited refs specifically to target the problem you are solving with 
ls-ref, but it requires knowledge about the authenticated user in order 
to work. I believe your suggestion is a better solution for the Git 
protocol.


The "easily extendable" part has specifically caught my interest, as we 
(Microsoft) would like to move most of the GVFS protocol into core Git, 
and this is a great way to do it. Even if not all features are accepted 
by upstream, we could use our GVFS-specific fork of Git to communicate 
to our servers without breaking normal users' interactions.


Please CC me in future versions of this proposal. Let me know if you 
want to chat directly about the "TODO" items below.


Speaking of TODOs, how much of this concept do you have working in a 
prototype? Do you have code that performs this version 2 handshake and 
communicates the ls-refs result?



  Ls-refs
-

Ls-refs can be looked at as the equivalent of the current ls-remote as
it is a way to query a remote for the references that it has.  Unlike
the current ls-remote, the filtering of the output is done on the server
side by passing a number of parameters to the server-side command
instead of the filtering occurring on the client.

Ls-ref takes in the following parameters:

   --head, --tags: Limit to only refs/heads or refs/tags


Nit: It would be better to use "--heads" to match refs/heads and your 
use of "--tags" for refs/tags.



   --refs: Do not show peeled tags or pseudorefs like HEAD


Assuming we are in the case where the server has a HEAD ref, why would 
that ever be advertised? Also, does this imply that without the --refs 
option we would peel annotated tags until we find non-tag OIDs? Neither 
of these functions seem useful as default behavior.



   --symref: In addition to the object pointed by it, show the underlying
 ref pointed by it when showing a symbolic ref
   : When specified, only references matching the given patterns
  are displayed.


Can you be specific about the patterns? For instance, it is not a good 
idea to allow the client to submit a regex for the server to compute. 
Instead, can we limit this pattern-matching to a prefix-set, such as the 
following list of prefixes:


    refs/heads/master
    refs/releases/*
    refs/heads/user/me/*

  Fetch
---

Fetch will need to be a modified version of the v1 fetch protocol.  Some
potential areas for improvement are: Ref-in-want, CDN offloading,
Fetch-options.

Since we'll have an 'ls-ref' service we can eliminate the need of fetch
to perform a ref-advertisement, instead a client can run the 'ls-refs'
service first, in order to find out what refs the server has, and then
request those refs directly using the fetch service.

//TODO Flush out the design

  Fetch-object
--

This service could be used by partial clones in order to request missing
objects.

//TODO Flush out the design


As you flesh our these "fetch" and "fetch-object" commands, keep in mind 
that partial clones could mean any of the following:


 * fetch all reachable objects except for blobs.

 * fetch all reachable objects except for blobs above a
   certain size.

 * fetch all commits, trees, (and blobs?) within a certain
   "cone" of the file system.


  Push
--

Push will need to be a modified version of the v1 push protocol.  Some
potential areas for improvement are: Fix push-options, Negotiation for
force push.


Negotiation is something to keep in mind for all pushes, especially in 
an ecosystem full of fork-based workflows. If you are working across 
forks and someone else syncs data between your remotes, you may re-push 
a large chunk of objects that are already present in a fork. Adding an 
ls-refs step before push would be a step in the right direction.

  Other Considerations
==

   * Move away from pkt-line framing?
   * Have responses structured in well known formats (e.g. JSON)
   * Eliminate initial round-trip using 'GIT_PROTOCOL' side-channel
   * Additional 

Re: [RFC] protocol version 2

2017-10-24 Thread Junio C Hamano
Brandon Williams  writes:

>> I actually have a reasonable guess why you want to have a separate
>> delimiter (which has nothing to do with "optional delim can be
>> omitted"), but I want to see it explained in this document clearly
>> by its designer(s).
>
> Jonathan Tan suggested that we tighten flush semantics in a newer
> protocol so that proxies are easier to work with.  Currently proxies
> need to understand the protocol instead of simply waiting for a flush.
>
> Also I've been told the smart http code is more complex because of the
> current semantics of flush packets.

I think the above two are the same thing ;-) but yes, "flush" in the
original protocol were used for both "I am truly finished talking;
now it is your turn" and "I am done with one section of what I need
to say, and a different section now begins; it is still my turn to
speak".  The need to handle the latter makes smart-http quite ugly.

Thanks.


Re: [RFC] protocol version 2

2017-10-24 Thread Brandon Williams
On 10/24, Junio C Hamano wrote:
> Brandon Williams  writes:
> 
> >   * Capabilities were implemented as a hack and are hidden behind a NUL
> > byte after the first ref sent from the server during the ref
> > advertisement:
> > ...
> >
> >   * Various other technical debt (e.g. abusing capabilities to
> > communicate agent and symref data, service name set using a query
> > parameter).
> 
> This sounds like a duplicate of the above.

You're right, it mostly is a duplication of that.

> 
> >  Special Packets
> > -
> >
> > In protocol v2 these special packets will have the following semantics:
> >
> >   * '' Flush Packet (flush-pkt) - indicates the end of a message
> >   * '0001' End-of-List delimiter (delim-pkt) - indicates the end of a list
> 
> After reading the remainder of the document twice, I do not think
> the reason why we want to introduce a new type delim-pkt is
> explained and justified well enough.  If a command request takes a
> command packet, zero or more capability packets, a mandatory
> delimiter packet, zero or more parameter packets and a mandatory
> flush packet, then you can use the same "flush" as delimiter in the
> middle.  The delimiter will of course become useful if you can omit
> it when not necessary (e.g. after seeing capabilities, you may see a
> flush and you will know there is no parameters and save the need to
> send one "delim").
> 
> I actually have a reasonable guess why you want to have a separate
> delimiter (which has nothing to do with "optional delim can be
> omitted"), but I want to see it explained in this document clearly
> by its designer(s).

Jonathan Tan suggested that we tighten flush semantics in a newer
protocol so that proxies are easier to work with.  Currently proxies
need to understand the protocol instead of simply waiting for a flush.

Also I've been told the smart http code is more complex because of the
current semantics of flush packets.

> 
> > command-request = command
> >   capability-list
> >   delim-pkt
> >   (command specific parameters)
> >   flush-pkt
> > command = PKT-LINE("command=" key LF)
> >
> > The server will then acknowledge the command and requested capabilities
> > by echoing them back to the client and then launch into the command.
> >
> > acknowledge-request = command
> >   capability-list
> >   delim-pkt
> >   execute-command
> > execute-command = 
> 
> It is not quite clear what the value of "echoing them back" is,
> especially if that is done by always echoing verbatim.  A reader may
> naturally expect, when capabilities are exchanged between two
> parties, these are negotiated so that the only ones that are
> commonly supported by both ends would be used, or something like
> that.

The echoing back of the command and requested capabilities may or may
not be needed.  A client should only ever issue a command-request using
advertised capabilities and commands so there really isn't much
negotiation happening, just the server saying "here's what's on the
menu" and the client picking only from that menu.

Where the echoing back may be useful is if we wanted to (at some point)
eliminate this initial round trip of doing the capability advertisement
and then subsequent selection of capabilities and instead stuff that
information into the GIT_PROTOCOL side channel in the initial request.
That way the client could optimistically send capabilities that it
doesn't yet know if the server supports.  I thought this might be an
interesting idea if we really really didn't want to live with the extra
round trip.

> 
> > A particular command can last for as many rounds as are required to
> > complete the service (multiple for negotiation during fetch and push or
> > no additional trips in the case of ls-refs).
> 
> OK.
> 
> >  Commands in v2
> > 
> >
> > Services are the core actions that a client wants to perform (fetch,
> > push, etc).  Each service has its own set of capabilities and its own
> > language of commands (think 'want' lines in fetch).  Optionally a
> > service can take in initial parameters or data when a client sends it
> > service request.
> 
> So a service (like "fetch") employ a set of "command"s (like "want",
> "have, etc)?  In the earlier part of the document, we did not see
> any mention of "service" and instead saw only "command" mentioned.
> Is the state machine on both ends and the transition between states
> implicit?  E.g. when one side throws "want" command and the other
> side acknowledges, they understand implicitly that they are now in a
> "fetch" service session, even though there is nothing that said over
> the wire that they are doing so?  One reason I am wondering about
> this is what we should do if a command verb may be applicable in
> multiple services.

Looks like I 

Re: [RFC] protocol version 2

2017-10-24 Thread Junio C Hamano
Brandon Williams  writes:

>   * Capabilities were implemented as a hack and are hidden behind a NUL
> byte after the first ref sent from the server during the ref
> advertisement:
> ...
>
>   * Various other technical debt (e.g. abusing capabilities to
> communicate agent and symref data, service name set using a query
> parameter).

This sounds like a duplicate of the above.

>  Special Packets
> -
>
> In protocol v2 these special packets will have the following semantics:
>
>   * '' Flush Packet (flush-pkt) - indicates the end of a message
>   * '0001' End-of-List delimiter (delim-pkt) - indicates the end of a list

After reading the remainder of the document twice, I do not think
the reason why we want to introduce a new type delim-pkt is
explained and justified well enough.  If a command request takes a
command packet, zero or more capability packets, a mandatory
delimiter packet, zero or more parameter packets and a mandatory
flush packet, then you can use the same "flush" as delimiter in the
middle.  The delimiter will of course become useful if you can omit
it when not necessary (e.g. after seeing capabilities, you may see a
flush and you will know there is no parameters and save the need to
send one "delim").

I actually have a reasonable guess why you want to have a separate
delimiter (which has nothing to do with "optional delim can be
omitted"), but I want to see it explained in this document clearly
by its designer(s).

> command-request = command
>   capability-list
>   delim-pkt
>   (command specific parameters)
>   flush-pkt
> command = PKT-LINE("command=" key LF)
>
> The server will then acknowledge the command and requested capabilities
> by echoing them back to the client and then launch into the command.
>
> acknowledge-request = command
>   capability-list
>   delim-pkt
>   execute-command
> execute-command = 

It is not quite clear what the value of "echoing them back" is,
especially if that is done by always echoing verbatim.  A reader may
naturally expect, when capabilities are exchanged between two
parties, these are negotiated so that the only ones that are
commonly supported by both ends would be used, or something like
that.

> A particular command can last for as many rounds as are required to
> complete the service (multiple for negotiation during fetch and push or
> no additional trips in the case of ls-refs).

OK.

>  Commands in v2
> 
>
> Services are the core actions that a client wants to perform (fetch,
> push, etc).  Each service has its own set of capabilities and its own
> language of commands (think 'want' lines in fetch).  Optionally a
> service can take in initial parameters or data when a client sends it
> service request.

So a service (like "fetch") employ a set of "command"s (like "want",
"have, etc)?  In the earlier part of the document, we did not see
any mention of "service" and instead saw only "command" mentioned.
Is the state machine on both ends and the transition between states
implicit?  E.g. when one side throws "want" command and the other
side acknowledges, they understand implicitly that they are now in a
"fetch" service session, even though there is nothing that said over
the wire that they are doing so?  One reason I am wondering about
this is what we should do if a command verb may be applicable in
multiple services.

After reading the earlier protocol exchange explanation, I was sort
of expecting that "fetch" would be the command and "want", "have",
and "ack" lines would be exchanged as "command specific parameters",
so a sudden introduction of "services" here was a bit of impedance
mismatch to me.

>  Ls-refs
> -
>
> Ls-refs can be looked at as the equivalent of the current ls-remote as
> it is a way to query a remote for the references that it has.  Unlike
> the current ls-remote, the filtering of the output is done on the server
> side by passing a number of parameters to the server-side command
> instead of the filtering occurring on the client.
>
> Ls-ref takes in the following parameters:
>
>   --head, --tags: Limit to only refs/heads or refs/tags

I see no need for the above two as long as "refs/heads/*", etc. are
understood.

>   --refs: Do not show peeled tags or pseudorefs like HEAD

So showing peeled tags is the default?  Then I can sort-of see why
"I am not interested in peeled result".  

I do not see why "do not show HEAD, MERGE_HEAD, etc." is needed,
though.  It should be sufficient to just ask for refs/* if you are
interested only in other things, no?

>   --symref: In addition to the object pointed by it, show the underlying
> ref pointed by it when showing a symbolic ref

Sort of OK--it probably is easier to always send this, as symbolic
refs are minority anyway, though.