Re: [PATCH v2 12/27] serve: introduce git-serve

2018-02-26 Thread Jonathan Nieder
Hi Duy,

Duy Nguyen wrote:
> On Fri, Jan 26, 2018 at 6:58 AM, Brandon Williams  wrote:

>> + stateless-rpc
>> +---
>> +
>> +If advertised, the `stateless-rpc` capability indicates that the server
>> +supports running commands in a stateless-rpc mode, which means that a
>> +command lasts for only a single request-response round.
>> +
>> +Normally a command can last for as many rounds as are required to
>> +complete it (multiple for negotiation during fetch or no additional
>> +trips in the case of ls-refs).  If the client sends the `stateless-rpc`
>> +capability with a value of `true` (in the form `stateless-rpc=true`)
>> +then the invoked command must only last a single round.
>
> Speaking of stateless-rpc, I remember last time this topic was brought
> up, there was some discussion to kind of optimize it for http as well,
> to fit the "client sends request, server responds data" model and
> avoid too many round trips (ideally everything happens in one round
> trip). Does it evolve to anything real? All the cool stuff happened
> while I was away, sorry if this was discussed and settled.

We have a few different ideas for improving negotiation.  They were
speculative enough that we didn't want to make them part of the
baseline protocol v2.  Feel free to poke me in a new thread. :)

Some teasers:

- allow both client and server to suggest commits in negotiation,
  instead of just the client?

- send a bloom filter for the peer to filter their suggestions
  against?

- send other basic information like maximum generation number or
  maximum commit date?

- exponential backoff in negotiation instead of linear walking?
  prioritizing ref tips?  Imitating the bitmap selection algorithm?

- at the "end" of negotiation, sending a graph data structure instead
  of a pack, to allow an extra round trip to produce a truly minimal
  pack?

Those are some initial ideas, but it's also likely that someone can
come up with some other experiments to try, too.  (E.g. we've looked
at various papers on set reconciliation, but they don't make enough
use of the graph structure to help much.)

Thanks,
Jonathan


Re: [PATCH v2 12/27] serve: introduce git-serve

2018-01-31 Thread Derrick Stolee

On 1/25/2018 6:58 PM, Brandon Williams wrote:

Introduce git-serve, the base server for protocol version 2.

Protocol version 2 is intended to be a replacement for Git's current
wire protocol.  The intention is that it will be a simpler, less
wasteful protocol which can evolve over time.

Protocol version 2 improves upon version 1 by eliminating the initial
ref advertisement.  In its place a server will export a list of
capabilities and commands which it supports in a capability
advertisement.  A client can then request that a particular command be
executed by providing a number of capabilities and command specific
parameters.  At the completion of a command, a client can request that
another command be executed or can terminate the connection by sending a
flush packet.

Signed-off-by: Brandon Williams 
---
  .gitignore  |   1 +
  Documentation/technical/protocol-v2.txt | 117 +++
  Makefile|   2 +
  builtin.h   |   1 +
  builtin/serve.c |  30 
  git.c   |   1 +
  serve.c | 249 
  serve.h |  15 ++
  t/t5701-git-serve.sh|  56 +++
  9 files changed, 472 insertions(+)
  create mode 100644 Documentation/technical/protocol-v2.txt
  create mode 100644 builtin/serve.c
  create mode 100644 serve.c
  create mode 100644 serve.h
  create mode 100755 t/t5701-git-serve.sh

diff --git a/.gitignore b/.gitignore
index 833ef3b0b..2d0450c26 100644
--- a/.gitignore
+++ b/.gitignore
@@ -140,6 +140,7 @@
  /git-rm
  /git-send-email
  /git-send-pack
+/git-serve
  /git-sh-i18n
  /git-sh-i18n--envsubst
  /git-sh-setup
diff --git a/Documentation/technical/protocol-v2.txt 
b/Documentation/technical/protocol-v2.txt
new file mode 100644
index 0..7f619a76c
--- /dev/null
+++ b/Documentation/technical/protocol-v2.txt
@@ -0,0 +1,117 @@
+ Git Wire Protocol, Version 2
+==
+
+This document presents a specification for a version 2 of Git's wire
+protocol.  Protocol v2 will improve upon v1 in the following ways:
+
+  * Instead of multiple service names, multiple commands will be
+supported by a single service.


As someone unfamiliar with the old protocol code, this statement is 
underselling the architectural significance of your change. The new 
model allows a single service to handle all different wire protocols 
(git://, ssh://, https://) while being agnostic to the command-specific 
logic. It also hides the protocol negotiation away from these consumers.


The ease with which you are adding new commands in later commits really 
demonstrates the value of this patch. To make that point here, you would 
almost need to document the old model to show how it was difficult to 
use and extend. Perhaps this document will not need expanding since the 
code speaks for itself.


I just wanted to state for the record that the new architecture is a big 
improvement and will make more commands much easier to implement.



+  * Easily extendable as capabilities are moved into their own section
+of the protocol, no longer being hidden behind a NUL byte and
+limited by the size of a pkt-line (as there will be a single
+capability per pkt-line).
+  * Separate out other information hidden behind NUL bytes (e.g. agent
+string as a capability and symrefs can be requested using 'ls-refs')
+  * Reference advertisement will be omitted unless explicitly requested
+  * ls-refs command to explicitly request some refs
+


nit: some bullets have full stops (.) and others do not.


+ Detailed Design
+=
+
+A client can request to speak protocol v2 by sending `version=2` in the
+side-channel `GIT_PROTOCOL` in the initial request to the server.
+
+In protocol v2 communication is command oriented.  When first contacting a
+server a list of capabilities will advertised.  Some of these capabilities
+will be commands which a client can request be executed.  Once a command
+has completed, a client can reuse the connection and request that other
+commands be executed.
+
+ Special Packets
+-
+
+In protocol v2 these special packets will have the following semantics:
+
+  * '' Flush Packet (flush-pkt) - indicates the end of a message
+  * '0001' Delimiter Packet (delim-pkt) - separates sections of a message
+
+ Capability Advertisement
+--
+
+A server which decides to communicate (based on a request from a client)
+using protocol version 2, notifies the client by sending a version string
+in its initial response followed by an advertisement of its capabilities.
+Each capability is a key with an optional value.  Clients must ignore all
+unknown keys.  Semantics of unknown values are left to the definition of
+each key.  Some capabilities will describe commands which can be 

Re: [PATCH v2 12/27] serve: introduce git-serve

2018-01-26 Thread Duy Nguyen
On Fri, Jan 26, 2018 at 6:58 AM, Brandon Williams  wrote:
> + Detailed Design
> +=
> +
> +A client can request to speak protocol v2 by sending `version=2` in the
> +side-channel `GIT_PROTOCOL` in the initial request to the server.
> +
> +In protocol v2 communication is command oriented.  When first contacting a
> +server a list of capabilities will advertised.  Some of these capabilities

s/will advertised/will be advertised/

> + Capability Advertisement
> +--
> +
> +A server which decides to communicate (based on a request from a client)
> +using protocol version 2, notifies the client by sending a version string
> +in its initial response followed by an advertisement of its capabilities.
> +Each capability is a key with an optional value.  Clients must ignore all
> +unknown keys.

With have a convention in $GIT_DIR/index file format that's probably a
good thing to follow here: lowercase keys are optional, such unknown
keys can (and must) be ignored. Uppercase keys are mandatory. If a
client can't understand one of those keys, abort. This gives the
server a way to "select" clients and introduce incompatible changes if
we ever have to.

> Semantics of unknown values are left to the definition of
> +each key.  Some capabilities will describe commands which can be requested
> +to be executed by the client.
> +
> +capability-advertisement = protocol-version
> +  capability-list
> +  flush-pkt
> +
> +protocol-version = PKT-LINE("version 2" LF)
> +capability-list = *capability
> +capability = PKT-LINE(key[=value] LF)
> +
> +key = 1*CHAR
> +value = 1*CHAR
> +CHAR = 1*(ALPHA / DIGIT / "-" / "_")

Is this a bit too restricted for "value"? Something like "." (e.g.
version) or "@" (I wonder if anybody will add an capability that
contains an email address). Unless there's a good reason to limit it,
should we just go full ascii (without control codes)?

> +A client then responds to select the command it wants with any particular
> +capabilities or arguments.  There is then an optional section where the
> +client can provide any command specific parameters or queries.
> +
> +command-request = command
> + capability-list
> + (command-args)
> + flush-pkt
> +command = PKT-LINE("command=" key LF)
> +command-args = delim-pkt
> +  *arg
> +arg = 1*CHAR
> +
> +The server will then check to ensure that the client's request is
> +comprised of a valid command as well as valid capabilities which were
> +advertised.  If the request is valid the server will then execute the
> +command.

What happens when the request is not valid? Or..

> +When a command has finished

How does the client know a command has finished? Is it up to each
command design?

More or less related it bugs me that I have a translated git client,
but I still receive remote error messages in English. It's a hard
problem, but I'm hoping that we won't need to change the core protocol
to support that someday. Although we could make rule now that side
channel message could be sent in "printf"-like form, where the client
can translate the format string and substitutes placeholders with real
values afterward...

> a client can either request that another
> +command be executed or can terminate the connection by sending an empty
> +request consisting of just a flush-pkt.
> +
> + Capabilities
> +~~
> +
> +There are two different types of capabilities: normal capabilities,
> +which can be used to to convey information or alter the behavior of a
> +request, and command capabilities, which are the core actions that a
> +client wants to perform (fetch, push, etc).
> +
> + agent
> +---
> +
> +The server can advertise the `agent` capability with a value `X` (in the
> +form `agent=X`) to notify the client that the server is running version
> +`X`.  The client may optionally send its own agent string by including
> +the `agent` capability with a value `Y` (in the form `agent=Y`) in its
> +request to the server (but it MUST NOT do so if the server did not
> +advertise the agent capability). The `X` and `Y` strings may contain any
> +printable ASCII characters except space (i.e., the byte range 32 < x <
> +127), and are typically of the form "package/version" (e.g.,
> +"git/1.8.3.1"). The agent strings are purely informative for statistics
> +and debugging purposes, and MUST NOT be used to programmatically assume
> +the presence or absence of particular features.
> +
> + stateless-rpc
> +---
> +
> +If advertised, the `stateless-rpc` capability indicates that the server
> +supports running commands in a stateless-rpc mode, which means that a
> +command lasts for only a single request-response round.
> +
> +Normally a command can last for as many rounds as are required to
> +complete it (multiple for negotiation during 

[PATCH v2 12/27] serve: introduce git-serve

2018-01-25 Thread Brandon Williams
Introduce git-serve, the base server for protocol version 2.

Protocol version 2 is intended to be a replacement for Git's current
wire protocol.  The intention is that it will be a simpler, less
wasteful protocol which can evolve over time.

Protocol version 2 improves upon version 1 by eliminating the initial
ref advertisement.  In its place a server will export a list of
capabilities and commands which it supports in a capability
advertisement.  A client can then request that a particular command be
executed by providing a number of capabilities and command specific
parameters.  At the completion of a command, a client can request that
another command be executed or can terminate the connection by sending a
flush packet.

Signed-off-by: Brandon Williams 
---
 .gitignore  |   1 +
 Documentation/technical/protocol-v2.txt | 117 +++
 Makefile|   2 +
 builtin.h   |   1 +
 builtin/serve.c |  30 
 git.c   |   1 +
 serve.c | 249 
 serve.h |  15 ++
 t/t5701-git-serve.sh|  56 +++
 9 files changed, 472 insertions(+)
 create mode 100644 Documentation/technical/protocol-v2.txt
 create mode 100644 builtin/serve.c
 create mode 100644 serve.c
 create mode 100644 serve.h
 create mode 100755 t/t5701-git-serve.sh

diff --git a/.gitignore b/.gitignore
index 833ef3b0b..2d0450c26 100644
--- a/.gitignore
+++ b/.gitignore
@@ -140,6 +140,7 @@
 /git-rm
 /git-send-email
 /git-send-pack
+/git-serve
 /git-sh-i18n
 /git-sh-i18n--envsubst
 /git-sh-setup
diff --git a/Documentation/technical/protocol-v2.txt 
b/Documentation/technical/protocol-v2.txt
new file mode 100644
index 0..7f619a76c
--- /dev/null
+++ b/Documentation/technical/protocol-v2.txt
@@ -0,0 +1,117 @@
+ Git Wire Protocol, Version 2
+==
+
+This document presents a specification for a version 2 of Git's wire
+protocol.  Protocol v2 will improve upon v1 in the following ways:
+
+  * Instead of multiple service names, multiple commands will be
+supported by a single service.
+  * Easily extendable as capabilities are moved into their own section
+of the protocol, no longer being hidden behind a NUL byte and
+limited by the size of a pkt-line (as there will be a single
+capability per pkt-line).
+  * Separate out other information hidden behind NUL bytes (e.g. agent
+string as a capability and symrefs can be requested using 'ls-refs')
+  * Reference advertisement will be omitted unless explicitly requested
+  * ls-refs command to explicitly request some refs
+
+ Detailed Design
+=
+
+A client can request to speak protocol v2 by sending `version=2` in the
+side-channel `GIT_PROTOCOL` in the initial request to the server.
+
+In protocol v2 communication is command oriented.  When first contacting a
+server a list of capabilities will advertised.  Some of these capabilities
+will be commands which a client can request be executed.  Once a command
+has completed, a client can reuse the connection and request that other
+commands be executed.
+
+ Special Packets
+-
+
+In protocol v2 these special packets will have the following semantics:
+
+  * '' Flush Packet (flush-pkt) - indicates the end of a message
+  * '0001' Delimiter Packet (delim-pkt) - separates sections of a message
+
+ Capability Advertisement
+--
+
+A server which decides to communicate (based on a request from a client)
+using protocol version 2, notifies the client by sending a version string
+in its initial response followed by an advertisement of its capabilities.
+Each capability is a key with an optional value.  Clients must ignore all
+unknown keys.  Semantics of unknown values are left to the definition of
+each key.  Some capabilities will describe commands which can be requested
+to be executed by the client.
+
+capability-advertisement = protocol-version
+  capability-list
+  flush-pkt
+
+protocol-version = PKT-LINE("version 2" LF)
+capability-list = *capability
+capability = PKT-LINE(key[=value] LF)
+
+key = 1*CHAR
+value = 1*CHAR
+CHAR = 1*(ALPHA / DIGIT / "-" / "_")
+
+A client then responds to select the command it wants with any particular
+capabilities or arguments.  There is then an optional section where the
+client can provide any command specific parameters or queries.
+
+command-request = command
+ capability-list
+ (command-args)
+ flush-pkt
+command = PKT-LINE("command=" key LF)
+command-args = delim-pkt
+  *arg
+arg = 1*CHAR
+
+The server will then check to ensure that the client's request is
+comprised of a