Re: [RFC/PATCH 0/3] protocol v2

2015-03-24 Thread Junio C Hamano
Stefan Beller sbel...@google.com writes:

 So I started looking into extending the buffer size as another 'first step'
 towards the protocol version 2 again. But now I think the packed length
 limit of 64k is actually a good and useful thing to have and should be
 extended/fixed if and only if we run into serious trouble with too small
 packets later.

I tend to agree.  Too large a packet size would mean your latency
would also suck, as pkt-line interface will not give you anything
until you read the entire packet.  The new protocol should be
designed around a reasonably sized packets, using multiple packets
to carry larger payload as necessary.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 0/3] protocol v2

2015-03-24 Thread Stefan Beller
On Tue, Mar 3, 2015 at 9:13 AM, Junio C Hamano gits...@pobox.com wrote:
 Duy Nguyen pclo...@gmail.com writes:

 Junio pointed out in private that I didn't address the packet length
 limit (64k). I thought I could get away with a new capability
 (i.e. not worry about it now) but I finally admit that was a bad
 hack. So perhaps this on top.

 No, I didn't ;-) but I tend to agree that perhaps 4GB huge packet?
 is a bad idea.

 The problem I had with the version in your write-up was that it
 still assumed that all capabilities must come on one packet-line.


So I started looking into extending the buffer size as another 'first step'
towards the protocol version 2 again. But now I think the packed length
limit of 64k is actually a good and useful thing to have and should be
extended/fixed if and only if we run into serious trouble with too small
packets later.

I mean we can add the possibility now by introducing these
special length 0x or 0xFFFE to mean we'd want to extend it in the
future. But when doing this we need to be extra careful with buffer allocation.
As it is easy to produce a denial of service attack if the receiving side
blindly trusts the length and allocates as much memory. So having a 64k
limit actually helps preventing this attack a bit as it is a very small number.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 0/3] protocol v2

2015-03-05 Thread Stefan Beller
On Wed, Mar 4, 2015 at 5:03 PM, Stefan Beller sbel...@google.com wrote:

 If anyone wants to experiment with the data I gathered, I can make them
 available.


All data of `ls-remote` including the gathering script is found at

(112 kB .tar.xz)
https://drive.google.com/file/d/0B7E93UKgFAfjcHRvM1N2YjBfTzA/view?usp=sharing
(6.6MB in .zip)
https://drive.google.com/file/d/0B7E93UKgFAfjRko3WHhtUWZtTEU/view?usp=sharing

I also do have all the object files which are referenced in the
outputs of ls-remote, though sharing
them is a bit tough as I cannot just git push them (forced pushes make
some of the objects
unreachable, so my local gathering repo is explicitly configured to
not garbage collect), and these
are huge compared to just the output of ls-remote.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 0/3] protocol v2

2015-03-04 Thread Shawn Pearce
On Wed, Mar 4, 2015 at 4:05 AM, Duy Nguyen pclo...@gmail.com wrote:
 On Wed, Mar 4, 2015 at 11:27 AM, Shawn Pearce spea...@spearce.org wrote:
 Let me go on a different tangent a bit from the current protocol.

 http://www.grpc.io/ was recently released and is built on the HTTP/2
 standard. It uses protobuf as a proven extensibility mechanism.
 Including a full C based grpc stack just to speak the Git wire
 protocol is quite likely overkill, but I think the embedding of a
 proven extensible format inside of a bi-directional framed streaming
 system like HTTP/2 offers some good guidance.

 I'll take this as learn from grpc, not just reuse grpc

Correct, that was what I was trying to say and I just wrote it poorly.

HTTP 1.x, HTTP/2 and protobuf have proven themselves to be fairly open
to extension and working well in the wild for transports. There is
useful guidance there that we should draw from to try and leave doors
open for the future.

HTTP/2, protobuf and grpc are fairly complex. I consider any one of
them too complicated for Git specific use. However HTTP/2 is probably
the future of HTTP stacks so we may see it show up in libcurl or
something as popular as libcurl in another 10 years. Hg had some
reasonably sane ideas about building the wire protocol to work well on
HTTP 1.x upfront rather than Git tacking it on much later.

 Network protocol parsing is hard. Especially in languages like C where
 buffer overflows are possible. Or where a client could trivially DoS a
 server by sending a packet of size uint_max and the server naively
 trying to malloc() that buffer. Defining the network protocol in an
 IDL like protobuf 3 and being machine generated from stable well
 maintained code has its advantages.

 I'm still studying the spec, so I can't comment if using IDL/protobuf3
 is a good idea yet.

 But I think at least we can avoid DoS by changing the pkt-line (again)
 a bit: the length 0x means that actual length is 0xfffe and the
 next pkt-line is part of this pkt-line. Higher level (upload-pack or
 fetch-pack, for example) must set an upper limit for packet_read() so
 it won't try to concatenate pkt-lines forever.

pkt-line is a reasonably simple and efficient framing system. A 64 KiB
pkt-line frame only costs ~0.0061% overhead; ~0.0076% overhead if you
are a pack stream in a side-band-64k channel. That is probably more
efficient than HTTP/2 or SSL framing.

I see no reason to attempt to try and reduce that overhead further. 64
KiB frame size is enough for anyone to move data efficiently with
these headers. In practice you are going to wrap that up into SSH or
SSL/TLS and those overheads are so much higher it doesn't matter we
have a tiny loss here.

I think a mistake in the wire protocol was making the pkt-line length
human readable hex, but the sideband channel binary. _If_ we redo the
framing the only change I would make is making the side band readable.
Thus far we have only used 0, 1, 2 for sideband channels. These could
easily be moved into human readable channel ids:

  'd':  currently sideband 0; this is the application data, aka the pack data
  'p':  currently sideband 1; this is the progress stream for stderr
  'e':  currently sideband 2; there was an error, data in this packet
is the message text, and the connection will shutdown after the
packet.

And then leave all other sideband values undefined and reserved for
future use, just like they are all open today.

I am not convinced framing changes are necessary. I would fine with
leaving the sideband streams as 0,1,2... but if we want a text based
protocol for ease of debugging we should be text based across the
board and try very hard to avoid these binary values in the framing,
or ever needing to use a magical NUL byte in the middle of a packet to
find a gap in older parsers for newer data.


If you want to build a larger stream like ref advertisement inside a
pkt-line framing without using a pkt-line per ref record, you should
follow the approach used by pack data streams where it uses the 5 byte
side-band pkt-line framing and a side-band channel is allocated for
that data. Application code can then run a side-band demux to yank out
the inner stream and parse it.

It may be simpler to restrict ref names to be smaller than 64k in
length so you have room for framing and hash value to be transferred
inside of a single pkt-line, then use the pkt-line framing to do the
transfer.

Today's upload-pack ref advertisment has ~25% overhead. Most of that
is in the duplicated tag name for the peeled refs/tags/v1.0^{} lines.
If you drop those names (but keep the pkt-line and SHA-1), its only
about 8% overhead above the packed-refs file.

I think optimization efforts for ref advertisement need to focus on
reducing the number of refs sent back and forth, not shrinking the
individual records down smaller.


Earlier in this thread Junio raised a point that the flush-pkt is
confusing because it has way too many purposes. I agree. IIRC we have
0001-0003 

Re: [RFC/PATCH 0/3] protocol v2

2015-03-04 Thread Stefan Beller
On Wed, Mar 4, 2015 at 11:10 AM, Shawn Pearce spea...@spearce.org wrote:
 On Wed, Mar 4, 2015 at 4:05 AM, Duy Nguyen pclo...@gmail.com wrote:
 On Wed, Mar 4, 2015 at 11:27 AM, Shawn Pearce spea...@spearce.org wrote:
 Let me go on a different tangent a bit from the current protocol.

 http://www.grpc.io/ was recently released and is built on the HTTP/2
 standard. It uses protobuf as a proven extensibility mechanism.
 Including a full C based grpc stack just to speak the Git wire
 protocol is quite likely overkill, but I think the embedding of a
 proven extensible format inside of a bi-directional framed streaming
 system like HTTP/2 offers some good guidance.

 I'll take this as learn from grpc, not just reuse grpc

 Correct, that was what I was trying to say and I just wrote it poorly.

 HTTP 1.x, HTTP/2 and protobuf have proven themselves to be fairly open
 to extension and working well in the wild for transports. There is
 useful guidance there that we should draw from to try and leave doors
 open for the future.

 HTTP/2, protobuf and grpc are fairly complex. I consider any one of
 them too complicated for Git specific use. However HTTP/2 is probably
 the future of HTTP stacks so we may see it show up in libcurl or
 something as popular as libcurl in another 10 years. Hg had some
 reasonably sane ideas about building the wire protocol to work well on
 HTTP 1.x upfront rather than Git tacking it on much later.

 Network protocol parsing is hard. Especially in languages like C where
 buffer overflows are possible. Or where a client could trivially DoS a
 server by sending a packet of size uint_max and the server naively
 trying to malloc() that buffer. Defining the network protocol in an
 IDL like protobuf 3 and being machine generated from stable well
 maintained code has its advantages.

 I'm still studying the spec, so I can't comment if using IDL/protobuf3
 is a good idea yet.

 But I think at least we can avoid DoS by changing the pkt-line (again)
 a bit: the length 0x means that actual length is 0xfffe and the
 next pkt-line is part of this pkt-line. Higher level (upload-pack or
 fetch-pack, for example) must set an upper limit for packet_read() so
 it won't try to concatenate pkt-lines forever.

 pkt-line is a reasonably simple and efficient framing system. A 64 KiB
 pkt-line frame only costs ~0.0061% overhead; ~0.0076% overhead if you
 are a pack stream in a side-band-64k channel. That is probably more
 efficient than HTTP/2 or SSL framing.

 I see no reason to attempt to try and reduce that overhead further. 64
 KiB frame size is enough for anyone to move data efficiently with
 these headers. In practice you are going to wrap that up into SSH or
 SSL/TLS and those overheads are so much higher it doesn't matter we
 have a tiny loss here.

 I think a mistake in the wire protocol was making the pkt-line length
 human readable hex, but the sideband channel binary. _If_ we redo the
 framing the only change I would make is making the side band readable.
 Thus far we have only used 0, 1, 2 for sideband channels. These could
 easily be moved into human readable channel ids:

   'd':  currently sideband 0; this is the application data, aka the pack data
   'p':  currently sideband 1; this is the progress stream for stderr
   'e':  currently sideband 2; there was an error, data in this packet
 is the message text, and the connection will shutdown after the
 packet.

 And then leave all other sideband values undefined and reserved for
 future use, just like they are all open today.

 I am not convinced framing changes are necessary. I would fine with
 leaving the sideband streams as 0,1,2... but if we want a text based
 protocol for ease of debugging we should be text based across the
 board and try very hard to avoid these binary values in the framing,
 or ever needing to use a magical NUL byte in the middle of a packet to
 find a gap in older parsers for newer data.


 If you want to build a larger stream like ref advertisement inside a
 pkt-line framing without using a pkt-line per ref record, you should
 follow the approach used by pack data streams where it uses the 5 byte
 side-band pkt-line framing and a side-band channel is allocated for
 that data. Application code can then run a side-band demux to yank out
 the inner stream and parse it.

 It may be simpler to restrict ref names to be smaller than 64k in
 length so you have room for framing and hash value to be transferred
 inside of a single pkt-line, then use the pkt-line framing to do the
 transfer.

 Today's upload-pack ref advertisment has ~25% overhead. Most of that
 is in the duplicated tag name for the peeled refs/tags/v1.0^{} lines.
 If you drop those names (but keep the pkt-line and SHA-1), its only
 about 8% overhead above the packed-refs file.

 I think optimization efforts for ref advertisement need to focus on
 reducing the number of refs sent back and forth, not shrinking the
 individual records down smaller.


 Earlier in this 

Re: [RFC/PATCH 0/3] protocol v2

2015-03-04 Thread Duy Nguyen
On Wed, Mar 4, 2015 at 11:27 AM, Shawn Pearce spea...@spearce.org wrote:
 Let me go on a different tangent a bit from the current protocol.

 http://www.grpc.io/ was recently released and is built on the HTTP/2
 standard. It uses protobuf as a proven extensibility mechanism.
 Including a full C based grpc stack just to speak the Git wire
 protocol is quite likely overkill, but I think the embedding of a
 proven extensible format inside of a bi-directional framed streaming
 system like HTTP/2 offers some good guidance.

I'll take this as learn from grpc, not just reuse grpc

 Network protocol parsing is hard. Especially in languages like C where
 buffer overflows are possible. Or where a client could trivially DoS a
 server by sending a packet of size uint_max and the server naively
 trying to malloc() that buffer. Defining the network protocol in an
 IDL like protobuf 3 and being machine generated from stable well
 maintained code has its advantages.

I'm still studying the spec, so I can't comment if using IDL/protobuf3
is a good idea yet.

But I think at least we can avoid DoS by changing the pkt-line (again)
a bit: the length 0x means that actual length is 0xfffe and the
next pkt-line is part of this pkt-line. Higher level (upload-pack or
fetch-pack, for example) must set an upper limit for packet_read() so
it won't try to concatenate pkt-lines forever.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 0/3] protocol v2

2015-03-03 Thread Junio C Hamano
Duy Nguyen pclo...@gmail.com writes:

 Junio pointed out in private that I didn't address the packet length
 limit (64k). I thought I could get away with a new capability
 (i.e. not worry about it now) but I finally admit that was a bad
 hack. So perhaps this on top.

No, I didn't ;-) but I tend to agree that perhaps 4GB huge packet?
is a bad idea.

The problem I had with the version in your write-up was that it
still assumed that all capabilities must come on one packet-line.

The immediate issue that limitation in the current protocol we had
was that the usual we can help newer programs to operate better
while getting ignored by existing programs by sending optional
information as part of the capability advert would not work for
upload-pack to enumerate symrefs and their targets to help
clone.

The lesson to draw from that experience is not we should have an
option to use large packets.  64kB is plenty but the senders and
the receivers have a lot lower limit in practice to avoid harming
latency (I think it is like 1000 bytes before both ends agree to
switch talking over the sideband multiplexer).  It is not we should
anticipate and design protocol better, either.  We are humans and
it is hard to predict things, especially things in the future.

The lesson we should learn is that it is important to leave us
enough wiggle room to allow us cope with such unanticipated
limitations ;-).

My recollection is that the consensus from the last time we
discussed protocol revamping was to list one capability per packet
so that packet length limit does not matter, but you may want to
check with the list archive yourself.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 0/3] protocol v2

2015-03-03 Thread Junio C Hamano
Junio C Hamano gits...@pobox.com writes:

 Duy Nguyen pclo...@gmail.com writes:

 Junio pointed out in private that I didn't address the packet length
 limit (64k). I thought I could get away with a new capability
 (i.e. not worry about it now) but I finally admit that was a bad
 hack. So perhaps this on top.

 No, I didn't ;-) but I tend to agree that perhaps 4GB huge packet?
 is a bad idea.
 ...

I realize that I responded with No, I did not complain about X, I
had trouble about Y and here is why and talked mostly about Y
without talking much about X.  So let's touch X a bit.

As to the packet length, I think it is a good idea to give us an
escape hatch to bust 64k limit.  Refs may not be the reason to do
so, but as I said, we cannot forsee the future needs.

Having X behind us, now back to Y, and then I'll remind us of Z ;-)
[*1*]

 My recollection is that the consensus from the last time we
 discussed protocol revamping was to list one capability per packet
 ...

And the above is the right thing from the protocol point of view.
The only reason the current protocol says capabilities go on a
single line separated by SP is because the hole we found to add to
the protocol was to piggyback after the ref advertisement lines, and
there was no guarantee that we have more than one ref advertised, so
we needed to be able to stuff everything on a single line.

Stepping back and thinking about what a packet in pkt-line
protocol is, we realize that it is the smallest logical unit of
transferring information.  The state of a single ref in a series of
ref advertisement.  The fact that receiving end has all the history
leading up to a single commit.  The request to obtain all history
leading up to a single commit.

That is why I say that one-cap-per-packet is the right thing.

These individual logical units are grouped into a larger logical
unit by (1) being at a specific point in the protocol exchange, (2)
being adjacent to each other and (3) terminated by a flush
packet.  Examples:

 - A bunch of individual ref states that is at the beginning of the
   upload-pack to fetch-pack commniucation that ends with a flush
   constitutes a ref advertisement.  

 - A series of want packets at the beginning of the fetch-pack to
   upload-pack communiucation that ends with a flush constitutes a
   fetch request.

Another thing I didn't find in the updated documentation was a
proposal to define what a flush exactly means.  

In my above writing, it should be clear that a flush is merely
the end of a group.  It does not mean (and it never meant, until
smart HTTP) I am finished talking, now it is your turn.  If a
requestor needs to give two groups of items before the responder can
process the request, we would want to be able to say A1, A2, ...,
now I am done with As; B1, B2, B3, ..., now I am done with Bs; this
concludes my request, and it is your turn to process and respond to
me.  But you cannot easily do so without affecting smart HTTP, as
it is written in such a way that it assumes flush is I am done,
it is your turn.

I am perfectly OK if v2 redefined flush to mean I am done, it is
yoru turn.  But then the protocol should have another way to allow
packets into larger groups.  A sequence of packets begin A, A1,
A2, ..., end, begin B, B1, B2, B3, end, flush may be
a way to do so, and if we continue to rely on the order of packets
to help determine the semantics (aka being at a specific point in
the protocol exchange above), we may even be able to omit begin A
and begin B packets (i.e. the end is the new end of a logical
group, which is what flush originally was).


[Footnote]

 *1* For those who haven't been following the discussion:

 X: maximum packet length being 64kB might be problematic.

 Y: requiring capability advertisement and request in a single
packet is wrong.

 Z: the meaning of flush needs to be clarified.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 0/3] protocol v2

2015-03-03 Thread Shawn Pearce
On Tue, Mar 3, 2015 at 5:54 PM, Duy Nguyen pclo...@gmail.com wrote:
 On Wed, Mar 4, 2015 at 12:13 AM, Junio C Hamano gits...@pobox.com wrote:
 My recollection is that the consensus from the last time we
 discussed protocol revamping was to list one capability per packet
 so that packet length limit does not matter, but you may want to
 check with the list archive yourself.

 I couldn't find that consensus mail, but this one [1] is good enough
 evidence that we can hit packet length limit in capability line
 easily.
 With an escape hatch to allow maximum packet length up to  uint_max, I

The symbolic ref thing was done badly. There isn't an escape hatch in
current v1 protocol sufficient to allow this but each ref should be
its own pkt-line, or should be a small batch of refs per pkt-line, or
the ref advertisement should be a data stream in a side-band-64k sort
of format inside the pkt-line framing.

At 64k per frame of side-band there is plenty of data to header ratio
that we don't need  to escape to uint_max.

 Looks like one cap per pkt-line is winning..

Yes.

 [1] http://thread.gmane.org/gmane.comp.version-control.git/237929


Let me go on a different tangent a bit from the current protocol.

http://www.grpc.io/ was recently released and is built on the HTTP/2
standard. It uses protobuf as a proven extensibility mechanism.
Including a full C based grpc stack just to speak the Git wire
protocol is quite likely overkill, but I think the embedding of a
proven extensible format inside of a bi-directional framed streaming
system like HTTP/2 offers some good guidance.

Network protocol parsing is hard. Especially in languages like C where
buffer overflows are possible. Or where a client could trivially DoS a
server by sending a packet of size uint_max and the server naively
trying to malloc() that buffer. Defining the network protocol in an
IDL like protobuf 3 and being machine generated from stable well
maintained code has its advantages.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 0/3] protocol v2

2015-03-03 Thread Duy Nguyen
On Wed, Mar 4, 2015 at 12:13 AM, Junio C Hamano gits...@pobox.com wrote:
 My recollection is that the consensus from the last time we
 discussed protocol revamping was to list one capability per packet
 so that packet length limit does not matter, but you may want to
 check with the list archive yourself.

I couldn't find that consensus mail, but this one [1] is good enough
evidence that we can hit packet length limit in capability line
easily.

With an escape hatch to allow maximum packet length up to  uint_max, I
think we'll be fine for a long time even if we don't send one cap per
pkt-line. So I'm trying to see if we really want to go with one cap
per pkt-line..

Pros:

 - better memory management, current pkt-line static buffer is probably fine
 - a capability can contain spaces after '='

Cons:

 - some refactoring needed to hide away differences between v1 and v2

Looks like one cap per pkt-line is winning..

[1] http://thread.gmane.org/gmane.comp.version-control.git/237929
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 0/3] protocol v2

2015-03-03 Thread Duy Nguyen
On Mon, Mar 02, 2015 at 04:21:36PM +0700, Duy Nguyen wrote:
 On Sun, Mar 01, 2015 at 07:47:40PM -0800, Junio C Hamano wrote:
  It seems, however, that our current thinking is that it is OK to do
  the allow new v1 clients to notice the availabilty of v2 servers,
  so that they can talk v2 the next time thing, so my preference is
  to throw this client first and let server notice into maybe
  doable but not our first choice bin, at least for now.
 
 OK let's see if first choice like this could work. Very draft but it
 should give some idea how to make a prototype to test it out. Note
 that the server still speaks first in this proposal.

Junio pointed out in private that I didn't address the packet length
limit (64k). I thought I could get away with a new capability
(i.e. not worry about it now) but I finally admit that was a bad
hack. So perhaps this on top.

The WARN line was originally supposed to be used when packet length
is still 64k and the server has a ref longer than that. It could then
skip that long ref and inform the user, so the user could re-request
again, this time asking for long packet length capability.

That's irrelevant now. But I think an option to say something without
abort may still be a good idea, especially if we allow hooks to
intercept the protocol.

-- 8 --
diff --git a/Documentation/technical/pack-protocol.txt 
b/Documentation/technical/pack-protocol.txt
index 32a1186..e2003c0 100644
--- a/Documentation/technical/pack-protocol.txt
+++ b/Documentation/technical/pack-protocol.txt
@@ -37,6 +37,20 @@ communicates with that invoked process over the SSH 
connection.
 The file:// transport runs the 'upload-pack' or 'receive-pack'
 process locally and communicates with it over a pipe.
 
+Pkt-line format
+---
+
+In version 1, a packet line consists of four bytes containing the
+length of the entire line plus four, in hexadecimal format. A flush
+consists of four zero bytes.
+
+In version 2, the four-byte header format remains supported but the
+maximum length is 0xfffe. If the length is 0x, the actual length
+follows in variable encoding in hexadecimal.
+
+XXX: perhaps go with 2-byte length by default instead because we don't
+usually need pkt-line longer than 256?? Maybe not worth saving a couple bytes
+
 Git Transport
 -
 
@@ -68,10 +82,12 @@ process on the server side over the Git protocol is this:
  nc -v example.com 9418
 
 If the server refuses the request for some reasons, it could abort
-gracefully with an error message.
+gracefully with an error message, or show a warning but and keep
+moving.
 
 
   error-line =  PKT-LINE(ERR SP explanation-text)
+  warning-line   =  PKT-LINE(WARN SP explanation-text)
 
 
 SSH Transport
-- 8 --
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 0/3] protocol v2

2015-03-02 Thread Duy Nguyen
On Mon, Mar 02, 2015 at 04:21:36PM +0700, Duy Nguyen wrote:
 On Sun, Mar 01, 2015 at 07:47:40PM -0800, Junio C Hamano wrote:
  It seems, however, that our current thinking is that it is OK to do
  the allow new v1 clients to notice the availabilty of v2 servers,
  so that they can talk v2 the next time thing, so my preference is
  to throw this client first and let server notice into maybe
  doable but not our first choice bin, at least for now.
 
 OK let's see if first choice like this could work. Very draft but it
 should give some idea how to make a prototype to test it out. Note
 that the server still speaks first in this proposal.

And ref discovery phase could be modified by new capabilities. For
example,

-- 8 --
diff --git a/Documentation/technical/protocol-capabilities.txt 
b/Documentation/technical/protocol-capabilities.txt
index 56c11b4..56a8c2e 100644
--- a/Documentation/technical/protocol-capabilities.txt
+++ b/Documentation/technical/protocol-capabilities.txt
@@ -304,3 +304,36 @@ language code.
 
 The default language code is unspecified, even though it's usually
 English in ASCII encoding.
+
+compressed-refs
+---
+
+This is applicable to upload-pack-2 and receive-pack-2 only. The
+client expects ref list in reference discovery phase to be sent in
+compressed format:
+
+ - Each PKT-LINE may contain more than one ref
+ - SHA-1 is in binary encoding (i.e. 20 bytes instead of
+   40 bytes as hex string)
+ - ref name is prefix compressed, see index-format.txt version 4.
+ - Ref list ends with flush-pkt
+
+glob-refs
+-
+
+This is applicable to upload-pack-2 and receive-pack-2 only. In
+reference discovery phase, a new mode glob is supported. Where the
+arguments are wildmatch patterns. Negative patterns begin with '!'.
+Only refs matching requested patterns are sent to the client.
+
+stateful-refs
+-
+
+This is applicable to upload-pack-2 and receive-pack-2 only. In
+reference discovery phase, a new mode stateful is supported. Where
+the first argument is a string representing the ref list that was sent
+by the same server last time. The remaining arguments are glob.
+
+The first ref line that the server sends should carry a new state
+string after ref name. The server may send only updated refs it if
+understands the state string sent by the client. Still under discussion.
-- 8 --
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 0/3] protocol v2

2015-03-02 Thread Duy Nguyen
On Sun, Mar 01, 2015 at 07:47:40PM -0800, Junio C Hamano wrote:
 It seems, however, that our current thinking is that it is OK to do
 the allow new v1 clients to notice the availabilty of v2 servers,
 so that they can talk v2 the next time thing, so my preference is
 to throw this client first and let server notice into maybe
 doable but not our first choice bin, at least for now.

OK let's see if first choice like this could work. Very draft but it
should give some idea how to make a prototype to test it out. Note
that the server still speaks first in this proposal.

-- 8 --
diff --git a/Documentation/technical/pack-protocol.txt 
b/Documentation/technical/pack-protocol.txt
index 462e206..32a1186 100644
--- a/Documentation/technical/pack-protocol.txt
+++ b/Documentation/technical/pack-protocol.txt
@@ -1,11 +1,11 @@
 Packfile transfer protocols
 ===
 
-Git supports transferring data in packfiles over the ssh://, git:// and
+Git supports transferring data in packfiles over the ssh://, git://, http:// 
and
 file:// transports.  There exist two sets of protocols, one for pushing
 data from a client to a server and another for fetching data from a
 server to a client.  All three transports (ssh, git, file) use the same
-protocol to transfer data.
+protocol to transfer data. http is documented in http-protocol.txt.
 
 The processes invoked in the canonical Git implementation are 'upload-pack'
 on the server side and 'fetch-pack' on the client side for fetching data;
@@ -14,6 +14,12 @@ data.  The protocol functions to have a server tell a client 
what is
 currently on the server, then for the two to negotiate the smallest amount
 of data to send in order to fully update one or the other.
 
+upload-pack-2 and receive-pack-2 are the next generation of
+upload-pack and receive-pack respectively. The first two are
+referred as version 2 in this document and pack-capabilities.txt
+while the last two are version 1. Unless stated otherwise, version 1
+is implied.
+
 Transports
 --
 There are three transports over which the packfile protocol is
@@ -42,7 +48,8 @@ hostname parameter, terminated by a NUL byte.
 
 --
git-proto-request = request-command SP pathname NUL [ host-parameter NUL ]
-   request-command   = git-upload-pack / git-receive-pack /
+   request-command   = git-upload-pack / git-upload-pack-2 /
+  git-receive-pack / git-receive-pack-2 /
   git-upload-archive   ; case sensitive
pathname  = *( %x01-ff ) ; exclude NUL
host-parameter= host= hostname [ : port ]
@@ -67,7 +74,6 @@ gracefully with an error message.
   error-line =  PKT-LINE(ERR SP explanation-text)
 
 
-
 SSH Transport
 -
 
@@ -124,9 +130,58 @@ has, the first can 'fetch' from the second.  This 
operation determines
 what data the server has that the client does not then streams that
 data down to the client in packfile format.
 
+Capability discovery (v2)
+-
 
-Reference Discovery

+In version 1, capability discovery is part of reference discovery and
+covered in reference discovery section.
+
+In versino 2, when the client initially connects, the server
+immediately sends its capabilities to the client. Then the client must
+send the list of server capabilities it wants to use to the server.
+
+   S: 00XXcapabilities multi_ack thin-pack ofs-delta lang\n
+   C: 00XXcapabilities thin-pack ofs-delta lang=en\n
+
+
+  cap  =  PKT-LINE(capabilities SP capability-list LF)
+  capability-list  =  capability *(SP capability)
+  capability   =  1*(LC_ALPHA / DIGIT / - / _ / =)
+  LC_ALPHA =  %x61-7A
+
+
+The client MUST NOT ask for capabilities the server did not say it
+supports.
+
+Server MUST diagnose and abort if capabilities it does not understand
+was sent.  Server MUST NOT ignore capabilities that client requested
+and server advertised.  As a consequence of these rules, server MUST
+NOT advertise capabilities it does not understand.
+
+See protocol-capabilities.txt for a list of allowed server and client
+capabilities and descriptions.
+
+XXX: this approach wastes one round trip in smart-http because the
+client would speak first. Perhaps we could allow client speculation.
+It can assume what server caps will send and send commands based on that
+assumption. If it turns out true, we save one round trip. E.g. fast
+path:
+
+   C: You are supposed to send caps A, B. I would respond with cap B.
+  Then I would send want-refs refs/heads/foo.
+   S: (yes we are sending caps A and B), validate client caps,
+  execute want-refs and return ref list
+
+and slow path:
+
+   C: You are supposed to send caps A, B. I would respond with cap B.
+  Then I would send want-refs refs/heads/foo.
+   S: Send caps A, B and C. ignore the rest from client
+   C: Want caps A and C. Send want-refs foo
+   S: return ref foo
+
+Reference Discovery (v1)
+

Re: [RFC/PATCH 0/3] protocol v2

2015-03-02 Thread Duy Nguyen
On Sun, Mar 01, 2015 at 11:06:21PM -, Philip Oakley wrote:
 OK, maybe not exactly about protocol, but a possible option would be the 
 ability to send the data as a bundle or multi-bundles; Or perhasps as an 
 archive, zip, or tar.
 
 Data can then be exchanged across an airgap or pigeon mail. The airgap 
 scenario is likely a real case that's not directly prominent at the 
 moment, just because it's not tha direct.
 
 There has been discussion about servers having bundles available for 
 clones, but with a multi-bundle, one could package up a large bundle 
 (months) and an increment (weeks, and then days), before an final easy 
 to pack last few hours. That would be a server work trade-off, and 
 support a CDN view if needed.
 
 If such an approach was reasonable would the protocol support it? etc.

It came up several times. Many people are in favor of it. Some
references..

http://thread.gmane.org/gmane.comp.version-control.git/264305/focus=264565
http://thread.gmane.org/gmane.comp.version-control.git/263898/focus=263928
http://thread.gmane.org/gmane.comp.version-control.git/263898/focus=264000
http://thread.gmane.org/gmane.comp.version-control.git/238472/focus=238844

This is what I got so far. I think the hard part is how to let
projects control this in a clean and flexible way. Not written in the
patch, but I'm thinking maybe we can allow hooking a remote helper in
standard git://, ssh://, http://... That would give total control to
projects.

-- 8 --
diff --git a/Documentation/technical/protocol-capabilities.txt 
b/Documentation/technical/protocol-capabilities.txt
index ecb0efd..2b99464 100644
--- a/Documentation/technical/protocol-capabilities.txt
+++ b/Documentation/technical/protocol-capabilities.txt
@@ -260,3 +260,34 @@ v2
 'git-upload-pack' and 'git-receive-pack' may advertise this capability
 if the server supports 'git-upload-pack-2' and 'git-receive-pack-2'
 respectively.
+
+redirect
+
+
+This capability is applicable for upload-pack and upload-pack-v2
+only. When the client requests this capability it must specify
+supported transport protocol separated by colon,
+e.g. redirect=http:ftp:ssh:torrent.
+
+Instead of sending a packfile data to the client, the server may send
+4-byte signature { 'L', 'I', 'N', 'K' } followed by a NUL-terminated
+URLs, each one pointing to a bundle. This fake pack ends with an empty
+string.
+
+The bundle does not have to contain all refs requested by the
+client. Different bundles from different URLs could have different
+content. The client must follow one of the links to get a bundle.
+The server must not send URL in a protocol that the client does not
+support.
+
+FIXME: do we keep current connection alive until the bundle is
+downloaded and get a normal pack, or let the client initiate a new
+connection? Or perhaps if the client fails to get the bundle for
+whatever reason, it could send NAK to the server and the server
+sends normal packfile data.
+
+FIXME: how do we implement this exactly? The decision to redirect
+should probably be delegated to some hook. Maybe sending all want
+lines to the script is enough.. Sending have lines is more difficult
+because the server decides when to stop receiving them. That decision
+must be moved to the hook...
-- 8 --
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 0/3] protocol v2

2015-03-01 Thread Junio C Hamano
Duy Nguyen pclo...@gmail.com writes:

 On Sun, Mar 1, 2015 at 3:41 PM, Junio C Hamano gits...@pobox.com wrote:
  - Because the protocol exchange starts by the server side
advertising all its refs, even when the fetcher is interested in
a single ref, the initial overhead is nontrivial, especially when
you are doing a small incremental update.  The worst case is an
auto-builder that polls every five minutes, even when there is no
new commits to be fetched [*3*].

 Maybe you can elaborate about how to handle states X, Y... in your
 footnote 3. I just don't see how it's actually implemented.  Or is
 it optional feature that will be provided (via hooks, maybe) by
 admin?

These footnotes are not the important part (I wanted us to agree on
the problem, and the ideas outlined with the footnotes are only an
example that illustrates how a potential solution and the problem to
be solved are described in relation to each other), but I'll give a
shot anyway ;-)

I am actually torn on how the names X, Y, etc. should be defined.

One side of me wants to leave its computation entirely up to the
server side.  The client says Last time I talked with you asking
for refs/heads/* and successfully updated from you, you told me to
call that state X without knowing how X is computed, and then the
server will update you and then tell you your state is now Y.  That
way, large hosting sites and server implementations can choose to
implement it any way they like.

On the other hand, we could rigidly define it, perhaps like this:

 - Imagine that you saved the output from ls-remote that is run
   against that server, limited to the refs hierarchy you are
   requesting, the last time you talked with it.

 - Concatenate the above to the list of patterns the client used to
   ask the refs.  This step is optional.

 - E.g. if you are asking it for refs/heads/*, then we are talking
   something like this (illustrated with optional pattern in front):

refs/heads/*
8004647...  refs/heads/maint
7f4ba4b...  refs/heads/master

 - Run SHA-1 hash over that.  And that is the state name.

I.e. if you as a client are doing

[remote origin]
fetch = refs/heads/*:refs/remotes/origin/*

and if the only time your refs/remotes/origin/* hierarchy changes is
when you fetch from there (which should be the norm), you can look
into remote.origin.fetch refspec (to learn that refs/heads* is
what you are asking) and your refs/remotes/origin/* refs (and
reverse the mapping you make when you fetch to make them talk about
refs/heads/* hierarchy on the server side), you can compute it
locally.

The latter will have one benefit over opaque thing the client does
not know how to compute.  Because I want us avoid sending unchanged
refs over connection, but I do want to see the protocol has some
validation mechanism built in, even if we go the latter client can
compute what the state name ought to be route, I want the servrer
to tell the client what to call that state.  That way, the client
side can tell when it goes out of sync for any reason and attempt to
recover.

 Do we need to worry about load balancers?

Unless you are allowing multiple backend servers to serve a same
repository behind a set of load balancers in an inconsistent way
(e.g. you push to one while I push to two and you fetch from one and
you temporarily see my push but then my push will be rejected as
conflicting and you fetch from one and now you see your push), I do
not think there is anything you need to worry about them more than
what you should be worrying about already.

There would be a point where all backend servers would agree This
is the set of values of these refs at some point (e.g. a majority
of surviving servers vote to decide, laggers that later join the
party will update to the concensus value before serving the end-user
traffic), and they would not be showing half-updated values that
haven't ratified by other servers to end users (otherwise they may
end up showing reversion).

- Is band #2 meant for human consumption, or do we expect the
  other end to interpret and act on it?  If the former, would it
  make sense to send locale information from the client side and
  ask the server side to produce its output with _(message)?

 No producing _(...) is a bad idea. First the client has to verify
 placeholders and stuff, we can't just feed data from server straight
 to printf(). Producing _() could complicate server code a lot. And I
 don't like the idea of using client .po files to translate server
 strings. There could be custom strings added by admin, which are not
 available in client .po. String translation should happen at server
 side.

What I meant to say was (1) the client says I want the human
readable message in Vietnamese and (2) the server uses .po on
_(message) in its code and send the result to sideband #2.  There
is no parsing, interpolation, or anything of that sort necessary on
the server 

Re: [RFC/PATCH 0/3] protocol v2

2015-03-01 Thread Philip Oakley

From: Junio C Hamano gits...@pobox.com

I earlier said:


So if we are going to discuss a new protocol, I'd prefer to see the
discussion without worrying too much about how to inter-operate
with the current vintage of Git. It is no longer an interesting 
problem,

as we know how to solve it with minimum risk. Instead, I'd like to
see us design the new protocol in such a way that it is in-line
upgradable without repeating our past mistakes.


And I am happy to see that people are interested in discussing the
design of new protocols.

But after seeing the patches Stefan sent out, I think we are risking
of losing sight of what we are trying to accomplish.  We do not want
something that is merely new.

That is why I wanted people to think about, discuss and agree on
what limitation of the current protocol has that are problematic
(limitations that are not problematic are not something we do not
need to address [*1*]), so that we can design the new thing without
reintroducing the same limitation.

To remind people, here is a reprint of the draft I sent out earlier
in $gmane/264000.


The current protocol has the following problems that limit us:

 - It is not easy to make it resumable, because we recompute every
   time.  This is especially problematic for the initial fetch aka
   clone as we will be talking about a large transfer [*1*].

 - The protocol extension has a fairly low length limit [*2*].

 - Because the protocol exchange starts by the server side
   advertising all its refs, even when the fetcher is interested in
   a single ref, the initial overhead is nontrivial, especially when
   you are doing a small incremental update.  The worst case is an
   auto-builder that polls every five minutes, even when there is no
   new commits to be fetched [*3*].

 - Because we recompute every time, taking into account of what the
   fetcher has, in addition to what the fetcher obtained earlier
   from us in order to reduce the transferred bytes, the payload for
   incremental updates become tailor-made for each fetch and cannot
   be easily reused [*4*].

I'd like to see a new protocol that lets us overcome the above
limitations (did I miss others? I am sure people can help here)
sometime this year.


Unfortunately, nobody seems to want to help us by responding to did
I miss others? RFH, here are a few more from me.


OK, maybe not exactly about protocol, but a possible option would be the 
ability to send the data as a bundle or multi-bundles; Or perhasps as an 
archive, zip, or tar.


Data can then be exchanged across an airgap or pigeon mail. The airgap 
scenario is likely a real case that's not directly prominent at the 
moment, just because it's not tha direct.


There has been discussion about servers having bundles available for 
clones, but with a multi-bundle, one could package up a large bundle 
(months) and an increment (weeks, and then days), before an final easy 
to pack last few hours. That would be a server work trade-off, and 
support a CDN view if needed.


If such an approach was reasonable would the protocol support it? etc.

Just a thought while reading...


- The semantics of the side-bands are unclear.

  - Is band #2 meant only for progress output (I think the current
protocol handlers assume that and unconditionally squelch it
under --quiet)?  Do we rather want a dedicated progress and
error message sidebands instead?

  - Is band #2 meant for human consumption, or do we expect the
other end to interpret and act on it?  If the former, would it
make sense to send locale information from the client side and
ask the server side to produce its output with _(message)?

- The semantics of packet_flush() is suboptimal, and this
  shortcoming seeps through to the protocol mapped to the
  smart-HTTP transport.

  Originally, packet_flush() was meant as Here is an end of one
  logical section of what I am going to speak., hinting that it
  might be a good idea for the underlying implementation to hold
  the packets up to that point in-core and then write(2) them all
  out (i.e. flush) to the file descriptor only when we handle
  packet_flush().  It never meant Now I am finished speaking for
  now and it is your turn to speak.

  But because HTTP is inherently a ping-pong protocol where the
  requestor at one point stops talking and lets the responder
  speak, the code to map our protocol to the smart HTTP transport
  made the packet_flush() boundary as Now I am done talking, it is
  my turn to listen.

  We probably need two kinds of packet_flush().  When a requestor
  needs to say two or more logical groups of things before telling
  the other side Now I am done talking; it is your turn., we need
  some marker (i.e. the original meaning of packet_flush()) at the
  end of these logical groups.  And in order to be able to say Now
  I am done saying everything I need to say at this point for you
  to respond to me.  It is your turn., we need another kind of
  marker.



Re: [RFC/PATCH 0/3] protocol v2

2015-03-01 Thread David Lang

On Sun, 1 Mar 2015, Junio C Hamano wrote:


and if the only time your refs/remotes/origin/* hierarchy changes is
when you fetch from there (which should be the norm), you can look
into remote.origin.fetch refspec (to learn that refs/heads* is
what you are asking) and your refs/remotes/origin/* refs (and
reverse the mapping you make when you fetch to make them talk about
refs/heads/* hierarchy on the server side), you can compute it
locally.

The latter will have one benefit over opaque thing the client does
not know how to compute.  Because I want us avoid sending unchanged
refs over connection, but I do want to see the protocol has some
validation mechanism built in, even if we go the latter client can
compute what the state name ought to be route, I want the servrer
to tell the client what to call that state.  That way, the client
side can tell when it goes out of sync for any reason and attempt to
recover.


how would these approaches be affected by a client that is pulling from 
different remotes into one local repository? For example, pulling from the main 
kernel repo and from the -stable repo.


David Lang
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 0/3] protocol v2

2015-03-01 Thread Junio C Hamano
I earlier said:

 So if we are going to discuss a new protocol, I'd prefer to see the
 discussion without worrying too much about how to inter-operate
 with the current vintage of Git. It is no longer an interesting problem,
 as we know how to solve it with minimum risk. Instead, I'd like to
 see us design the new protocol in such a way that it is in-line
 upgradable without repeating our past mistakes.

And I am happy to see that people are interested in discussing the
design of new protocols.

But after seeing the patches Stefan sent out, I think we are risking
of losing sight of what we are trying to accomplish.  We do not want
something that is merely new.

That is why I wanted people to think about, discuss and agree on
what limitation of the current protocol has that are problematic
(limitations that are not problematic are not something we do not
need to address [*1*]), so that we can design the new thing without
reintroducing the same limitation.

To remind people, here is a reprint of the draft I sent out earlier
in $gmane/264000.

 The current protocol has the following problems that limit us:
 
  - It is not easy to make it resumable, because we recompute every
time.  This is especially problematic for the initial fetch aka
clone as we will be talking about a large transfer [*1*].
 
  - The protocol extension has a fairly low length limit [*2*].
 
  - Because the protocol exchange starts by the server side
advertising all its refs, even when the fetcher is interested in
a single ref, the initial overhead is nontrivial, especially when
you are doing a small incremental update.  The worst case is an
auto-builder that polls every five minutes, even when there is no
new commits to be fetched [*3*].
 
  - Because we recompute every time, taking into account of what the
fetcher has, in addition to what the fetcher obtained earlier
from us in order to reduce the transferred bytes, the payload for
incremental updates become tailor-made for each fetch and cannot
be easily reused [*4*].
 
 I'd like to see a new protocol that lets us overcome the above
 limitations (did I miss others? I am sure people can help here)
 sometime this year.

Unfortunately, nobody seems to want to help us by responding to did
I miss others? RFH, here are a few more from me.

 - The semantics of the side-bands are unclear.

   - Is band #2 meant only for progress output (I think the current
 protocol handlers assume that and unconditionally squelch it
 under --quiet)?  Do we rather want a dedicated progress and
 error message sidebands instead?

   - Is band #2 meant for human consumption, or do we expect the
 other end to interpret and act on it?  If the former, would it
 make sense to send locale information from the client side and
 ask the server side to produce its output with _(message)?

 - The semantics of packet_flush() is suboptimal, and this
   shortcoming seeps through to the protocol mapped to the
   smart-HTTP transport.

   Originally, packet_flush() was meant as Here is an end of one
   logical section of what I am going to speak., hinting that it
   might be a good idea for the underlying implementation to hold
   the packets up to that point in-core and then write(2) them all
   out (i.e. flush) to the file descriptor only when we handle
   packet_flush().  It never meant Now I am finished speaking for
   now and it is your turn to speak.

   But because HTTP is inherently a ping-pong protocol where the
   requestor at one point stops talking and lets the responder
   speak, the code to map our protocol to the smart HTTP transport
   made the packet_flush() boundary as Now I am done talking, it is
   my turn to listen.

   We probably need two kinds of packet_flush().  When a requestor
   needs to say two or more logical groups of things before telling
   the other side Now I am done talking; it is your turn., we need
   some marker (i.e. the original meaning of packet_flush()) at the
   end of these logical groups.  And in order to be able to say Now
   I am done saying everything I need to say at this point for you
   to respond to me.  It is your turn., we need another kind of
   marker.


[Footnote]

*1* For example, if we were working off of what mistakes do we want
to correct? list, I do not think we would have seen capabilities
have to be only on the first packet or lets allow new daemon to
read extra cruft at the end of the first request.  I do not think I
heard why it is a problem that the daemon cannot pass extra info to
invoked program in the first place.  There might be a valid reason,
but then that needs to be explained, understood and agreed upon and
should be part of an updated what are we fixing? list.




--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 0/3] protocol v2

2015-03-01 Thread Duy Nguyen
On Sun, Mar 1, 2015 at 3:41 PM, Junio C Hamano gits...@pobox.com wrote:
  - Because the protocol exchange starts by the server side
advertising all its refs, even when the fetcher is interested in
a single ref, the initial overhead is nontrivial, especially when
you are doing a small incremental update.  The worst case is an
auto-builder that polls every five minutes, even when there is no
new commits to be fetched [*3*].

Maybe you can elaborate about how to handle states X, Y... in your
footnote 3. I just don't see how it's actually implemented. Or is it
optional feature that will be provided (via hooks, maybe) by admin? Do
we need to worry about load balancers? Is it meant to address the
excessive state transfer due to stateless nature of smart-http?

 I'd like to see a new protocol that lets us overcome the above
 limitations (did I miss others? I am sure people can help here)
 sometime this year.

 Unfortunately, nobody seems to want to help us by responding to did
 I miss others? RFH, here are a few more from me.

Heh.. I did think about it, but I didn't see anything worth mentioning..

  - The semantics of the side-bands are unclear.

- Is band #2 meant only for progress output (I think the current
  protocol handlers assume that and unconditionally squelch it
  under --quiet)?  Do we rather want a dedicated progress and
  error message sidebands instead?

- Is band #2 meant for human consumption, or do we expect the
  other end to interpret and act on it?  If the former, would it
  make sense to send locale information from the client side and
  ask the server side to produce its output with _(message)?

No producing _(...) is a bad idea. First the client has to verify
placeholders and stuff, we can't just feed data from server straight
to printf(). Producing _() could complicate server code a lot. And I
don't like the idea of using client .po files to translate server
strings. There could be custom strings added by admin, which are not
available in client .po. String translation should happen at server
side.

If we want error messages to be handled by machine as well, just add a
result code at the beginning, like ftp, http, ... do. Hmm.. this could
be the reason to separate progress and error messages.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 0/3] protocol v2

2015-03-01 Thread David Lang

On Sun, 1 Mar 2015, Stefan Beller wrote:


The way I understand Junio here is to have predefined points which
makes it easier to communicate. There are lots of clients and they usually
want to catch up a different amount of commits, so we need to recompute it
all the time. The idea is then to compute a small pack from the original point
to one of these predefined points.
So a conversion might look like:
Client: My newest commit is dated 2014-11-17.
Server: ok here is a pack from 2014-11-17 until 2014-12-01 and then
I have prepared packs I sent out all the time of 2014-12 and 2015-01
and 2015-02 and then there will be another custom pack for you describing
changes of 2015-02-01 until now.

Mind that I choose dates instead of arbitrary sha1 values as I feel
that explains the
point better, the packs in between are precomputed because many
clients need them.

Personally I don't buy that idea, because it produces a lot of question, like
how large should these packs be? Depending on time or commit counts?


I think this is going to depend on the project in question. I think that doing 
this based on public tags makes lots of sense. The precomputed packs should also 
change over time.


For example, with the linux kernel, as each -rc is released, there will be a lot 
of people wanting to upgrade from a prior -rc, so having a pack for each of 
these is probably worthwhile. You probably also want a precomputed pack to move 
from some of the -rc releases to the final release. And then a single pack to 
move from the prior final release to the newest one. There may also me a resons 
to make a pack that jumps several releases to go from one LTS kernel to the 
next.


Exactly what precomputed packs make sense, and how large the packs should be is 
going to be _very_ dependent on the update patterns of users. The only people 
who can decide exactly what packs they should use are the admins of the systems, 
and should be based on their logs of what requests are being made. I can see the 
git project creating scripts to analyze the logs of client connections to make 
recommendations on what packs would be useful to have pre-generated, ideally 
ordered by how much computation they would save (and the amount of disk space 
required to hold the packs, and then the admin of the site can indicate where 
they want the cutoff to be. Some extremely busy sites may have a lot of disk 
space compared to CPU and be willing to have lots of packs around, others are 
less busy and will only want to keep a few around.


David Lang
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 0/3] protocol v2

2015-03-01 Thread Junio C Hamano
David Lang da...@lang.hm writes:

 how would these approaches be affected by a client that is pulling
 from different remotes into one local repository? For example, pulling
 from the main kernel repo and from the -stable repo.

 David Lang

As I said in $gmane/264000, which the above came from:

Note that the above would work if and only if we accept that it
is OK to send objects between the remote tracking branches the
fetcher has (i.e. the objects it last fetched from the server)
and the current tips of branches the server has, without
optimizing by taking into account that some commits in that set
may have already been obtained by the fetcher from a
third-party.

The scheme tries go gain by reducing the ref advertisement cost by
sacrificing the optimization opportunity when you fetch from updated
Linus's tree after having fetched from a recent next tree, the
latter of which may have contained a lot of objects that went to the
Linus's tree since you fetched from Linus's the last time.  The
current protocol, by negotiating what you have (including the
objects you obtained from sideways via 'next') with the Linus's
tree, allows the server to compute a minimum packfile customized
just for you.  By trading that off with everybody that follow this
repository will get the same set of packfiles in sequence trickled
into his repository model, it would instead allow the server to
prepare these packfiles thousands of clients that follow Linus's
tree will want only once.

The client-server pair may want to have a negociation mechanism
(e.g. I may have many objects I fetched from sideways, give me
minimum pack that is customized for me by spending cycles---I am
willing to wait until you finish computing it vs I am just
following along and not doing anything fancy, just give me the same
thing as everybody else) to select what optimization they want to
use.


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 0/3] protocol v2

2015-03-01 Thread Junio C Hamano
Stefan Beller sbel...@google.com writes:

 A race condition may be a serious objection then? Once people believe the
 refs can scale fairly well they will use it, which means blasting the ref
 advertisement will become very worse over time.

I think we are already in agreement about that case:

A misdetected case between (new client, new server) pair might go
like this:

- new client connects and sends that no-op.

- new server accepts the connection, but that no-op probe has
  not arrived yet..  It misdetects the other side as a v1
  client and it starts blasting the ref advertisement.

- new client notices that the ref advertisement has the
  capability bit and the server is capable of v2 protocol.  it
  waits until the server sends sorry, I misdetected message.

- new server eventually notices the no-op probe while blasting
  the ref advertisement and it can stop in the middle.
  hopefully this can happen after only sending a few kilobytes
  among megabytes of ref advertisement data ;-).  The server
  sends sorry, I misdetected message to synchronise.

- both sides happily speak v2 from here on.

However, I do not think it needs to become worse over time, because
we can change and adjust as the user population and their use
patterns evolve.  For example, you can introduce a small delay
before the new versions of server starts the v1 advertisement, and
make that delay longer and longer over time, as the population of
v1-only clients go down, for example.

Difficulty (see J6t's comment) in other implementations may be a
more important roadblocks.  It seems, however, that our current
thinking is that it is OK to do the allow new v1 clients to notice
the availabilty of v2 servers, so that they can talk v2 the next
time thing, so my preference is to throw this client first and let
server notice into maybe doable but not our first choice bin, at
least for now.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 0/3] protocol v2

2015-03-01 Thread Stefan Beller
On Sun, Mar 1, 2015 at 3:32 AM, Duy Nguyen pclo...@gmail.com wrote:
 On Sun, Mar 1, 2015 at 3:41 PM, Junio C Hamano gits...@pobox.com wrote:
  - Because the protocol exchange starts by the server side
advertising all its refs, even when the fetcher is interested in
a single ref, the initial overhead is nontrivial, especially when
you are doing a small incremental update.  The worst case is an
auto-builder that polls every five minutes, even when there is no
new commits to be fetched [*3*].

 Maybe you can elaborate about how to handle states X, Y... in your
 footnote 3. I just don't see how it's actually implemented. Or is it
 optional feature that will be provided (via hooks, maybe) by admin? Do
 we need to worry about load balancers? Is it meant to address the
 excessive state transfer due to stateless nature of smart-http?

The way I understand Junio here is to have predefined points which
makes it easier to communicate. There are lots of clients and they usually
want to catch up a different amount of commits, so we need to recompute it
all the time. The idea is then to compute a small pack from the original point
to one of these predefined points.
So a conversion might look like:
Client: My newest commit is dated 2014-11-17.
Server: ok here is a pack from 2014-11-17 until 2014-12-01 and then
I have prepared packs I sent out all the time of 2014-12 and 2015-01
and 2015-02 and then there will be another custom pack for you describing
changes of 2015-02-01 until now.

Mind that I choose dates instead of arbitrary sha1 values as I feel
that explains the
point better, the packs in between are precomputed because many
clients need them.

Personally I don't buy that idea, because it produces a lot of question, like
how large should these packs be? Depending on time or commit counts?

The idea I'd rather favor (I am repeating myself from another post,
but maybe a bit clearer now):

Client: The last time I asked for refs/heads/* and I got a refs
advertisement hashing to $SHA1
Server: Ok, here is the diff from that old ref advertisement to the
current refs advertisement.

I realize that these two ideas are not contradicting each other, but
could rather
help each other as they are orthogonal to each other. One is about
refs advertising
while the other is about object transmission.


 I'd like to see a new protocol that lets us overcome the above
 limitations (did I miss others? I am sure people can help here)
 sometime this year.

 Unfortunately, nobody seems to want to help us by responding to did
 I miss others? RFH, here are a few more from me.

 Heh.. I did think about it, but I didn't see anything worth mentioning..

  - The semantics of the side-bands are unclear.

- Is band #2 meant only for progress output (I think the current
  protocol handlers assume that and unconditionally squelch it
  under --quiet)?  Do we rather want a dedicated progress and
  error message sidebands instead?

- Is band #2 meant for human consumption, or do we expect the
  other end to interpret and act on it?  If the former, would it
  make sense to send locale information from the client side and
  ask the server side to produce its output with _(message)?

 No producing _(...) is a bad idea. First the client has to verify
 placeholders and stuff, we can't just feed data from server straight
 to printf(). Producing _() could complicate server code a lot. And I
 don't like the idea of using client .po files to translate server
 strings. There could be custom strings added by admin, which are not
 available in client .po. String translation should happen at server
 side.

 If we want error messages to be handled by machine as well, just add a
 result code at the beginning, like ftp, http, ... do. Hmm.. this could
 be the reason to separate progress and error messages.
 --
 Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 0/3] protocol v2

2015-02-27 Thread Duy Nguyen
On Sat, Feb 28, 2015 at 6:05 AM, Junio C Hamano gits...@pobox.com wrote:
 Just for fun, I was trying to see if there is a hole in the current
 protocol that allows a new client to talk a valid v1 protocol
 exchange with existing, deployed servers without breaking, while
 letting it to know a new server that it is a new client and it does
 not want to get blasted by megabytes of ref advertisement.
 ...
 The idea is to find a request that can be sent as the first
 utterance by the client to an old server that is interpreted as a
 no-op and can be recognised by a new server as such a no-op probe.
 ...
 And there _is_ a hole ;-).  The parsing of shallow  object name is
 done in such a way that an object name that passes get_sha1_hex()
 that results in a NULL return from parse_object() is _ignored_.  So
 a new client can use shallow 0{40} as a no-op probe.
 ...
 I am _not_ proposing that we should go this route, at least not yet.
 I am merely pointing out that an in-place sidegrade from v1 to a
 protocol that avoids the megabyte-advertisement-at-the-beginning
 seems to be possible, as a food for thought.

There may be another hole, if we send want empty-tree, it looks
like it will go through without causing errors. It's not exactly no-op
because an empty tree object will be bundled in result pack. But that
makes no difference in pratice. I didn't verify this though.

In the spirit of fun, I looked at how jgit handles this shallow line
(because this is more like an implementation hole than protocol hole).
I don't think jgit would ignore 0{40} the way C Git does. This SHA-1
will end up in shallowCommits set in upload-pack, then will be parsed
as a commit. But even if the parsing is through, a non-empty
shallowCommits set would disable pack bitmap. Fun is usually short..

PS. heh my want empty-tree hole is probably impl-specific too. Not
sure if jgit also keeps empty tree available even if it does not
exist.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 0/3] protocol v2

2015-02-27 Thread Stefan Beller
+git@vger.kernel.org

On Thu, Feb 26, 2015 at 5:42 PM, Duy Nguyen pclo...@gmail.com wrote:
 https://github.com/pclouds/git/commits/uploadpack2

I rebased your branch, changed the order of commits slightly and
started to add some.
they are found at https://github.com/stefanbeller/git/commits/uploadpack2

I think the very first patch series which I try to polish now will
just try to move the
capabilities negotiation into the beginning of the exchange.

Any 'real' changes such as adding capabilities to the protocol to not
have all the refs
advertised will come in a later series.

Thanks for your help!
Stefan
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 0/3] protocol v2

2015-02-27 Thread Junio C Hamano
Junio C Hamano gits...@pobox.com writes:

 I do not think v1 can be fixed by send one ref with capability,
 newer client may respond immediately so we can stop enumerating
 remaining refs and older one will get stuck so we can have a timeout
 to see if the connection is from the newer one, and send the rest
 for the older client, because anything that involves such a timeout
 would not reliably work over WAN.

Just for fun, I was trying to see if there is a hole in the current
protocol that allows a new client to talk a valid v1 protocol
exchange with existing, deployed servers without breaking, while
letting it to know a new server that it is a new client and it does
not want to get blasted by megabytes of ref advertisement.

The idea is to find a request that can be sent as the first
utterance by the client to an old server that is interpreted as a
no-op and can be recognised by a new server as such a no-op probe.
If there is such a request, then the exchange can go like this with
(new client, old server) pair:

- new client connects and sends that no-op.

- old server starts blasting the ref advertisement

- new client monitors and notices that the other side
  started speaking, and the ref advertisement lacks the
  capability bit for new protocol.

- new client accepts the ref advertisement and does the v1
  protocol thing as a follow-up to what it already sent.

As long as the first one turns out to be no-op for old server, we
would be OK.  On the other hand, (new client, new server) pair
would go like this:

- new client connects and sends that no-op.

- new server notices that there is already a data from the
  client, and recognises the no-op probe.

- new server gives the first v2 protocol message with
  capability.

- new client notices thqat the other side started speaking, and
  it is the first v2 protocol message.

- both sides happily speak v2.

and (old client, new server) pair would go like this:

- old client connects and waits.

- new server notices that there is *no* data sent from the
  client and decides that the other side is a v1 client.  It
  starts blasting the ref advertisement.

- both sides happily speak v1 from here on.

A misdetected case between (new client, new server) pair might go
like this:

- new client connects and sends that no-op.

- new server accepts the connection, but that no-op probe has
  not arrived yet..  It misdetects the other side as a v1
  client and it starts blasting the ref advertisement.

- new client notices that the ref advertisement has the
  capability bit and the server is capable of v2 protocol.  it
  waits until the server sends sorry, I misdetected message.

- new server eventually notices the no-op probe while blasting
  the ref advertisement and it can stop in the middle.
  hopefully this can happen after only sending a few kilobytes
  among megabytes of ref advertisement data ;-).  The server
  sends sorry, I misdetected message to synchronise.

- both sides happily speak v2 from here on.

So the topic of this exercise (just for fun) is to see if there is
such a no-op request the client side can send as the first thing for
probing.

On the fetch side, the first response upload-pack expects are one
of:

  - want  followed by an object name.
  - shallow  followed by an object name.
  - deepen  followed by a positive integer.

And there _is_ a hole ;-).  The parsing of shallow  object name is
done in such a way that an object name that passes get_sha1_hex()
that results in a NULL return from parse_object() is _ignored_.  So
a new client can use shallow 0{40} as a no-op probe.

It appears that on the push side, there is a similar hole that can
be used. receive-pack expects either shallow , push-cert or the
refname updates (i.e. two [0-9a-f]{40} followed by a refname); the
parsing of shallow  is not as loose as the fetch side in that
using a shallow 0{40} as a no-op probe will end up causing
prepare_shallow_info() sift the 0{40} object name into theirs,
but I think it will be ignored at the end as unreachable cruft
without causing harm.

I am _not_ proposing that we should go this route, at least not yet.
I am merely pointing out that an in-place sidegrade from v1 to a
protocol that avoids the megabyte-advertisement-at-the-beginning
seems to be possible, as a food for thought.


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 0/3] protocol v2

2015-02-27 Thread Junio C Hamano
On Fri, Feb 27, 2015 at 4:07 PM, Duy Nguyen pclo...@gmail.com wrote:

 There may be another hole, if we send want empty-tree, it looks
 like it will go through without causing errors. It's not exactly no-op
 because an empty tree object will be bundled in result pack. But that
 makes no difference in pratice. I didn't verify this though.

In addition to that's not a no-op problem, unless the old server has a
ref that has an emtpy tree at its tip, such a fetch request will be rejected,
unless the server is configured to serve any object, no?

If your new server does have a ref that points at an empty tree, a client
may request you to send that, but this is not a problem, because the
new server can tell if the client is sending it as a no-op probe or a serious
request by looking at its capability request. A serious old client will not
tell you that he is new, a probing new client does, and a serious new
client does. So your new server can tell and will not be confused.

 as a commit. But even if the parsing is through, a non-empty
 shallowCommits set would disable pack bitmap.

Performance penalty is fine. Over time we would upgrade and the
point of the exercise is not to cause the old-new or new-old pair to
die but keep talking the old protocol and getting correct results.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 0/3] protocol v2

2015-02-27 Thread Junio C Hamano
On Fri, Feb 27, 2015 at 3:44 PM, Stefan Beller sbel...@google.com wrote:
 On Fri, Feb 27, 2015 at 3:05 PM, Junio C Hamano gits...@pobox.com wrote:

 I am _not_ proposing that we should go this route, at least not yet.
 I am merely pointing out that an in-place sidegrade from v1 to a
 protocol that avoids the megabyte-advertisement-at-the-beginning
 seems to be possible, as a food for thought.

 This is a fun thing indeed, though I'd personally feel uneasy with
 such a probe as
 a serious proposal. (Remember somebody 10 years from now wants to enjoy
 reading the source code).

That cannot be a serious objection, once you realize that NUL + capability
was exactly the same kind of yes, we have a hole to allow up customize
the protocol. The code to do so may not be pretty, but the code to implement
ended up being reasonably clean with parse_feature_request() and friends.
After all we live in a real world ;-)
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 0/3] protocol v2

2015-02-27 Thread Stefan Beller
On Fri, Feb 27, 2015 at 4:33 PM, Junio C Hamano gits...@pobox.com wrote:
 On Fri, Feb 27, 2015 at 3:44 PM, Stefan Beller sbel...@google.com wrote:
 On Fri, Feb 27, 2015 at 3:05 PM, Junio C Hamano gits...@pobox.com wrote:

 I am _not_ proposing that we should go this route, at least not yet.
 I am merely pointing out that an in-place sidegrade from v1 to a
 protocol that avoids the megabyte-advertisement-at-the-beginning
 seems to be possible, as a food for thought.

 This is a fun thing indeed, though I'd personally feel uneasy with
 such a probe as
 a serious proposal. (Remember somebody 10 years from now wants to enjoy
 reading the source code).

 That cannot be a serious objection, once you realize that NUL + capability
 was exactly the same kind of yes, we have a hole to allow up customize
 the protocol. The code to do so may not be pretty, but the code to implement
 ended up being reasonably clean with parse_feature_request() and friends.
 After all we live in a real world ;-)

- new server accepts the connection, but that no-op probe has
  not arrived yet..  It misdetects the other side as a v1
  client and it starts blasting the ref advertisement.

A race condition may be a serious objection then? Once people believe the
refs can scale fairly well they will use it, which means blasting the ref
advertisement will become very worse over time.
I'll try to present a 'client asks for options first out of band' instead of the
way you describe.

Also we should not rely on having holes here and there. (We might run out of
holes over time), so I'd rather have the capabilities presented at first
which rather opens new holes instead of closing old ones.

(assuming we'll never run into megabytes of capabilities
over time to have the same trouble again ;)
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 0/3] protocol v2

2015-02-27 Thread Stefan Beller
On Fri, Feb 27, 2015 at 3:05 PM, Junio C Hamano gits...@pobox.com wrote:
 Junio C Hamano gits...@pobox.com writes:

 I do not think v1 can be fixed by send one ref with capability,
 newer client may respond immediately so we can stop enumerating
 remaining refs and older one will get stuck so we can have a timeout
 to see if the connection is from the newer one, and send the rest
 for the older client, because anything that involves such a timeout
 would not reliably work over WAN.

 Just for fun, I was trying to see if there is a hole in the current
 protocol that allows a new client to talk a valid v1 protocol
 exchange with existing, deployed servers without breaking, while
 letting it to know a new server that it is a new client and it does
 not want to get blasted by megabytes of ref advertisement.

 The idea is to find a request that can be sent as the first
 utterance by the client to an old server that is interpreted as a
 no-op and can be recognised by a new server as such a no-op probe.
 If there is such a request, then the exchange can go like this with
 (new client, old server) pair:

 - new client connects and sends that no-op.

 - old server starts blasting the ref advertisement

 - new client monitors and notices that the other side
   started speaking, and the ref advertisement lacks the
   capability bit for new protocol.

 - new client accepts the ref advertisement and does the v1
   protocol thing as a follow-up to what it already sent.

 As long as the first one turns out to be no-op for old server, we
 would be OK.  On the other hand, (new client, new server) pair
 would go like this:

 - new client connects and sends that no-op.

 - new server notices that there is already a data from the
   client, and recognises the no-op probe.

 - new server gives the first v2 protocol message with
   capability.

 - new client notices thqat the other side started speaking, and
   it is the first v2 protocol message.

 - both sides happily speak v2.

 and (old client, new server) pair would go like this:

 - old client connects and waits.

 - new server notices that there is *no* data sent from the
   client and decides that the other side is a v1 client.  It
   starts blasting the ref advertisement.

 - both sides happily speak v1 from here on.

 A misdetected case between (new client, new server) pair might go
 like this:

 - new client connects and sends that no-op.

 - new server accepts the connection, but that no-op probe has
   not arrived yet..  It misdetects the other side as a v1
   client and it starts blasting the ref advertisement.

 - new client notices that the ref advertisement has the
   capability bit and the server is capable of v2 protocol.  it
   waits until the server sends sorry, I misdetected message.

 - new server eventually notices the no-op probe while blasting
   the ref advertisement and it can stop in the middle.
   hopefully this can happen after only sending a few kilobytes
   among megabytes of ref advertisement data ;-).  The server
   sends sorry, I misdetected message to synchronise.

 - both sides happily speak v2 from here on.

 So the topic of this exercise (just for fun) is to see if there is
 such a no-op request the client side can send as the first thing for
 probing.

 On the fetch side, the first response upload-pack expects are one
 of:

   - want  followed by an object name.
   - shallow  followed by an object name.
   - deepen  followed by a positive integer.

 And there _is_ a hole ;-).  The parsing of shallow  object name is
 done in such a way that an object name that passes get_sha1_hex()
 that results in a NULL return from parse_object() is _ignored_.  So
 a new client can use shallow 0{40} as a no-op probe.

 It appears that on the push side, there is a similar hole that can
 be used. receive-pack expects either shallow , push-cert or the
 refname updates (i.e. two [0-9a-f]{40} followed by a refname); the
 parsing of shallow  is not as loose as the fetch side in that
 using a shallow 0{40} as a no-op probe will end up causing
 prepare_shallow_info() sift the 0{40} object name into theirs,
 but I think it will be ignored at the end as unreachable cruft
 without causing harm.

 I am _not_ proposing that we should go this route, at least not yet.
 I am merely pointing out that an in-place sidegrade from v1 to a
 protocol that avoids the megabyte-advertisement-at-the-beginning
 seems to be possible, as a food for thought.



This is a fun thing indeed, though I'd personally feel uneasy with
such a probe as
a serious proposal. (Remember somebody 10 years from now wants to enjoy
reading the source code). So let's keep the idea around if we don't find another
solution.

As far as I can tell we have
* native git protocol (git daemon)
* ssh
* http(s)
* ftp (deprecated!)
* rsync(deprecated)

For both native git as well as 

Re: [RFC/PATCH 0/3] protocol v2

2015-02-26 Thread Stefan Beller
On Thu, Feb 26, 2015 at 2:15 AM, Duy Nguyen pclo...@gmail.com wrote:
 On Thu, Feb 26, 2015 at 2:31 PM, Stefan Beller sbel...@google.com wrote:
 On Wed, Feb 25, 2015 at 10:04 AM, Junio C Hamano gits...@pobox.com wrote:
 Duy Nguyen pclo...@gmail.com writes:

 On Wed, Feb 25, 2015 at 6:37 AM, Stefan Beller sbel...@google.com wrote:
 I can understand, that we maybe want to just provide one generic
 version 2 of the protocol which is an allrounder not doing bad in
 all of these aspects, but I can see usecases of having the desire to
 replace the wire protocol by your own implementation. To do so
 we could try to offer an API which makes implementing a new
 protocol somewhat easy. The current state of affairs is not providing
 this flexibility.

 I think we are quite flexible after initial ref advertisement.

 Yes, that is exactly where my I am not convinced comes from.


 We are not. (not really at least). We can tune some parameters or
 change the behavior slightly,
 but we cannot fix core assumptions made when creating v2 protocol.
 This you can see when when talking about v1 as well: we cannot fix any
 wrongdoings of v1 now by adding another capability.

 Step 1 then should be identifying these wrongdoings and assumptions.

So I think one of the key assumption was to not have many refs to advertise,
and advertising the refs is fine under that assumption.
So from my point of view it is hard to change the general


 We can really go wild with these capabilities. The only thing that
 can't be changed is perhaps sending the first ref. I don't know
 whether we can accept a dummy first ref... After that point, you can
 turn the protocol upside down because both client and server know what
 it would be.

So the way I currently envision (the transition to and) version 2 of
the protocol:

First connection (using the protocol as of now):

Server: Here are all the refs and capabilities I can offer. The capabilities
include not-send-refs-first (aka version2)
Client: Ok, I'll store not-send-refs-first for next time. Now we will
continue with
these options:  For now we continue using the current
protocol and I want
to update the master branch.
Server: Ok here is a pack file, and then master advances $SHA1..$SHA1
Client: ok, thanks, bye

For the next connection I have different ideas:

Client thinks v2 is supported, so it talks first: Last time we talked
your capabilities
hashed to $SHA1, is that still correct?
Server: yes it is
# In the above roundtrip we would have a new key assumption that the
capabilities
# don't change often. Having push-certs enabled, this is invalid of
today. Hover this
# could be implemented with very low bandwidth usage
# The alternative path would be:
# Server: No my new capabilities are: 
Client: Ok I want to update all of refs/heads/{master,next,pu}. My
last fetch was
yesterday at noon.
Server: Let me check the ref logs for these refs. Here is a packfile of length
1000 bytes: binary gibberish
{master, next} did not update since yesterday noon, pu updates from A..B
Client: ok, thanks, bye


Another approach would be this:
Client thinks v2 is supported, so it talks first: Last time we talked
you sent me
refs advertisement including capabilities which hash to $SHA1.
Server: I see, I have stored that. Now that time has advanced there are a few
differences, here is a diff of the refs advertisement:
* b3a551adf53c224b04c40f05b72a8790807b3138 HEAD\0 capabilities
* b3a551adf53c224b04c40f05b72a8790807b3138 refs/heads/master
- 24ca137a384aa1ac5a776eddaf35bb820fc6f6e6 refs/heads/tmp-fix
+ 1da8335ad5d0e46062a929ba6481bbbe35c8eef0 refs/pull/123/head

Note that I do not include changed lines as +one line and - one
line as you know
what the line was by your given $SHA1, so changed lines are marked
with *, while
lines starting with '-' indicate deleted refs and '+' indicate new refs.
Client: I see, I can reconstruct the refs advertisement. Now we can
continue talking
as we always talked using v1 protocol.


 So from my point
 of view we don't waste resources when having an advertisement of
 possible protocols instead of a boolean flag indicating v2 is
 supported. There is really not much overhead in coding nor bytes
 exchanged on the wire, so why not accept stuff that comes for free
 (nearly) ?

 You realize you're advertising v2 as a new capability, right? Instead
 of defining v2 feature set then advertise v2, people could simply add
 new features directly. I don't see v2 (at least with these patches)
 adds any value.

Yes, we can go wild after the refs advertisement, but that is not the
critical problem as it works ok-ish? The problem I see for now is the huge
refs advertisement even before the capabilities are exchanged. So maybe
I do not want to talk about v2 but about changing the current protocol to first
talk about the capabilities in the first round trip, not sure if we
ever want to attach
data into the first RT as it may explode as soon as that 

Re: [RFC/PATCH 0/3] protocol v2

2015-02-26 Thread Duy Nguyen
On Thu, Feb 26, 2015 at 2:31 PM, Stefan Beller sbel...@google.com wrote:
 On Wed, Feb 25, 2015 at 10:04 AM, Junio C Hamano gits...@pobox.com wrote:
 Duy Nguyen pclo...@gmail.com writes:

 On Wed, Feb 25, 2015 at 6:37 AM, Stefan Beller sbel...@google.com wrote:
 I can understand, that we maybe want to just provide one generic
 version 2 of the protocol which is an allrounder not doing bad in
 all of these aspects, but I can see usecases of having the desire to
 replace the wire protocol by your own implementation. To do so
 we could try to offer an API which makes implementing a new
 protocol somewhat easy. The current state of affairs is not providing
 this flexibility.

 I think we are quite flexible after initial ref advertisement.

 Yes, that is exactly where my I am not convinced comes from.


 We are not. (not really at least). We can tune some parameters or
 change the behavior slightly,
 but we cannot fix core assumptions made when creating v2 protocol.
 This you can see when when talking about v1 as well: we cannot fix any
 wrongdoings of v1 now by adding another capability.

Step 1 then should be identifying these wrongdoings and assumptions.

We can really go wild with these capabilities. The only thing that
can't be changed is perhaps sending the first ref. I don't know
whether we can accept a dummy first ref... After that point, you can
turn the protocol upside down because both client and server know what
it would be.

 So from my point
 of view we don't waste resources when having an advertisement of
 possible protocols instead of a boolean flag indicating v2 is
 supported. There is really not much overhead in coding nor bytes
 exchanged on the wire, so why not accept stuff that comes for free
 (nearly) ?

You realize you're advertising v2 as a new capability, right? Instead
of defining v2 feature set then advertise v2, people could simply add
new features directly. I don't see v2 (at least with these patches)
adds any value.

 I mean how do we know all the core assumptions made for v2 hold in the
 future? We don't. That's why I'd propose a plain and easy exchange at
 first stating the version to talk.

And we already does that, except that we don't state what version (as
a number) exactly, but what feature that version supports. The focus
should be the new protocol at daemon.c and maybe remote-curl.c where
we do know the current protocol is not flexible enough.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 0/3] protocol v2

2015-02-26 Thread Stefan Beller
 On Thu, Feb 26, 2015 at 12:13 PM, Junio C Hamano gits...@pobox.com wrote:

 I agree with the value assessment of these patches 98%, but these
 bits can be taken as the we have v2 server availble for you on the
 side, by the way hint you mentioned in the older thread, I think.

The patches are not well polished (In fact they don't even compile :/),
but I think they may demonstrate the ideas and though process. And
as it turns out we'd not be following that spirit of ideas but rather want
to have a dedicated v2.

That said I did not want to spend lots of time to polish the patch for
inclusion but rather to demonstrate ideas, which can be done with
substantial less quality IMHO. Correct me if I am wrong here!

Thanks,
Stefan
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 0/3] protocol v2

2015-02-26 Thread Stefan Beller
On Thu, Feb 26, 2015 at 12:13 PM, Junio C Hamano gits...@pobox.com wrote:
 Duy Nguyen pclo...@gmail.com writes:

 Step 1 then should be identifying these wrongdoings and assumptions.

 We can really go wild with these capabilities. The only thing that
 can't be changed is perhaps sending the first ref. I don't know
 whether we can accept a dummy first ref... After that point, you can
 turn the protocol upside down because both client and server know what
 it would be.

 Yes, exactly.  To up/down/side-grade from v1 is technically
 possible, but being technically possible is different from being
 sensible.  The capability-based sidegrade does not solve the problem
 when the problem to be solved is that the server side needs to spend
 a lot of cycles and the network needs to carry megabytes of data
 before capability exchange happens.  Yes, the newer server and the
 newer client can notice that the counterparty is new and start
 talking in new protocol (which may or may not benefit from already
 knowing the result of ref advertisement), but by the time that
 happens, the resource has already been spent and wasted.

 I do not think v1 can be fixed by send one ref with capability,
 newer client may respond immediately so we can stop enumerating
 remaining refs and older one will get stuck so we can have a timeout
 to see if the connection is from the newer one, and send the rest
 for the older client, because anything that involves such a timeout
 would not reliably work over WAN.

 You realize you're advertising v2 as a new capability, right? Instead
 of defining v2 feature set then advertise v2, people could simply add
 new features directly. I don't see v2 (at least with these patches)
 adds any value.

 I agree with the value assessment of these patches 98%, but these
 bits can be taken as the we have v2 server availble for you on the
 side, by the way hint you mentioned in the older thread, I think.

 And we already does that, except that we don't state what version (as
 a number) exactly, but what feature that version supports. The focus
 should be the new protocol at daemon.c and maybe remote-curl.c where
 we do know the current protocol is not flexible enough.

 The first thing the client tells the server is what service it
 requests.  A request over git:// protocol is read by git daemon to
 choose which service to run, and it is read directly by the login
 shell if it comes over ssh:// protocol.

 There is nothing that prevents us from defining that service to be a
 generic git service, not upload-pack, archive, receive-pack.
 And the early protocol exchange, once git service is spawned, with
 the client can be what real services does the server end support?
 capability list responded with wow, you are new enough to support
 the 'trickle-pack' service---please connect me to it request.


So I am not quite sure how to understand this input.

I wonder if a high level test could look like the following,
which just tests the workflow with git fetch, but not the
internals.

(Note: patch formatting may be broken as it's send via gmail web ui)
---8---
From: Stefan Beller sbel...@google.com
Date: Thu, 26 Feb 2015 17:19:30 -0800
Subject: [PATCH] Propose new tests for transitioning to the new option
transport.capabilitiesfirst

Signed-off-by: Stefan Beller sbel...@google.com
---
 t/t5544-capability-handshake.sh | 81 +
 1 file changed, 81 insertions(+)
 create mode 100755 t/t5544-capability-handshake.sh

diff --git a/t/t5544-capability-handshake.sh b/t/t5544-capability-handshake.sh
new file mode 100755
index 000..aa2b52d
--- /dev/null
+++ b/t/t5544-capability-handshake.sh
@@ -0,0 +1,81 @@
+#!/bin/sh
+
+test_description='fetching from a repository using the capabilities
first push option'
+
+. ./test-lib.sh
+
+mk_repo_pair () {
+ rm -rf workbench upstream 
+ test_create_repo upstream 
+ test_create_repo workbench 
+ (
+ cd upstream 
+ git config receive.denyCurrentBranch warn
+ ) 
+ (
+ cd workbench 
+ git remote add origin ../upstream
+ )
+}
+
+generate_commits_upstream () {
+ (
+ cd upstream 
+ echo more content file 
+ git add file 
+ git commit -a -m create a commit
+ )
+}
+
+# Compare the ref ($1) in upstream with a ref value from workbench ($2)
+# i.e. test_refs second HEAD@{2}
+test_refs () {
+ test $# = 2 
+ git -C upstream rev-parse --verify $1 expect 
+ git -C workbench rev-parse --verify $2 actual 
+ test_cmp expect actual
+}
+
+test_expect_success 'transport.capabilitiesfirst is not overridden
when set already' '
+ mk_repo_pair 
+ (
+ cd workbench 
+ git config transport.capabilitiesfirst 0
+ git config --get transport.capabilitiesfirst 0 expected
+ )
+ generate_commits_upstream 
+ (
+ cd workbench 
+ git fetch --all
+ git config --get transport.capabilitiesfirst actual
+ test_cmp expected actual
+ )
+'
+
+test_expect_success 'enable transport by fetching from new server' '
+ mk_repo_pair 
+ (
+ cd workbench 
+ git fetch origin
+ ) 
+ 

Re: [RFC/PATCH 0/3] protocol v2

2015-02-25 Thread Duy Nguyen
On Wed, Feb 25, 2015 at 6:37 AM, Stefan Beller sbel...@google.com wrote:
 I can understand, that we maybe want to just provide one generic
 version 2 of the protocol which is an allrounder not doing bad in
 all of these aspects, but I can see usecases of having the desire to
 replace the wire protocol by your own implementation. To do so
 we could try to offer an API which makes implementing a new
 protocol somewhat easy. The current state of affairs is not providing
 this flexibility.

I think we are quite flexible after initial ref advertisement. After
that point the client tells the server its capabilities and the server
does the same for the client. Only shared features can be used. So if
you want to add a new micro protocol for mobile, just add mobile
capability to both client and server. A new implementation can support
no capabililities and it should work fine with C Git (less efficient
though, of course). And we have freedom to mix capabilities any way we
want (it's harder to do when you have to follow v2, v2.1, v2.2...)
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 0/3] protocol v2

2015-02-25 Thread Junio C Hamano
Duy Nguyen pclo...@gmail.com writes:

 On Wed, Feb 25, 2015 at 6:37 AM, Stefan Beller sbel...@google.com wrote:
 I can understand, that we maybe want to just provide one generic
 version 2 of the protocol which is an allrounder not doing bad in
 all of these aspects, but I can see usecases of having the desire to
 replace the wire protocol by your own implementation. To do so
 we could try to offer an API which makes implementing a new
 protocol somewhat easy. The current state of affairs is not providing
 this flexibility.

 I think we are quite flexible after initial ref advertisement.

Yes, that is exactly where my I am not convinced comes from.

 After
 that point the client tells the server its capabilities and the server
 does the same for the client. Only shared features can be used. So if
 you want to add a new micro protocol for mobile, just add mobile
 capability to both client and server. A new implementation can support
 no capabililities and it should work fine with C Git (less efficient
 though, of course). And we have freedom to mix capabilities any way we
 want (it's harder to do when you have to follow v2, v2.1, v2.2...)
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 0/3] protocol v2

2015-02-25 Thread Stefan Beller
On Wed, Feb 25, 2015 at 10:04 AM, Junio C Hamano gits...@pobox.com wrote:
 Duy Nguyen pclo...@gmail.com writes:

 On Wed, Feb 25, 2015 at 6:37 AM, Stefan Beller sbel...@google.com wrote:
 I can understand, that we maybe want to just provide one generic
 version 2 of the protocol which is an allrounder not doing bad in
 all of these aspects, but I can see usecases of having the desire to
 replace the wire protocol by your own implementation. To do so
 we could try to offer an API which makes implementing a new
 protocol somewhat easy. The current state of affairs is not providing
 this flexibility.

 I think we are quite flexible after initial ref advertisement.

 Yes, that is exactly where my I am not convinced comes from.


We are not. (not really at least). We can tune some parameters or
change the behavior slightly,
but we cannot fix core assumptions made when creating v2 protocol.
This you can see when when talking about v1 as well: we cannot fix any
wrongdoings of v1 now by adding another capability. So from my point
of view we don't waste resources when having an advertisement of
possible protocols instead of a boolean flag indicating v2 is
supported. There is really not much overhead in coding nor bytes
exchanged on the wire, so why not accept stuff that comes for free
(nearly) ?

I mean how do we know all the core assumptions made for v2 hold in the
future? We don't. That's why I'd propose a plain and easy exchange at
first stating the version to talk.

Anyway what is the cost of a round trip time compared to the bytes on
the wire? Usually the cost of bytes on the wire correlate with the
latency anyway. (think mobile metered compared to corporate setting
with low latency). That's why I'd rather optimize for used bandwidth
than round trip times, but that may be just my personal perception of
the internet. That's why I'd propose different protocols.

Thanks,
Stefan
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 0/3] protocol v2

2015-02-24 Thread Stefan Beller
On Mon, Feb 23, 2015 at 10:15 PM, Junio C Hamano gits...@pobox.com wrote:
 On Mon, Feb 23, 2015 at 8:02 PM, Duy Nguyen pclo...@gmail.com wrote:

 It's very hard to keep backward compatibility if you want to stop the
 initial ref adverstisement, costly when there are lots of refs. But we
 can let both protocols run in parallel, with the old one advertise the
 presence of the new one. Then the client could switch to new protocol
 gradually. This way new protocol could forget about backward
 compatibility. See

 http://thread.gmane.org/gmane.comp.version-control.git/215054/focus=244325

 Yes, the whole thread is worth a read, but the approach suggested by
 that article $gmane/244325 is very good for its simplicity. The server
 end programs, upload-pack and receive-pack, need to only learn to
 advertise the availability of upload-pack-v2 and receive-pack-v2
 services and the client side programs, fetch-pack and push-pack,
 need to only notice the advertisement and record the availability of
 v2 counterparts for the current remote *and* continue the exchange
 in v1 protocol. That way, there is very little risk for breaking anything.

Right, I want to add this learn about v2 on the fly, continue as always
to the protocol.


 So if we are going to discuss a new protocol, I'd prefer to see the
 discussion without worrying too much about how to inter-operate
 with the current vintage of Git. It is no longer an interesting problem,
 as we know how to solve it with minimum risk. Instead, I'd like to
 see us design the new protocol in such a way that it is in-line
 upgradable without repeating our past mistakes.

 I am *not* convinced that we want multiple suite of protocols that
 must be chosen from to suit the use pattern, as mentioned somewhere
 upthread, by the way.

I do think it makes sense to have different protocols or different tunings
of one protocol, because there are many different situations in which different
metrics are the key metric.

If you are on mobile, you'd possibly be billed by the bytes on the wire, so
you want to have a protocol with as actual transport as possible and would
maybe trade off transported bytes to lots of computational overhead.

If you are in Australia (sorry downunder ;) or on satellite internet,
you may care a lot about latency and roundtrip times.

If you are in a corporate environment and just cloning from next door,
you may want to have the overall process (compute+network+
local reconstruction) just be fast overall.


I can understand, that we maybe want to just provide one generic
version 2 of the protocol which is an allrounder not doing bad in
all of these aspects, but I can see usecases of having the desire to
replace the wire protocol by your own implementation. To do so
we could try to offer an API which makes implementing a new
protocol somewhat easy. The current state of affairs is not providing
this flexibility.

I think it would be not much overhead to have such
flexibility when writing the actual code for the very little risk v2
update. So instead of advertising a boolean flag meaning
This server/client speaks version2, we would rather send a list
This server speaks v2,v1 and v-custom-optimized-for-high-latency.

I started looking for academic literature if there are generic solutions
to finding graph differences, but no real luck for adapting to our
problem yet.

Thanks for your input,
Stefan
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 0/3] protocol v2

2015-02-23 Thread Stefan Beller
On Mon, Feb 23, 2015 at 8:02 PM, Duy Nguyen pclo...@gmail.com wrote:
 On Tue, Feb 24, 2015 at 10:12 AM, Stefan Beller sbel...@google.com wrote:
 One of the biggest problems of a new protocol would be deployment
 as the users probably would not care too deeply. It should just
 work in the sense that the user should not even sense that the
 protocol changed.

 Agreed.

 To do so we need to make sure the protocol
 is backwards compatible and works if an old client talks to
 a new server as well as the other way round.

 It's very hard to keep backward compatibility if you want to stop the
 initial ref adverstisement, costly when there are lots of refs. But we
 can let both protocols run in parallel, with the old one advertise the
 presence of the new one.

That's what I actually meant, to have different versions out there,
but maybe having the version as of now as the least common denominator
such that it always works (albeit inefficient for many refs).

 Then the client could switch to new protocol
 gradually. This way new protocol could forget about backward
 compatibility. See

 http://thread.gmane.org/gmane.comp.version-control.git/215054/focus=244325
 --
 Duy

 I would add that upload-pack also advertises about the availability of
 upload-pack2 and the client may set the remote.*.useUploadPack2 to
 either yes or auto so next time upload-pack2 will be used.

I had a similar thought, though I would not just restrict it to v2
this time, but
I'd aim to make it possible to plug whatever protocol you want to.
(Comparable to the SSL or ssh, it will always work, but as a proficient user
you can spend lot's of time tweaking what you actually want, looking
at tradeoffs
of efficiency, security, convenience).
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 0/3] protocol v2

2015-02-23 Thread Junio C Hamano
On Mon, Feb 23, 2015 at 8:02 PM, Duy Nguyen pclo...@gmail.com wrote:

 It's very hard to keep backward compatibility if you want to stop the
 initial ref adverstisement, costly when there are lots of refs. But we
 can let both protocols run in parallel, with the old one advertise the
 presence of the new one. Then the client could switch to new protocol
 gradually. This way new protocol could forget about backward
 compatibility. See

 http://thread.gmane.org/gmane.comp.version-control.git/215054/focus=244325

Yes, the whole thread is worth a read, but the approach suggested by
that article $gmane/244325 is very good for its simplicity. The server
end programs, upload-pack and receive-pack, need to only learn to
advertise the availability of upload-pack-v2 and receive-pack-v2
services and the client side programs, fetch-pack and push-pack,
need to only notice the advertisement and record the availability of
v2 counterparts for the current remote *and* continue the exchange
in v1 protocol. That way, there is very little risk for breaking anything.

And the programs for new protocol exchange do not have to worry
about having to talk with older counterparts and downgrading the
protocol inline at all. As long as we learn from our past mistakes
and make sure that the very initial exchange will be kept short (one
of the items in the list of limitations, $gmane/264000), future servers
and clients can upgrade the protocol they talk inline by probing
capabilities, just like the current protocol allows them to choose
extensions. The biggest issue in the current protocol is not who
speaks first (that is merely one aspect) but what is spoken first,
iow, one side blinly gives a large message as the first thing, which
cannot be squelched by capability exchange.

So if we are going to discuss a new protocol, I'd prefer to see the
discussion without worrying too much about how to inter-operate
with the current vintage of Git. It is no longer an interesting problem,
as we know how to solve it with minimum risk. Instead, I'd like to
see us design the new protocol in such a way that it is in-line
upgradable without repeating our past mistakes.

I am *not* convinced that we want multiple suite of protocols that
must be chosen from to suit the use pattern, as mentioned somewhere
upthread, by the way.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC/PATCH 0/3] protocol v2

2015-02-23 Thread Stefan Beller
Inspired by a discusson on the scaling of git in the last days,
I thought about starting the adventure to teach git a new transport
protocol.

One of the biggest problems of a new protocol would be deployment
as the users probably would not care too deeply. It should just 
work in the sense that the user should not even sense that the
protocol changed. To do so we need to make sure the protocol
is backwards compatible and works if and old client talks to 
a new server as well as the other way round.

A later incarnation of the patch series will eventually add the 
possibility to add new versions of the transport protocols easily
without harming the user. For now in the first revision of the 
series it just documents an approach of how I'd start this problem 
of compatibility issues.

I realize this will be a bigger change to git, so I'd rather
just make a small step now. The actual discussion on how to
do the next protocol(s) may be started on the gitmerge
conference? (bloomfilter! client speaks first!, rsyncing
the refs changes!)

Any thoughts on how to make it easy to teach git new protocols
are very welcome.

Thanks,
Stefan

Stefan Beller (3):
  Document protocol capabilities extension
  receive-pack: add advertisement of different protocol options
  receive-pack: enable protocol v2

 Documentation/technical/protocol-capabilities.txt | 11 +++
 builtin/receive-pack.c|  7 +++
 2 files changed, 18 insertions(+)

-- 
2.3.0.81.gc37f363

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 0/3] protocol v2

2015-02-23 Thread Duy Nguyen
On Tue, Feb 24, 2015 at 10:12 AM, Stefan Beller sbel...@google.com wrote:
 One of the biggest problems of a new protocol would be deployment
 as the users probably would not care too deeply. It should just
 work in the sense that the user should not even sense that the
 protocol changed.

Agreed.

 To do so we need to make sure the protocol
 is backwards compatible and works if an old client talks to
 a new server as well as the other way round.

It's very hard to keep backward compatibility if you want to stop the
initial ref adverstisement, costly when there are lots of refs. But we
can let both protocols run in parallel, with the old one advertise the
presence of the new one. Then the client could switch to new protocol
gradually. This way new protocol could forget about backward
compatibility. See

http://thread.gmane.org/gmane.comp.version-control.git/215054/focus=244325
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html