Re: [RFC/PATCH 0/3] protocol v2
Stefan Beller sbel...@google.com writes: So I started looking into extending the buffer size as another 'first step' towards the protocol version 2 again. But now I think the packed length limit of 64k is actually a good and useful thing to have and should be extended/fixed if and only if we run into serious trouble with too small packets later. I tend to agree. Too large a packet size would mean your latency would also suck, as pkt-line interface will not give you anything until you read the entire packet. The new protocol should be designed around a reasonably sized packets, using multiple packets to carry larger payload as necessary. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 0/3] protocol v2
On Tue, Mar 3, 2015 at 9:13 AM, Junio C Hamano gits...@pobox.com wrote: Duy Nguyen pclo...@gmail.com writes: Junio pointed out in private that I didn't address the packet length limit (64k). I thought I could get away with a new capability (i.e. not worry about it now) but I finally admit that was a bad hack. So perhaps this on top. No, I didn't ;-) but I tend to agree that perhaps 4GB huge packet? is a bad idea. The problem I had with the version in your write-up was that it still assumed that all capabilities must come on one packet-line. So I started looking into extending the buffer size as another 'first step' towards the protocol version 2 again. But now I think the packed length limit of 64k is actually a good and useful thing to have and should be extended/fixed if and only if we run into serious trouble with too small packets later. I mean we can add the possibility now by introducing these special length 0x or 0xFFFE to mean we'd want to extend it in the future. But when doing this we need to be extra careful with buffer allocation. As it is easy to produce a denial of service attack if the receiving side blindly trusts the length and allocates as much memory. So having a 64k limit actually helps preventing this attack a bit as it is a very small number. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 0/3] protocol v2
On Wed, Mar 4, 2015 at 5:03 PM, Stefan Beller sbel...@google.com wrote: If anyone wants to experiment with the data I gathered, I can make them available. All data of `ls-remote` including the gathering script is found at (112 kB .tar.xz) https://drive.google.com/file/d/0B7E93UKgFAfjcHRvM1N2YjBfTzA/view?usp=sharing (6.6MB in .zip) https://drive.google.com/file/d/0B7E93UKgFAfjRko3WHhtUWZtTEU/view?usp=sharing I also do have all the object files which are referenced in the outputs of ls-remote, though sharing them is a bit tough as I cannot just git push them (forced pushes make some of the objects unreachable, so my local gathering repo is explicitly configured to not garbage collect), and these are huge compared to just the output of ls-remote. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 0/3] protocol v2
On Wed, Mar 4, 2015 at 4:05 AM, Duy Nguyen pclo...@gmail.com wrote: On Wed, Mar 4, 2015 at 11:27 AM, Shawn Pearce spea...@spearce.org wrote: Let me go on a different tangent a bit from the current protocol. http://www.grpc.io/ was recently released and is built on the HTTP/2 standard. It uses protobuf as a proven extensibility mechanism. Including a full C based grpc stack just to speak the Git wire protocol is quite likely overkill, but I think the embedding of a proven extensible format inside of a bi-directional framed streaming system like HTTP/2 offers some good guidance. I'll take this as learn from grpc, not just reuse grpc Correct, that was what I was trying to say and I just wrote it poorly. HTTP 1.x, HTTP/2 and protobuf have proven themselves to be fairly open to extension and working well in the wild for transports. There is useful guidance there that we should draw from to try and leave doors open for the future. HTTP/2, protobuf and grpc are fairly complex. I consider any one of them too complicated for Git specific use. However HTTP/2 is probably the future of HTTP stacks so we may see it show up in libcurl or something as popular as libcurl in another 10 years. Hg had some reasonably sane ideas about building the wire protocol to work well on HTTP 1.x upfront rather than Git tacking it on much later. Network protocol parsing is hard. Especially in languages like C where buffer overflows are possible. Or where a client could trivially DoS a server by sending a packet of size uint_max and the server naively trying to malloc() that buffer. Defining the network protocol in an IDL like protobuf 3 and being machine generated from stable well maintained code has its advantages. I'm still studying the spec, so I can't comment if using IDL/protobuf3 is a good idea yet. But I think at least we can avoid DoS by changing the pkt-line (again) a bit: the length 0x means that actual length is 0xfffe and the next pkt-line is part of this pkt-line. Higher level (upload-pack or fetch-pack, for example) must set an upper limit for packet_read() so it won't try to concatenate pkt-lines forever. pkt-line is a reasonably simple and efficient framing system. A 64 KiB pkt-line frame only costs ~0.0061% overhead; ~0.0076% overhead if you are a pack stream in a side-band-64k channel. That is probably more efficient than HTTP/2 or SSL framing. I see no reason to attempt to try and reduce that overhead further. 64 KiB frame size is enough for anyone to move data efficiently with these headers. In practice you are going to wrap that up into SSH or SSL/TLS and those overheads are so much higher it doesn't matter we have a tiny loss here. I think a mistake in the wire protocol was making the pkt-line length human readable hex, but the sideband channel binary. _If_ we redo the framing the only change I would make is making the side band readable. Thus far we have only used 0, 1, 2 for sideband channels. These could easily be moved into human readable channel ids: 'd': currently sideband 0; this is the application data, aka the pack data 'p': currently sideband 1; this is the progress stream for stderr 'e': currently sideband 2; there was an error, data in this packet is the message text, and the connection will shutdown after the packet. And then leave all other sideband values undefined and reserved for future use, just like they are all open today. I am not convinced framing changes are necessary. I would fine with leaving the sideband streams as 0,1,2... but if we want a text based protocol for ease of debugging we should be text based across the board and try very hard to avoid these binary values in the framing, or ever needing to use a magical NUL byte in the middle of a packet to find a gap in older parsers for newer data. If you want to build a larger stream like ref advertisement inside a pkt-line framing without using a pkt-line per ref record, you should follow the approach used by pack data streams where it uses the 5 byte side-band pkt-line framing and a side-band channel is allocated for that data. Application code can then run a side-band demux to yank out the inner stream and parse it. It may be simpler to restrict ref names to be smaller than 64k in length so you have room for framing and hash value to be transferred inside of a single pkt-line, then use the pkt-line framing to do the transfer. Today's upload-pack ref advertisment has ~25% overhead. Most of that is in the duplicated tag name for the peeled refs/tags/v1.0^{} lines. If you drop those names (but keep the pkt-line and SHA-1), its only about 8% overhead above the packed-refs file. I think optimization efforts for ref advertisement need to focus on reducing the number of refs sent back and forth, not shrinking the individual records down smaller. Earlier in this thread Junio raised a point that the flush-pkt is confusing because it has way too many purposes. I agree. IIRC we have 0001-0003
Re: [RFC/PATCH 0/3] protocol v2
On Wed, Mar 4, 2015 at 11:10 AM, Shawn Pearce spea...@spearce.org wrote: On Wed, Mar 4, 2015 at 4:05 AM, Duy Nguyen pclo...@gmail.com wrote: On Wed, Mar 4, 2015 at 11:27 AM, Shawn Pearce spea...@spearce.org wrote: Let me go on a different tangent a bit from the current protocol. http://www.grpc.io/ was recently released and is built on the HTTP/2 standard. It uses protobuf as a proven extensibility mechanism. Including a full C based grpc stack just to speak the Git wire protocol is quite likely overkill, but I think the embedding of a proven extensible format inside of a bi-directional framed streaming system like HTTP/2 offers some good guidance. I'll take this as learn from grpc, not just reuse grpc Correct, that was what I was trying to say and I just wrote it poorly. HTTP 1.x, HTTP/2 and protobuf have proven themselves to be fairly open to extension and working well in the wild for transports. There is useful guidance there that we should draw from to try and leave doors open for the future. HTTP/2, protobuf and grpc are fairly complex. I consider any one of them too complicated for Git specific use. However HTTP/2 is probably the future of HTTP stacks so we may see it show up in libcurl or something as popular as libcurl in another 10 years. Hg had some reasonably sane ideas about building the wire protocol to work well on HTTP 1.x upfront rather than Git tacking it on much later. Network protocol parsing is hard. Especially in languages like C where buffer overflows are possible. Or where a client could trivially DoS a server by sending a packet of size uint_max and the server naively trying to malloc() that buffer. Defining the network protocol in an IDL like protobuf 3 and being machine generated from stable well maintained code has its advantages. I'm still studying the spec, so I can't comment if using IDL/protobuf3 is a good idea yet. But I think at least we can avoid DoS by changing the pkt-line (again) a bit: the length 0x means that actual length is 0xfffe and the next pkt-line is part of this pkt-line. Higher level (upload-pack or fetch-pack, for example) must set an upper limit for packet_read() so it won't try to concatenate pkt-lines forever. pkt-line is a reasonably simple and efficient framing system. A 64 KiB pkt-line frame only costs ~0.0061% overhead; ~0.0076% overhead if you are a pack stream in a side-band-64k channel. That is probably more efficient than HTTP/2 or SSL framing. I see no reason to attempt to try and reduce that overhead further. 64 KiB frame size is enough for anyone to move data efficiently with these headers. In practice you are going to wrap that up into SSH or SSL/TLS and those overheads are so much higher it doesn't matter we have a tiny loss here. I think a mistake in the wire protocol was making the pkt-line length human readable hex, but the sideband channel binary. _If_ we redo the framing the only change I would make is making the side band readable. Thus far we have only used 0, 1, 2 for sideband channels. These could easily be moved into human readable channel ids: 'd': currently sideband 0; this is the application data, aka the pack data 'p': currently sideband 1; this is the progress stream for stderr 'e': currently sideband 2; there was an error, data in this packet is the message text, and the connection will shutdown after the packet. And then leave all other sideband values undefined and reserved for future use, just like they are all open today. I am not convinced framing changes are necessary. I would fine with leaving the sideband streams as 0,1,2... but if we want a text based protocol for ease of debugging we should be text based across the board and try very hard to avoid these binary values in the framing, or ever needing to use a magical NUL byte in the middle of a packet to find a gap in older parsers for newer data. If you want to build a larger stream like ref advertisement inside a pkt-line framing without using a pkt-line per ref record, you should follow the approach used by pack data streams where it uses the 5 byte side-band pkt-line framing and a side-band channel is allocated for that data. Application code can then run a side-band demux to yank out the inner stream and parse it. It may be simpler to restrict ref names to be smaller than 64k in length so you have room for framing and hash value to be transferred inside of a single pkt-line, then use the pkt-line framing to do the transfer. Today's upload-pack ref advertisment has ~25% overhead. Most of that is in the duplicated tag name for the peeled refs/tags/v1.0^{} lines. If you drop those names (but keep the pkt-line and SHA-1), its only about 8% overhead above the packed-refs file. I think optimization efforts for ref advertisement need to focus on reducing the number of refs sent back and forth, not shrinking the individual records down smaller. Earlier in this
Re: [RFC/PATCH 0/3] protocol v2
On Wed, Mar 4, 2015 at 11:27 AM, Shawn Pearce spea...@spearce.org wrote: Let me go on a different tangent a bit from the current protocol. http://www.grpc.io/ was recently released and is built on the HTTP/2 standard. It uses protobuf as a proven extensibility mechanism. Including a full C based grpc stack just to speak the Git wire protocol is quite likely overkill, but I think the embedding of a proven extensible format inside of a bi-directional framed streaming system like HTTP/2 offers some good guidance. I'll take this as learn from grpc, not just reuse grpc Network protocol parsing is hard. Especially in languages like C where buffer overflows are possible. Or where a client could trivially DoS a server by sending a packet of size uint_max and the server naively trying to malloc() that buffer. Defining the network protocol in an IDL like protobuf 3 and being machine generated from stable well maintained code has its advantages. I'm still studying the spec, so I can't comment if using IDL/protobuf3 is a good idea yet. But I think at least we can avoid DoS by changing the pkt-line (again) a bit: the length 0x means that actual length is 0xfffe and the next pkt-line is part of this pkt-line. Higher level (upload-pack or fetch-pack, for example) must set an upper limit for packet_read() so it won't try to concatenate pkt-lines forever. -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 0/3] protocol v2
Duy Nguyen pclo...@gmail.com writes: Junio pointed out in private that I didn't address the packet length limit (64k). I thought I could get away with a new capability (i.e. not worry about it now) but I finally admit that was a bad hack. So perhaps this on top. No, I didn't ;-) but I tend to agree that perhaps 4GB huge packet? is a bad idea. The problem I had with the version in your write-up was that it still assumed that all capabilities must come on one packet-line. The immediate issue that limitation in the current protocol we had was that the usual we can help newer programs to operate better while getting ignored by existing programs by sending optional information as part of the capability advert would not work for upload-pack to enumerate symrefs and their targets to help clone. The lesson to draw from that experience is not we should have an option to use large packets. 64kB is plenty but the senders and the receivers have a lot lower limit in practice to avoid harming latency (I think it is like 1000 bytes before both ends agree to switch talking over the sideband multiplexer). It is not we should anticipate and design protocol better, either. We are humans and it is hard to predict things, especially things in the future. The lesson we should learn is that it is important to leave us enough wiggle room to allow us cope with such unanticipated limitations ;-). My recollection is that the consensus from the last time we discussed protocol revamping was to list one capability per packet so that packet length limit does not matter, but you may want to check with the list archive yourself. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 0/3] protocol v2
Junio C Hamano gits...@pobox.com writes: Duy Nguyen pclo...@gmail.com writes: Junio pointed out in private that I didn't address the packet length limit (64k). I thought I could get away with a new capability (i.e. not worry about it now) but I finally admit that was a bad hack. So perhaps this on top. No, I didn't ;-) but I tend to agree that perhaps 4GB huge packet? is a bad idea. ... I realize that I responded with No, I did not complain about X, I had trouble about Y and here is why and talked mostly about Y without talking much about X. So let's touch X a bit. As to the packet length, I think it is a good idea to give us an escape hatch to bust 64k limit. Refs may not be the reason to do so, but as I said, we cannot forsee the future needs. Having X behind us, now back to Y, and then I'll remind us of Z ;-) [*1*] My recollection is that the consensus from the last time we discussed protocol revamping was to list one capability per packet ... And the above is the right thing from the protocol point of view. The only reason the current protocol says capabilities go on a single line separated by SP is because the hole we found to add to the protocol was to piggyback after the ref advertisement lines, and there was no guarantee that we have more than one ref advertised, so we needed to be able to stuff everything on a single line. Stepping back and thinking about what a packet in pkt-line protocol is, we realize that it is the smallest logical unit of transferring information. The state of a single ref in a series of ref advertisement. The fact that receiving end has all the history leading up to a single commit. The request to obtain all history leading up to a single commit. That is why I say that one-cap-per-packet is the right thing. These individual logical units are grouped into a larger logical unit by (1) being at a specific point in the protocol exchange, (2) being adjacent to each other and (3) terminated by a flush packet. Examples: - A bunch of individual ref states that is at the beginning of the upload-pack to fetch-pack commniucation that ends with a flush constitutes a ref advertisement. - A series of want packets at the beginning of the fetch-pack to upload-pack communiucation that ends with a flush constitutes a fetch request. Another thing I didn't find in the updated documentation was a proposal to define what a flush exactly means. In my above writing, it should be clear that a flush is merely the end of a group. It does not mean (and it never meant, until smart HTTP) I am finished talking, now it is your turn. If a requestor needs to give two groups of items before the responder can process the request, we would want to be able to say A1, A2, ..., now I am done with As; B1, B2, B3, ..., now I am done with Bs; this concludes my request, and it is your turn to process and respond to me. But you cannot easily do so without affecting smart HTTP, as it is written in such a way that it assumes flush is I am done, it is your turn. I am perfectly OK if v2 redefined flush to mean I am done, it is yoru turn. But then the protocol should have another way to allow packets into larger groups. A sequence of packets begin A, A1, A2, ..., end, begin B, B1, B2, B3, end, flush may be a way to do so, and if we continue to rely on the order of packets to help determine the semantics (aka being at a specific point in the protocol exchange above), we may even be able to omit begin A and begin B packets (i.e. the end is the new end of a logical group, which is what flush originally was). [Footnote] *1* For those who haven't been following the discussion: X: maximum packet length being 64kB might be problematic. Y: requiring capability advertisement and request in a single packet is wrong. Z: the meaning of flush needs to be clarified. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 0/3] protocol v2
On Tue, Mar 3, 2015 at 5:54 PM, Duy Nguyen pclo...@gmail.com wrote: On Wed, Mar 4, 2015 at 12:13 AM, Junio C Hamano gits...@pobox.com wrote: My recollection is that the consensus from the last time we discussed protocol revamping was to list one capability per packet so that packet length limit does not matter, but you may want to check with the list archive yourself. I couldn't find that consensus mail, but this one [1] is good enough evidence that we can hit packet length limit in capability line easily. With an escape hatch to allow maximum packet length up to uint_max, I The symbolic ref thing was done badly. There isn't an escape hatch in current v1 protocol sufficient to allow this but each ref should be its own pkt-line, or should be a small batch of refs per pkt-line, or the ref advertisement should be a data stream in a side-band-64k sort of format inside the pkt-line framing. At 64k per frame of side-band there is plenty of data to header ratio that we don't need to escape to uint_max. Looks like one cap per pkt-line is winning.. Yes. [1] http://thread.gmane.org/gmane.comp.version-control.git/237929 Let me go on a different tangent a bit from the current protocol. http://www.grpc.io/ was recently released and is built on the HTTP/2 standard. It uses protobuf as a proven extensibility mechanism. Including a full C based grpc stack just to speak the Git wire protocol is quite likely overkill, but I think the embedding of a proven extensible format inside of a bi-directional framed streaming system like HTTP/2 offers some good guidance. Network protocol parsing is hard. Especially in languages like C where buffer overflows are possible. Or where a client could trivially DoS a server by sending a packet of size uint_max and the server naively trying to malloc() that buffer. Defining the network protocol in an IDL like protobuf 3 and being machine generated from stable well maintained code has its advantages. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 0/3] protocol v2
On Wed, Mar 4, 2015 at 12:13 AM, Junio C Hamano gits...@pobox.com wrote: My recollection is that the consensus from the last time we discussed protocol revamping was to list one capability per packet so that packet length limit does not matter, but you may want to check with the list archive yourself. I couldn't find that consensus mail, but this one [1] is good enough evidence that we can hit packet length limit in capability line easily. With an escape hatch to allow maximum packet length up to uint_max, I think we'll be fine for a long time even if we don't send one cap per pkt-line. So I'm trying to see if we really want to go with one cap per pkt-line.. Pros: - better memory management, current pkt-line static buffer is probably fine - a capability can contain spaces after '=' Cons: - some refactoring needed to hide away differences between v1 and v2 Looks like one cap per pkt-line is winning.. [1] http://thread.gmane.org/gmane.comp.version-control.git/237929 -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 0/3] protocol v2
On Mon, Mar 02, 2015 at 04:21:36PM +0700, Duy Nguyen wrote: On Sun, Mar 01, 2015 at 07:47:40PM -0800, Junio C Hamano wrote: It seems, however, that our current thinking is that it is OK to do the allow new v1 clients to notice the availabilty of v2 servers, so that they can talk v2 the next time thing, so my preference is to throw this client first and let server notice into maybe doable but not our first choice bin, at least for now. OK let's see if first choice like this could work. Very draft but it should give some idea how to make a prototype to test it out. Note that the server still speaks first in this proposal. Junio pointed out in private that I didn't address the packet length limit (64k). I thought I could get away with a new capability (i.e. not worry about it now) but I finally admit that was a bad hack. So perhaps this on top. The WARN line was originally supposed to be used when packet length is still 64k and the server has a ref longer than that. It could then skip that long ref and inform the user, so the user could re-request again, this time asking for long packet length capability. That's irrelevant now. But I think an option to say something without abort may still be a good idea, especially if we allow hooks to intercept the protocol. -- 8 -- diff --git a/Documentation/technical/pack-protocol.txt b/Documentation/technical/pack-protocol.txt index 32a1186..e2003c0 100644 --- a/Documentation/technical/pack-protocol.txt +++ b/Documentation/technical/pack-protocol.txt @@ -37,6 +37,20 @@ communicates with that invoked process over the SSH connection. The file:// transport runs the 'upload-pack' or 'receive-pack' process locally and communicates with it over a pipe. +Pkt-line format +--- + +In version 1, a packet line consists of four bytes containing the +length of the entire line plus four, in hexadecimal format. A flush +consists of four zero bytes. + +In version 2, the four-byte header format remains supported but the +maximum length is 0xfffe. If the length is 0x, the actual length +follows in variable encoding in hexadecimal. + +XXX: perhaps go with 2-byte length by default instead because we don't +usually need pkt-line longer than 256?? Maybe not worth saving a couple bytes + Git Transport - @@ -68,10 +82,12 @@ process on the server side over the Git protocol is this: nc -v example.com 9418 If the server refuses the request for some reasons, it could abort -gracefully with an error message. +gracefully with an error message, or show a warning but and keep +moving. error-line = PKT-LINE(ERR SP explanation-text) + warning-line = PKT-LINE(WARN SP explanation-text) SSH Transport -- 8 -- -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 0/3] protocol v2
On Mon, Mar 02, 2015 at 04:21:36PM +0700, Duy Nguyen wrote: On Sun, Mar 01, 2015 at 07:47:40PM -0800, Junio C Hamano wrote: It seems, however, that our current thinking is that it is OK to do the allow new v1 clients to notice the availabilty of v2 servers, so that they can talk v2 the next time thing, so my preference is to throw this client first and let server notice into maybe doable but not our first choice bin, at least for now. OK let's see if first choice like this could work. Very draft but it should give some idea how to make a prototype to test it out. Note that the server still speaks first in this proposal. And ref discovery phase could be modified by new capabilities. For example, -- 8 -- diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt index 56c11b4..56a8c2e 100644 --- a/Documentation/technical/protocol-capabilities.txt +++ b/Documentation/technical/protocol-capabilities.txt @@ -304,3 +304,36 @@ language code. The default language code is unspecified, even though it's usually English in ASCII encoding. + +compressed-refs +--- + +This is applicable to upload-pack-2 and receive-pack-2 only. The +client expects ref list in reference discovery phase to be sent in +compressed format: + + - Each PKT-LINE may contain more than one ref + - SHA-1 is in binary encoding (i.e. 20 bytes instead of + 40 bytes as hex string) + - ref name is prefix compressed, see index-format.txt version 4. + - Ref list ends with flush-pkt + +glob-refs +- + +This is applicable to upload-pack-2 and receive-pack-2 only. In +reference discovery phase, a new mode glob is supported. Where the +arguments are wildmatch patterns. Negative patterns begin with '!'. +Only refs matching requested patterns are sent to the client. + +stateful-refs +- + +This is applicable to upload-pack-2 and receive-pack-2 only. In +reference discovery phase, a new mode stateful is supported. Where +the first argument is a string representing the ref list that was sent +by the same server last time. The remaining arguments are glob. + +The first ref line that the server sends should carry a new state +string after ref name. The server may send only updated refs it if +understands the state string sent by the client. Still under discussion. -- 8 -- -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 0/3] protocol v2
On Sun, Mar 01, 2015 at 07:47:40PM -0800, Junio C Hamano wrote: It seems, however, that our current thinking is that it is OK to do the allow new v1 clients to notice the availabilty of v2 servers, so that they can talk v2 the next time thing, so my preference is to throw this client first and let server notice into maybe doable but not our first choice bin, at least for now. OK let's see if first choice like this could work. Very draft but it should give some idea how to make a prototype to test it out. Note that the server still speaks first in this proposal. -- 8 -- diff --git a/Documentation/technical/pack-protocol.txt b/Documentation/technical/pack-protocol.txt index 462e206..32a1186 100644 --- a/Documentation/technical/pack-protocol.txt +++ b/Documentation/technical/pack-protocol.txt @@ -1,11 +1,11 @@ Packfile transfer protocols === -Git supports transferring data in packfiles over the ssh://, git:// and +Git supports transferring data in packfiles over the ssh://, git://, http:// and file:// transports. There exist two sets of protocols, one for pushing data from a client to a server and another for fetching data from a server to a client. All three transports (ssh, git, file) use the same -protocol to transfer data. +protocol to transfer data. http is documented in http-protocol.txt. The processes invoked in the canonical Git implementation are 'upload-pack' on the server side and 'fetch-pack' on the client side for fetching data; @@ -14,6 +14,12 @@ data. The protocol functions to have a server tell a client what is currently on the server, then for the two to negotiate the smallest amount of data to send in order to fully update one or the other. +upload-pack-2 and receive-pack-2 are the next generation of +upload-pack and receive-pack respectively. The first two are +referred as version 2 in this document and pack-capabilities.txt +while the last two are version 1. Unless stated otherwise, version 1 +is implied. + Transports -- There are three transports over which the packfile protocol is @@ -42,7 +48,8 @@ hostname parameter, terminated by a NUL byte. -- git-proto-request = request-command SP pathname NUL [ host-parameter NUL ] - request-command = git-upload-pack / git-receive-pack / + request-command = git-upload-pack / git-upload-pack-2 / + git-receive-pack / git-receive-pack-2 / git-upload-archive ; case sensitive pathname = *( %x01-ff ) ; exclude NUL host-parameter= host= hostname [ : port ] @@ -67,7 +74,6 @@ gracefully with an error message. error-line = PKT-LINE(ERR SP explanation-text) - SSH Transport - @@ -124,9 +130,58 @@ has, the first can 'fetch' from the second. This operation determines what data the server has that the client does not then streams that data down to the client in packfile format. +Capability discovery (v2) +- -Reference Discovery +In version 1, capability discovery is part of reference discovery and +covered in reference discovery section. + +In versino 2, when the client initially connects, the server +immediately sends its capabilities to the client. Then the client must +send the list of server capabilities it wants to use to the server. + + S: 00XXcapabilities multi_ack thin-pack ofs-delta lang\n + C: 00XXcapabilities thin-pack ofs-delta lang=en\n + + + cap = PKT-LINE(capabilities SP capability-list LF) + capability-list = capability *(SP capability) + capability = 1*(LC_ALPHA / DIGIT / - / _ / =) + LC_ALPHA = %x61-7A + + +The client MUST NOT ask for capabilities the server did not say it +supports. + +Server MUST diagnose and abort if capabilities it does not understand +was sent. Server MUST NOT ignore capabilities that client requested +and server advertised. As a consequence of these rules, server MUST +NOT advertise capabilities it does not understand. + +See protocol-capabilities.txt for a list of allowed server and client +capabilities and descriptions. + +XXX: this approach wastes one round trip in smart-http because the +client would speak first. Perhaps we could allow client speculation. +It can assume what server caps will send and send commands based on that +assumption. If it turns out true, we save one round trip. E.g. fast +path: + + C: You are supposed to send caps A, B. I would respond with cap B. + Then I would send want-refs refs/heads/foo. + S: (yes we are sending caps A and B), validate client caps, + execute want-refs and return ref list + +and slow path: + + C: You are supposed to send caps A, B. I would respond with cap B. + Then I would send want-refs refs/heads/foo. + S: Send caps A, B and C. ignore the rest from client + C: Want caps A and C. Send want-refs foo + S: return ref foo + +Reference Discovery (v1) +
Re: [RFC/PATCH 0/3] protocol v2
On Sun, Mar 01, 2015 at 11:06:21PM -, Philip Oakley wrote: OK, maybe not exactly about protocol, but a possible option would be the ability to send the data as a bundle or multi-bundles; Or perhasps as an archive, zip, or tar. Data can then be exchanged across an airgap or pigeon mail. The airgap scenario is likely a real case that's not directly prominent at the moment, just because it's not tha direct. There has been discussion about servers having bundles available for clones, but with a multi-bundle, one could package up a large bundle (months) and an increment (weeks, and then days), before an final easy to pack last few hours. That would be a server work trade-off, and support a CDN view if needed. If such an approach was reasonable would the protocol support it? etc. It came up several times. Many people are in favor of it. Some references.. http://thread.gmane.org/gmane.comp.version-control.git/264305/focus=264565 http://thread.gmane.org/gmane.comp.version-control.git/263898/focus=263928 http://thread.gmane.org/gmane.comp.version-control.git/263898/focus=264000 http://thread.gmane.org/gmane.comp.version-control.git/238472/focus=238844 This is what I got so far. I think the hard part is how to let projects control this in a clean and flexible way. Not written in the patch, but I'm thinking maybe we can allow hooking a remote helper in standard git://, ssh://, http://... That would give total control to projects. -- 8 -- diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt index ecb0efd..2b99464 100644 --- a/Documentation/technical/protocol-capabilities.txt +++ b/Documentation/technical/protocol-capabilities.txt @@ -260,3 +260,34 @@ v2 'git-upload-pack' and 'git-receive-pack' may advertise this capability if the server supports 'git-upload-pack-2' and 'git-receive-pack-2' respectively. + +redirect + + +This capability is applicable for upload-pack and upload-pack-v2 +only. When the client requests this capability it must specify +supported transport protocol separated by colon, +e.g. redirect=http:ftp:ssh:torrent. + +Instead of sending a packfile data to the client, the server may send +4-byte signature { 'L', 'I', 'N', 'K' } followed by a NUL-terminated +URLs, each one pointing to a bundle. This fake pack ends with an empty +string. + +The bundle does not have to contain all refs requested by the +client. Different bundles from different URLs could have different +content. The client must follow one of the links to get a bundle. +The server must not send URL in a protocol that the client does not +support. + +FIXME: do we keep current connection alive until the bundle is +downloaded and get a normal pack, or let the client initiate a new +connection? Or perhaps if the client fails to get the bundle for +whatever reason, it could send NAK to the server and the server +sends normal packfile data. + +FIXME: how do we implement this exactly? The decision to redirect +should probably be delegated to some hook. Maybe sending all want +lines to the script is enough.. Sending have lines is more difficult +because the server decides when to stop receiving them. That decision +must be moved to the hook... -- 8 -- -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 0/3] protocol v2
Duy Nguyen pclo...@gmail.com writes: On Sun, Mar 1, 2015 at 3:41 PM, Junio C Hamano gits...@pobox.com wrote: - Because the protocol exchange starts by the server side advertising all its refs, even when the fetcher is interested in a single ref, the initial overhead is nontrivial, especially when you are doing a small incremental update. The worst case is an auto-builder that polls every five minutes, even when there is no new commits to be fetched [*3*]. Maybe you can elaborate about how to handle states X, Y... in your footnote 3. I just don't see how it's actually implemented. Or is it optional feature that will be provided (via hooks, maybe) by admin? These footnotes are not the important part (I wanted us to agree on the problem, and the ideas outlined with the footnotes are only an example that illustrates how a potential solution and the problem to be solved are described in relation to each other), but I'll give a shot anyway ;-) I am actually torn on how the names X, Y, etc. should be defined. One side of me wants to leave its computation entirely up to the server side. The client says Last time I talked with you asking for refs/heads/* and successfully updated from you, you told me to call that state X without knowing how X is computed, and then the server will update you and then tell you your state is now Y. That way, large hosting sites and server implementations can choose to implement it any way they like. On the other hand, we could rigidly define it, perhaps like this: - Imagine that you saved the output from ls-remote that is run against that server, limited to the refs hierarchy you are requesting, the last time you talked with it. - Concatenate the above to the list of patterns the client used to ask the refs. This step is optional. - E.g. if you are asking it for refs/heads/*, then we are talking something like this (illustrated with optional pattern in front): refs/heads/* 8004647... refs/heads/maint 7f4ba4b... refs/heads/master - Run SHA-1 hash over that. And that is the state name. I.e. if you as a client are doing [remote origin] fetch = refs/heads/*:refs/remotes/origin/* and if the only time your refs/remotes/origin/* hierarchy changes is when you fetch from there (which should be the norm), you can look into remote.origin.fetch refspec (to learn that refs/heads* is what you are asking) and your refs/remotes/origin/* refs (and reverse the mapping you make when you fetch to make them talk about refs/heads/* hierarchy on the server side), you can compute it locally. The latter will have one benefit over opaque thing the client does not know how to compute. Because I want us avoid sending unchanged refs over connection, but I do want to see the protocol has some validation mechanism built in, even if we go the latter client can compute what the state name ought to be route, I want the servrer to tell the client what to call that state. That way, the client side can tell when it goes out of sync for any reason and attempt to recover. Do we need to worry about load balancers? Unless you are allowing multiple backend servers to serve a same repository behind a set of load balancers in an inconsistent way (e.g. you push to one while I push to two and you fetch from one and you temporarily see my push but then my push will be rejected as conflicting and you fetch from one and now you see your push), I do not think there is anything you need to worry about them more than what you should be worrying about already. There would be a point where all backend servers would agree This is the set of values of these refs at some point (e.g. a majority of surviving servers vote to decide, laggers that later join the party will update to the concensus value before serving the end-user traffic), and they would not be showing half-updated values that haven't ratified by other servers to end users (otherwise they may end up showing reversion). - Is band #2 meant for human consumption, or do we expect the other end to interpret and act on it? If the former, would it make sense to send locale information from the client side and ask the server side to produce its output with _(message)? No producing _(...) is a bad idea. First the client has to verify placeholders and stuff, we can't just feed data from server straight to printf(). Producing _() could complicate server code a lot. And I don't like the idea of using client .po files to translate server strings. There could be custom strings added by admin, which are not available in client .po. String translation should happen at server side. What I meant to say was (1) the client says I want the human readable message in Vietnamese and (2) the server uses .po on _(message) in its code and send the result to sideband #2. There is no parsing, interpolation, or anything of that sort necessary on the server
Re: [RFC/PATCH 0/3] protocol v2
From: Junio C Hamano gits...@pobox.com I earlier said: So if we are going to discuss a new protocol, I'd prefer to see the discussion without worrying too much about how to inter-operate with the current vintage of Git. It is no longer an interesting problem, as we know how to solve it with minimum risk. Instead, I'd like to see us design the new protocol in such a way that it is in-line upgradable without repeating our past mistakes. And I am happy to see that people are interested in discussing the design of new protocols. But after seeing the patches Stefan sent out, I think we are risking of losing sight of what we are trying to accomplish. We do not want something that is merely new. That is why I wanted people to think about, discuss and agree on what limitation of the current protocol has that are problematic (limitations that are not problematic are not something we do not need to address [*1*]), so that we can design the new thing without reintroducing the same limitation. To remind people, here is a reprint of the draft I sent out earlier in $gmane/264000. The current protocol has the following problems that limit us: - It is not easy to make it resumable, because we recompute every time. This is especially problematic for the initial fetch aka clone as we will be talking about a large transfer [*1*]. - The protocol extension has a fairly low length limit [*2*]. - Because the protocol exchange starts by the server side advertising all its refs, even when the fetcher is interested in a single ref, the initial overhead is nontrivial, especially when you are doing a small incremental update. The worst case is an auto-builder that polls every five minutes, even when there is no new commits to be fetched [*3*]. - Because we recompute every time, taking into account of what the fetcher has, in addition to what the fetcher obtained earlier from us in order to reduce the transferred bytes, the payload for incremental updates become tailor-made for each fetch and cannot be easily reused [*4*]. I'd like to see a new protocol that lets us overcome the above limitations (did I miss others? I am sure people can help here) sometime this year. Unfortunately, nobody seems to want to help us by responding to did I miss others? RFH, here are a few more from me. OK, maybe not exactly about protocol, but a possible option would be the ability to send the data as a bundle or multi-bundles; Or perhasps as an archive, zip, or tar. Data can then be exchanged across an airgap or pigeon mail. The airgap scenario is likely a real case that's not directly prominent at the moment, just because it's not tha direct. There has been discussion about servers having bundles available for clones, but with a multi-bundle, one could package up a large bundle (months) and an increment (weeks, and then days), before an final easy to pack last few hours. That would be a server work trade-off, and support a CDN view if needed. If such an approach was reasonable would the protocol support it? etc. Just a thought while reading... - The semantics of the side-bands are unclear. - Is band #2 meant only for progress output (I think the current protocol handlers assume that and unconditionally squelch it under --quiet)? Do we rather want a dedicated progress and error message sidebands instead? - Is band #2 meant for human consumption, or do we expect the other end to interpret and act on it? If the former, would it make sense to send locale information from the client side and ask the server side to produce its output with _(message)? - The semantics of packet_flush() is suboptimal, and this shortcoming seeps through to the protocol mapped to the smart-HTTP transport. Originally, packet_flush() was meant as Here is an end of one logical section of what I am going to speak., hinting that it might be a good idea for the underlying implementation to hold the packets up to that point in-core and then write(2) them all out (i.e. flush) to the file descriptor only when we handle packet_flush(). It never meant Now I am finished speaking for now and it is your turn to speak. But because HTTP is inherently a ping-pong protocol where the requestor at one point stops talking and lets the responder speak, the code to map our protocol to the smart HTTP transport made the packet_flush() boundary as Now I am done talking, it is my turn to listen. We probably need two kinds of packet_flush(). When a requestor needs to say two or more logical groups of things before telling the other side Now I am done talking; it is your turn., we need some marker (i.e. the original meaning of packet_flush()) at the end of these logical groups. And in order to be able to say Now I am done saying everything I need to say at this point for you to respond to me. It is your turn., we need another kind of marker.
Re: [RFC/PATCH 0/3] protocol v2
On Sun, 1 Mar 2015, Junio C Hamano wrote: and if the only time your refs/remotes/origin/* hierarchy changes is when you fetch from there (which should be the norm), you can look into remote.origin.fetch refspec (to learn that refs/heads* is what you are asking) and your refs/remotes/origin/* refs (and reverse the mapping you make when you fetch to make them talk about refs/heads/* hierarchy on the server side), you can compute it locally. The latter will have one benefit over opaque thing the client does not know how to compute. Because I want us avoid sending unchanged refs over connection, but I do want to see the protocol has some validation mechanism built in, even if we go the latter client can compute what the state name ought to be route, I want the servrer to tell the client what to call that state. That way, the client side can tell when it goes out of sync for any reason and attempt to recover. how would these approaches be affected by a client that is pulling from different remotes into one local repository? For example, pulling from the main kernel repo and from the -stable repo. David Lang -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 0/3] protocol v2
I earlier said: So if we are going to discuss a new protocol, I'd prefer to see the discussion without worrying too much about how to inter-operate with the current vintage of Git. It is no longer an interesting problem, as we know how to solve it with minimum risk. Instead, I'd like to see us design the new protocol in such a way that it is in-line upgradable without repeating our past mistakes. And I am happy to see that people are interested in discussing the design of new protocols. But after seeing the patches Stefan sent out, I think we are risking of losing sight of what we are trying to accomplish. We do not want something that is merely new. That is why I wanted people to think about, discuss and agree on what limitation of the current protocol has that are problematic (limitations that are not problematic are not something we do not need to address [*1*]), so that we can design the new thing without reintroducing the same limitation. To remind people, here is a reprint of the draft I sent out earlier in $gmane/264000. The current protocol has the following problems that limit us: - It is not easy to make it resumable, because we recompute every time. This is especially problematic for the initial fetch aka clone as we will be talking about a large transfer [*1*]. - The protocol extension has a fairly low length limit [*2*]. - Because the protocol exchange starts by the server side advertising all its refs, even when the fetcher is interested in a single ref, the initial overhead is nontrivial, especially when you are doing a small incremental update. The worst case is an auto-builder that polls every five minutes, even when there is no new commits to be fetched [*3*]. - Because we recompute every time, taking into account of what the fetcher has, in addition to what the fetcher obtained earlier from us in order to reduce the transferred bytes, the payload for incremental updates become tailor-made for each fetch and cannot be easily reused [*4*]. I'd like to see a new protocol that lets us overcome the above limitations (did I miss others? I am sure people can help here) sometime this year. Unfortunately, nobody seems to want to help us by responding to did I miss others? RFH, here are a few more from me. - The semantics of the side-bands are unclear. - Is band #2 meant only for progress output (I think the current protocol handlers assume that and unconditionally squelch it under --quiet)? Do we rather want a dedicated progress and error message sidebands instead? - Is band #2 meant for human consumption, or do we expect the other end to interpret and act on it? If the former, would it make sense to send locale information from the client side and ask the server side to produce its output with _(message)? - The semantics of packet_flush() is suboptimal, and this shortcoming seeps through to the protocol mapped to the smart-HTTP transport. Originally, packet_flush() was meant as Here is an end of one logical section of what I am going to speak., hinting that it might be a good idea for the underlying implementation to hold the packets up to that point in-core and then write(2) them all out (i.e. flush) to the file descriptor only when we handle packet_flush(). It never meant Now I am finished speaking for now and it is your turn to speak. But because HTTP is inherently a ping-pong protocol where the requestor at one point stops talking and lets the responder speak, the code to map our protocol to the smart HTTP transport made the packet_flush() boundary as Now I am done talking, it is my turn to listen. We probably need two kinds of packet_flush(). When a requestor needs to say two or more logical groups of things before telling the other side Now I am done talking; it is your turn., we need some marker (i.e. the original meaning of packet_flush()) at the end of these logical groups. And in order to be able to say Now I am done saying everything I need to say at this point for you to respond to me. It is your turn., we need another kind of marker. [Footnote] *1* For example, if we were working off of what mistakes do we want to correct? list, I do not think we would have seen capabilities have to be only on the first packet or lets allow new daemon to read extra cruft at the end of the first request. I do not think I heard why it is a problem that the daemon cannot pass extra info to invoked program in the first place. There might be a valid reason, but then that needs to be explained, understood and agreed upon and should be part of an updated what are we fixing? list. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 0/3] protocol v2
On Sun, Mar 1, 2015 at 3:41 PM, Junio C Hamano gits...@pobox.com wrote: - Because the protocol exchange starts by the server side advertising all its refs, even when the fetcher is interested in a single ref, the initial overhead is nontrivial, especially when you are doing a small incremental update. The worst case is an auto-builder that polls every five minutes, even when there is no new commits to be fetched [*3*]. Maybe you can elaborate about how to handle states X, Y... in your footnote 3. I just don't see how it's actually implemented. Or is it optional feature that will be provided (via hooks, maybe) by admin? Do we need to worry about load balancers? Is it meant to address the excessive state transfer due to stateless nature of smart-http? I'd like to see a new protocol that lets us overcome the above limitations (did I miss others? I am sure people can help here) sometime this year. Unfortunately, nobody seems to want to help us by responding to did I miss others? RFH, here are a few more from me. Heh.. I did think about it, but I didn't see anything worth mentioning.. - The semantics of the side-bands are unclear. - Is band #2 meant only for progress output (I think the current protocol handlers assume that and unconditionally squelch it under --quiet)? Do we rather want a dedicated progress and error message sidebands instead? - Is band #2 meant for human consumption, or do we expect the other end to interpret and act on it? If the former, would it make sense to send locale information from the client side and ask the server side to produce its output with _(message)? No producing _(...) is a bad idea. First the client has to verify placeholders and stuff, we can't just feed data from server straight to printf(). Producing _() could complicate server code a lot. And I don't like the idea of using client .po files to translate server strings. There could be custom strings added by admin, which are not available in client .po. String translation should happen at server side. If we want error messages to be handled by machine as well, just add a result code at the beginning, like ftp, http, ... do. Hmm.. this could be the reason to separate progress and error messages. -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 0/3] protocol v2
On Sun, 1 Mar 2015, Stefan Beller wrote: The way I understand Junio here is to have predefined points which makes it easier to communicate. There are lots of clients and they usually want to catch up a different amount of commits, so we need to recompute it all the time. The idea is then to compute a small pack from the original point to one of these predefined points. So a conversion might look like: Client: My newest commit is dated 2014-11-17. Server: ok here is a pack from 2014-11-17 until 2014-12-01 and then I have prepared packs I sent out all the time of 2014-12 and 2015-01 and 2015-02 and then there will be another custom pack for you describing changes of 2015-02-01 until now. Mind that I choose dates instead of arbitrary sha1 values as I feel that explains the point better, the packs in between are precomputed because many clients need them. Personally I don't buy that idea, because it produces a lot of question, like how large should these packs be? Depending on time or commit counts? I think this is going to depend on the project in question. I think that doing this based on public tags makes lots of sense. The precomputed packs should also change over time. For example, with the linux kernel, as each -rc is released, there will be a lot of people wanting to upgrade from a prior -rc, so having a pack for each of these is probably worthwhile. You probably also want a precomputed pack to move from some of the -rc releases to the final release. And then a single pack to move from the prior final release to the newest one. There may also me a resons to make a pack that jumps several releases to go from one LTS kernel to the next. Exactly what precomputed packs make sense, and how large the packs should be is going to be _very_ dependent on the update patterns of users. The only people who can decide exactly what packs they should use are the admins of the systems, and should be based on their logs of what requests are being made. I can see the git project creating scripts to analyze the logs of client connections to make recommendations on what packs would be useful to have pre-generated, ideally ordered by how much computation they would save (and the amount of disk space required to hold the packs, and then the admin of the site can indicate where they want the cutoff to be. Some extremely busy sites may have a lot of disk space compared to CPU and be willing to have lots of packs around, others are less busy and will only want to keep a few around. David Lang -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 0/3] protocol v2
David Lang da...@lang.hm writes: how would these approaches be affected by a client that is pulling from different remotes into one local repository? For example, pulling from the main kernel repo and from the -stable repo. David Lang As I said in $gmane/264000, which the above came from: Note that the above would work if and only if we accept that it is OK to send objects between the remote tracking branches the fetcher has (i.e. the objects it last fetched from the server) and the current tips of branches the server has, without optimizing by taking into account that some commits in that set may have already been obtained by the fetcher from a third-party. The scheme tries go gain by reducing the ref advertisement cost by sacrificing the optimization opportunity when you fetch from updated Linus's tree after having fetched from a recent next tree, the latter of which may have contained a lot of objects that went to the Linus's tree since you fetched from Linus's the last time. The current protocol, by negotiating what you have (including the objects you obtained from sideways via 'next') with the Linus's tree, allows the server to compute a minimum packfile customized just for you. By trading that off with everybody that follow this repository will get the same set of packfiles in sequence trickled into his repository model, it would instead allow the server to prepare these packfiles thousands of clients that follow Linus's tree will want only once. The client-server pair may want to have a negociation mechanism (e.g. I may have many objects I fetched from sideways, give me minimum pack that is customized for me by spending cycles---I am willing to wait until you finish computing it vs I am just following along and not doing anything fancy, just give me the same thing as everybody else) to select what optimization they want to use. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 0/3] protocol v2
Stefan Beller sbel...@google.com writes: A race condition may be a serious objection then? Once people believe the refs can scale fairly well they will use it, which means blasting the ref advertisement will become very worse over time. I think we are already in agreement about that case: A misdetected case between (new client, new server) pair might go like this: - new client connects and sends that no-op. - new server accepts the connection, but that no-op probe has not arrived yet.. It misdetects the other side as a v1 client and it starts blasting the ref advertisement. - new client notices that the ref advertisement has the capability bit and the server is capable of v2 protocol. it waits until the server sends sorry, I misdetected message. - new server eventually notices the no-op probe while blasting the ref advertisement and it can stop in the middle. hopefully this can happen after only sending a few kilobytes among megabytes of ref advertisement data ;-). The server sends sorry, I misdetected message to synchronise. - both sides happily speak v2 from here on. However, I do not think it needs to become worse over time, because we can change and adjust as the user population and their use patterns evolve. For example, you can introduce a small delay before the new versions of server starts the v1 advertisement, and make that delay longer and longer over time, as the population of v1-only clients go down, for example. Difficulty (see J6t's comment) in other implementations may be a more important roadblocks. It seems, however, that our current thinking is that it is OK to do the allow new v1 clients to notice the availabilty of v2 servers, so that they can talk v2 the next time thing, so my preference is to throw this client first and let server notice into maybe doable but not our first choice bin, at least for now. Thanks. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 0/3] protocol v2
On Sun, Mar 1, 2015 at 3:32 AM, Duy Nguyen pclo...@gmail.com wrote: On Sun, Mar 1, 2015 at 3:41 PM, Junio C Hamano gits...@pobox.com wrote: - Because the protocol exchange starts by the server side advertising all its refs, even when the fetcher is interested in a single ref, the initial overhead is nontrivial, especially when you are doing a small incremental update. The worst case is an auto-builder that polls every five minutes, even when there is no new commits to be fetched [*3*]. Maybe you can elaborate about how to handle states X, Y... in your footnote 3. I just don't see how it's actually implemented. Or is it optional feature that will be provided (via hooks, maybe) by admin? Do we need to worry about load balancers? Is it meant to address the excessive state transfer due to stateless nature of smart-http? The way I understand Junio here is to have predefined points which makes it easier to communicate. There are lots of clients and they usually want to catch up a different amount of commits, so we need to recompute it all the time. The idea is then to compute a small pack from the original point to one of these predefined points. So a conversion might look like: Client: My newest commit is dated 2014-11-17. Server: ok here is a pack from 2014-11-17 until 2014-12-01 and then I have prepared packs I sent out all the time of 2014-12 and 2015-01 and 2015-02 and then there will be another custom pack for you describing changes of 2015-02-01 until now. Mind that I choose dates instead of arbitrary sha1 values as I feel that explains the point better, the packs in between are precomputed because many clients need them. Personally I don't buy that idea, because it produces a lot of question, like how large should these packs be? Depending on time or commit counts? The idea I'd rather favor (I am repeating myself from another post, but maybe a bit clearer now): Client: The last time I asked for refs/heads/* and I got a refs advertisement hashing to $SHA1 Server: Ok, here is the diff from that old ref advertisement to the current refs advertisement. I realize that these two ideas are not contradicting each other, but could rather help each other as they are orthogonal to each other. One is about refs advertising while the other is about object transmission. I'd like to see a new protocol that lets us overcome the above limitations (did I miss others? I am sure people can help here) sometime this year. Unfortunately, nobody seems to want to help us by responding to did I miss others? RFH, here are a few more from me. Heh.. I did think about it, but I didn't see anything worth mentioning.. - The semantics of the side-bands are unclear. - Is band #2 meant only for progress output (I think the current protocol handlers assume that and unconditionally squelch it under --quiet)? Do we rather want a dedicated progress and error message sidebands instead? - Is band #2 meant for human consumption, or do we expect the other end to interpret and act on it? If the former, would it make sense to send locale information from the client side and ask the server side to produce its output with _(message)? No producing _(...) is a bad idea. First the client has to verify placeholders and stuff, we can't just feed data from server straight to printf(). Producing _() could complicate server code a lot. And I don't like the idea of using client .po files to translate server strings. There could be custom strings added by admin, which are not available in client .po. String translation should happen at server side. If we want error messages to be handled by machine as well, just add a result code at the beginning, like ftp, http, ... do. Hmm.. this could be the reason to separate progress and error messages. -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 0/3] protocol v2
On Sat, Feb 28, 2015 at 6:05 AM, Junio C Hamano gits...@pobox.com wrote: Just for fun, I was trying to see if there is a hole in the current protocol that allows a new client to talk a valid v1 protocol exchange with existing, deployed servers without breaking, while letting it to know a new server that it is a new client and it does not want to get blasted by megabytes of ref advertisement. ... The idea is to find a request that can be sent as the first utterance by the client to an old server that is interpreted as a no-op and can be recognised by a new server as such a no-op probe. ... And there _is_ a hole ;-). The parsing of shallow object name is done in such a way that an object name that passes get_sha1_hex() that results in a NULL return from parse_object() is _ignored_. So a new client can use shallow 0{40} as a no-op probe. ... I am _not_ proposing that we should go this route, at least not yet. I am merely pointing out that an in-place sidegrade from v1 to a protocol that avoids the megabyte-advertisement-at-the-beginning seems to be possible, as a food for thought. There may be another hole, if we send want empty-tree, it looks like it will go through without causing errors. It's not exactly no-op because an empty tree object will be bundled in result pack. But that makes no difference in pratice. I didn't verify this though. In the spirit of fun, I looked at how jgit handles this shallow line (because this is more like an implementation hole than protocol hole). I don't think jgit would ignore 0{40} the way C Git does. This SHA-1 will end up in shallowCommits set in upload-pack, then will be parsed as a commit. But even if the parsing is through, a non-empty shallowCommits set would disable pack bitmap. Fun is usually short.. PS. heh my want empty-tree hole is probably impl-specific too. Not sure if jgit also keeps empty tree available even if it does not exist. -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 0/3] protocol v2
+git@vger.kernel.org On Thu, Feb 26, 2015 at 5:42 PM, Duy Nguyen pclo...@gmail.com wrote: https://github.com/pclouds/git/commits/uploadpack2 I rebased your branch, changed the order of commits slightly and started to add some. they are found at https://github.com/stefanbeller/git/commits/uploadpack2 I think the very first patch series which I try to polish now will just try to move the capabilities negotiation into the beginning of the exchange. Any 'real' changes such as adding capabilities to the protocol to not have all the refs advertised will come in a later series. Thanks for your help! Stefan -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 0/3] protocol v2
Junio C Hamano gits...@pobox.com writes: I do not think v1 can be fixed by send one ref with capability, newer client may respond immediately so we can stop enumerating remaining refs and older one will get stuck so we can have a timeout to see if the connection is from the newer one, and send the rest for the older client, because anything that involves such a timeout would not reliably work over WAN. Just for fun, I was trying to see if there is a hole in the current protocol that allows a new client to talk a valid v1 protocol exchange with existing, deployed servers without breaking, while letting it to know a new server that it is a new client and it does not want to get blasted by megabytes of ref advertisement. The idea is to find a request that can be sent as the first utterance by the client to an old server that is interpreted as a no-op and can be recognised by a new server as such a no-op probe. If there is such a request, then the exchange can go like this with (new client, old server) pair: - new client connects and sends that no-op. - old server starts blasting the ref advertisement - new client monitors and notices that the other side started speaking, and the ref advertisement lacks the capability bit for new protocol. - new client accepts the ref advertisement and does the v1 protocol thing as a follow-up to what it already sent. As long as the first one turns out to be no-op for old server, we would be OK. On the other hand, (new client, new server) pair would go like this: - new client connects and sends that no-op. - new server notices that there is already a data from the client, and recognises the no-op probe. - new server gives the first v2 protocol message with capability. - new client notices thqat the other side started speaking, and it is the first v2 protocol message. - both sides happily speak v2. and (old client, new server) pair would go like this: - old client connects and waits. - new server notices that there is *no* data sent from the client and decides that the other side is a v1 client. It starts blasting the ref advertisement. - both sides happily speak v1 from here on. A misdetected case between (new client, new server) pair might go like this: - new client connects and sends that no-op. - new server accepts the connection, but that no-op probe has not arrived yet.. It misdetects the other side as a v1 client and it starts blasting the ref advertisement. - new client notices that the ref advertisement has the capability bit and the server is capable of v2 protocol. it waits until the server sends sorry, I misdetected message. - new server eventually notices the no-op probe while blasting the ref advertisement and it can stop in the middle. hopefully this can happen after only sending a few kilobytes among megabytes of ref advertisement data ;-). The server sends sorry, I misdetected message to synchronise. - both sides happily speak v2 from here on. So the topic of this exercise (just for fun) is to see if there is such a no-op request the client side can send as the first thing for probing. On the fetch side, the first response upload-pack expects are one of: - want followed by an object name. - shallow followed by an object name. - deepen followed by a positive integer. And there _is_ a hole ;-). The parsing of shallow object name is done in such a way that an object name that passes get_sha1_hex() that results in a NULL return from parse_object() is _ignored_. So a new client can use shallow 0{40} as a no-op probe. It appears that on the push side, there is a similar hole that can be used. receive-pack expects either shallow , push-cert or the refname updates (i.e. two [0-9a-f]{40} followed by a refname); the parsing of shallow is not as loose as the fetch side in that using a shallow 0{40} as a no-op probe will end up causing prepare_shallow_info() sift the 0{40} object name into theirs, but I think it will be ignored at the end as unreachable cruft without causing harm. I am _not_ proposing that we should go this route, at least not yet. I am merely pointing out that an in-place sidegrade from v1 to a protocol that avoids the megabyte-advertisement-at-the-beginning seems to be possible, as a food for thought. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 0/3] protocol v2
On Fri, Feb 27, 2015 at 4:07 PM, Duy Nguyen pclo...@gmail.com wrote: There may be another hole, if we send want empty-tree, it looks like it will go through without causing errors. It's not exactly no-op because an empty tree object will be bundled in result pack. But that makes no difference in pratice. I didn't verify this though. In addition to that's not a no-op problem, unless the old server has a ref that has an emtpy tree at its tip, such a fetch request will be rejected, unless the server is configured to serve any object, no? If your new server does have a ref that points at an empty tree, a client may request you to send that, but this is not a problem, because the new server can tell if the client is sending it as a no-op probe or a serious request by looking at its capability request. A serious old client will not tell you that he is new, a probing new client does, and a serious new client does. So your new server can tell and will not be confused. as a commit. But even if the parsing is through, a non-empty shallowCommits set would disable pack bitmap. Performance penalty is fine. Over time we would upgrade and the point of the exercise is not to cause the old-new or new-old pair to die but keep talking the old protocol and getting correct results. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 0/3] protocol v2
On Fri, Feb 27, 2015 at 3:44 PM, Stefan Beller sbel...@google.com wrote: On Fri, Feb 27, 2015 at 3:05 PM, Junio C Hamano gits...@pobox.com wrote: I am _not_ proposing that we should go this route, at least not yet. I am merely pointing out that an in-place sidegrade from v1 to a protocol that avoids the megabyte-advertisement-at-the-beginning seems to be possible, as a food for thought. This is a fun thing indeed, though I'd personally feel uneasy with such a probe as a serious proposal. (Remember somebody 10 years from now wants to enjoy reading the source code). That cannot be a serious objection, once you realize that NUL + capability was exactly the same kind of yes, we have a hole to allow up customize the protocol. The code to do so may not be pretty, but the code to implement ended up being reasonably clean with parse_feature_request() and friends. After all we live in a real world ;-) -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 0/3] protocol v2
On Fri, Feb 27, 2015 at 4:33 PM, Junio C Hamano gits...@pobox.com wrote: On Fri, Feb 27, 2015 at 3:44 PM, Stefan Beller sbel...@google.com wrote: On Fri, Feb 27, 2015 at 3:05 PM, Junio C Hamano gits...@pobox.com wrote: I am _not_ proposing that we should go this route, at least not yet. I am merely pointing out that an in-place sidegrade from v1 to a protocol that avoids the megabyte-advertisement-at-the-beginning seems to be possible, as a food for thought. This is a fun thing indeed, though I'd personally feel uneasy with such a probe as a serious proposal. (Remember somebody 10 years from now wants to enjoy reading the source code). That cannot be a serious objection, once you realize that NUL + capability was exactly the same kind of yes, we have a hole to allow up customize the protocol. The code to do so may not be pretty, but the code to implement ended up being reasonably clean with parse_feature_request() and friends. After all we live in a real world ;-) - new server accepts the connection, but that no-op probe has not arrived yet.. It misdetects the other side as a v1 client and it starts blasting the ref advertisement. A race condition may be a serious objection then? Once people believe the refs can scale fairly well they will use it, which means blasting the ref advertisement will become very worse over time. I'll try to present a 'client asks for options first out of band' instead of the way you describe. Also we should not rely on having holes here and there. (We might run out of holes over time), so I'd rather have the capabilities presented at first which rather opens new holes instead of closing old ones. (assuming we'll never run into megabytes of capabilities over time to have the same trouble again ;) -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 0/3] protocol v2
On Fri, Feb 27, 2015 at 3:05 PM, Junio C Hamano gits...@pobox.com wrote: Junio C Hamano gits...@pobox.com writes: I do not think v1 can be fixed by send one ref with capability, newer client may respond immediately so we can stop enumerating remaining refs and older one will get stuck so we can have a timeout to see if the connection is from the newer one, and send the rest for the older client, because anything that involves such a timeout would not reliably work over WAN. Just for fun, I was trying to see if there is a hole in the current protocol that allows a new client to talk a valid v1 protocol exchange with existing, deployed servers without breaking, while letting it to know a new server that it is a new client and it does not want to get blasted by megabytes of ref advertisement. The idea is to find a request that can be sent as the first utterance by the client to an old server that is interpreted as a no-op and can be recognised by a new server as such a no-op probe. If there is such a request, then the exchange can go like this with (new client, old server) pair: - new client connects and sends that no-op. - old server starts blasting the ref advertisement - new client monitors and notices that the other side started speaking, and the ref advertisement lacks the capability bit for new protocol. - new client accepts the ref advertisement and does the v1 protocol thing as a follow-up to what it already sent. As long as the first one turns out to be no-op for old server, we would be OK. On the other hand, (new client, new server) pair would go like this: - new client connects and sends that no-op. - new server notices that there is already a data from the client, and recognises the no-op probe. - new server gives the first v2 protocol message with capability. - new client notices thqat the other side started speaking, and it is the first v2 protocol message. - both sides happily speak v2. and (old client, new server) pair would go like this: - old client connects and waits. - new server notices that there is *no* data sent from the client and decides that the other side is a v1 client. It starts blasting the ref advertisement. - both sides happily speak v1 from here on. A misdetected case between (new client, new server) pair might go like this: - new client connects and sends that no-op. - new server accepts the connection, but that no-op probe has not arrived yet.. It misdetects the other side as a v1 client and it starts blasting the ref advertisement. - new client notices that the ref advertisement has the capability bit and the server is capable of v2 protocol. it waits until the server sends sorry, I misdetected message. - new server eventually notices the no-op probe while blasting the ref advertisement and it can stop in the middle. hopefully this can happen after only sending a few kilobytes among megabytes of ref advertisement data ;-). The server sends sorry, I misdetected message to synchronise. - both sides happily speak v2 from here on. So the topic of this exercise (just for fun) is to see if there is such a no-op request the client side can send as the first thing for probing. On the fetch side, the first response upload-pack expects are one of: - want followed by an object name. - shallow followed by an object name. - deepen followed by a positive integer. And there _is_ a hole ;-). The parsing of shallow object name is done in such a way that an object name that passes get_sha1_hex() that results in a NULL return from parse_object() is _ignored_. So a new client can use shallow 0{40} as a no-op probe. It appears that on the push side, there is a similar hole that can be used. receive-pack expects either shallow , push-cert or the refname updates (i.e. two [0-9a-f]{40} followed by a refname); the parsing of shallow is not as loose as the fetch side in that using a shallow 0{40} as a no-op probe will end up causing prepare_shallow_info() sift the 0{40} object name into theirs, but I think it will be ignored at the end as unreachable cruft without causing harm. I am _not_ proposing that we should go this route, at least not yet. I am merely pointing out that an in-place sidegrade from v1 to a protocol that avoids the megabyte-advertisement-at-the-beginning seems to be possible, as a food for thought. This is a fun thing indeed, though I'd personally feel uneasy with such a probe as a serious proposal. (Remember somebody 10 years from now wants to enjoy reading the source code). So let's keep the idea around if we don't find another solution. As far as I can tell we have * native git protocol (git daemon) * ssh * http(s) * ftp (deprecated!) * rsync(deprecated) For both native git as well as
Re: [RFC/PATCH 0/3] protocol v2
On Thu, Feb 26, 2015 at 2:15 AM, Duy Nguyen pclo...@gmail.com wrote: On Thu, Feb 26, 2015 at 2:31 PM, Stefan Beller sbel...@google.com wrote: On Wed, Feb 25, 2015 at 10:04 AM, Junio C Hamano gits...@pobox.com wrote: Duy Nguyen pclo...@gmail.com writes: On Wed, Feb 25, 2015 at 6:37 AM, Stefan Beller sbel...@google.com wrote: I can understand, that we maybe want to just provide one generic version 2 of the protocol which is an allrounder not doing bad in all of these aspects, but I can see usecases of having the desire to replace the wire protocol by your own implementation. To do so we could try to offer an API which makes implementing a new protocol somewhat easy. The current state of affairs is not providing this flexibility. I think we are quite flexible after initial ref advertisement. Yes, that is exactly where my I am not convinced comes from. We are not. (not really at least). We can tune some parameters or change the behavior slightly, but we cannot fix core assumptions made when creating v2 protocol. This you can see when when talking about v1 as well: we cannot fix any wrongdoings of v1 now by adding another capability. Step 1 then should be identifying these wrongdoings and assumptions. So I think one of the key assumption was to not have many refs to advertise, and advertising the refs is fine under that assumption. So from my point of view it is hard to change the general We can really go wild with these capabilities. The only thing that can't be changed is perhaps sending the first ref. I don't know whether we can accept a dummy first ref... After that point, you can turn the protocol upside down because both client and server know what it would be. So the way I currently envision (the transition to and) version 2 of the protocol: First connection (using the protocol as of now): Server: Here are all the refs and capabilities I can offer. The capabilities include not-send-refs-first (aka version2) Client: Ok, I'll store not-send-refs-first for next time. Now we will continue with these options: For now we continue using the current protocol and I want to update the master branch. Server: Ok here is a pack file, and then master advances $SHA1..$SHA1 Client: ok, thanks, bye For the next connection I have different ideas: Client thinks v2 is supported, so it talks first: Last time we talked your capabilities hashed to $SHA1, is that still correct? Server: yes it is # In the above roundtrip we would have a new key assumption that the capabilities # don't change often. Having push-certs enabled, this is invalid of today. Hover this # could be implemented with very low bandwidth usage # The alternative path would be: # Server: No my new capabilities are: Client: Ok I want to update all of refs/heads/{master,next,pu}. My last fetch was yesterday at noon. Server: Let me check the ref logs for these refs. Here is a packfile of length 1000 bytes: binary gibberish {master, next} did not update since yesterday noon, pu updates from A..B Client: ok, thanks, bye Another approach would be this: Client thinks v2 is supported, so it talks first: Last time we talked you sent me refs advertisement including capabilities which hash to $SHA1. Server: I see, I have stored that. Now that time has advanced there are a few differences, here is a diff of the refs advertisement: * b3a551adf53c224b04c40f05b72a8790807b3138 HEAD\0 capabilities * b3a551adf53c224b04c40f05b72a8790807b3138 refs/heads/master - 24ca137a384aa1ac5a776eddaf35bb820fc6f6e6 refs/heads/tmp-fix + 1da8335ad5d0e46062a929ba6481bbbe35c8eef0 refs/pull/123/head Note that I do not include changed lines as +one line and - one line as you know what the line was by your given $SHA1, so changed lines are marked with *, while lines starting with '-' indicate deleted refs and '+' indicate new refs. Client: I see, I can reconstruct the refs advertisement. Now we can continue talking as we always talked using v1 protocol. So from my point of view we don't waste resources when having an advertisement of possible protocols instead of a boolean flag indicating v2 is supported. There is really not much overhead in coding nor bytes exchanged on the wire, so why not accept stuff that comes for free (nearly) ? You realize you're advertising v2 as a new capability, right? Instead of defining v2 feature set then advertise v2, people could simply add new features directly. I don't see v2 (at least with these patches) adds any value. Yes, we can go wild after the refs advertisement, but that is not the critical problem as it works ok-ish? The problem I see for now is the huge refs advertisement even before the capabilities are exchanged. So maybe I do not want to talk about v2 but about changing the current protocol to first talk about the capabilities in the first round trip, not sure if we ever want to attach data into the first RT as it may explode as soon as that
Re: [RFC/PATCH 0/3] protocol v2
On Thu, Feb 26, 2015 at 2:31 PM, Stefan Beller sbel...@google.com wrote: On Wed, Feb 25, 2015 at 10:04 AM, Junio C Hamano gits...@pobox.com wrote: Duy Nguyen pclo...@gmail.com writes: On Wed, Feb 25, 2015 at 6:37 AM, Stefan Beller sbel...@google.com wrote: I can understand, that we maybe want to just provide one generic version 2 of the protocol which is an allrounder not doing bad in all of these aspects, but I can see usecases of having the desire to replace the wire protocol by your own implementation. To do so we could try to offer an API which makes implementing a new protocol somewhat easy. The current state of affairs is not providing this flexibility. I think we are quite flexible after initial ref advertisement. Yes, that is exactly where my I am not convinced comes from. We are not. (not really at least). We can tune some parameters or change the behavior slightly, but we cannot fix core assumptions made when creating v2 protocol. This you can see when when talking about v1 as well: we cannot fix any wrongdoings of v1 now by adding another capability. Step 1 then should be identifying these wrongdoings and assumptions. We can really go wild with these capabilities. The only thing that can't be changed is perhaps sending the first ref. I don't know whether we can accept a dummy first ref... After that point, you can turn the protocol upside down because both client and server know what it would be. So from my point of view we don't waste resources when having an advertisement of possible protocols instead of a boolean flag indicating v2 is supported. There is really not much overhead in coding nor bytes exchanged on the wire, so why not accept stuff that comes for free (nearly) ? You realize you're advertising v2 as a new capability, right? Instead of defining v2 feature set then advertise v2, people could simply add new features directly. I don't see v2 (at least with these patches) adds any value. I mean how do we know all the core assumptions made for v2 hold in the future? We don't. That's why I'd propose a plain and easy exchange at first stating the version to talk. And we already does that, except that we don't state what version (as a number) exactly, but what feature that version supports. The focus should be the new protocol at daemon.c and maybe remote-curl.c where we do know the current protocol is not flexible enough. -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 0/3] protocol v2
On Thu, Feb 26, 2015 at 12:13 PM, Junio C Hamano gits...@pobox.com wrote: I agree with the value assessment of these patches 98%, but these bits can be taken as the we have v2 server availble for you on the side, by the way hint you mentioned in the older thread, I think. The patches are not well polished (In fact they don't even compile :/), but I think they may demonstrate the ideas and though process. And as it turns out we'd not be following that spirit of ideas but rather want to have a dedicated v2. That said I did not want to spend lots of time to polish the patch for inclusion but rather to demonstrate ideas, which can be done with substantial less quality IMHO. Correct me if I am wrong here! Thanks, Stefan -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 0/3] protocol v2
On Thu, Feb 26, 2015 at 12:13 PM, Junio C Hamano gits...@pobox.com wrote: Duy Nguyen pclo...@gmail.com writes: Step 1 then should be identifying these wrongdoings and assumptions. We can really go wild with these capabilities. The only thing that can't be changed is perhaps sending the first ref. I don't know whether we can accept a dummy first ref... After that point, you can turn the protocol upside down because both client and server know what it would be. Yes, exactly. To up/down/side-grade from v1 is technically possible, but being technically possible is different from being sensible. The capability-based sidegrade does not solve the problem when the problem to be solved is that the server side needs to spend a lot of cycles and the network needs to carry megabytes of data before capability exchange happens. Yes, the newer server and the newer client can notice that the counterparty is new and start talking in new protocol (which may or may not benefit from already knowing the result of ref advertisement), but by the time that happens, the resource has already been spent and wasted. I do not think v1 can be fixed by send one ref with capability, newer client may respond immediately so we can stop enumerating remaining refs and older one will get stuck so we can have a timeout to see if the connection is from the newer one, and send the rest for the older client, because anything that involves such a timeout would not reliably work over WAN. You realize you're advertising v2 as a new capability, right? Instead of defining v2 feature set then advertise v2, people could simply add new features directly. I don't see v2 (at least with these patches) adds any value. I agree with the value assessment of these patches 98%, but these bits can be taken as the we have v2 server availble for you on the side, by the way hint you mentioned in the older thread, I think. And we already does that, except that we don't state what version (as a number) exactly, but what feature that version supports. The focus should be the new protocol at daemon.c and maybe remote-curl.c where we do know the current protocol is not flexible enough. The first thing the client tells the server is what service it requests. A request over git:// protocol is read by git daemon to choose which service to run, and it is read directly by the login shell if it comes over ssh:// protocol. There is nothing that prevents us from defining that service to be a generic git service, not upload-pack, archive, receive-pack. And the early protocol exchange, once git service is spawned, with the client can be what real services does the server end support? capability list responded with wow, you are new enough to support the 'trickle-pack' service---please connect me to it request. So I am not quite sure how to understand this input. I wonder if a high level test could look like the following, which just tests the workflow with git fetch, but not the internals. (Note: patch formatting may be broken as it's send via gmail web ui) ---8--- From: Stefan Beller sbel...@google.com Date: Thu, 26 Feb 2015 17:19:30 -0800 Subject: [PATCH] Propose new tests for transitioning to the new option transport.capabilitiesfirst Signed-off-by: Stefan Beller sbel...@google.com --- t/t5544-capability-handshake.sh | 81 + 1 file changed, 81 insertions(+) create mode 100755 t/t5544-capability-handshake.sh diff --git a/t/t5544-capability-handshake.sh b/t/t5544-capability-handshake.sh new file mode 100755 index 000..aa2b52d --- /dev/null +++ b/t/t5544-capability-handshake.sh @@ -0,0 +1,81 @@ +#!/bin/sh + +test_description='fetching from a repository using the capabilities first push option' + +. ./test-lib.sh + +mk_repo_pair () { + rm -rf workbench upstream + test_create_repo upstream + test_create_repo workbench + ( + cd upstream + git config receive.denyCurrentBranch warn + ) + ( + cd workbench + git remote add origin ../upstream + ) +} + +generate_commits_upstream () { + ( + cd upstream + echo more content file + git add file + git commit -a -m create a commit + ) +} + +# Compare the ref ($1) in upstream with a ref value from workbench ($2) +# i.e. test_refs second HEAD@{2} +test_refs () { + test $# = 2 + git -C upstream rev-parse --verify $1 expect + git -C workbench rev-parse --verify $2 actual + test_cmp expect actual +} + +test_expect_success 'transport.capabilitiesfirst is not overridden when set already' ' + mk_repo_pair + ( + cd workbench + git config transport.capabilitiesfirst 0 + git config --get transport.capabilitiesfirst 0 expected + ) + generate_commits_upstream + ( + cd workbench + git fetch --all + git config --get transport.capabilitiesfirst actual + test_cmp expected actual + ) +' + +test_expect_success 'enable transport by fetching from new server' ' + mk_repo_pair + ( + cd workbench + git fetch origin + ) +
Re: [RFC/PATCH 0/3] protocol v2
On Wed, Feb 25, 2015 at 6:37 AM, Stefan Beller sbel...@google.com wrote: I can understand, that we maybe want to just provide one generic version 2 of the protocol which is an allrounder not doing bad in all of these aspects, but I can see usecases of having the desire to replace the wire protocol by your own implementation. To do so we could try to offer an API which makes implementing a new protocol somewhat easy. The current state of affairs is not providing this flexibility. I think we are quite flexible after initial ref advertisement. After that point the client tells the server its capabilities and the server does the same for the client. Only shared features can be used. So if you want to add a new micro protocol for mobile, just add mobile capability to both client and server. A new implementation can support no capabililities and it should work fine with C Git (less efficient though, of course). And we have freedom to mix capabilities any way we want (it's harder to do when you have to follow v2, v2.1, v2.2...) -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 0/3] protocol v2
Duy Nguyen pclo...@gmail.com writes: On Wed, Feb 25, 2015 at 6:37 AM, Stefan Beller sbel...@google.com wrote: I can understand, that we maybe want to just provide one generic version 2 of the protocol which is an allrounder not doing bad in all of these aspects, but I can see usecases of having the desire to replace the wire protocol by your own implementation. To do so we could try to offer an API which makes implementing a new protocol somewhat easy. The current state of affairs is not providing this flexibility. I think we are quite flexible after initial ref advertisement. Yes, that is exactly where my I am not convinced comes from. After that point the client tells the server its capabilities and the server does the same for the client. Only shared features can be used. So if you want to add a new micro protocol for mobile, just add mobile capability to both client and server. A new implementation can support no capabililities and it should work fine with C Git (less efficient though, of course). And we have freedom to mix capabilities any way we want (it's harder to do when you have to follow v2, v2.1, v2.2...) -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 0/3] protocol v2
On Wed, Feb 25, 2015 at 10:04 AM, Junio C Hamano gits...@pobox.com wrote: Duy Nguyen pclo...@gmail.com writes: On Wed, Feb 25, 2015 at 6:37 AM, Stefan Beller sbel...@google.com wrote: I can understand, that we maybe want to just provide one generic version 2 of the protocol which is an allrounder not doing bad in all of these aspects, but I can see usecases of having the desire to replace the wire protocol by your own implementation. To do so we could try to offer an API which makes implementing a new protocol somewhat easy. The current state of affairs is not providing this flexibility. I think we are quite flexible after initial ref advertisement. Yes, that is exactly where my I am not convinced comes from. We are not. (not really at least). We can tune some parameters or change the behavior slightly, but we cannot fix core assumptions made when creating v2 protocol. This you can see when when talking about v1 as well: we cannot fix any wrongdoings of v1 now by adding another capability. So from my point of view we don't waste resources when having an advertisement of possible protocols instead of a boolean flag indicating v2 is supported. There is really not much overhead in coding nor bytes exchanged on the wire, so why not accept stuff that comes for free (nearly) ? I mean how do we know all the core assumptions made for v2 hold in the future? We don't. That's why I'd propose a plain and easy exchange at first stating the version to talk. Anyway what is the cost of a round trip time compared to the bytes on the wire? Usually the cost of bytes on the wire correlate with the latency anyway. (think mobile metered compared to corporate setting with low latency). That's why I'd rather optimize for used bandwidth than round trip times, but that may be just my personal perception of the internet. That's why I'd propose different protocols. Thanks, Stefan -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 0/3] protocol v2
On Mon, Feb 23, 2015 at 10:15 PM, Junio C Hamano gits...@pobox.com wrote: On Mon, Feb 23, 2015 at 8:02 PM, Duy Nguyen pclo...@gmail.com wrote: It's very hard to keep backward compatibility if you want to stop the initial ref adverstisement, costly when there are lots of refs. But we can let both protocols run in parallel, with the old one advertise the presence of the new one. Then the client could switch to new protocol gradually. This way new protocol could forget about backward compatibility. See http://thread.gmane.org/gmane.comp.version-control.git/215054/focus=244325 Yes, the whole thread is worth a read, but the approach suggested by that article $gmane/244325 is very good for its simplicity. The server end programs, upload-pack and receive-pack, need to only learn to advertise the availability of upload-pack-v2 and receive-pack-v2 services and the client side programs, fetch-pack and push-pack, need to only notice the advertisement and record the availability of v2 counterparts for the current remote *and* continue the exchange in v1 protocol. That way, there is very little risk for breaking anything. Right, I want to add this learn about v2 on the fly, continue as always to the protocol. So if we are going to discuss a new protocol, I'd prefer to see the discussion without worrying too much about how to inter-operate with the current vintage of Git. It is no longer an interesting problem, as we know how to solve it with minimum risk. Instead, I'd like to see us design the new protocol in such a way that it is in-line upgradable without repeating our past mistakes. I am *not* convinced that we want multiple suite of protocols that must be chosen from to suit the use pattern, as mentioned somewhere upthread, by the way. I do think it makes sense to have different protocols or different tunings of one protocol, because there are many different situations in which different metrics are the key metric. If you are on mobile, you'd possibly be billed by the bytes on the wire, so you want to have a protocol with as actual transport as possible and would maybe trade off transported bytes to lots of computational overhead. If you are in Australia (sorry downunder ;) or on satellite internet, you may care a lot about latency and roundtrip times. If you are in a corporate environment and just cloning from next door, you may want to have the overall process (compute+network+ local reconstruction) just be fast overall. I can understand, that we maybe want to just provide one generic version 2 of the protocol which is an allrounder not doing bad in all of these aspects, but I can see usecases of having the desire to replace the wire protocol by your own implementation. To do so we could try to offer an API which makes implementing a new protocol somewhat easy. The current state of affairs is not providing this flexibility. I think it would be not much overhead to have such flexibility when writing the actual code for the very little risk v2 update. So instead of advertising a boolean flag meaning This server/client speaks version2, we would rather send a list This server speaks v2,v1 and v-custom-optimized-for-high-latency. I started looking for academic literature if there are generic solutions to finding graph differences, but no real luck for adapting to our problem yet. Thanks for your input, Stefan -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 0/3] protocol v2
On Mon, Feb 23, 2015 at 8:02 PM, Duy Nguyen pclo...@gmail.com wrote: On Tue, Feb 24, 2015 at 10:12 AM, Stefan Beller sbel...@google.com wrote: One of the biggest problems of a new protocol would be deployment as the users probably would not care too deeply. It should just work in the sense that the user should not even sense that the protocol changed. Agreed. To do so we need to make sure the protocol is backwards compatible and works if an old client talks to a new server as well as the other way round. It's very hard to keep backward compatibility if you want to stop the initial ref adverstisement, costly when there are lots of refs. But we can let both protocols run in parallel, with the old one advertise the presence of the new one. That's what I actually meant, to have different versions out there, but maybe having the version as of now as the least common denominator such that it always works (albeit inefficient for many refs). Then the client could switch to new protocol gradually. This way new protocol could forget about backward compatibility. See http://thread.gmane.org/gmane.comp.version-control.git/215054/focus=244325 -- Duy I would add that upload-pack also advertises about the availability of upload-pack2 and the client may set the remote.*.useUploadPack2 to either yes or auto so next time upload-pack2 will be used. I had a similar thought, though I would not just restrict it to v2 this time, but I'd aim to make it possible to plug whatever protocol you want to. (Comparable to the SSL or ssh, it will always work, but as a proficient user you can spend lot's of time tweaking what you actually want, looking at tradeoffs of efficiency, security, convenience). -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 0/3] protocol v2
On Mon, Feb 23, 2015 at 8:02 PM, Duy Nguyen pclo...@gmail.com wrote: It's very hard to keep backward compatibility if you want to stop the initial ref adverstisement, costly when there are lots of refs. But we can let both protocols run in parallel, with the old one advertise the presence of the new one. Then the client could switch to new protocol gradually. This way new protocol could forget about backward compatibility. See http://thread.gmane.org/gmane.comp.version-control.git/215054/focus=244325 Yes, the whole thread is worth a read, but the approach suggested by that article $gmane/244325 is very good for its simplicity. The server end programs, upload-pack and receive-pack, need to only learn to advertise the availability of upload-pack-v2 and receive-pack-v2 services and the client side programs, fetch-pack and push-pack, need to only notice the advertisement and record the availability of v2 counterparts for the current remote *and* continue the exchange in v1 protocol. That way, there is very little risk for breaking anything. And the programs for new protocol exchange do not have to worry about having to talk with older counterparts and downgrading the protocol inline at all. As long as we learn from our past mistakes and make sure that the very initial exchange will be kept short (one of the items in the list of limitations, $gmane/264000), future servers and clients can upgrade the protocol they talk inline by probing capabilities, just like the current protocol allows them to choose extensions. The biggest issue in the current protocol is not who speaks first (that is merely one aspect) but what is spoken first, iow, one side blinly gives a large message as the first thing, which cannot be squelched by capability exchange. So if we are going to discuss a new protocol, I'd prefer to see the discussion without worrying too much about how to inter-operate with the current vintage of Git. It is no longer an interesting problem, as we know how to solve it with minimum risk. Instead, I'd like to see us design the new protocol in such a way that it is in-line upgradable without repeating our past mistakes. I am *not* convinced that we want multiple suite of protocols that must be chosen from to suit the use pattern, as mentioned somewhere upthread, by the way. Thanks. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 0/3] protocol v2
On Tue, Feb 24, 2015 at 10:12 AM, Stefan Beller sbel...@google.com wrote: One of the biggest problems of a new protocol would be deployment as the users probably would not care too deeply. It should just work in the sense that the user should not even sense that the protocol changed. Agreed. To do so we need to make sure the protocol is backwards compatible and works if an old client talks to a new server as well as the other way round. It's very hard to keep backward compatibility if you want to stop the initial ref adverstisement, costly when there are lots of refs. But we can let both protocols run in parallel, with the old one advertise the presence of the new one. Then the client could switch to new protocol gradually. This way new protocol could forget about backward compatibility. See http://thread.gmane.org/gmane.comp.version-control.git/215054/focus=244325 -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html