Re: [PATCH] protocol upload-pack-v2
After looking at $gmane/264000 again, maybe the client should talk first stating all the relevant information it wants to get, though I realize this is not part of capabilities so maybe it could even before, such as: Client: All I want to do is an ls-remote, so only Phase 2, no transmission of blobs phase 3 Server: ok Client[as in the previous mail]: Last time we talked you advertised hashed to $SHA Server: that's correct! As the server knows the client doesn't want to know about Phase3, it can omit things relevant to that phase such as the signed push nonce. So from a high level perspective we'd maybe need 4 phases like Phase 0) declare the intent (fetch/push all or partial parts) Phase 1) negotiation of capabilities Phase 2) ref advertisement (i.e. changes in the DAG end points) Phase 3) transmitting the missing blobs. The problem may be that phase 0 and 1 may require mixing, as you may want to declare new things to do in 0) which you would have needed to advertise as a capability in 1). So maybe we need to swap that around: Phase 1a) negotiation of capabilities Phase 1b) negotiation of intent (fetch/push of all/few branches in full/shallow/narrow fashion) Phase 2) ref advertisement (i.e. changes in the DAG end points) Phase 3) transmitting the missing blobs. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] protocol upload-pack-v2
On Thu, Apr 2, 2015 at 3:18 PM, Martin Fick wrote: >> The current protocol has the following problems that limit >> us: >> >> - It is not easy to make it resumable, because we >> recompute every time. This is especially problematic for >> the initial fetch aka "clone" as we will be talking about >> a large transfer. Redirection to a bundle hosted on CDN >> might be something we could do transparently. >> >> - The protocol extension has a fairly low length limit. >> >> - Because the protocol exchange starts by the server side >> advertising all its refs, even when the fetcher is >> interested in a single ref, the initial overhead is >> nontrivial, especially when you are doing a small >> incremental update. The worst case is an auto-builder >> that polls every five minutes, even when there is no new >> commits to be fetched. > > A lot of focus about the problems with ref advertisement is > about the obvious problem mentioned above (a bad problem > indeed). I would like to add that there is another related > problem that all potential solutions to the above problem do > not neccessarily improve. When polling regularly there is > also no current efficient way to check on the current state of > all refs. It would be nice to also be able to get an > incremental update on large refs spaces. I think once the new protocol is in place, the server could advertise the capability to send a differential of refs. To make sure that works the capability phase should be strictly separated from the rest, so you can think of any new fancy scheme to transmit refs or objects, and once both client and server agree on that fancy scheme both know when to expect the "new changed" protocol. So from a high level perspective it should look like: Phase 1) negotiation of capabilities Phase 2) ref advertisement (i.e. changes in the DAG end points) Phase 3) transmitting the missing blobs. The crucial point now is to make sure Phase 1) is not growing to large in transmission size / required compute power (/ complexity). And as everybody out there wants to invent new schemes how to do 2) and 3) efficient, I wonder if we need to do Phase 1) as a differential as well, so I'd presume the optimum could look like Client: Last time we talked the capabilities you advertised hashed to $SHA Server: That's right, but additionally I have "push_cert_nonce=$value" In the non-optimal case: Client: Last time we talked the capabilities you advertised hashed to $SHA Server: I don't know that value, here comes the list of all capabilities I can do: ... ... I like that approach as it would really break down to transmitting the minimal amount of information most of the time. The downside is to know which capabilities are cache-able and then hash-able, such that the remote side only needs to maintain only a very small set of advertised capability lists and their hash. For example the nonce for signed pushes will hopefully never be the same, so it makes no sense to have them inside the capabilities cache. Having such a capabilities cache would give us a long time until the phase to negotiate the capabilities will grow too large again (most of the capabilities I'd assume are rather static per server) And the way I understand the current situation, it's all about talking this early negotiation phase, which then allows us to model the refs advertisement and the blob transmission later on as a response to upcoming problems in the future. > > Thanks, > > -Martin > > -- > The Qualcomm Innovation Center, Inc. is a member of Code > Aurora Forum, hosted by The Linux Foundation > -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] protocol upload-pack-v2
Martin Fick writes: >> The current protocol has the following problems that limit >> us: >> >> - It is not easy to make it resumable, because we >> recompute every time. This is especially problematic for >> the initial fetch aka "clone" as we will be talking about >> a large transfer. Redirection to a bundle hosted on CDN >> might be something we could do transparently. >> >> - The protocol extension has a fairly low length limit. >> >> - Because the protocol exchange starts by the server side >> advertising all its refs, even when the fetcher is >> interested in a single ref, the initial overhead is >> nontrivial, especially when you are doing a small >> incremental update. The worst case is an auto-builder >> that polls every five minutes, even when there is no new >> commits to be fetched. > > A lot of focus about the problems with ref advertisement is > about the obvious problem mentioned above (a bad problem > indeed). I would like to add that there is another related > problem that all potential solutions to the above problem do > not neccessarily improve. When polling regularly there is > also no current efficient way to check on the current state of > all refs. It would be nice to also be able to get an > incremental update on large refs spaces. Yes, incremental ref update is an important problem to solve. I think one potential solution was indeed mentioned to improve that exact issue, e.g. footnote #3 in $gmane/264000. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] protocol upload-pack-v2
> The current protocol has the following problems that limit > us: > > - It is not easy to make it resumable, because we > recompute every time. This is especially problematic for > the initial fetch aka "clone" as we will be talking about > a large transfer. Redirection to a bundle hosted on CDN > might be something we could do transparently. > > - The protocol extension has a fairly low length limit. > > - Because the protocol exchange starts by the server side > advertising all its refs, even when the fetcher is > interested in a single ref, the initial overhead is > nontrivial, especially when you are doing a small > incremental update. The worst case is an auto-builder > that polls every five minutes, even when there is no new > commits to be fetched. A lot of focus about the problems with ref advertisement is about the obvious problem mentioned above (a bad problem indeed). I would like to add that there is another related problem that all potential solutions to the above problem do not neccessarily improve. When polling regularly there is also no current efficient way to check on the current state of all refs. It would be nice to also be able to get an incremental update on large refs spaces. Thanks, -Martin -- The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] protocol upload-pack-v2
Duy Nguyen writes: > On Wed, Apr 1, 2015 at 2:58 AM, Junio C Hamano wrote: >> This is a follow-up on $gmane/264553, which is a continuation of >> $gmane/264000, but instead of giving two required readings to >> readers, I'll start with reproduction of the two, and add a few more >> things the current protocol lacks that I would want to see in the >> updated protocol. > > I think the important thing to get v2 started is making sure we do not > need v3 to get rid of any of these limitations. In other words v2 > should be extensible enough to implement them later. Yes. >>... > > I don't have an answer to this one. So the reaction is,... > "broken" (in pratice, not in theory), don't touch it. I know I'm > burying my head in the sand.. >... > Introducing the ref exchange in push-pack could be done in an > extension too, I think. Thanks, but you do not have to "solve" these sample issues here. As you said above, your primary goal at this stage is to make sure v2 is extensible to cover future needs, and in order to do so, you need to make sure "what's lacking" list is fairly complete. You are not helping from that point of view. I'd like to see a new protocol that lets us overcome the limitations (did I miss others? I am sure people can help here) sometime this year. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] protocol upload-pack-v2
On Wed, Apr 1, 2015 at 2:58 AM, Junio C Hamano wrote: > This is a follow-up on $gmane/264553, which is a continuation of > $gmane/264000, but instead of giving two required readings to > readers, I'll start with reproduction of the two, and add a few more > things the current protocol lacks that I would want to see in the > updated protocol. I think the important thing to get v2 started is making sure we do not need v3 to get rid of any of these limitations. In other words v2 should be extensible enough to implement them later. I'm looking from this perspective. > The current protocol has the following problems that limit us: > > - It is not easy to make it resumable, because we recompute every >time. This is especially problematic for the initial fetch aka >"clone" as we will be talking about a large transfer. Redirection >to a bundle hosted on CDN might be something we could do >transparently. Sending multiple packs or some redirection instructions could be done even with v1. The only recompute part that is unavoidable in v1 is ref advertisement, which I think is solved. > - The protocol extension has a fairly low length limit. One pkt-line per protocol extension should do it. > - Because the protocol exchange starts by the server side >advertising all its refs, even when the fetcher is interested in >a single ref, the initial overhead is nontrivial, especially when >you are doing a small incremental update. The worst case is an >auto-builder that polls every five minutes, even when there is no >new commits to be fetched. One of the reason v2 is started, should be ok with current v2 design. > - Because we recompute every time, taking into account of what the >fetcher has, in addition to what the fetcher obtained earlier >from us in order to reduce the transferred bytes, the payload for >incremental updates become tailor-made for each fetch and cannot >be easily reused. Well, we reuse at a lower level, pack-objects would try to copy existing deltas instead of making new ones. We can cache new deltas in hope that they may be useful for the next fetch. But that has nothing to do with the protocol.. > - The semantics of the side-bands are unclear. > >- Is band #2 meant only for progress output (I think the current > protocol handlers assume that and unconditionally squelch it > under --quiet)? Do we rather want a dedicated "progress" and > "error message" sidebands instead? > >- Is band #2 meant for human consumption, or do we expect the > other end to interpret and act on it? If the former, would it > make sense to send locale information from the client side and > ask the server side to produce its output with _("message")? The interpretation of side-band could be changed by introducing a new extension, couldn't it? > - The semantics of packet_flush() is suboptimal, and this >shortcoming seeps through to the protocol mapped to the >smart-HTTP transport. > >... I don't have an answer to this one. So the reaction is, if it is not "broken" (in pratice, not in theory), don't touch it. I know I'm burying my head in the sand.. > - The fetch-pack direction does the common-parent discovery but the >push-pack direction does not. This is OK for the normal >fast-forward push, in which case we will see a known commit on >the tip of the branch we are pushing into, but makes forced push >inefficient. Introducing the ref exchange in push-pack could be done in an extension too, I think. > - The existing common-parent discovery done on the fetch-pack side >enumerates commits contiguously traversing the history to the >past. We might want to go exponential or Fibonacci to quickly >find an ancient common commit and bisect the history from there >(or it might turn out not to be worth it). Hm.. i'm wondering if we can already do this with v1 if we have enough man power. > - We may want to revamp the builtin/receive-pack.c::report() that >reports the final result of a push back to the pusher to tell the >exact object name that sits at the updated tips of refs, not just >refnames. It will allow the server side to accept a push of >commit X to a branch, do some "magic" on X (e.g. rebase it on top >of the current tip, merge it with the current tip, or let a hook >to rewrite the commit in any way it deems appropriate) and put >the resulting commit Y at the tip of the branch. Without such a >revamp, it is currently not possible to sensibly allow the server >side to rewrite what got pushed. Sounds more coding than changing the protocol, which should be possible with another extension. > - If we were to start allowing the receiver side to rewrite pushed >commits, the updated send-pack protocol must be able to send the >new objects created by that "magic" back to the pusher. The >current protocol does not allow the re
Re: [PATCH] protocol upload-pack-v2
Junio C Hamano writes: > I have a feeling that it is a bit too premature to specify the > details at such a low level as "capaiblities are announced by > prefixing four-byte 'c', 'a', 'p', ':' in front" and "a multi-record > group has its element count at the beginning (or an end marker at > the end, for that matter)", and it may be a better idea to outline > all the necessary elements at a bit higher level first---that would > avoid needs for useless exchanges like what we are having right now. > > If you keep the > discussion at the level like "fetch first asks capabilities it wants > upload-pack-2 to enable, optionally gives the current shallow > boundaries when the capaibilty says the other side supports it, and > then starts showing what it has" while we are trying to achieve > concensus on what kind of protocol elements we would need, and what > information each element would carry, the discussion will help us > reach a shared understanding on what to write down in EBNF form > exactly faster, I would imagine. And I see we went silent after this, so let's try to stir the pot again to see if it simmers. This is a follow-up on $gmane/264553, which is a continuation of $gmane/264000, but instead of giving two required readings to readers, I'll start with reproduction of the two, and add a few more things the current protocol lacks that I would want to see in the updated protocol. The current protocol has the following problems that limit us: - It is not easy to make it resumable, because we recompute every time. This is especially problematic for the initial fetch aka "clone" as we will be talking about a large transfer. Redirection to a bundle hosted on CDN might be something we could do transparently. - The protocol extension has a fairly low length limit. - Because the protocol exchange starts by the server side advertising all its refs, even when the fetcher is interested in a single ref, the initial overhead is nontrivial, especially when you are doing a small incremental update. The worst case is an auto-builder that polls every five minutes, even when there is no new commits to be fetched. - Because we recompute every time, taking into account of what the fetcher has, in addition to what the fetcher obtained earlier from us in order to reduce the transferred bytes, the payload for incremental updates become tailor-made for each fetch and cannot be easily reused. - The semantics of the side-bands are unclear. - Is band #2 meant only for progress output (I think the current protocol handlers assume that and unconditionally squelch it under --quiet)? Do we rather want a dedicated "progress" and "error message" sidebands instead? - Is band #2 meant for human consumption, or do we expect the other end to interpret and act on it? If the former, would it make sense to send locale information from the client side and ask the server side to produce its output with _("message")? - The semantics of packet_flush() is suboptimal, and this shortcoming seeps through to the protocol mapped to the smart-HTTP transport. Originally, packet_flush() was meant as "Here is an end of one logical section of what I am going to speak.", hinting that it might be a good idea for the underlying implementation to hold the packets up to that point in-core and then write(2) them all out (i.e. "flush") to the file descriptor only when we handle packet_flush(). It never meant "Now I am finished speaking for now and it is your turn to speak." But because HTTP is inherently a ping-pong protocol where the requestor at one point stops talking and lets the responder speak, the code to map our protocol to the smart HTTP transport made the packet_flush() boundary as "Now I am done talking, it is my turn to listen." We probably need two kinds of packet_flush(). When a requestor needs to say two or more logical groups of things before telling the other side "Now I am done talking; it is your turn.", we need some marker (i.e. the original meaning of packet_flush()) at the end of these logical groups. And in order to be able to say "Now I am done saying everything I need to say at this point for you to respond to me. It is your turn.", we need another kind of marker. - The fetch-pack direction does the common-parent discovery but the push-pack direction does not. This is OK for the normal fast-forward push, in which case we will see a known commit on the tip of the branch we are pushing into, but makes forced push inefficient. - The existing common-parent discovery done on the fetch-pack side enumerates commits contiguously traversing the history to the past. We might want to go exponential or Fibonacci to quickly find an ancient common commit and bisect the history from there (or it might turn out not to be worth it). - We may want to
Re: [PATCH] protocol upload-pack-v2
On Mar 9, 2015, at 18:38, Duy Nguyen wrote: A minor point on capability negotiation. I remember why I passed capabilities via command line instead of this proposal. With this proposal, daemon.c does not recognize "i18n" capability and cannot switch to the correct language before it reports an error. But perhaps I approached it the wrong way. Instead of letting the daemon produce non-English language directly, if could sort of standardize the messages and send an error code back in "ERR" line, together with an English message (current pack-protocol.txt says "ERR" followed by explanation-text). The client can use this error code to print non-English messages if it wants to. Perhaps we can reuse http (or ftp) return codes or some other protocol then customize to fit Git's needs. May I suggest that you take a look at RFC 2034 [1]. It describes almost exactly this same situation and how SMTP was enhanced to support error code numbers that could then be translated. No reason this enhancement needs to be restricted to protocol v2 if it's just an "enhancedstatuscodes" capability the server sends back to indicate that its ERR text is in that format. -Kyle [1] http://tools.ietf.org/html/rfc2034 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] protocol upload-pack-v2
A minor point on capability negotiation. I remember why I passed capabilities via command line instead of this proposal. With this proposal, daemon.c does not recognize "i18n" capability and cannot switch to the correct language before it reports an error. But perhaps I approached it the wrong way. Instead of letting the daemon produce non-English language directly, if could sort of standardize the messages and send an error code back in "ERR" line, together with an English message (current pack-protocol.txt says "ERR" followed by explanation-text). The client can use this error code to print non-English messages if it wants to. Perhaps we can reuse http (or ftp) return codes or some other protocol then customize to fit Git's needs. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] protocol upload-pack-v2
Stefan Beller writes: >> I do not see a good reason why we want "I am sending N caps" >> upfront, instead of "this, that, and here is the end of the group". > > I thought about having an end marker, so something like > capabilities start > thin-pack > lang > ofs-delta > capabilities done > > (Each line a pkt-line) > > Though all decisions I thought through I tried to put more weight on > future expandability. Now that I think about it again, it makes no > difference, whether to use a counter or an end marker. One reason why I would suggest avoiding "count upfront" is to make sure we do not repeat the mistake of "Content-Length" which had to later be corrected by introducing chunked transfer by HTTP folks. Closer to home, our "type and then size upfront and then contents and hash the whole thing" loose object format makes it quite hard to produce without having the whole thing in-core, or otherwise having a separate way to know the size upfront. For the capability list, the number of the capabilities you support may be limited, bounded and may even be known at the compile time, so count-upfront may not be a burden, but in other parts of the protocol where you need to feed the result of computation to the other end, you would need "the group ends here" marker. It would be easier for everybody if we can make all types of messages use the same syntax, regardless of type. >>> + cap = PKT-LINE("capabilities" SP size LF list) >> >> Isn't a cap packet terminated by LF without any "list" following it? >> The notation PKT-LINE() is "wrap in a single packet", >> and I do not think you meant to send the capability line and a series >> of cap:foo entries in a single packet. > > Yeah I meant to use one packet per line > So after considering your input, you'd want to have > PKT-LINE("capabilities start") > PKT-LINE("no-prefix-for-capabilities") > PKT-LINE("ofs-delta") > PKT-LINE("agent-as-capability-2.6") > PKT-LINE("capabilities end") OK, so that "list" at the end is just a typo; there shouldn't be "list at the end inside PKT-LINE(). >>> + size = *DIGIT >>> + capability-list = *(capability) [agent LF] >>> + capability = "cap:" keyvaluepair LF >>> + agent= keyvaluepair LF >> >> I do not see a reason to make 'agent' as part of capability. That >> was an implementation detail of v1 but v2 does not have an >> obligation to consider agent announcement as capability. > > So I think we don't want to drop the agent announcement as it may > reveal useful information ("How many outdated clients connect to our > $HOSTING_SITE?", "I need to debug failures which happen only rarely, > Oh! it can be narrowed down to clients with agent xyz.") Don't be overly defensive and try not to misunderstand and see a criticism where there is none. All I am saying is that agent announcement is not annoucing capability. You may announce many things, and server or client version may be something you would want to announce. I have a feeling that it is a bit too premature to specify the details at such a low level as "capaiblities are announced by prefixing four-byte 'c', 'a', 'p', ':' in front" and "a multi-record group has its element count at the beginning (or an end marker at the end, for that matter)", and it may be a better idea to outline all the necessary elements at a bit higher level first---that would avoid needs for useless exchanges like what we are having right now. It's that when you write things in EBNF, you are writing something that you would eventually want to cast in stone, and the non-terminal names in EBNF matter (they convey the semantics, what these named things are), and I was reacting to that because I wanted to make sure we avoid mislabaling things as something that are not. The "shallow" vs "reference advertisement" is the same. I think the former is _not_ part of reference announcement but is an optional phase of the protocol, but the level of the detail that would make the difference matter would appear only when you start writing it in EBNF and call both "reference advertisement". If you keep the discussion at the level like "fetch first asks capabilities it wants upload-pack-2 to enable, optionally gives the current shallow boundaries when the capaibilty says the other side supports it, and then starts showing what it has" while we are trying to achieve concensus on what kind of protocol elements we would need, and what information each element would carry, the discussion will help us reach a shared understanding on what to write down in EBNF form exactly faster, I would imagine. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] protocol upload-pack-v2
On Sat, Mar 7, 2015 at 11:28 AM, Stefan Beller wrote: >>> + >>> + advertised-refs = (no-refs / list-of-refs) >>> + *shallow >>> + flush-pkt >> >> I am not sure if defining "shallow" as part of "refs advertisement" >> is a good idea. The latter lives in the same place in the v1 >> protocol merely because that was how it was later bolted onto the >> protocol. But this concern can easily be allayed by retitling >> "advertised-refs" to something else. > > I don't quite understand this. What are your concerns about shallow here? This made me look for explanation about these shallow lines (commit ad49136).They are sent when source repo is a shallow one. They tell the receiver about the bottom the source repo. My experiment with "git clone --since" shows that such a shallow clone could end up with a few thousand shallow lines. Not as many as refs, but sending shallow lines this way still cost more than necessary. If we want to do that, we need more flexibility that just sending all shallow lines this way. One of the approach is, because these shallow lines (on source repo) rarely change. We could simply exchange SHA-1 of the source repo's "shallow" file first. The client only requests shallow info when SHA-1 does not match. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] protocol upload-pack-v2
On Fri, Mar 6, 2015 at 4:28 PM, Junio C Hamano wrote: > Stefan Beller writes: > >> @@ -67,7 +74,6 @@ gracefully with an error message. >>error-line = PKT-LINE("ERR" SP explanation-text) >> >> >> - >> SSH Transport > > Noise? > >> @@ -124,9 +130,56 @@ has, the first can 'fetch' from the second. This >> operation determines >> what data the server has that the client does not then streams that >> data down to the client in packfile format. >> >> +Capability discovery (v2) >> +- >> >> +In version 1, capability discovery is part of reference discovery and >> +covered in reference discovery section. >> + >> +In version 2, when the client initially connects, the server >> +immediately sends its capabilities to the client. Then the client must >> +send the list of server capabilities it wants to use to the server. >> + >> + S: 00XXcapabilities 4\n >> + S: 00XXcap:lang\n >> + S: 00XXcap:thin-pack\n >> + S: 00XXcap:ofs-delta\n >> + S: 00XXagent:agent=git/2:3.4.5+custom-739-gb850f98\n >> + >> + C: 00XXcapabilities 3 >> + C: 00XXcap:thin-pack\n >> + C: 00XXcap:ofs-delta\n >> + C: 00XXcap:lang=en\n >> + C: 00XXagent:agent=git/custom_string\n > > I do not see a good reason why we want "I am sending N caps" > upfront, instead of "this, that, and here is the end of the group". I thought about having an end marker, so something like capabilities start thin-pack lang ofs-delta capabilities done (Each line a pkt-line) Though all decisions I thought through I tried to put more weight on future expandability. Now that I think about it again, it makes no difference, whether to use a counter or an end marker. > If you expect the recipient to benefit by being able to pre-allocate > N slots, then that might make sense, but otherwise, I'd rather see > us stick to a (weaker) flush that says "group ends here". I think it's not about pre allocating but counting down. Then you know at the beginning how much to expect which might become relevant if that section grows large again. ("The server really wants to send 1500 capability lines? Nope I'll just disconnect because I am on mobile!") Implementation wise an end marker is easier though (you don't need to count down, so it feels more stateless to me). > > Besides, I do not know how you counted 4 on the S: side and 3 on > the C: side in the above example and missing LF after 3 ;-). > Sorry about that, I added one answer late and forgot to increment the 3. >> + >> + cap = PKT-LINE("capabilities" SP size LF list) > > Isn't a cap packet terminated by LF without any "list" following it? > The notation PKT-LINE() is "wrap in a single packet", > and I do not think you meant to send the capability line and a series > of cap:foo entries in a single packet. Yeah I meant to use one packet per line So after considering your input, you'd want to have PKT-LINE("capabilities start") PKT-LINE("no-prefix-for-capabilities") PKT-LINE("ofs-delta") PKT-LINE("agent-as-capability-2.6") PKT-LINE("capabilities end") And additionally to that a PKT-LINE should have the ability to grow larger than 0x, which would be encoded with 0x being an escape character indicating the length is encoded somehow different. (Maybe take the next 8 bytes instead of just 4). > >> + size = *DIGIT >> + capability-list = *(capability) [agent LF] >> + capability = "cap:" keyvaluepair LF >> + agent= keyvaluepair LF > > I do not see a reason to make 'agent' as part of capability. That > was an implementation detail of v1 but v2 does not have an > obligation to consider agent announcement as capability. So I think we don't want to drop the agent announcement as it may reveal useful information ("How many outdated clients connect to our $HOSTING_SITE?", "I need to debug failures which happen only rarely, Oh! it can be narrowed down to clients with agent xyz.") So then we need to decide where to put the agent. And as it is only small but useful (meta?)-information I'd rather put it at the beginning of the data exchange, so in case the other side seems to be missbehaving, it is easier to debug in the hope the agent transmission was still successful. > > server-announcement = PKT-LINE("capabilities" ...) capability-list > [agent-announcement] > capability-list = as you have it w/o "[agent LF]" > agent-announcement = PKT-LINE("agent=" agent-token LF) > > or something, perhaps? This looks like me as if all capabilities are in one PKT-LINE, which you seemed to oppose? > >> +The client MUST ignore any data before the pkt-line starting with >> "capabilities" >> +for future easy of extension. > > s/easy/ease/; but I am not sure if this makes sense. Without > knowing the extended preamble, you cannot even tell if a packet line > that happens to start with "capabilities" is a proper beginning of > 0-th iteration of v2 protocol, or an embedded data in the preamble, > no? I rather thought about the
Re: [PATCH] protocol upload-pack-v2
Stefan Beller writes: > @@ -67,7 +74,6 @@ gracefully with an error message. >error-line = PKT-LINE("ERR" SP explanation-text) > > > - > SSH Transport Noise? > @@ -124,9 +130,56 @@ has, the first can 'fetch' from the second. This > operation determines > what data the server has that the client does not then streams that > data down to the client in packfile format. > > +Capability discovery (v2) > +- > > +In version 1, capability discovery is part of reference discovery and > +covered in reference discovery section. > + > +In version 2, when the client initially connects, the server > +immediately sends its capabilities to the client. Then the client must > +send the list of server capabilities it wants to use to the server. > + > + S: 00XXcapabilities 4\n > + S: 00XXcap:lang\n > + S: 00XXcap:thin-pack\n > + S: 00XXcap:ofs-delta\n > + S: 00XXagent:agent=git/2:3.4.5+custom-739-gb850f98\n > + > + C: 00XXcapabilities 3 > + C: 00XXcap:thin-pack\n > + C: 00XXcap:ofs-delta\n > + C: 00XXcap:lang=en\n > + C: 00XXagent:agent=git/custom_string\n I do not see a good reason why we want "I am sending N caps" upfront, instead of "this, that, and here is the end of the group". If you expect the recipient to benefit by being able to pre-allocate N slots, then that might make sense, but otherwise, I'd rather see us stick to a (weaker) flush that says "group ends here". Besides, I do not know how you counted 4 on the S: side and 3 on the C: side in the above example and missing LF after 3 ;-). > + > + cap = PKT-LINE("capabilities" SP size LF list) Isn't a cap packet terminated by LF without any "list" following it? The notation PKT-LINE() is "wrap in a single packet", and I do not think you meant to send the capability line and a series of cap:foo entries in a single packet. > + size = *DIGIT > + capability-list = *(capability) [agent LF] > + capability = "cap:" keyvaluepair LF > + agent= keyvaluepair LF I do not see a reason to make 'agent' as part of capability. That was an implementation detail of v1 but v2 does not have an obligation to consider agent announcement as capability. server-announcement = PKT-LINE("capabilities" ...) capability-list [agent-announcement] capability-list = as you have it w/o "[agent LF]" agent-announcement = PKT-LINE("agent=" agent-token LF) or something, perhaps? > +The client MUST ignore any data before the pkt-line starting with > "capabilities" > +for future easy of extension. s/easy/ease/; but I am not sure if this makes sense. Without knowing the extended preamble, you cannot even tell if a packet line that happens to start with "capabilities" is a proper beginning of 0-th iteration of v2 protocol, or an embedded data in the preamble, no? > +Reference Discovery (v2) > + > + > +In version 2, reference discovery is initiated by the client with > +"want-refs" line. The client may skip reference discovery phase > +entirely by not sending "want-refs" (e.g. the client has another way > +to retrieve ref list). > + > + > + want-refs = PKT-LINE("want-refs" SP mode *argument) > + mode = "all" > + argument = SP arg > + arg= 1*(LC_ALPHA / DIGIT / "-" / "_" / "=") > + > + > +Mode "all" sends all visible refs to the client like in version 1. No > +arguments are accepted in this mode. More modes may be available based > +on capabilities. I tend to agree with Duy that the protocol must anticipate that args can become longer. > + > + advertised-refs = (no-refs / list-of-refs) > + *shallow > + flush-pkt I am not sure if defining "shallow" as part of "refs advertisement" is a good idea. The latter lives in the same place in the v1 protocol merely because that was how it was later bolted onto the protocol. But this concern can easily be allayed by retitling "advertised-refs" to something else. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] protocol upload-pack-v2
On Sat, Mar 7, 2015 at 6:38 AM, Stefan Beller wrote: > +Reference Discovery (v2) > + > + > +In version 2, reference discovery is initiated by the client with > +"want-refs" line. The client may skip reference discovery phase > +entirely by not sending "want-refs" (e.g. the client has another way > +to retrieve ref list). > + > + > + want-refs = PKT-LINE("want-refs" SP mode *argument) > + mode = "all" > + argument = SP arg > + arg= 1*(LC_ALPHA / DIGIT / "-" / "_" / "=") > + On the same line with capabilities, we would not want to run into a situation where we have too many arguments to send and exceed pkt-line limit. So perhaps do one argument per pkt-line too, ending with a pkt-flush. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] protocol upload-pack-v2
I'm still wondering if we should reserve more from the packet length. We have used length for pkt-flush. Shawn pointed out that we still have 0001, 0002 and 0003 but we may use some of them to avoid abuse of pkt-flush in some cases. Perhaps we could limit packet length to 0xfff0, so we have 0xfff1-0x to assign special meanings in future, if we have to. On Sat, Mar 7, 2015 at 6:38 AM, Stefan Beller wrote: > +In version 2, when the client initially connects, the server > +immediately sends its capabilities to the client. Then the client must > +send the list of server capabilities it wants to use to the server. > + > + S: 00XXcapabilities 4\n > + S: 00XXcap:lang\n > + S: 00XXcap:thin-pack\n > + S: 00XXcap:ofs-delta\n > + S: 00XXagent:agent=git/2:3.4.5+custom-739-gb850f98\n > + > + C: 00XXcapabilities 3 > + C: 00XXcap:thin-pack\n > + C: 00XXcap:ofs-delta\n > + C: 00XXcap:lang=en\n > + C: 00XXagent:agent=git/custom_string\n > + > + > + cap = PKT-LINE("capabilities" SP size LF list) > + size = *DIGIT > + capability-list = *(capability) [agent LF] > + capability = "cap:" keyvaluepair LF > + agent= keyvaluepair LF > + keyvaluepair = 1*(LC_ALPHA / DIGIT / "-" / "_" / "=") If we send one cap per pkt-line, cap can contain spaces. The question is, should we allow them? Ending cap lines with LF seems redudant because we already know the line length. > + LC_ALPHA = %x61-7A > + > + > +The client MUST ignore any data before the pkt-line starting with > "capabilities" > +for future easy of extension. > + > +The server MUST advertise "size" as the decimal number of lines following > +the "capabilities" line. This includes lines starting "cap:" and "agent:" > for now. > +The client MUST ignore lines which start with an unknown pattern. I think the common pattern in our protocol is to end these with a pkt-flush, instead of send the number of items upfront. If we do that we don't have to specify "cap:" or "agent:" either. All pkt-lines until pkt-flush at the beginning of v2 are cap lines. And agent is just another "capability". -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] protocol upload-pack-v2
On Fri, Mar 6, 2015 at 3:38 PM, Stefan Beller wrote: > From: Duy Nguyen Oops. I edited the proposal from Duy heavily(?), such that it is different from what he proposed 4 days ago. In my impression this is what most of the participants would agree on. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html