Re: [RFC/WIP PATCH 11/11] Document protocol version 2
Jeff King p...@peff.net writes: We should definitely _not_ add anything that scales with the repository size. For instance, the symref field only shows the HEAD now. That's OK, as it's constant size. I agree that this is an easy-to-explain rule to keep the design sensible. We do not show _all_ symrefs right now because of pkt-line length limitations. With v2, we could in theory start doing so (one per pkt-line). But that does not make it a good idea. Very true. If we want symref information conveyed, then we should either make a new symref advertisement available (and it is OK to use a single-bit capability to enable or disable that new section), or make the ref advertisement natively capable of always doing so (i.e. if there is no need to make this optional). -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/WIP PATCH 11/11] Document protocol version 2
On Mon, Jun 01, 2015 at 04:40:54PM -0700, Stefan Beller wrote: Thinking about this further, maybe it is a good idea to restrict the capabilities advertising to alphabetical order? This seems like an unnecessary restriction. The main impetus seems to be: So how does parse_capability scale w.r.t the number of capabilities? If parse_capability is just a linear search then it is O(n) and with n capabilities the client faces an O(n^2) computation which is bad. So if we were to require alphabetic capabilities, you could internally keep track and the whole operation is O(n). I just wonder if this is premature optimization or some thought we need to think of. but that is an implementation problem, not a protocol problem. The parsing side can easily use something better than O(n) lookups (e.g., a binary search). I think we can live with O(n lg n) to parse the whole list. A true in-order merge would be O(n), but the difference is probably negligible here, because... Now if we assume the number of capabilities grows over time a lot (someone may abuse it for a cool feature, similar to the refs currently. Nobody thought about having so many refs in advance) I think if it grows without bound, we are doing it wrong. We should generally only be adding capabilities that require a single short tag in the list (server supports foo, client asks for foo). I'd find it acceptable to add ones that repeat, as long as we are very sure that the repetition is going to be small, or scales with the size of the config (e.g., servers says here is a mirror you can seed from; here is another mirror you can seed from). We should definitely _not_ add anything that scales with the repository size. For instance, the symref field only shows the HEAD now. That's OK, as it's constant size. We do not show _all_ symrefs right now because of pkt-line length limitations. With v2, we could in theory start doing so (one per pkt-line). But that does not make it a good idea. The right way to implement that is: 1. the server says I can tell you about symrefs 2. client says OK, I understand how to parse your symref data 3. for each ref in the advertisement, tack on \0symref=... (or whatever). The capability portion of the conversation remains constant-sized, and the O(# of refs) portion is part of the ref advertisement, which is inherently of that magnitude. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/WIP PATCH 11/11] Document protocol version 2
Stefan Beller sbel...@google.com writes: I think the recent issue with the push certificates shows that having arbitrary data after the = is a bad idea. I do not think push certificate episode tells any such thing. It was about not carefully using cryptography with arbitrary data. How that arbitrary data came to the machinery is irrelevant. We could have used base-64 to encode the server nonce when transferring it to the machinery via capability, but then decode it in order to place it in the cerficiate. Do not restrict transport for such a reason and make legitimate uses of the transport unnecessarily harder for later users. What needs to be done is to think how the data that was transport was used. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/WIP PATCH 11/11] Document protocol version 2
On Fri, May 29, 2015 at 3:21 PM, Jeff King p...@peff.net wrote: On Fri, May 29, 2015 at 02:52:14PM -0700, Junio C Hamano wrote: Currently we can do a = as part of the line after the first ref, such as symref=HEAD:refs/heads/master agent=git/2:2.4.0 so I thought we want to keep this. I do not understand that statement. Capability exchange in v2 is one packet per cap, so the above example would be expressed as: symref=HEAD:refs/heads/master agent=git/2:2.4.0 right? Your keyvaluepair is limited to [a-z0-9-_=]*, and neither of the above two can be expressed with that, which was why I said you need two different set of characters before and after =. Left hand side of = is tightly limited and that is OK. Right hand side may contain characters like ':', '.' and '/', so your alphabet need to be more lenient, even in v1 (which I would imagine would be any octet other than SP, LF and NUL). I think the recent issue with the push certificates shows that having arbitrary data after the = is a bad idea. So we need to be very cautious when to allow which data after the =. I'll try split up the patch. Yes. See git_user_agent_sanitized(), for example, which allows basically any printable ASCII except for SP. I think the v2 capabilities do not even need to have that restriction. It can allow arbitrary binary data, because it has an 8bit-clean framing mechanism (pkt-lines). Of course, that means such capabilities cannot be represented in a v1 conversation (whose framing mechanism involves SP and NUL). But it's probably acceptable to introduce new capabilities which are only available in a v2 conversation. Old clients that do not understand v2 would not understand the capability either. It does require new clients implementing the capability to _also_ implement v2 if they have not done so, but I do not mind pushing people in that direction. The initial v2 client implementation should probably do a few cautionary things, then: 1. Do _not_ fold the per-pkt capabilities into a v1 string; that loses the robust framing. I suggested string_list earlier, but probably we want a list of ptr/len pair, so that it can remain NUL-clean. 2. Avoid holding on to unknown packets longer than necessary. Some capability pkt-lines may be arbitrarily large (up to 64K). If we do not understand them during the v2 read of the capabilities, there is no point hanging on to them. It's not _wrong_ to do so, but just inefficient; if we know that clients will just throw away unknown packets, then we can later introduce new packets with large data, without worrying about wasting the client's resources. I suspect it's not that big a deal either way, though. I have no plans for sending a bunch of large packets, and anyway network bandwidth is probably more precious than client memory. That's very sensible thoughts after rereading this email. The version I'll be sending out today will not follow those suggestions though. :( -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/WIP PATCH 11/11] Document protocol version 2
On Mon, Jun 1, 2015 at 4:14 PM, Stefan Beller sbel...@google.com wrote: On Fri, May 29, 2015 at 3:21 PM, Jeff King p...@peff.net wrote: On Fri, May 29, 2015 at 02:52:14PM -0700, Junio C Hamano wrote: Currently we can do a = as part of the line after the first ref, such as symref=HEAD:refs/heads/master agent=git/2:2.4.0 so I thought we want to keep this. I do not understand that statement. Capability exchange in v2 is one packet per cap, so the above example would be expressed as: symref=HEAD:refs/heads/master agent=git/2:2.4.0 right? Your keyvaluepair is limited to [a-z0-9-_=]*, and neither of the above two can be expressed with that, which was why I said you need two different set of characters before and after =. Left hand side of = is tightly limited and that is OK. Right hand side may contain characters like ':', '.' and '/', so your alphabet need to be more lenient, even in v1 (which I would imagine would be any octet other than SP, LF and NUL). I think the recent issue with the push certificates shows that having arbitrary data after the = is a bad idea. So we need to be very cautious when to allow which data after the =. I'll try split up the patch. Yes. See git_user_agent_sanitized(), for example, which allows basically any printable ASCII except for SP. I think the v2 capabilities do not even need to have that restriction. It can allow arbitrary binary data, because it has an 8bit-clean framing mechanism (pkt-lines). Of course, that means such capabilities cannot be represented in a v1 conversation (whose framing mechanism involves SP and NUL). But it's probably acceptable to introduce new capabilities which are only available in a v2 conversation. Old clients that do not understand v2 would not understand the capability either. It does require new clients implementing the capability to _also_ implement v2 if they have not done so, but I do not mind pushing people in that direction. The initial v2 client implementation should probably do a few cautionary things, then: 1. Do _not_ fold the per-pkt capabilities into a v1 string; that loses the robust framing. I suggested string_list earlier, but probably we want a list of ptr/len pair, so that it can remain NUL-clean. 2. Avoid holding on to unknown packets longer than necessary. Some capability pkt-lines may be arbitrarily large (up to 64K). If we do not understand them during the v2 read of the capabilities, there is no point hanging on to them. It's not _wrong_ to do so, but just inefficient; if we know that clients will just throw away unknown packets, then we can later introduce new packets with large data, without worrying about wasting the client's resources. I suspect it's not that big a deal either way, though. I have no plans for sending a bunch of large packets, and anyway network bandwidth is probably more precious than client memory. That's very sensible thoughts after rereading this email. The version I'll be sending out today will not follow those suggestions though. :( Thinking about this further, maybe it is a good idea to restrict the capabilities advertising to alphabetical order? The exchange would look like this: server: for capability in list: pkt_write(capability) pkt_flush client: do line = recv_pkt() parse_capability(line) while line != flush with parse_capability checking if we know the capability and maybe setting some internal field if we know this capability. Now if we assume the number of capabilities grows over time a lot (someone may abuse it for a cool feature, similar to the refs currently. Nobody thought about having so many refs in advance) So how does parse_capability scale w.r.t the number of capabilities? If parse_capability is just a linear search then it is O(n) and with n capabilities the client faces an O(n^2) computation which is bad. So if we were to require alphabetic capabilities, you could internally keep track and the whole operation is O(n). I just wonder if this is premature optimization or some thought we need to think of. To prevent this problem from popping up, it must be easier to introduce a new phase after the capabilities exchange than to just abuse the capabilities phase for whatever you plan on doing. Thanks, Stefan -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/WIP PATCH 11/11] Document protocol version 2
On Fri, May 29, 2015 at 02:52:14PM -0700, Junio C Hamano wrote: Currently we can do a = as part of the line after the first ref, such as symref=HEAD:refs/heads/master agent=git/2:2.4.0 so I thought we want to keep this. I do not understand that statement. Capability exchange in v2 is one packet per cap, so the above example would be expressed as: symref=HEAD:refs/heads/master agent=git/2:2.4.0 right? Your keyvaluepair is limited to [a-z0-9-_=]*, and neither of the above two can be expressed with that, which was why I said you need two different set of characters before and after =. Left hand side of = is tightly limited and that is OK. Right hand side may contain characters like ':', '.' and '/', so your alphabet need to be more lenient, even in v1 (which I would imagine would be any octet other than SP, LF and NUL). Yes. See git_user_agent_sanitized(), for example, which allows basically any printable ASCII except for SP. I think the v2 capabilities do not even need to have that restriction. It can allow arbitrary binary data, because it has an 8bit-clean framing mechanism (pkt-lines). Of course, that means such capabilities cannot be represented in a v1 conversation (whose framing mechanism involves SP and NUL). But it's probably acceptable to introduce new capabilities which are only available in a v2 conversation. Old clients that do not understand v2 would not understand the capability either. It does require new clients implementing the capability to _also_ implement v2 if they have not done so, but I do not mind pushing people in that direction. The initial v2 client implementation should probably do a few cautionary things, then: 1. Do _not_ fold the per-pkt capabilities into a v1 string; that loses the robust framing. I suggested string_list earlier, but probably we want a list of ptr/len pair, so that it can remain NUL-clean. 2. Avoid holding on to unknown packets longer than necessary. Some capability pkt-lines may be arbitrarily large (up to 64K). If we do not understand them during the v2 read of the capabilities, there is no point hanging on to them. It's not _wrong_ to do so, but just inefficient; if we know that clients will just throw away unknown packets, then we can later introduce new packets with large data, without worrying about wasting the client's resources. I suspect it's not that big a deal either way, though. I have no plans for sending a bunch of large packets, and anyway network bandwidth is probably more precious than client memory. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/WIP PATCH 11/11] Document protocol version 2
Stefan Beller sbel...@google.com writes: +Capability discovery (v2) +- ... + capability-list = *(capability) [agent LF] flush-pkt + capability = PKT-LINE(capability: keyvaluepair LF) + agent= keyvaluepair LF + keyvaluepair = 1*(LC_ALPHA / DIGIT / - / _ / =) What is the = doing there? If you meant to cover things like lang=en with this, I do not think it is a good idea. Rather, it should be more like this: capability = 1*(LC_ALPHA / DIGIT / - / _) [ = value ] value = 0*( any octet other than LF, NUL ) in order to leave us wiggle room to have more than very limited subset of US-ASCII in 'value'. I suspect that we may want to allow anything other than LF (unlike v1 that allowed anything other than SP and LF). Currently we can do a = as part of the line after the first ref, such as symref=HEAD:refs/heads/master agent=git/2:2.4.0 so I thought we want to keep this. I do not understand that statement. Capability exchange in v2 is one packet per cap, so the above example would be expressed as: symref=HEAD:refs/heads/master agent=git/2:2.4.0 right? Your keyvaluepair is limited to [a-z0-9-_=]*, and neither of the above two can be expressed with that, which was why I said you need two different set of characters before and after =. Left hand side of = is tightly limited and that is OK. Right hand side may contain characters like ':', '.' and '/', so your alphabet need to be more lenient, even in v1 (which I would imagine would be any octet other than SP, LF and NUL). -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/WIP PATCH 11/11] Document protocol version 2
Stefan Beller sbel...@google.com writes: @@ -1,11 +1,11 @@ Packfile transfer protocols === -Git supports transferring data in packfiles over the ssh://, git:// and +Git supports transferring data in packfiles over the ssh://, git://, http:// and When you have chance, can you do things like this, which is a clear improvement of the current document even if we never had v2, as separate patches? +Capability discovery (v2) +- ... + capability-list = *(capability) [agent LF] flush-pkt + capability = PKT-LINE(capability: keyvaluepair LF) + agent= keyvaluepair LF + keyvaluepair = 1*(LC_ALPHA / DIGIT / - / _ / =) What is the = doing there? If you meant to cover things like lang=en with this, I do not think it is a good idea. Rather, it should be more like this: capability = 1*(LC_ALPHA / DIGIT / - / _) [ = value ] value = 0*( any octet other than LF, NUL ) in order to leave us wiggle room to have more than very limited subset of US-ASCII in 'value'. I suspect that we may want to allow anything other than LF (unlike v1 that allowed anything other than SP and LF). + LC_ALPHA = %x61-7A + + +The client MUST ignore any data on pkt-lines starting with anything +different than capability for future ease of extension. + +The client MUST NOT ask for capabilities the server did not say it +supports. The server MUST diagnose and abort if capabilities it does +not understand was requested. The server MUST NOT ignore capabilities +that client requested and server advertised. As a consequence of these +rules, server MUST NOT advertise capabilities it does not understand. I think it was already discussed that we shouldn't do the capability: and cap: prefixes in reviews of earlier parts, so the details of this part would be updated? @@ -154,10 +203,14 @@ If HEAD is a valid ref, HEAD MUST appear as the first advertised ref. If HEAD is not a valid ref, HEAD MUST NOT appear in the advertisement list at all, but other refs may still appear. -The stream MUST include capability declarations behind a NUL on the -first ref. The peeled value of a ref (that is ref^{}) MUST be -immediately after the ref itself, if presented. A conforming server -MUST peel the ref if it's an annotated tag. +In version 1 the stream MUST include capability declarations behind +a NUL on the first ref. The peeled value of a ref (that is ref^{}) +MUST be immediately after the ref itself, if presented. A conforming +server MUST peel the ref if it's an annotated tag. + +In version 2 the capabilities are already negotiated, so the first ref +MUST NOT be followed by any capability advertisement, but it should be +treated as any other refs advertising line. Sensible. @@ -178,13 +231,28 @@ MUST peel the ref if it's an annotated tag. shallow = PKT-LINE(shallow SP obj-id) capability-list = capability *(SP capability) - capability = 1*(LC_ALPHA / DIGIT / - / _) + capability = 1*(LC_ALPHA / DIGIT / - / _ / =) Ditto. Thanks. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/WIP PATCH 11/11] Document protocol version 2
On Fri, May 29, 2015 at 1:35 PM, Junio C Hamano gits...@pobox.com wrote: Stefan Beller sbel...@google.com writes: @@ -1,11 +1,11 @@ Packfile transfer protocols === -Git supports transferring data in packfiles over the ssh://, git:// and +Git supports transferring data in packfiles over the ssh://, git://, http:// and When you have chance, can you do things like this, which is a clear improvement of the current document even if we never had v2, as separate patches? will do. +Capability discovery (v2) +- ... + capability-list = *(capability) [agent LF] flush-pkt + capability = PKT-LINE(capability: keyvaluepair LF) + agent= keyvaluepair LF + keyvaluepair = 1*(LC_ALPHA / DIGIT / - / _ / =) What is the = doing there? If you meant to cover things like lang=en with this, I do not think it is a good idea. Rather, it should be more like this: capability = 1*(LC_ALPHA / DIGIT / - / _) [ = value ] value = 0*( any octet other than LF, NUL ) in order to leave us wiggle room to have more than very limited subset of US-ASCII in 'value'. I suspect that we may want to allow anything other than LF (unlike v1 that allowed anything other than SP and LF). Currently we can do a = as part of the line after the first ref, such as symref=HEAD:refs/heads/master agent=git/2:2.4.0 so I thought we want to keep this. And below I just corrected what I thought was a difference between documentation and implementation. + LC_ALPHA = %x61-7A + + +The client MUST ignore any data on pkt-lines starting with anything +different than capability for future ease of extension. + +The client MUST NOT ask for capabilities the server did not say it +supports. The server MUST diagnose and abort if capabilities it does +not understand was requested. The server MUST NOT ignore capabilities +that client requested and server advertised. As a consequence of these +rules, server MUST NOT advertise capabilities it does not understand. I think it was already discussed that we shouldn't do the capability: and cap: prefixes in reviews of earlier parts, so the details of this part would be updated? sure @@ -154,10 +203,14 @@ If HEAD is a valid ref, HEAD MUST appear as the first advertised ref. If HEAD is not a valid ref, HEAD MUST NOT appear in the advertisement list at all, but other refs may still appear. -The stream MUST include capability declarations behind a NUL on the -first ref. The peeled value of a ref (that is ref^{}) MUST be -immediately after the ref itself, if presented. A conforming server -MUST peel the ref if it's an annotated tag. +In version 1 the stream MUST include capability declarations behind +a NUL on the first ref. The peeled value of a ref (that is ref^{}) +MUST be immediately after the ref itself, if presented. A conforming +server MUST peel the ref if it's an annotated tag. + +In version 2 the capabilities are already negotiated, so the first ref +MUST NOT be followed by any capability advertisement, but it should be +treated as any other refs advertising line. Sensible. @@ -178,13 +231,28 @@ MUST peel the ref if it's an annotated tag. shallow = PKT-LINE(shallow SP obj-id) capability-list = capability *(SP capability) - capability = 1*(LC_ALPHA / DIGIT / - / _) + capability = 1*(LC_ALPHA / DIGIT / - / _ / =) Ditto. Thanks. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html