Re: [PATCH] protocol upload-pack-v2

2015-04-02 Thread Stefan Beller
After looking at $gmane/264000 again, maybe the client should talk first
stating all the relevant information it wants to get, though I realize this
is not part of capabilities so maybe it could even before, such as:

Client: All I want to do is an ls-remote, so only Phase 2, no
transmission of blobs phase 3
Server: ok
Client[as in the previous mail]: Last time we talked you advertised
hashed to $SHA
Server: that's correct!

As the server knows the client doesn't want to know about Phase3, it
can omit things
relevant to that phase such as the signed push nonce.

So from a high level perspective we'd maybe need 4 phases like
Phase 0) declare the intent (fetch/push all or partial parts)
Phase 1) negotiation of capabilities
Phase 2) ref advertisement (i.e. changes in the DAG end points)
Phase 3) transmitting the missing blobs.

The problem may be that phase 0 and 1 may require mixing, as you may want
to declare new things to do in 0) which you would have needed to advertise as
a capability in 1). So maybe we need to swap that around:

Phase 1a) negotiation of capabilities
Phase 1b) negotiation of intent (fetch/push of all/few branches in
full/shallow/narrow fashion)
Phase 2) ref advertisement (i.e. changes in the DAG end points)
Phase 3) transmitting the missing blobs.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] protocol upload-pack-v2

2015-04-02 Thread Stefan Beller
On Thu, Apr 2, 2015 at 3:18 PM, Martin Fick  wrote:
>> The current protocol has the following problems that limit
>> us:
>>
>>  - It is not easy to make it resumable, because we
>> recompute every time.  This is especially problematic for
>> the initial fetch aka "clone" as we will be talking about
>> a large transfer. Redirection to a bundle hosted on CDN
>> might be something we could do transparently.
>>
>>  - The protocol extension has a fairly low length limit.
>>
>>  - Because the protocol exchange starts by the server side
>> advertising all its refs, even when the fetcher is
>> interested in a single ref, the initial overhead is
>> nontrivial, especially when you are doing a small
>> incremental update.  The worst case is an auto-builder
>> that polls every five minutes, even when there is no new
>> commits to be fetched.
>
> A lot of focus about the problems with ref advertisement is
> about the obvious problem mentioned above (a bad problem
> indeed).  I would like to add that there is another related
> problem that all potential solutions to the above problem do
> not neccessarily improve.   When polling regularly there is
> also no current efficient way to check on the current state of
> all refs.  It would be nice to also be able to get an
> incremental update on large refs spaces.

I think once the new protocol is in place, the server could advertise
the capability to send a differential of refs.

To make sure that works the capability phase should be strictly separated
from the rest, so you can think of any new fancy scheme to transmit
refs or objects, and once both client and server agree on that fancy scheme
both know when to expect the "new changed" protocol.

So from a high level perspective it should look like:
Phase 1) negotiation of capabilities
Phase 2) ref advertisement (i.e. changes in the DAG end points)
Phase 3) transmitting the missing blobs.

The crucial point now is to make sure Phase 1) is not growing to large in
transmission size / required compute power (/ complexity).

And as everybody out there wants to invent new schemes how to do 2) and 3)
efficient, I wonder if we need to do Phase 1) as a differential as well, so I'd
presume the optimum could look like

Client: Last time we talked the capabilities you advertised hashed to $SHA
Server: That's right, but additionally I have "push_cert_nonce=$value"

In the non-optimal case:
Client: Last time we talked the capabilities you advertised hashed to $SHA
Server: I don't know that value, here comes the list of all
capabilities I can do:
 ...
 ...

I like that approach as it would really break down to transmitting the minimal
amount of information most of the time. The downside is to know which
capabilities are cache-able and then hash-able, such that the remote side
only needs to maintain only a very small set of advertised capability lists
and their hash. For example the nonce for signed pushes will hopefully
never be the same, so it makes no sense to have them inside the capabilities
cache.

Having such a capabilities cache would give us a long time until the
phase to negotiate the capabilities will grow too large again (most of the
capabilities I'd assume are rather static per server)

And the way I understand the current situation, it's all about talking this
early negotiation phase, which then allows us to model the refs
advertisement and
the blob transmission later on as a response to upcoming problems in the future.

>
> Thanks,
>
> -Martin
>
> --
> The Qualcomm Innovation Center, Inc. is a member of Code
> Aurora Forum, hosted by The Linux Foundation
>
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] protocol upload-pack-v2

2015-04-02 Thread Junio C Hamano
Martin Fick  writes:

>> The current protocol has the following problems that limit
>> us:
>> 
>>  - It is not easy to make it resumable, because we
>> recompute every time.  This is especially problematic for
>> the initial fetch aka "clone" as we will be talking about
>> a large transfer. Redirection to a bundle hosted on CDN
>> might be something we could do transparently.
>> 
>>  - The protocol extension has a fairly low length limit.
>> 
>>  - Because the protocol exchange starts by the server side
>> advertising all its refs, even when the fetcher is
>> interested in a single ref, the initial overhead is
>> nontrivial, especially when you are doing a small
>> incremental update.  The worst case is an auto-builder
>> that polls every five minutes, even when there is no new
>> commits to be fetched.
>
> A lot of focus about the problems with ref advertisement is 
> about the obvious problem mentioned above (a bad problem 
> indeed).  I would like to add that there is another related 
> problem that all potential solutions to the above problem do 
> not neccessarily improve.   When polling regularly there is 
> also no current efficient way to check on the current state of 
> all refs.  It would be nice to also be able to get an 
> incremental update on large refs spaces.

Yes, incremental ref update is an important problem to solve.  I
think one potential solution was indeed mentioned to improve that
exact issue, e.g. footnote #3 in $gmane/264000.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] protocol upload-pack-v2

2015-04-02 Thread Martin Fick
> The current protocol has the following problems that limit
> us:
> 
>  - It is not easy to make it resumable, because we
> recompute every time.  This is especially problematic for
> the initial fetch aka "clone" as we will be talking about
> a large transfer. Redirection to a bundle hosted on CDN
> might be something we could do transparently.
> 
>  - The protocol extension has a fairly low length limit.
> 
>  - Because the protocol exchange starts by the server side
> advertising all its refs, even when the fetcher is
> interested in a single ref, the initial overhead is
> nontrivial, especially when you are doing a small
> incremental update.  The worst case is an auto-builder
> that polls every five minutes, even when there is no new
> commits to be fetched.

A lot of focus about the problems with ref advertisement is 
about the obvious problem mentioned above (a bad problem 
indeed).  I would like to add that there is another related 
problem that all potential solutions to the above problem do 
not neccessarily improve.   When polling regularly there is 
also no current efficient way to check on the current state of 
all refs.  It would be nice to also be able to get an 
incremental update on large refs spaces.

Thanks,

-Martin

-- 
The Qualcomm Innovation Center, Inc. is a member of Code 
Aurora Forum, hosted by The Linux Foundation

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] protocol upload-pack-v2

2015-04-02 Thread Junio C Hamano
Duy Nguyen  writes:

> On Wed, Apr 1, 2015 at 2:58 AM, Junio C Hamano  wrote:
>> This is a follow-up on $gmane/264553, which is a continuation of
>> $gmane/264000, but instead of giving two required readings to
>> readers, I'll start with reproduction of the two, and add a few more
>> things the current protocol lacks that I would want to see in the
>> updated protocol.
>
> I think the important thing to get v2 started is making sure we do not
> need v3 to get rid of any of these limitations. In other words v2
> should be extensible enough to implement them later.

Yes.

>>...
>
> I don't have an answer to this one. So the reaction is,...
> "broken" (in pratice, not in theory), don't touch it. I know I'm
> burying my head in the sand..
>...
> Introducing the ref exchange in push-pack could be done in an
> extension too, I think.

Thanks, but you do not have to "solve" these sample issues here.  As
you said above, your primary goal at this stage is to make sure v2
is extensible to cover future needs, and in order to do so, you need
to make sure "what's lacking" list is fairly complete.  You are not
helping from that point of view.

I'd like to see a new protocol that lets us overcome the limitations
(did I miss others? I am sure people can help here) sometime this
year.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] protocol upload-pack-v2

2015-04-02 Thread Duy Nguyen
On Wed, Apr 1, 2015 at 2:58 AM, Junio C Hamano  wrote:
> This is a follow-up on $gmane/264553, which is a continuation of
> $gmane/264000, but instead of giving two required readings to
> readers, I'll start with reproduction of the two, and add a few more
> things the current protocol lacks that I would want to see in the
> updated protocol.

I think the important thing to get v2 started is making sure we do not
need v3 to get rid of any of these limitations. In other words v2
should be extensible enough to implement them later. I'm looking from
this perspective.

> The current protocol has the following problems that limit us:
>
>  - It is not easy to make it resumable, because we recompute every
>time.  This is especially problematic for the initial fetch aka
>"clone" as we will be talking about a large transfer. Redirection
>to a bundle hosted on CDN might be something we could do
>transparently.

Sending multiple packs or some redirection instructions could be done
even with v1. The only recompute part that is unavoidable in v1 is ref
advertisement, which I think is solved.

>  - The protocol extension has a fairly low length limit.

One pkt-line per protocol extension should do it.

>  - Because the protocol exchange starts by the server side
>advertising all its refs, even when the fetcher is interested in
>a single ref, the initial overhead is nontrivial, especially when
>you are doing a small incremental update.  The worst case is an
>auto-builder that polls every five minutes, even when there is no
>new commits to be fetched.

One of the reason v2 is started, should be ok with current v2 design.

>  - Because we recompute every time, taking into account of what the
>fetcher has, in addition to what the fetcher obtained earlier
>from us in order to reduce the transferred bytes, the payload for
>incremental updates become tailor-made for each fetch and cannot
>be easily reused.

Well, we reuse at a lower level, pack-objects would try to copy
existing deltas instead of making new ones. We can cache new deltas in
hope that they may be useful for the next fetch. But that has nothing
to do with the protocol..

>  - The semantics of the side-bands are unclear.
>
>- Is band #2 meant only for progress output (I think the current
>  protocol handlers assume that and unconditionally squelch it
>  under --quiet)?  Do we rather want a dedicated "progress" and
>  "error message" sidebands instead?
>
>- Is band #2 meant for human consumption, or do we expect the
>  other end to interpret and act on it?  If the former, would it
>  make sense to send locale information from the client side and
>  ask the server side to produce its output with _("message")?

The interpretation of side-band could be changed by introducing a new
extension, couldn't it?

>  - The semantics of packet_flush() is suboptimal, and this
>shortcoming seeps through to the protocol mapped to the
>smart-HTTP transport.
>
>...

I don't have an answer to this one. So the reaction is, if it is not
"broken" (in pratice, not in theory), don't touch it. I know I'm
burying my head in the sand..

>  - The fetch-pack direction does the common-parent discovery but the
>push-pack direction does not.  This is OK for the normal
>fast-forward push, in which case we will see a known commit on
>the tip of the branch we are pushing into, but makes forced push
>inefficient.

Introducing the ref exchange in push-pack could be done in an
extension too, I think.

>  - The existing common-parent discovery done on the fetch-pack side
>enumerates commits contiguously traversing the history to the
>past.  We might want to go exponential or Fibonacci to quickly
>find an ancient common commit and bisect the history from there
>(or it might turn out not to be worth it).

Hm.. i'm wondering if we can already do this with v1 if we have enough
man power.

>  - We may want to revamp the builtin/receive-pack.c::report() that
>reports the final result of a push back to the pusher to tell the
>exact object name that sits at the updated tips of refs, not just
>refnames.  It will allow the server side to accept a push of
>commit X to a branch, do some "magic" on X (e.g. rebase it on top
>of the current tip, merge it with the current tip, or let a hook
>to rewrite the commit in any way it deems appropriate) and put
>the resulting commit Y at the tip of the branch.  Without such a
>revamp, it is currently not possible to sensibly allow the server
>side to rewrite what got pushed.

Sounds more coding than changing the protocol, which should be
possible with another extension.

>  - If we were to start allowing the receiver side to rewrite pushed
>commits, the updated send-pack protocol must be able to send the
>new objects created by that "magic" back to the pusher.  The
>current protocol does not allow the re

Re: [PATCH] protocol upload-pack-v2

2015-03-31 Thread Junio C Hamano
Junio C Hamano  writes:

> I have a feeling that it is a bit too premature to specify the
> details at such a low level as "capaiblities are announced by
> prefixing four-byte 'c', 'a', 'p', ':' in front" and "a multi-record
> group has its element count at the beginning (or an end marker at
> the end, for that matter)", and it may be a better idea to outline
> all the necessary elements at a bit higher level first---that would
> avoid needs for useless exchanges like what we are having right now.
>
>   If you keep the
> discussion at the level like "fetch first asks capabilities it wants
> upload-pack-2 to enable, optionally gives the current shallow
> boundaries when the capaibilty says the other side supports it, and
> then starts showing what it has" while we are trying to achieve
> concensus on what kind of protocol elements we would need, and what
> information each element would carry, the discussion will help us
> reach a shared understanding on what to write down in EBNF form
> exactly faster, I would imagine.

And I see we went silent after this, so let's try to stir the pot
again to see if it simmers.

This is a follow-up on $gmane/264553, which is a continuation of
$gmane/264000, but instead of giving two required readings to
readers, I'll start with reproduction of the two, and add a few more
things the current protocol lacks that I would want to see in the
updated protocol.



The current protocol has the following problems that limit us:

 - It is not easy to make it resumable, because we recompute every
   time.  This is especially problematic for the initial fetch aka
   "clone" as we will be talking about a large transfer. Redirection
   to a bundle hosted on CDN might be something we could do
   transparently.

 - The protocol extension has a fairly low length limit.

 - Because the protocol exchange starts by the server side
   advertising all its refs, even when the fetcher is interested in
   a single ref, the initial overhead is nontrivial, especially when
   you are doing a small incremental update.  The worst case is an
   auto-builder that polls every five minutes, even when there is no
   new commits to be fetched.

 - Because we recompute every time, taking into account of what the
   fetcher has, in addition to what the fetcher obtained earlier
   from us in order to reduce the transferred bytes, the payload for
   incremental updates become tailor-made for each fetch and cannot
   be easily reused.

 - The semantics of the side-bands are unclear.

   - Is band #2 meant only for progress output (I think the current
 protocol handlers assume that and unconditionally squelch it
 under --quiet)?  Do we rather want a dedicated "progress" and
 "error message" sidebands instead?

   - Is band #2 meant for human consumption, or do we expect the
 other end to interpret and act on it?  If the former, would it
 make sense to send locale information from the client side and
 ask the server side to produce its output with _("message")?

 - The semantics of packet_flush() is suboptimal, and this
   shortcoming seeps through to the protocol mapped to the
   smart-HTTP transport.

   Originally, packet_flush() was meant as "Here is an end of one
   logical section of what I am going to speak.", hinting that it
   might be a good idea for the underlying implementation to hold
   the packets up to that point in-core and then write(2) them all
   out (i.e. "flush") to the file descriptor only when we handle
   packet_flush().  It never meant "Now I am finished speaking for
   now and it is your turn to speak."

   But because HTTP is inherently a ping-pong protocol where the
   requestor at one point stops talking and lets the responder
   speak, the code to map our protocol to the smart HTTP transport
   made the packet_flush() boundary as "Now I am done talking, it is
   my turn to listen."

   We probably need two kinds of packet_flush().  When a requestor
   needs to say two or more logical groups of things before telling
   the other side "Now I am done talking; it is your turn.", we need
   some marker (i.e. the original meaning of packet_flush()) at the
   end of these logical groups.  And in order to be able to say "Now
   I am done saying everything I need to say at this point for you
   to respond to me.  It is your turn.", we need another kind of
   marker.

 - The fetch-pack direction does the common-parent discovery but the
   push-pack direction does not.  This is OK for the normal
   fast-forward push, in which case we will see a known commit on
   the tip of the branch we are pushing into, but makes forced push
   inefficient.

 - The existing common-parent discovery done on the fetch-pack side
   enumerates commits contiguously traversing the history to the
   past.  We might want to go exponential or Fibonacci to quickly
   find an ancient common commit and bisect the history from there
   (or it might turn out not to be worth it).

 - We may want to

Re: [PATCH] protocol upload-pack-v2

2015-03-10 Thread Kyle J. McKay

On Mar 9, 2015, at 18:38, Duy Nguyen wrote:

A minor point on capability negotiation. I remember why I passed
capabilities via command line instead of this proposal. With this
proposal, daemon.c does not recognize "i18n" capability and cannot
switch to the correct language before it reports an error.

But perhaps I approached it the wrong way. Instead of letting the
daemon produce non-English language directly, if could sort of
standardize the messages and send an error code back in "ERR" line,
together with an English message (current pack-protocol.txt says "ERR"
followed by explanation-text). The client can use this error code to
print non-English messages if it wants to. Perhaps we can reuse http
(or ftp) return codes or some other protocol then customize to fit
Git's needs.


May I suggest that you take a look at RFC 2034 [1].  It describes  
almost exactly this same situation and how SMTP was enhanced to  
support error code numbers that could then be translated.


No reason this enhancement needs to be restricted to protocol v2 if  
it's just an "enhancedstatuscodes" capability the server sends back to  
indicate that its ERR text is in that format.


-Kyle

[1] http://tools.ietf.org/html/rfc2034
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] protocol upload-pack-v2

2015-03-09 Thread Duy Nguyen
A minor point on capability negotiation. I remember why I passed
capabilities via command line instead of this proposal. With this
proposal, daemon.c does not recognize "i18n" capability and cannot
switch to the correct language before it reports an error.

But perhaps I approached it the wrong way. Instead of letting the
daemon produce non-English language directly, if could sort of
standardize the messages and send an error code back in "ERR" line,
together with an English message (current pack-protocol.txt says "ERR"
followed by explanation-text). The client can use this error code to
print non-English messages if it wants to. Perhaps we can reuse http
(or ftp) return codes or some other protocol then customize to fit
Git's needs.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] protocol upload-pack-v2

2015-03-08 Thread Junio C Hamano
Stefan Beller  writes:

>> I do not see a good reason why we want "I am sending N caps"
>> upfront, instead of "this, that, and here is the end of the group".
>
> I thought about having an end marker, so something like
> capabilities start
> thin-pack
> lang
> ofs-delta
> capabilities done
>
> (Each line a pkt-line)
>
> Though all decisions I thought through I tried to put more weight on
> future expandability. Now that I think about it again, it makes no
> difference, whether to use a counter or an end marker.

One reason why I would suggest avoiding "count upfront" is to make
sure we do not repeat the mistake of "Content-Length" which had to
later be corrected by introducing chunked transfer by HTTP folks.
Closer to home, our "type and then size upfront and then contents
and hash the whole thing" loose object format makes it quite hard
to produce without having the whole thing in-core, or otherwise
having a separate way to know the size upfront.

For the capability list, the number of the capabilities you support
may be limited, bounded and may even be known at the compile time,
so count-upfront may not be a burden, but in other parts of the
protocol where you need to feed the result of computation to the
other end, you would need "the group ends here" marker.  It would
be easier for everybody if we can make all types of messages use
the same syntax, regardless of type.

>>> +  cap  =  PKT-LINE("capabilities" SP size LF list)
>>
>> Isn't a cap packet terminated by LF without any "list" following it?
>> The notation PKT-LINE() is "wrap  in a single packet",
>> and I do not think you meant to send the capability line and a series
>> of cap:foo entries in a single packet.
>
> Yeah I meant to use one packet per line
> So after considering your input, you'd want to have
> PKT-LINE("capabilities start")
> PKT-LINE("no-prefix-for-capabilities")
> PKT-LINE("ofs-delta")
> PKT-LINE("agent-as-capability-2.6")
> PKT-LINE("capabilities end")

OK, so that "list" at the end is just a typo; there shouldn't be
"list at the end inside PKT-LINE().

>>> +  size =  *DIGIT
>>> +  capability-list  =  *(capability) [agent LF]
>>> +  capability   =  "cap:" keyvaluepair LF
>>> +  agent=  keyvaluepair LF
>>
>> I do not see a reason to make 'agent' as part of capability.  That
>> was an implementation detail of v1 but v2 does not have an
>> obligation to consider agent announcement as capability.
>
> So I think we don't want to drop the agent announcement as it may
> reveal useful information ("How many outdated clients connect to our
> $HOSTING_SITE?", "I need to debug failures which happen only rarely,
> Oh! it can be narrowed down to clients with agent xyz.")

Don't be overly defensive and try not to misunderstand and see a
criticism where there is none.  All I am saying is that agent
announcement is not annoucing capability.  You may announce many
things, and server or client version may be something you would want
to announce.

I have a feeling that it is a bit too premature to specify the
details at such a low level as "capaiblities are announced by
prefixing four-byte 'c', 'a', 'p', ':' in front" and "a multi-record
group has its element count at the beginning (or an end marker at
the end, for that matter)", and it may be a better idea to outline
all the necessary elements at a bit higher level first---that would
avoid needs for useless exchanges like what we are having right now.

It's that when you write things in EBNF, you are writing something
that you would eventually want to cast in stone, and the
non-terminal names in EBNF matter (they convey the semantics, what
these named things are), and I was reacting to that because I wanted
to make sure we avoid mislabaling things as something that are not.

The "shallow" vs "reference advertisement" is the same.  I think the
former is _not_ part of reference announcement but is an optional
phase of the protocol, but the level of the detail that would make
the difference matter would appear only when you start writing it in
EBNF and call both "reference advertisement".  If you keep the
discussion at the level like "fetch first asks capabilities it wants
upload-pack-2 to enable, optionally gives the current shallow
boundaries when the capaibilty says the other side supports it, and
then starts showing what it has" while we are trying to achieve
concensus on what kind of protocol elements we would need, and what
information each element would carry, the discussion will help us
reach a shared understanding on what to write down in EBNF form
exactly faster, I would imagine.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] protocol upload-pack-v2

2015-03-06 Thread Duy Nguyen
On Sat, Mar 7, 2015 at 11:28 AM, Stefan Beller  wrote:
>>> +
>>> +  advertised-refs  =  (no-refs / list-of-refs)
>>> +   *shallow
>>> +   flush-pkt
>>
>> I am not sure if defining "shallow" as part of "refs advertisement"
>> is a good idea.  The latter lives in the same place in the v1
>> protocol merely because that was how it was later bolted onto the
>> protocol.  But this concern can easily be allayed by retitling
>> "advertised-refs" to something else.
>
> I don't quite understand this. What are your concerns about shallow here?

This made me look for explanation about these shallow lines (commit
ad49136).They are sent when source repo is a shallow one. They tell
the receiver about the bottom the source repo. My experiment with "git
clone --since" shows that such a shallow clone could end up with a few
thousand shallow lines. Not as many as refs, but sending shallow lines
this way still cost more than necessary. If we want to do that, we
need more flexibility that just sending all shallow lines this way.

One of the approach is, because these shallow lines (on source repo)
rarely change. We could simply exchange SHA-1 of the source repo's
"shallow" file first. The client only requests shallow info when SHA-1
does not match.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] protocol upload-pack-v2

2015-03-06 Thread Stefan Beller
On Fri, Mar 6, 2015 at 4:28 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> @@ -67,7 +74,6 @@ gracefully with an error message.
>>error-line =  PKT-LINE("ERR" SP explanation-text)
>>  
>>
>> -
>>  SSH Transport
>
> Noise?
>
>> @@ -124,9 +130,56 @@ has, the first can 'fetch' from the second.  This 
>> operation determines
>>  what data the server has that the client does not then streams that
>>  data down to the client in packfile format.
>>
>> +Capability discovery (v2)
>> +-
>>
>> +In version 1, capability discovery is part of reference discovery and
>> +covered in reference discovery section.
>> +
>> +In version 2, when the client initially connects, the server
>> +immediately sends its capabilities to the client. Then the client must
>> +send the list of server capabilities it wants to use to the server.
>> +
>> +   S: 00XXcapabilities 4\n
>> +   S: 00XXcap:lang\n
>> +   S: 00XXcap:thin-pack\n
>> +   S: 00XXcap:ofs-delta\n
>> +   S: 00XXagent:agent=git/2:3.4.5+custom-739-gb850f98\n
>> +
>> +   C: 00XXcapabilities 3
>> +   C: 00XXcap:thin-pack\n
>> +   C: 00XXcap:ofs-delta\n
>> +   C: 00XXcap:lang=en\n
>> +   C: 00XXagent:agent=git/custom_string\n
>
> I do not see a good reason why we want "I am sending N caps"
> upfront, instead of "this, that, and here is the end of the group".

I thought about having an end marker, so something like
capabilities start
thin-pack
lang
ofs-delta
capabilities done

(Each line a pkt-line)

Though all decisions I thought through I tried to put more weight on
future expandability. Now that I think about it again, it makes no
difference, whether to use a counter or an end marker.

> If you expect the recipient to benefit by being able to pre-allocate
> N slots, then that might make sense, but otherwise, I'd rather see
> us stick to a (weaker) flush that says "group ends here".

I think it's not about pre allocating but counting down. Then you know
at the beginning how much to expect which might become relevant if
that section grows large again. ("The server really wants to send 1500
capability lines? Nope I'll just disconnect because I am on mobile!")

Implementation wise an end marker is easier though (you don't need
to count down, so it feels more stateless to me).

>
> Besides, I do not know how you counted 4 on the S: side and 3 on
> the C: side in the above example and missing LF after 3 ;-).
>

Sorry about that, I added one answer late and forgot to increment the 3.

>> +
>> +  cap  =  PKT-LINE("capabilities" SP size LF list)
>
> Isn't a cap packet terminated by LF without any "list" following it?
> The notation PKT-LINE() is "wrap  in a single packet",
> and I do not think you meant to send the capability line and a series
> of cap:foo entries in a single packet.

Yeah I meant to use one packet per line
So after considering your input, you'd want to have
PKT-LINE("capabilities start")
PKT-LINE("no-prefix-for-capabilities")
PKT-LINE("ofs-delta")
PKT-LINE("agent-as-capability-2.6")
PKT-LINE("capabilities end")

And additionally to that a PKT-LINE should have the ability to grow larger than
0x, which would be encoded with 0x being an escape character
indicating the
length is encoded somehow different. (Maybe take the next 8 bytes
instead of just 4).


>
>> +  size =  *DIGIT
>> +  capability-list  =  *(capability) [agent LF]
>> +  capability   =  "cap:" keyvaluepair LF
>> +  agent=  keyvaluepair LF
>
> I do not see a reason to make 'agent' as part of capability.  That
> was an implementation detail of v1 but v2 does not have an
> obligation to consider agent announcement as capability.

So I think we don't want to drop the agent announcement as it may
reveal useful information ("How many outdated clients connect to our
$HOSTING_SITE?", "I need to debug failures which happen only rarely,
Oh! it can be narrowed down to clients with agent xyz.")

So then we need to decide where to put the agent. And as it is only small
but useful (meta?)-information I'd rather put it at the beginning of the
data exchange, so in case the other side seems to be missbehaving,
it is easier to debug in the hope the agent transmission was still
successful.

>
> server-announcement = PKT-LINE("capabilities" ...) capability-list 
> [agent-announcement]
> capability-list = as you have it w/o "[agent LF]"
> agent-announcement = PKT-LINE("agent=" agent-token LF)
>
> or something, perhaps?

This looks like me as if all capabilities are in one PKT-LINE, which
you seemed to oppose?

>
>> +The client MUST ignore any data before the pkt-line starting with 
>> "capabilities"
>> +for future easy of extension.
>
> s/easy/ease/; but I am not sure if this makes sense.  Without
> knowing the extended preamble, you cannot even tell if a packet line
> that happens to start with "capabilities" is a proper beginning of
> 0-th iteration of v2 protocol, or an embedded data in the preamble,
> no?

I rather thought about the 

Re: [PATCH] protocol upload-pack-v2

2015-03-06 Thread Junio C Hamano
Stefan Beller  writes:

> @@ -67,7 +74,6 @@ gracefully with an error message.
>error-line =  PKT-LINE("ERR" SP explanation-text)
>  
>  
> -
>  SSH Transport

Noise?

> @@ -124,9 +130,56 @@ has, the first can 'fetch' from the second.  This 
> operation determines
>  what data the server has that the client does not then streams that
>  data down to the client in packfile format.
>  
> +Capability discovery (v2)
> +-
>  
> +In version 1, capability discovery is part of reference discovery and
> +covered in reference discovery section.
> +
> +In version 2, when the client initially connects, the server
> +immediately sends its capabilities to the client. Then the client must
> +send the list of server capabilities it wants to use to the server.
> +
> +   S: 00XXcapabilities 4\n
> +   S: 00XXcap:lang\n
> +   S: 00XXcap:thin-pack\n
> +   S: 00XXcap:ofs-delta\n
> +   S: 00XXagent:agent=git/2:3.4.5+custom-739-gb850f98\n
> +
> +   C: 00XXcapabilities 3
> +   C: 00XXcap:thin-pack\n
> +   C: 00XXcap:ofs-delta\n
> +   C: 00XXcap:lang=en\n
> +   C: 00XXagent:agent=git/custom_string\n

I do not see a good reason why we want "I am sending N caps"
upfront, instead of "this, that, and here is the end of the group".
If you expect the recipient to benefit by being able to pre-allocate
N slots, then that might make sense, but otherwise, I'd rather see
us stick to a (weaker) flush that says "group ends here".

Besides, I do not know how you counted 4 on the S: side and 3 on
the C: side in the above example and missing LF after 3 ;-).

> +
> +  cap  =  PKT-LINE("capabilities" SP size LF list)

Isn't a cap packet terminated by LF without any "list" following it?
The notation PKT-LINE() is "wrap  in a single packet",
and I do not think you meant to send the capability line and a series
of cap:foo entries in a single packet.

> +  size =  *DIGIT
> +  capability-list  =  *(capability) [agent LF]
> +  capability   =  "cap:" keyvaluepair LF
> +  agent=  keyvaluepair LF

I do not see a reason to make 'agent' as part of capability.  That
was an implementation detail of v1 but v2 does not have an
obligation to consider agent announcement as capability.

server-announcement = PKT-LINE("capabilities" ...) capability-list 
[agent-announcement]
capability-list = as you have it w/o "[agent LF]"
agent-announcement = PKT-LINE("agent=" agent-token LF)

or something, perhaps?

> +The client MUST ignore any data before the pkt-line starting with 
> "capabilities"
> +for future easy of extension.

s/easy/ease/; but I am not sure if this makes sense.  Without
knowing the extended preamble, you cannot even tell if a packet line
that happens to start with "capabilities" is a proper beginning of
0-th iteration of v2 protocol, or an embedded data in the preamble,
no?

> +Reference Discovery (v2)
> +
> +
> +In version 2, reference discovery is initiated by the client with
> +"want-refs" line. The client may skip reference discovery phase
> +entirely by not sending "want-refs" (e.g. the client has another way
> +to retrieve ref list).
> +
> +
> +  want-refs  =  PKT-LINE("want-refs" SP mode *argument)
> +  mode   =  "all"
> +  argument   =  SP arg
> +  arg=  1*(LC_ALPHA / DIGIT / "-" / "_" / "=")
> +
> +
> +Mode "all" sends all visible refs to the client like in version 1. No
> +arguments are accepted in this mode. More modes may be available based
> +on capabilities.

I tend to agree with Duy that the protocol must anticipate that args
can become longer.

> +
> +  advertised-refs  =  (no-refs / list-of-refs)
> +   *shallow
> +   flush-pkt

I am not sure if defining "shallow" as part of "refs advertisement"
is a good idea.  The latter lives in the same place in the v1
protocol merely because that was how it was later bolted onto the
protocol.  But this concern can easily be allayed by retitling
"advertised-refs" to something else.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] protocol upload-pack-v2

2015-03-06 Thread Duy Nguyen
On Sat, Mar 7, 2015 at 6:38 AM, Stefan Beller  wrote:
> +Reference Discovery (v2)
> +
> +
> +In version 2, reference discovery is initiated by the client with
> +"want-refs" line. The client may skip reference discovery phase
> +entirely by not sending "want-refs" (e.g. the client has another way
> +to retrieve ref list).
> +
> +
> +  want-refs  =  PKT-LINE("want-refs" SP mode *argument)
> +  mode   =  "all"
> +  argument   =  SP arg
> +  arg=  1*(LC_ALPHA / DIGIT / "-" / "_" / "=")
> +

On the same line with capabilities, we would not want to run into a
situation where we have too many arguments to send and exceed pkt-line
limit. So perhaps do one argument per pkt-line too, ending with a
pkt-flush.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] protocol upload-pack-v2

2015-03-06 Thread Duy Nguyen
I'm still wondering if we should reserve more from the packet length.
We have used length  for pkt-flush. Shawn pointed out that we
still have 0001, 0002 and 0003 but we may use some of them to avoid
abuse of pkt-flush in some cases. Perhaps we could limit packet length
to 0xfff0, so we have 0xfff1-0x to assign special meanings in
future, if we have to.

On Sat, Mar 7, 2015 at 6:38 AM, Stefan Beller  wrote:
> +In version 2, when the client initially connects, the server
> +immediately sends its capabilities to the client. Then the client must
> +send the list of server capabilities it wants to use to the server.
> +
> +   S: 00XXcapabilities 4\n
> +   S: 00XXcap:lang\n
> +   S: 00XXcap:thin-pack\n
> +   S: 00XXcap:ofs-delta\n
> +   S: 00XXagent:agent=git/2:3.4.5+custom-739-gb850f98\n
> +
> +   C: 00XXcapabilities 3
> +   C: 00XXcap:thin-pack\n
> +   C: 00XXcap:ofs-delta\n
> +   C: 00XXcap:lang=en\n
> +   C: 00XXagent:agent=git/custom_string\n
> +
> +
> +  cap  =  PKT-LINE("capabilities" SP size LF list)
> +  size =  *DIGIT
> +  capability-list  =  *(capability) [agent LF]
> +  capability   =  "cap:" keyvaluepair LF
> +  agent=  keyvaluepair LF
> +  keyvaluepair =  1*(LC_ALPHA / DIGIT / "-" / "_" / "=")

If we send one cap per pkt-line, cap can contain spaces. The question
is, should we allow them?

Ending cap lines with LF seems redudant because we already know the line length.

> +  LC_ALPHA =  %x61-7A
> +
> +
> +The client MUST ignore any data before the pkt-line starting with 
> "capabilities"
> +for future easy of extension.
> +
> +The server MUST advertise "size" as the decimal number of lines following
> +the "capabilities" line. This includes lines starting "cap:" and "agent:" 
> for now.
> +The client MUST ignore lines which start with an unknown pattern.

I think the common pattern in our protocol is to end these with a
pkt-flush, instead of send the number of items upfront. If we do that
we don't have to specify "cap:" or "agent:" either. All pkt-lines
until pkt-flush at the beginning of v2 are cap lines. And agent is
just another "capability".
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] protocol upload-pack-v2

2015-03-06 Thread Stefan Beller
On Fri, Mar 6, 2015 at 3:38 PM, Stefan Beller  wrote:
> From: Duy Nguyen 

Oops. I edited the proposal from Duy heavily(?), such that it is
different from what he proposed 4 days ago.

In my impression this is what most of the participants would agree on.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html