On Thu, Apr 2, 2015 at 3:18 PM, Martin Fick <mf...@codeaurora.org> wrote:
>> The current protocol has the following problems that limit
>> us:
>>
>>  - It is not easy to make it resumable, because we
>> recompute every time.  This is especially problematic for
>> the initial fetch aka "clone" as we will be talking about
>> a large transfer. Redirection to a bundle hosted on CDN
>> might be something we could do transparently.
>>
>>  - The protocol extension has a fairly low length limit.
>>
>>  - Because the protocol exchange starts by the server side
>> advertising all its refs, even when the fetcher is
>> interested in a single ref, the initial overhead is
>> nontrivial, especially when you are doing a small
>> incremental update.  The worst case is an auto-builder
>> that polls every five minutes, even when there is no new
>> commits to be fetched.
>
> A lot of focus about the problems with ref advertisement is
> about the obvious problem mentioned above (a bad problem
> indeed).  I would like to add that there is another related
> problem that all potential solutions to the above problem do
> not neccessarily improve.   When polling regularly there is
> also no current efficient way to check on the current state of
> all refs.  It would be nice to also be able to get an
> incremental update on large refs spaces.

I think once the new protocol is in place, the server could advertise
the capability to send a differential of refs.

To make sure that works the capability phase should be strictly separated
from the rest, so you can think of any new fancy scheme to transmit
refs or objects, and once both client and server agree on that fancy scheme
both know when to expect the "new changed" protocol.

So from a high level perspective it should look like:
Phase 1) negotiation of capabilities
Phase 2) ref advertisement (i.e. changes in the DAG end points)
Phase 3) transmitting the missing blobs.

The crucial point now is to make sure Phase 1) is not growing to large in
transmission size / required compute power (/ complexity).

And as everybody out there wants to invent new schemes how to do 2) and 3)
efficient, I wonder if we need to do Phase 1) as a differential as well, so I'd
presume the optimum could look like

Client: Last time we talked the capabilities you advertised hashed to $SHA
Server: That's right, but additionally I have "push_cert_nonce=$value"

In the non-optimal case:
Client: Last time we talked the capabilities you advertised hashed to $SHA
Server: I don't know that value, here comes the list of all
capabilities I can do:
 ...
 ...

I like that approach as it would really break down to transmitting the minimal
amount of information most of the time. The downside is to know which
capabilities are cache-able and then hash-able, such that the remote side
only needs to maintain only a very small set of advertised capability lists
and their hash. For example the nonce for signed pushes will hopefully
never be the same, so it makes no sense to have them inside the capabilities
cache.

Having such a capabilities cache would give us a long time until the
phase to negotiate the capabilities will grow too large again (most of the
capabilities I'd assume are rather static per server)

And the way I understand the current situation, it's all about talking this
early negotiation phase, which then allows us to model the refs
advertisement and
the blob transmission later on as a response to upcoming problems in the future.

>
> Thanks,
>
> -Martin
>
> --
> The Qualcomm Innovation Center, Inc. is a member of Code
> Aurora Forum, hosted by The Linux Foundation
>
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to