On Thu, Apr 2, 2015 at 3:18 PM, Martin Fick <mf...@codeaurora.org> wrote: >> The current protocol has the following problems that limit >> us: >> >> - It is not easy to make it resumable, because we >> recompute every time. This is especially problematic for >> the initial fetch aka "clone" as we will be talking about >> a large transfer. Redirection to a bundle hosted on CDN >> might be something we could do transparently. >> >> - The protocol extension has a fairly low length limit. >> >> - Because the protocol exchange starts by the server side >> advertising all its refs, even when the fetcher is >> interested in a single ref, the initial overhead is >> nontrivial, especially when you are doing a small >> incremental update. The worst case is an auto-builder >> that polls every five minutes, even when there is no new >> commits to be fetched. > > A lot of focus about the problems with ref advertisement is > about the obvious problem mentioned above (a bad problem > indeed). I would like to add that there is another related > problem that all potential solutions to the above problem do > not neccessarily improve. When polling regularly there is > also no current efficient way to check on the current state of > all refs. It would be nice to also be able to get an > incremental update on large refs spaces.
I think once the new protocol is in place, the server could advertise the capability to send a differential of refs. To make sure that works the capability phase should be strictly separated from the rest, so you can think of any new fancy scheme to transmit refs or objects, and once both client and server agree on that fancy scheme both know when to expect the "new changed" protocol. So from a high level perspective it should look like: Phase 1) negotiation of capabilities Phase 2) ref advertisement (i.e. changes in the DAG end points) Phase 3) transmitting the missing blobs. The crucial point now is to make sure Phase 1) is not growing to large in transmission size / required compute power (/ complexity). And as everybody out there wants to invent new schemes how to do 2) and 3) efficient, I wonder if we need to do Phase 1) as a differential as well, so I'd presume the optimum could look like Client: Last time we talked the capabilities you advertised hashed to $SHA Server: That's right, but additionally I have "push_cert_nonce=$value" In the non-optimal case: Client: Last time we talked the capabilities you advertised hashed to $SHA Server: I don't know that value, here comes the list of all capabilities I can do: ... ... I like that approach as it would really break down to transmitting the minimal amount of information most of the time. The downside is to know which capabilities are cache-able and then hash-able, such that the remote side only needs to maintain only a very small set of advertised capability lists and their hash. For example the nonce for signed pushes will hopefully never be the same, so it makes no sense to have them inside the capabilities cache. Having such a capabilities cache would give us a long time until the phase to negotiate the capabilities will grow too large again (most of the capabilities I'd assume are rather static per server) And the way I understand the current situation, it's all about talking this early negotiation phase, which then allows us to model the refs advertisement and the blob transmission later on as a response to upcoming problems in the future. > > Thanks, > > -Martin > > -- > The Qualcomm Innovation Center, Inc. is a member of Code > Aurora Forum, hosted by The Linux Foundation > -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html