Re: Is anyone working on a next-gen Git protocol (Re: [PATCH v3 0/8] Hiding refs)
On Tue, Feb 5, 2013 at 5:03 PM, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason writes: > >> Do you have any plans for something that *does* have the reduction of >> network bandwidth as a primary goal? > > Uncluttering gives reduction of bandwidth anyway, so I do not see > much point in the distinction you seem to be making. Doing this work wouldn't only give us a way to specify which refs we want, but if done correctly would future-proof the protocol in case we want to add any other extensions down the line in a backwards-compatible fashion without having the server first spew all his refs at us. Anyway, an implementation that allows a client to say "I want X" is simpler than an implementation where a server has to anticipate in advance which X the clients will ask for. >> Is this what you've been working on? Because if so I misunderstood you >> thinking you were going to work on something that gave clients the >> ability specify what they wanted before the initial ref advertisement. >> ... >> 4. http://thread.gmane.org/gmane.comp.version-control.git/207190 > > "Who speaks first" mentioned in 4. above, was primarily about > "delaying ref advertisement", which would be a larger protocol > change. Nobody seems to have attacked it since it was discussed, > and I was tired of hearing nothing but complaints and whines. This > "hiding refs" series was done as a cheaper way to solve a related > issue, without having to wait for the solution of "delaying > advertisement", which is an orthogonal issue. Oh sure. I just wanted to know if you were working on delaying ref advertisement to avoid duplicating efforts. I had the impression you were given your earlier E-Mail, but obviously we had a misunderstanding. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is anyone working on a next-gen Git protocol (Re: [PATCH v3 0/8] Hiding refs)
Ævar Arnfjörð Bjarmason writes: > Do you have any plans for something that *does* have the reduction of > network bandwidth as a primary goal? Uncluttering gives reduction of bandwidth anyway, so I do not see much point in the distinction you seem to be making. > Is this what you've been working on? Because if so I misunderstood you > thinking you were going to work on something that gave clients the > ability specify what they wanted before the initial ref advertisement. > ... > 4. http://thread.gmane.org/gmane.comp.version-control.git/207190 "Who speaks first" mentioned in 4. above, was primarily about "delaying ref advertisement", which would be a larger protocol change. Nobody seems to have attacked it since it was discussed, and I was tired of hearing nothing but complaints and whines. This "hiding refs" series was done as a cheaper way to solve a related issue, without having to wait for the solution of "delaying advertisement", which is an orthogonal issue. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is anyone working on a next-gen Git protocol (Re: [PATCH v3 0/8] Hiding refs)
On Wed, Jan 30, 2013 at 7:45 PM, Junio C Hamano wrote: > The third round. > > - Multi-valued variable transfer.hiderefs lists prefixes of ref >hierarchies to be hidden from the requests coming over the >network. > > - A configuration optionally allows uploadpack to accept fetch >requests for an object at the tip of a hidden ref. > > Elsewhere, we discussed "delaying ref advertisement" (aka "expand > refs"), but it is an orthogonal feature and this "hiding refs > completely from advertisement" series does not attempt to address. I'm a bit late to this so sorry if this has been covered before. In the initial draft of this series the rationale for it was "reducing the network cost while talking with a repository with tons of refs"[1]. But later you seem to have changed your mind, and "network bandwidth reduction of advertisement is a side effect of clutter reduction, and not necessarily the primary goal". Do you have any plans for something that *does* have the reduction of network bandwidth as a primary goal? In October I asked if anyone was working on a next-gen Git protocol[3] that would provide clients with the ability to specify what refs they wanted. You replied to me off-list saying "Yes". Is this what you've been working on? Because if so I misunderstood you thinking you were going to work on something that gave clients the ability specify what they wanted before the initial ref advertisement. I'm still very keen to have that ability, so if you're not working on it I just might give it a go. 1. http://article.gmane.org/gmane.comp.version-control.git/213951 2. http://article.gmane.org/gmane.comp.version-control.git/213984 3. http://article.gmane.org/gmane.comp.version-control.git/214025 4. http://thread.gmane.org/gmane.comp.version-control.git/207190 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is anyone working on a next-gen Git protocol?
On Wed, Oct 10, 2012 at 6:44 PM, Nguyen Thai Ngoc Duy wrote: > On Thu, Oct 11, 2012 at 3:46 AM, Junio C Hamano wrote: >> Steffen Prohaska writes: >> >>> I've recently discovered that the current protocol can be amazingly >>> inefficient when it comes to transferring binary objects. Assuming two >>> repositories that are in sync. After a 'git checkout --orphan && git >>> commit', a subsequent transfers sends all the blobs attached to the new >>> commit, although the other side already has all the blobs. >> >> I do not think it has anything to do with binary, but what you >> deserve from using orphan, where you declared that the history does >> not have anything to do with the original. >> >> If both of your repositories had the two paralle lines of these >> histories as branches, the transfer would have went well with or >> without binary objects. > > On the same inefficient subject, git does not try to share common > objects for non-commit refs, for example tags pointing to trees. I > have such a peculiar repo and if a new tag shares 90% the tree with > existing tags, git-fetch to sends the whole tree of the new tag over > the wire. It does not seem easy to fix though and is probably rare > enough that does not justify proper support. As a work around, I > generate commits that link all these tags/trees together in a > predetermined order. Not nice but works ok. Aside from saving a huge amount of CPU during the "Counting objects" phase, the compressed bitmap work we presented in JGit solves this by working off the complete reachability graph, and not just some subset related to a cut made across the commit graph. Unfortunately we took a shortcut and didn't create bitmaps for non-commits, but this is a trivial modification to the algorithm and the storage. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is anyone working on a next-gen Git protocol?
On Thu, Oct 11, 2012 at 3:46 AM, Junio C Hamano wrote: > Steffen Prohaska writes: > >> I've recently discovered that the current protocol can be amazingly >> inefficient when it comes to transferring binary objects. Assuming two >> repositories that are in sync. After a 'git checkout --orphan && git >> commit', a subsequent transfers sends all the blobs attached to the new >> commit, although the other side already has all the blobs. > > I do not think it has anything to do with binary, but what you > deserve from using orphan, where you declared that the history does > not have anything to do with the original. > > If both of your repositories had the two paralle lines of these > histories as branches, the transfer would have went well with or > without binary objects. On the same inefficient subject, git does not try to share common objects for non-commit refs, for example tags pointing to trees. I have such a peculiar repo and if a new tag shares 90% the tree with existing tags, git-fetch to sends the whole tree of the new tag over the wire. It does not seem easy to fix though and is probably rare enough that does not justify proper support. As a work around, I generate commits that link all these tags/trees together in a predetermined order. Not nice but works ok. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is anyone working on a next-gen Git protocol?
From: "Junio C Hamano" Steffen Prohaska writes: I've recently discovered that the current protocol can be amazingly inefficient when it comes to transferring binary objects. Assuming two repositories that are in sync. After a 'git checkout --orphan && git commit', a subsequent transfers sends all the blobs attached to the new commit, although the other side already has all the blobs. I do not think it has anything to do with binary, but what you deserve from using orphan, where you declared that the history does not have anything to do with the original. If both of your repositories had the two paralle lines of these histories as branches, the transfer would have went well with or without binary objects. -- Steffen, An alternative could be a shallow clone for just those branches with the binary objects, so that the git objects are still identical. Or use a replace/graft to trim the line of development. It's still a fudge, but something you could look at. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is anyone working on a next-gen Git protocol?
Steffen Prohaska writes: > I've recently discovered that the current protocol can be amazingly > inefficient when it comes to transferring binary objects. Assuming two > repositories that are in sync. After a 'git checkout --orphan && git > commit', a subsequent transfers sends all the blobs attached to the new > commit, although the other side already has all the blobs. I do not think it has anything to do with binary, but what you deserve from using orphan, where you declared that the history does not have anything to do with the original. If both of your repositories had the two paralle lines of these histories as branches, the transfer would have went well with or without binary objects. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is anyone working on a next-gen Git protocol?
On Oct 8, 2012, at 6:27 PM, Junio C Hamano wrote: > Once we go into "want/have" phase, I do not think there is a need > for fundamental change in the protocol (by this, I am not counting a > change to send "have"s sparsely and possibly backtracking to bisect > history, etc. as "fundamental"). I've recently discovered that the current protocol can be amazingly inefficient when it comes to transferring binary objects. Assuming two repositories that are in sync. After a 'git checkout --orphan && git commit', a subsequent transfers sends all the blobs attached to the new commit, although the other side already has all the blobs. This behavior is especially annoying when (mis)using git to store binary files. I was thinking for a while that it might be a reasonable idea to store binary files in a submodule and frequently cut the history in order to save space. The history would have little value anyway, since diff and merge don't make much sense with binary files. Eventually, I abandoned the idea due to the current behavior of the protocol. I had expected that git would be smarter and behave more like rsync, for example, by skipping big blobs as soon as it recognizes that they are already available at both sides. Maybe the new protocol could include an optimization for the described case. I don't know whether this would be a fundamental change. Steffen -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is anyone working on a next-gen Git protocol?
Andreas Ericsson writes: > You'll want that to be a single "wants" message to avoid incurring > insane amounts of roundtrip latency with lots of refs. github and > other hosted services are quite popular, but with my 120ms ping > rtt I'd be spending half a minute just telling the other side what > I want when I fetch from a repo with 250 refs. Peff's recent patch when applied on the server side would help alleviate the load to produce these refs, but it obviously would not cut the network cost. In order to change this, we need to swap "who speaks first". Once we go into "want/have" phase, I do not think there is a need for fundamental change in the protocol (by this, I am not counting a change to send "have"s sparsely and possibly backtracking to bisect history, etc. as "fundamental"). -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is anyone working on a next-gen Git protocol?
On 10/07/2012 09:57 PM, Ævar Arnfjörð Bjarmason wrote: > On Wed, Oct 3, 2012 at 9:13 PM, Junio C Hamano wrote: >> Ævar Arnfjörð Bjarmason writes: >> >>> I'm creating a system where a lot of remotes constantly fetch from a >>> central repository for deployment purposes, but I've noticed that even >>> with a remote.$name.fetch configuration to only get certain refs a >>> "git fetch" will still call git-upload pack which will provide a list >>> of all references. >> >> It has been observed that the sender has to advertise megabytes of >> refs because it has to speak first before knowing what the receiver >> wants, even when the receiver is interested in getting updates from >> only one of them, or worse yet, when the receiver is only trying to >> peek the ref it is interested has been updated. > > Has anyone started working on a next-gen Git protocol as a result of > this discussion? If not I thought I'd give it a shot if/when I have > time. > > The current protocol is basically (S = Server, C = Client) > > S: Spew out first ref > S: Advertisement of capabilities > S: Dump of all our refs > C/S: Declare wanted refs, negotiate with server > S: Send pack to client, if needed > > And I thought I'd basically turn it into: > > C: Connect to server, declare what protocol we understand > C: Advertisement of capabilities > S: Advertisement of capabilities > C/S: Negotiate what we want > C/S: Same as v1, without the advertisement of capabilities, and maybe > don't dump refs at all > > Basically future-proofing it by having the client say what it supports > to begin with along with what it can handle (like in HTTP). > > Then in the negotiation phase the client & server would go back & > forth about what they want & how they want it. I'd planned to > implement something like: > > C: want_refs refs/heads/* > S: OK to that > C: want_refs refs/tags/* > S: OK to that > > Or: > > C: want_refs refs/heads/master > S: OK to that > C: want_refs refs/tags/v* > S: OK to that > You'll want that to be a single "wants" message to avoid incurring insane amounts of roundtrip latency with lots of refs. github and other hosted services are quite popular, but with my 120ms ping rtt I'd be spending half a minute just telling the other side what I want when I fetch from a repo with 250 refs. It's a flagday and a half to change the protocol though, so I expect it'll have to wait for 2.0, unless the current client-side part of it is dumb and ignores existing refs when requesting its "wants", in which case the server can just stop advertising existing refs and most of the speedup is already done. -- Andreas Ericsson andreas.erics...@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is anyone working on a next-gen Git protocol?
Ævar Arnfjörð Bjarmason writes: > On Wed, Oct 3, 2012 at 9:13 PM, Junio C Hamano wrote: >> Ævar Arnfjörð Bjarmason writes: >> >>> I'm creating a system where a lot of remotes constantly fetch from a >>> central repository for deployment purposes, but I've noticed that even >>> with a remote.$name.fetch configuration to only get certain refs a >>> "git fetch" will still call git-upload pack which will provide a list >>> of all references. >> >> It has been observed that the sender has to advertise megabytes of >> refs because it has to speak first before knowing what the receiver >> wants, even when the receiver is interested in getting updates from >> only one of them, or worse yet, when the receiver is only trying to >> peek the ref it is interested has been updated. > > Has anyone started working on a next-gen Git protocol as a result of > this discussion? I and Shawn helped privately somebody from Gerrit circle, where the initial ref advertisement is a huge problem (primarily because they add tons of refs to one commit that eventually goes to their integration branch), to coming up with a problem description and proposal document to kick-start a discussion some time ago, but not much has happened since. Unless I hear from them soonish, I'll send a cleaned-up version of the draft before I leave for my vacation. The gist of it is that the current protocol cannot be upgraded in place because "who speaks first" is not something you can update with capability, so we would need upload-pack-v2 that lets the fetching side speak first. "What is spoken in the first message" is a separate issue, and one of the things it can address is to allow the ends to reduce the amount of ref advertisement that ends up not getting used in the end, but once we allow the fetcher to speak first, we have much wider possibilities. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is anyone working on a next-gen Git protocol?
On Sun, Oct 07, 2012 at 09:57:56PM +0200, Ævar Arnfjörð Bjarmason wrote: > Has anyone started working on a next-gen Git protocol as a result of > this discussion? If not I thought I'd give it a shot if/when I have > time. I haven't, and don't really plan on it soon (I have a few smaller things I'm working on, then I'd like to look into the EWAH bitmap stuff from Shawn next). > The current protocol is basically (S = Server, C = Client) > > S: Spew out first ref > S: Advertisement of capabilities > S: Dump of all our refs > C/S: Declare wanted refs, negotiate with server > S: Send pack to client, if needed In the "C" portion there, there is also "client acknowledges a subset of capabilities shown by server" while it is declaring wanted refs. > And I thought I'd basically turn it into: > > C: Connect to server, declare what protocol we understand > C: Advertisement of capabilities > S: Advertisement of capabilities The capability negotiation right now is that the server offers and the client accepts. Are you swapping that so that the client offers and the server accepts? Or are you thinking that they would be sent simultaneously here? That could drop one round-trip (it's probably not that important for git-over-tcp, but smart-http cares a lot about round trips). But it also introduces a complexity with future additions (one side may not know how to present its capabilities until understanding what the other side can do). > C/S: Negotiate what we want Refs we want, or capabilities we want? > C/S: Same as v1, without the advertisement of capabilities, and maybe > don't dump refs at all > > Basically future-proofing it by having the client say what it supports > to begin with along with what it can handle (like in HTTP). I feel like this "maybe..." bit needs more fleshed out before designing the first part. I like the idea of future-proofing first and then adding new features second, but what does the "don't advertise all refs" protocol look like? Presumably the client is going to say "I'm interested in refs/heads/* and refs/tags/*" or something. Does that come with the capabilities? Or is it a new protocol phase? I think we need to know what the second half of the two-step process will look like to be sure the first half will accommodate it (and the answer may be as simple as saying "they're not sending capabilities, they're sending arbitrary key/value items, with the knowledge that the other side may not understand particular keys, and we have to be prepared to handle both cases). > Then in the negotiation phase the client & server would go back & > forth about what they want & how they want it. I'd planned to > implement something like: > > C: want_refs refs/heads/* > S: OK to that > C: want_refs refs/tags/* > S: OK to that > > Or: > > C: want_refs refs/heads/master > S: OK to that > C: want_refs refs/tags/v* > S: OK to that That seems simple. But how will it work over smart-http? Are we adding a round-trip to do want_refs negotiation? -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is anyone working on a next-gen Git protocol?
On Sun, Oct 07, 2012 at 09:57:56PM +0200, Ævar Arnfjörð Bjarmason wrote: > > Has anyone started working on a next-gen Git protocol as a result of > this discussion? If not I thought I'd give it a shot if/when I have > time. Unfortunately, client signaling the version is nasty to do in ways that wouldn't cause current servers to hang up or do other undesirable things. git://: Git-daemon will hang up[1] if it receives command it won't understand (and one can't add arguments either). ssh://: Commands are NAKed in non-standard ways (e.g. Gitolite vs. shell) and one can't add arguments. file://: That's easy. CONNECT: The helper needs to be told that v2 is supported (helper doing the rest). Maybe with git://, one could hack the stuff in similar way as virtual hosting was added. But that won't work with SSH (nor one can use environment with SSH). :-/ [1] And there is no guarantee that the server end of git:// is git-daemon. There's at least one git:// server implemetation that responds to unknown commands by ERR packet followed by hangup. -Ilari -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Is anyone working on a next-gen Git protocol?
On Wed, Oct 3, 2012 at 9:13 PM, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason writes: > >> I'm creating a system where a lot of remotes constantly fetch from a >> central repository for deployment purposes, but I've noticed that even >> with a remote.$name.fetch configuration to only get certain refs a >> "git fetch" will still call git-upload pack which will provide a list >> of all references. > > It has been observed that the sender has to advertise megabytes of > refs because it has to speak first before knowing what the receiver > wants, even when the receiver is interested in getting updates from > only one of them, or worse yet, when the receiver is only trying to > peek the ref it is interested has been updated. Has anyone started working on a next-gen Git protocol as a result of this discussion? If not I thought I'd give it a shot if/when I have time. The current protocol is basically (S = Server, C = Client) S: Spew out first ref S: Advertisement of capabilities S: Dump of all our refs C/S: Declare wanted refs, negotiate with server S: Send pack to client, if needed And I thought I'd basically turn it into: C: Connect to server, declare what protocol we understand C: Advertisement of capabilities S: Advertisement of capabilities C/S: Negotiate what we want C/S: Same as v1, without the advertisement of capabilities, and maybe don't dump refs at all Basically future-proofing it by having the client say what it supports to begin with along with what it can handle (like in HTTP). Then in the negotiation phase the client & server would go back & forth about what they want & how they want it. I'd planned to implement something like: C: want_refs refs/heads/* S: OK to that C: want_refs refs/tags/* S: OK to that Or: C: want_refs refs/heads/master S: OK to that C: want_refs refs/tags/v* S: OK to that As a proof of concept (and also something that'll solve the issue I had), but by adding an initial negotiation phase the protocol should be open to any future extensions without making assumptions about the client wanting to know about all of the server's refs, unlike the current protocol. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html