Re: [PATCH 0/2] Hiding some refs in ls-remote
On Mon, Jan 21, 2013 at 03:33:46PM -0800, Junio C Hamano wrote: > Jeff King writes: > > >> It may even make sense to have "git push" honor it, excluding them > >> from "git push --mirror" (or "git push --all" if some of the > >> branches are hidden); I haven't thought things through, though. > > > > That is harder, as that is something that happens on the client. How > > does the client learn about the transfer.hiderefs setting on the remote? > > No, I was talking about running "git push" from a repository that > has the transfer.hiderefs set, emulating a fetch from a client by > pushing out in the reverse direction. Ah. That seems a bit more questionable to me, as you are assuming that the face that the repository shows to network clients is the same as it would show when it is the client itself. That wouldn't be true, for example, when pushing to a backup repository which would expect to get everything. Of course the same problem comes from a backup repository which wants to fetch from you. Which again comes down to the fact that I think this ref-hiding is really in the eye of the receiver, not the sender. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Hiding some refs in ls-remote
Jeff King writes: >> It may even make sense to have "git push" honor it, excluding them >> from "git push --mirror" (or "git push --all" if some of the >> branches are hidden); I haven't thought things through, though. > > That is harder, as that is something that happens on the client. How > does the client learn about the transfer.hiderefs setting on the remote? No, I was talking about running "git push" from a repository that has the transfer.hiderefs set, emulating a fetch from a client by pushing out in the reverse direction. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Hiding some refs in ls-remote
On Sun, Jan 20, 2013 at 10:06:33AM -0800, Junio C Hamano wrote: > Jeff King writes: > > >>[uploadPack] > >>hiderefs = refs/changes > > > > Would you want to do the same thing on receive-pack? It could benefit > > from the same reduction in network cost (although it tends to be invoked > > less frequently than upload-pack). > > Do *I* want to do? Not when there already is a patch exists; I'd > rather take existing and tested patch ;-) The patch we have is below, but I do not think you want it, as it is missing several things: - docs and tests - handling multiple values - anything but raw prefix matching (you even have to put in the "/" yourself). It was not my patch originally, and I never bothered to clean and upstream it because I didn't think it was something anybody else would be interested in. I'm happy to do so, but if you are working in the area anyway, it would make sense to be part of your series. -Peff --- diff --git b/builtin/receive-pack.c a/builtin/receive-pack.c index 0afb8b2..b22670c 100644 --- b/builtin/receive-pack.c +++ a/builtin/receive-pack.c @@ -39,6 +39,7 @@ static void *head_name_to_free; static int auto_gc = 1; static const char *head_name; static void *head_name_to_free; +static const char *hide_refs; static int sent_capabilities; static enum deny_action parse_deny_action(const char *var, const char *value) @@ -113,11 +114,19 @@ static void show_ref(const char *path, const unsigned char *sha1) return 0; } + if (strcmp(var, "receive.hiderefs") == 0) { + git_config_string(&hide_refs, var, value); + return 0; + } + return git_default_config(var, value, cb); } static void show_ref(const char *path, const unsigned char *sha1) { + if (hide_refs && strncmp(path, hide_refs, strlen(hide_refs)) == 0) + return 0; + if (sent_capabilities) packet_write(1, "%s %s\n", sha1_to_hex(sha1), path); else -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Hiding some refs in ls-remote
On Sun, Jan 20, 2013 at 02:08:32PM -0800, Junio C Hamano wrote: > Junio C Hamano writes: > > > Jeff King writes: > > > >>> [uploadPack] > >>> hiderefs = refs/changes > >> > >> Would you want to do the same thing on receive-pack? It could benefit > >> from the same reduction in network cost (although it tends to be invoked > >> less frequently than upload-pack). > > ... > > As I said, I view this primarily as solving the cluttering issue. > > The use of hiderefs to hide these refs that point at objects I do > > not consider belong to my repository from the pushers equally makes > > sense as such, I think. > > Now that raises one question. Should this be transfer.hiderefs that > governs both upload-pack and receive-pack? I tend to think the > answer is yes. Yes, that is what I was getting at (and it should have individual hiderefs for each side that override the transfer.*, in case somebody wants to make the distinction). > It may even make sense to have "git push" honor it, excluding them > from "git push --mirror" (or "git push --all" if some of the > branches are hidden); I haven't thought things through, though. That is harder, as that is something that happens on the client. How does the client learn about the transfer.hiderefs setting on the remote? It has to be out-of-band, at which point calling it by the same name has little benefit. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Hiding some refs in ls-remote
On Sat, Jan 19, 2013 at 11:16:00AM -0800, Junio C Hamano wrote: > For example, if you mirror-clone from either of my repositories from > GitHub: > > git clone --mirror git://github.com/git/git/ > git clone --mirror git://github.com/gitster/git/ > > you will see stuff that does not belong to the project; objects that > are only reachable from refs/pull/* are not something I approved to > be placed in my repository; I as the project owner do not want to > give these objects to a cloner as if they are part of my project. > > The server side can hide refs/pull/ and refs/pull-request-tags/ so > that clones (and ls-remote) will not see clutter, and nobody gets > hurt. I think it is unfortunately not so simple as that. You do not want them to be part of your "clone --mirror", but some people do (because they will inspect them when evaluating the pull request). And that decision is, I think, in the eye of the cloner, not the eye of the repo owner. I think in your case it is a little harder to see, in that you do not care about the PR mechanism for git.git at all, but let us think for a minute about a project that actually uses PRs. For an ordinary developer, they would be content to fetch the branch tips and tags (and that is what we do by default with "git clone). They do not need to fetch all of refs/pull. If they learn out-of-band that PR 123 exists, they can in theory ask for refs/pull/123/head. That works even with your proposal, because it is just about dropping the advertisement, not the availability of refs. But what about entities which really do want all of refs/pull? I can think of two good reasons to want them: 1. You really do want a mirror, because you are creating a backup or alternate distribution channel. IOW, you are using "git clone --mirror", but it is missing those refs. 2. You are a developer who is sometimes disconnected. You want to fetch all of the PRs, and then examine them at your leisure. Without advertising, how do they ask for the wildcard of `refs/pull/*`? They're stuck massaging some out-of-band data into a set of distinct fetch refspecs. I don't know much about what's in Gerrit's refs/changes, but I imagine one could make a similar argument that their value is in the eye of the client. And later you give this example: > Another example. If you run ls-remote against the above two > repositories, you will notice that the latter has tons more > branches. The former are to publish only the primary integration > branches, while the latter are to show individual topics. > > I wish I didn't have to do this if I could. > > We have periodical "What's cooking" postings that let interested > parties learn topics out-of-band. If I can hide refs/heads/??/* > from normal users' clones while actually keeping individual topics > there at the same place, I do not have to push into two places. Most people do not want to see those heads. But some people (like me) do, and it is a great convenience to be able to fetch them all with a wildcard refspec, which cannot work if they are not advertised. I would have to parse what's cooking and fetch them each individually. That's not a technologically insurmountable problem, but it means git is being a lot less helpful than it could be. So I think in these cases of "extra refs", some clients would want them, and some would not. And only they can know which camp they are in, not the server side. Which is why the current system works pretty well: we advertise everything and let the client decide what it wants. Clone very sanely fetches only refs/heads/* (and associated tags), and you can put whatever extra junk you want elsewhere in the hierarchy. A mirrored clone will fetch it, but to me that is the point of --mirror. And if you want some arbitrary subset, then you can implement that, too, by the use of fetch refspecs[1]. The downside, of course, is that the ref advertisement is bigger than many clients will want. But dealing with that is a separate issue from "these refs should never be shown", as solutions for one may not work from the other (e.g., if we delayed ref advertisement until the client had spoken, the client could tell us what refspecs they are interested in). For your topic branch example, why don't you push to refs/topics of the main git repository? Then normal cloners wouldn't see it, but anybody interested could add the appropriate fetch refspec. -Peff [1] It may be that refspecs are not as expressive as we would like, but that is a client side problem we can deal with. For instance, you cannot currently say "give me everything except refs/foo/*". -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Hiding some refs in ls-remote
On Sun, Jan 20, 2013 at 2:16 AM, Junio C Hamano wrote: > Duy Nguyen writes: > >> Should the client side learn how to list hidden refs too? I'm thinking >> of an extreme case where upload-pack advertises nothing (or maybe just >> refs/heads/master) and it's up to the client to ask for the ref >> selection it's interested in. upload-pack may need more updates to do >> that, I think. > > What you are talking about is a different goal. > > Look at this as a mechanism for the repository owner to control the > clutter in what is shown to the intended audience of what s/he > publishes in the repository. Network bandwidth reduction of > advertisement is a side effect of clutter reduction, and not > necessarily the primary goal. Probably stupid question: does gitnamespaces.txt attempt to achieve the same? The document says smart http supports passing namespace, nothing about git protocol so I guess we need some extension in upload-pack (or git-daemon) for specifying namespace over git protocol. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Hiding some refs in ls-remote
Junio C Hamano writes: > Jeff King writes: > >>> [uploadPack] >>> hiderefs = refs/changes >> >> Would you want to do the same thing on receive-pack? It could benefit >> from the same reduction in network cost (although it tends to be invoked >> less frequently than upload-pack). > ... > As I said, I view this primarily as solving the cluttering issue. > The use of hiderefs to hide these refs that point at objects I do > not consider belong to my repository from the pushers equally makes > sense as such, I think. Now that raises one question. Should this be transfer.hiderefs that governs both upload-pack and receive-pack? I tend to think the answer is yes. It may even make sense to have "git push" honor it, excluding them from "git push --mirror" (or "git push --all" if some of the branches are hidden); I haven't thought things through, though. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Hiding some refs in ls-remote
Junio C Hamano writes: > Duy Nguyen writes: > >> Should the client side learn how to list hidden refs too? I'm thinking >> of an extreme case where upload-pack advertises nothing (or maybe just >> refs/heads/master) and it's up to the client to ask for the ref >> selection it's interested in. upload-pack may need more updates to do >> that, I think. > > What you are talking about is a different goal. > > Look at this as a mechanism for the repository owner to control the > clutter in what is shown to the intended audience of what s/he > publishes in the repository. Network bandwidth reduction of > advertisement is a side effect of clutter reduction, and not > necessarily the primary goal. As a separate and follow-up topic, I can see a future where we also have .delayref (in addition to .hideref) configuration, that causes the upload-pack to: * omit refs that are marked as "delayed"; * advertise "allow-expand-ref=" where the value is the pattern used to specify which refs are omitted via this mechanism (e.g. "refs/*" in your extreme example, or perhaps "refs/tags/" if you only advertise branches in order to limit the bandwidth); and then a fetch-pack that is interested in fetching "refs/foo/bar", after seeing that it matches one of the "allow-expand-ref" patterns, to ask "expand-ref refs/foo/bar", expect the upload-pack to respond with "refs/foo/bar " (or "refs/foo/bar does not exist"), before going into the usual "want" exchange ("ls-remote" would of course do the same, and pattern-less "ls-remote" may even ask to expand everything by saying "expand-ref refs/*" (where the capability said "allow-expand-ref=refs/*" in your extreme example) to grab everything discoverable). That is _primarily_ for trading network bandwidth with initial latency; as far as the end-result is concerned, delayed ones to be expanded on demand are just as discoverable as the ones advertised initially. I think we'd want to have both in the end. It just happens I just posted the beginning of clutter-reduction one, as I think developing both in parallel would be messy and hideref is probably less involved than the delayref. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Hiding some refs in ls-remote
Jeff King writes: >> [uploadPack] >> hiderefs = refs/changes > > Would you want to do the same thing on receive-pack? It could benefit > from the same reduction in network cost (although it tends to be invoked > less frequently than upload-pack). Do *I* want to do? Not when there already is a patch exists; I'd rather take existing and tested patch ;-) As I said, I view this primarily as solving the cluttering issue. The use of hiderefs to hide these refs that point at objects I do not consider belong to my repository from the pushers equally makes sense as such, I think. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Hiding some refs in ls-remote
Duy Nguyen writes: > Should the client side learn how to list hidden refs too? I'm thinking > of an extreme case where upload-pack advertises nothing (or maybe just > refs/heads/master) and it's up to the client to ask for the ref > selection it's interested in. upload-pack may need more updates to do > that, I think. What you are talking about is a different goal. Look at this as a mechanism for the repository owner to control the clutter in what is shown to the intended audience of what s/he publishes in the repository. Network bandwidth reduction of advertisement is a side effect of clutter reduction, and not necessarily the primary goal. It lets me as the repository owner to say "do not list them to make these discoverable"; letting the client side learn defeats the whole point of this mechanism. For example, if you mirror-clone from either of my repositories from GitHub: git clone --mirror git://github.com/git/git/ git clone --mirror git://github.com/gitster/git/ you will see stuff that does not belong to the project; objects that are only reachable from refs/pull/* are not something I approved to be placed in my repository; I as the project owner do not want to give these objects to a cloner as if they are part of my project. The server side can hide refs/pull/ and refs/pull-request-tags/ so that clones (and ls-remote) will not see clutter, and nobody gets hurt. These refs are there only to support GitHub "pull requests"; the advertisement of them to ls-remote and appearance of them in mirror clones are undesirable side effects of how GitHub happened to implement the feature. GitHub has its way to notify of these pull requests, and it makes these pull requests discoverable out of band without involving Git. Ability to say "git fetch origin $SHA1" (or "git fetch origin pull/1") is the only thing lacking, in order to get rid of these clutter. The patches do the "get rid of clutter" part by letting you hide these refs from ls-remote and clone; allowing "fetch by tip SHA1" needs to come latter. Another example. If you run ls-remote against the above two repositories, you will notice that the latter has tons more branches. The former are to publish only the primary integration branches, while the latter are to show individual topics. I wish I didn't have to do this if I could. We have periodical "What's cooking" postings that let interested parties learn topics out-of-band. If I can hide refs/heads/??/* from normal users' clones while actually keeping individual topics there at the same place, I do not have to push into two places. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Hiding some refs in ls-remote
On Fri, Jan 18, 2013 at 04:37:04PM -0800, Junio C Hamano wrote: > This is an early preview of reducing the network cost while talking > with a repository with tons of refs, most of which are of use by > very narrow audiences (e.g. refs under Gerrit's refs/changes/ are > useful only for people who are interested in the changes under > review). As long as these narrow audiences have a way to learn the > names of refs or objects pointed at by the refs out-of-band, it is > not necessary to advertise these refs. > > On the server end, you tell upload-pack that some refs do not have > to be advertised with the uploadPack.hiderefs multi-valued > configuration variable: > > [uploadPack] > hiderefs = refs/changes Would you want to do the same thing on receive-pack? It could benefit from the same reduction in network cost (although it tends to be invoked less frequently than upload-pack). At GitHub, we have a similar patch (we even call it hiderefs), but we do it only for receive-pack. In our case, it is not about network traffic, but rather that we provide a set of read-only refs in the refs/pull hierarchy. These are generated upstream by the creation of pull requests, and we reject any updates to them via the git protocol using a pre-receive hook. However, if a client without these refs uses "git push --mirror", it will attempt to delete them (which will fail). Meaning that a mirror push will always report failure, because it will always fail to push the refs/pull deletions. I don't know much about Gerrit's inner workings. Are refs/changes also read-only? -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Hiding some refs in ls-remote
On 01/19/2013 01:37 AM, Junio C Hamano wrote: > This is an early preview of reducing the network cost while talking > with a repository with tons of refs, most of which are of use by > very narrow audiences (e.g. refs under Gerrit's refs/changes/ are > useful only for people who are interested in the changes under > review). As long as these narrow audiences have a way to learn the > names of refs or objects pointed at by the refs out-of-band, it is > not necessary to advertise these refs. > > On the server end, you tell upload-pack that some refs do not have > to be advertised with the uploadPack.hiderefs multi-valued > configuration variable: > > [uploadPack] > hiderefs = refs/changes > > The changes necessary on the client side to allow fetching objects > at the tip of a ref in hidden hierarchies are much more involved and > not part of this early preview, but the end user UI is expected to > be like these: > > $ git fetch $there refs/changes/72/41672/1 > $ git fetch $there 9598d59cdc098c5d9094d68024475e2430343182 > > That is, you ask for a refname as usual even though it is not part > of ls-remote response, or you ask for the commit object that is at > the tip of whatever hidden ref you are interested in. Although I can understand the pain of slow network performance, somehow this proposal gives me the feeling of being expeditious rather than elegant. Could the problem be solved in some other way? Maybe such references could be stored in a second repository or in a separate namespace (in the sense of gitnamespaces(7)) to prevent their creating overhead when they are unneeded? And *if* reference hiding makes sense, it seems to me that the client, not the server, should be the one who decides which server references it is interested in (though I understand that would require a protocol change). Otherwise the git repository *relies* on out-of-band channels for its functionality. If I understand correctly, a user would have *no way* to discover, via git, what hidden references are contained in a remote repository, or indeed even that the repo contains a hidden namespace. For example this would make it impossible to clean up obsolete "hidden" references on a remote repository without the supplementary information stored elsewhere. And if anybody accidentally creates a reference in a hidden namespace by hand, it will just sit there undetectably, forever. I assume (though I've never checked) that a server does not let a client ask for a SHA1 that is not currently reachable from a server-side reference, and I assume that that you are not proposing to change this policy. But allowing objects to be fetched from a hidden reference opens up some "interesting" possibilities: * A pusher could upload arbitrary content to a public git server under a cryptic hidden reference name. Most people would be completely unable to see this content, unless given the SHA1 or the reference name by the pusher. Thus this mechanism could be used as a dark channel to exchange arbitrary data relatively secretly. * Somebody could push a trojan version of code to a hidden reference in a project, then pass the SHA1 to a victim. The victim might trust the code because it comes from a known project website, even though the code would be invisible to other project developers and thus impossible for them to audit. And even if they learned about the trojan's SHA1 they would be unable to remove it from their repository because they have no way to find out the name of the hidden reference! Obviously these hacks would only be possible for a bad guy with push privileges to a repository that has turned on hidden references, but I think they are sobering nevertheless. These worries would go away if reference hiding were configured on the client rather than on the server. A second point: currently, the output of "git show-ref -d" and "git ls-remote ." are almost identical. Under your proposal, I believe that the hiderefs would only be omitted from the latter. Would it be useful to add an option to "git show-ref" to make it omit the "hiderefs" refs? And maybe another option to make it display *only* the hideref refs? And in the bikeshedding department, I wonder if "hiderefs" is the best name for the config setting. "hiderefs", implies to me that the refs are actively hidden and not available to the client in any way. But in fact they are just not advertised; they can be fetched normally. Maybe another name would be more suggestive of its true effect, for example "quietrefs" or "noadvertiserefs". Michael -- Michael Haggerty mhag...@alum.mit.edu http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Hiding some refs in ls-remote
On Sat, Jan 19, 2013 at 7:37 AM, Junio C Hamano wrote: > This is an early preview of reducing the network cost while talking > with a repository with tons of refs, most of which are of use by > very narrow audiences (e.g. refs under Gerrit's refs/changes/ are > useful only for people who are interested in the changes under > review). As long as these narrow audiences have a way to learn the > names of refs or objects pointed at by the refs out-of-band, it is > not necessary to advertise these refs. > > On the server end, you tell upload-pack that some refs do not have > to be advertised with the uploadPack.hiderefs multi-valued > configuration variable: > > [uploadPack] > hiderefs = refs/changes > > The changes necessary on the client side to allow fetching objects > at the tip of a ref in hidden hierarchies are much more involved and > not part of this early preview, but the end user UI is expected to > be like these: > > $ git fetch $there refs/changes/72/41672/1 > $ git fetch $there 9598d59cdc098c5d9094d68024475e2430343182 > > That is, you ask for a refname as usual even though it is not part > of ls-remote response, or you ask for the commit object that is at > the tip of whatever hidden ref you are interested in. Should the client side learn how to list hidden refs too? I'm thinking of an extreme case where upload-pack advertises nothing (or maybe just refs/heads/master) and it's up to the client to ask for the ref selection it's interested in. upload-pack may need more updates to do that, I think. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] Hiding some refs in ls-remote
This is an early preview of reducing the network cost while talking with a repository with tons of refs, most of which are of use by very narrow audiences (e.g. refs under Gerrit's refs/changes/ are useful only for people who are interested in the changes under review). As long as these narrow audiences have a way to learn the names of refs or objects pointed at by the refs out-of-band, it is not necessary to advertise these refs. On the server end, you tell upload-pack that some refs do not have to be advertised with the uploadPack.hiderefs multi-valued configuration variable: [uploadPack] hiderefs = refs/changes The changes necessary on the client side to allow fetching objects at the tip of a ref in hidden hierarchies are much more involved and not part of this early preview, but the end user UI is expected to be like these: $ git fetch $there refs/changes/72/41672/1 $ git fetch $there 9598d59cdc098c5d9094d68024475e2430343182 That is, you ask for a refname as usual even though it is not part of ls-remote response, or you ask for the commit object that is at the tip of whatever hidden ref you are interested in. Junio C Hamano (2): upload-pack: share more code upload-pack: allow hiding ref hiearchies t/t5512-ls-remote.sh | 9 ++ upload-pack.c| 86 ++-- 2 files changed, 79 insertions(+), 16 deletions(-) -- 1.8.1.1.454.g48d39c0 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html