Re: git-bundle rev handling and de-duping
From: Jeff King p...@peff.net [subject tweaked as we have veered quite far off the original, and this might get more attention from interested people] On Thu, Oct 16, 2014 at 10:39:54AM -0700, Junio C Hamano wrote: 2. We use object_array_remove_duplicates() to de-dup git bundle create x master master, which came from b2a6d1c6 (bundle: allow the same ref to be given more than once, 2009-01-17), which is still the sole caller of the function, and I think this is bogus. Comparing .name would not de-dup git bundle create x master refs/heads/master. Hmm. We also doesn't respect the UNINTERESTING flag, so it will dedup foo and ^foo into a single entry. I _think_ this doesn't have any bad side effects, because they are equivalent (i.e., the flag is carried on the object, not the pending entry). I would expect this: git bundle create ... foo ^foo to barf (and it does) because the bundle is empty. But with this: git bundle create ... another-ref foo ^foo I would expect the resulting bundle to contain the objects from another-ref, but still mention foo as an update (we didn't need to send any objects, since presumably the other side already had them). It doesn't, but that is not because of the dedup. It is because we generate the list of refs to send based on the pending list, and we skip all UNINTERESTING objects (we must, because otherwise ^foo would make an entry). I.e., it is the same the flag is on the object, not the pending entry that saved us above now causing its own bug. Obviously the example above is sort of silly, but if you have: # two branches point at the same object git branch foo master # the other side already has master. Let's send them foo. # this will fail because the bundle is empty. That's mildly # annoying because we really want to tell them hey, update # your foo branch. But at least we get an error. git bundle create tmp.bundle foo ^master Isn't this kindof what happens as an issue when we want the right HEAD to be included explicitly in a bundle. Though ttp://thread.gmane.org/gmane.comp.version-control.git/234053 suggests its more complicated than that. # now the same thing, but we're sending them some other objects. # This one succeeds, but silently omits foo from the bundle! git bundle create tmp.bundle foo another-ref ^master I have a feeling that the list needs to be generated from revs.cmdline, not revs.pending, and the de-duplication needs to happen there (with the ref resolution that you mention). I also have the feeling that fast-export had to deal with this exact same issue. It looks like we use revs.cmdline there. I seem to recall there was some ongoing work in that area, but I stopped paying close attention due to some personality conflicts, and I don't know if all of the issues were ever resolved. I think the right way to fix these two and a half problems is to do the following: - object_array_remove_duplicates() (and contains_name() helper it uses) should be removed from object.c; - create_bundle() in bundle.c should implement a helper that is similar to contains_name() but knows about ref dwimming and use it to call object_array_filter() to replace its call to object_array_remove_duplicates(). Agreed. The loop in create_bundle right after the de-dup does the dwimming. Probably it would be simple to just skip duplicates there using a hashmap or sorted list. I am not doing this myself, and I do not expect either you or Michael to do so, either. I am just writing this down to point out a low hanging fruit to aspiring new contributors (hint, hint). I am also not planning on working on it soon, but now we have hopefully fed plenty of possibilities to anybody who wants to. :) -- Philip -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git-bundle rev handling and de-duping
Philip Oakley philipoak...@iee.org writes: # two branches point at the same object git branch foo master # the other side already has master. Let's send them foo. # this will fail because the bundle is empty. That's mildly # annoying because we really want to tell them hey, update # your foo branch. But at least we get an error. git bundle create tmp.bundle foo ^master Isn't this kindof what happens as an issue when we want the right HEAD to be included explicitly in a bundle. Though What we are discussing here is we tell from the command line where the histories end, but do we correctly record all these end points as fetchable refs in the resulting bundle? It does not have anything to do with bundle that does not record its HEAD cannot be cloned, which happens when you do not mention HEAD when creating the bundle in the first place, which is a totally different thing. http://thread.gmane.org/gmane.comp.version-control.git/234053 suggests its more complicated than that. The main topic of discussion does not have much to what bundle records and what a reader of a bundle guesses. It is about what goes on the wire and mention of bundle was just a tangent brought up by those who do not know what was being discussed, I think. I think the right fix to the git bundle issue is to make it easier on the git bundle create side to have the resulting bundle record its HEAD, even when the user did not mention HEAD on the command line. For example, when there is only one end point, e.g. git bundle create x next, record refs/heads/next _and_ HEAD pointing at the same commit, because there is no other seneible choice. git bundle create y master next may record master, next and HEAD while HEAD is likely pointing at the same commit as master (because 'master' is special). Or we could give a warning and even go interactive to ask which ref to record as HEAD. But the above three paragraphs are tangent so I'd stop here. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
git-bundle rev handling and de-duping
[subject tweaked as we have veered quite far off the original, and this might get more attention from interested people] On Thu, Oct 16, 2014 at 10:39:54AM -0700, Junio C Hamano wrote: 2. We use object_array_remove_duplicates() to de-dup git bundle create x master master, which came from b2a6d1c6 (bundle: allow the same ref to be given more than once, 2009-01-17), which is still the sole caller of the function, and I think this is bogus. Comparing .name would not de-dup git bundle create x master refs/heads/master. Hmm. We also doesn't respect the UNINTERESTING flag, so it will dedup foo and ^foo into a single entry. I _think_ this doesn't have any bad side effects, because they are equivalent (i.e., the flag is carried on the object, not the pending entry). I would expect this: git bundle create ... foo ^foo to barf (and it does) because the bundle is empty. But with this: git bundle create ... another-ref foo ^foo I would expect the resulting bundle to contain the objects from another-ref, but still mention foo as an update (we didn't need to send any objects, since presumably the other side already had them). It doesn't, but that is not because of the dedup. It is because we generate the list of refs to send based on the pending list, and we skip all UNINTERESTING objects (we must, because otherwise ^foo would make an entry). I.e., it is the same the flag is on the object, not the pending entry that saved us above now causing its own bug. Obviously the example above is sort of silly, but if you have: # two branches point at the same object git branch foo master # the other side already has master. Let's send them foo. # this will fail because the bundle is empty. That's mildly # annoying because we really want to tell them hey, update # your foo branch. But at least we get an error. git bundle create tmp.bundle foo ^master # now the same thing, but we're sending them some other objects. # This one succeeds, but silently omits foo from the bundle! git bundle create tmp.bundle foo another-ref ^master I have a feeling that the list needs to be generated from revs.cmdline, not revs.pending, and the de-duplication needs to happen there (with the ref resolution that you mention). I also have the feeling that fast-export had to deal with this exact same issue. It looks like we use revs.cmdline there. I seem to recall there was some ongoing work in that area, but I stopped paying close attention due to some personality conflicts, and I don't know if all of the issues were ever resolved. I think the right way to fix these two and a half problems is to do the following: - object_array_remove_duplicates() (and contains_name() helper it uses) should be removed from object.c; - create_bundle() in bundle.c should implement a helper that is similar to contains_name() but knows about ref dwimming and use it to call object_array_filter() to replace its call to object_array_remove_duplicates(). Agreed. The loop in create_bundle right after the de-dup does the dwimming. Probably it would be simple to just skip duplicates there using a hashmap or sorted list. I am not doing this myself, and I do not expect either you or Michael to do so, either. I am just writing this down to point out a low hanging fruit to aspiring new contributors (hint, hint). I am also not planning on working on it soon, but now we have hopefully fed plenty of possibilities to anybody who wants to. :) -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html