Re: git-bundle rev handling and de-duping

2014-10-17 Thread Philip Oakley

From: Jeff King p...@peff.net

[subject tweaked as we have veered quite far off the original, and
this might get more attention from interested people]

On Thu, Oct 16, 2014 at 10:39:54AM -0700, Junio C Hamano wrote:


 2.  We use object_array_remove_duplicates() to de-dup git bundle
 create x master master, which came from b2a6d1c6 (bundle:
 allow the same ref to be given more than once, 2009-01-17),
 which is still the sole caller of the function, and I think
 this is bogus.  Comparing .name would not de-dup git bundle
 create x master refs/heads/master.


Hmm. We also doesn't respect the UNINTERESTING flag, so it will dedup
foo and ^foo into a single entry. I _think_ this doesn't have any
bad side effects, because they are equivalent (i.e., the flag is
carried
on the object, not the pending entry). I would expect this:

 git bundle create ... foo ^foo

to barf (and it does) because the bundle is empty. But with this:

 git bundle create ... another-ref foo ^foo

I would expect the resulting bundle to contain the objects from
another-ref, but still mention foo as an update (we didn't need to
send any objects, since presumably the other side already had them).
It
doesn't, but that is not because of the dedup. It is because we
generate
the list of refs to send based on the pending list, and we skip all
UNINTERESTING objects (we must, because otherwise ^foo would make an
entry). I.e., it is the same the flag is on the object, not the
pending
entry that saved us above now causing its own bug. Obviously the
example above is sort of silly, but if you have:

 # two branches point at the same object
 git branch foo master

 # the other side already has master. Let's send them foo.
 # this will fail because the bundle is empty. That's mildly
 # annoying because we really want to tell them hey, update
 # your foo branch. But at least we get an error.
 git bundle create tmp.bundle foo ^master


Isn't this kindof what happens as an issue when we want the right HEAD 
to be included explicitly in a bundle. Though 
ttp://thread.gmane.org/gmane.comp.version-control.git/234053 suggests 
its more complicated than that.




 # now the same thing, but we're sending them some other objects.
 # This one succeeds, but silently omits foo from the bundle!
 git bundle create tmp.bundle foo another-ref ^master

I have a feeling that the list needs to be generated from
revs.cmdline,
not revs.pending, and the de-duplication needs to happen there (with
the
ref resolution that you mention).

I also have the feeling that fast-export had to deal with this exact
same issue. It looks like we use revs.cmdline there. I seem to recall
there was some ongoing work in that area, but I stopped paying close
attention due to some personality conflicts, and I don't know if all
of
the issues were ever resolved.


I think the right way to fix these two and a half problems is to do
the following:

 - object_array_remove_duplicates() (and contains_name() helper it
   uses) should be removed from object.c;

 - create_bundle() in bundle.c should implement a helper that is
   similar to contains_name() but knows about ref dwimming and use
   it to call object_array_filter() to replace its call to
   object_array_remove_duplicates().


Agreed. The loop in create_bundle right after the de-dup does the
dwimming. Probably it would be simple to just skip duplicates there
using a hashmap or sorted list.


I am not doing this myself, and I do not expect either you or
Michael to do so, either.  I am just writing this down to point out
a low hanging fruit to aspiring new contributors (hint, hint).


I am also not planning on working on it soon, but now we have
hopefully
fed plenty of possibilities to anybody who wants to. :)


--
Philip 


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git-bundle rev handling and de-duping

2014-10-17 Thread Junio C Hamano
Philip Oakley philipoak...@iee.org writes:

  # two branches point at the same object
  git branch foo master

  # the other side already has master. Let's send them foo.
  # this will fail because the bundle is empty. That's mildly
  # annoying because we really want to tell them hey, update
  # your foo branch. But at least we get an error.
  git bundle create tmp.bundle foo ^master

 Isn't this kindof what happens as an issue when we want the right HEAD
 to be included explicitly in a bundle. Though


What we are discussing here is we tell from the command line where
the histories end, but do we correctly record all these end points
as fetchable refs in the resulting bundle?

It does not have anything to do with bundle that does not record
its HEAD cannot be cloned, which happens when you do not mention
HEAD when creating the bundle in the first place, which is a totally
different thing.

 http://thread.gmane.org/gmane.comp.version-control.git/234053 suggests
 its more complicated than that.

The main topic of discussion does not have much to what bundle
records and what a reader of a bundle guesses.  It is about what
goes on the wire and mention of bundle was just a tangent brought up
by those who do not know what was being discussed, I think.

I think the right fix to the git bundle issue is to make it easier
on the git bundle create side to have the resulting bundle record
its HEAD, even when the user did not mention HEAD on the command
line.  For example, when there is only one end point, e.g. git
bundle create x next, record refs/heads/next _and_ HEAD pointing at
the same commit, because there is no other seneible choice.  

git bundle create y master next may record master, next and HEAD
while HEAD is likely pointing at the same commit as master (because
'master' is special).  Or we could give a warning and even go
interactive to ask which ref to record as HEAD.

But the above three paragraphs are tangent so I'd stop here.



--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


git-bundle rev handling and de-duping

2014-10-16 Thread Jeff King
[subject tweaked as we have veered quite far off the original, and
 this might get more attention from interested people]

On Thu, Oct 16, 2014 at 10:39:54AM -0700, Junio C Hamano wrote:

  2.  We use object_array_remove_duplicates() to de-dup git bundle
  create x master master, which came from b2a6d1c6 (bundle:
  allow the same ref to be given more than once, 2009-01-17),
  which is still the sole caller of the function, and I think
  this is bogus.  Comparing .name would not de-dup git bundle
  create x master refs/heads/master.

Hmm. We also doesn't respect the UNINTERESTING flag, so it will dedup
foo and ^foo into a single entry. I _think_ this doesn't have any
bad side effects, because they are equivalent (i.e., the flag is carried
on the object, not the pending entry). I would expect this:

  git bundle create ... foo ^foo

to barf (and it does) because the bundle is empty. But with this:

  git bundle create ... another-ref foo ^foo

I would expect the resulting bundle to contain the objects from
another-ref, but still mention foo as an update (we didn't need to
send any objects, since presumably the other side already had them). It
doesn't, but that is not because of the dedup. It is because we generate
the list of refs to send based on the pending list, and we skip all
UNINTERESTING objects (we must, because otherwise ^foo would make an
entry). I.e., it is the same the flag is on the object, not the pending
entry that saved us above now causing its own bug. Obviously the
example above is sort of silly, but if you have:

  # two branches point at the same object
  git branch foo master

  # the other side already has master. Let's send them foo.
  # this will fail because the bundle is empty. That's mildly
  # annoying because we really want to tell them hey, update
  # your foo branch. But at least we get an error.
  git bundle create tmp.bundle foo ^master

  # now the same thing, but we're sending them some other objects.
  # This one succeeds, but silently omits foo from the bundle!
  git bundle create tmp.bundle foo another-ref ^master

I have a feeling that the list needs to be generated from revs.cmdline,
not revs.pending, and the de-duplication needs to happen there (with the
ref resolution that you mention).

I also have the feeling that fast-export had to deal with this exact
same issue. It looks like we use revs.cmdline there. I seem to recall
there was some ongoing work in that area, but I stopped paying close
attention due to some personality conflicts, and I don't know if all of
the issues were ever resolved.

 I think the right way to fix these two and a half problems is to do
 the following:
 
  - object_array_remove_duplicates() (and contains_name() helper it
uses) should be removed from object.c;
 
  - create_bundle() in bundle.c should implement a helper that is
similar to contains_name() but knows about ref dwimming and use
it to call object_array_filter() to replace its call to
object_array_remove_duplicates().

Agreed. The loop in create_bundle right after the de-dup does the
dwimming. Probably it would be simple to just skip duplicates there
using a hashmap or sorted list.

 I am not doing this myself, and I do not expect either you or
 Michael to do so, either.  I am just writing this down to point out
 a low hanging fruit to aspiring new contributors (hint, hint).

I am also not planning on working on it soon, but now we have hopefully
fed plenty of possibilities to anybody who wants to. :)

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html