Re: Re: Re: write-tree is pasky-0.4

2005-04-15 Thread Daniel Barkalow
On Fri, 15 Apr 2005, Linus Torvalds wrote:

 I think I've explained my name tracking worries.  When it comes to how to 
 merge, there's three issues:
 
  - we do commonly have merge clashes where both trees have applied the 
exact same patch. That should merge perfectly well using the 3-way
merge from a common parent that Junio has, but not your current bring
patches forward kind of strategy.

I think 3-way merge is probably the best starting point, but I think that
there might be value in being able to identify the commits of each side
involved in a conflict. I think this would help with cases where both
sides pick up an identical patch, and then each side makes a further
change to a different part of the changed region (you find out that the
other guy's change was supposed to follow the patch, and don't conflict
with it).

  - I _do_ actually sometimes merge with dirty state in my working 
directory, which is why I want the merge to take place in a separate 
(and temporary) directory, which allows for a failed merge without 
having any major cleanup. If the merge fails, it's not a big deal, and 
I can just blow the merge directory away without losing the work I had 
in my real working directory.

Is there some reason you don't commit before merging? All of the current
merge theory seems to want to merge two commits, using the information git
keeps about them. It should be cheap to get a new clean working directory
to merge in, too, particularly if we add a cache of hardlinkable expanded
blobs.

  - reliability. I care much less for clever than I care for guaranteed 
to never do the wrong thing. If I have to fix up some stuff by hand, 
I'll happily do so. But if I can't trust the merge and have to _check_ 
things by hand afterwards, that will make me leery of the merges, and
_that_ is bad.
 
 The third point is why I'm going to the ultra-conservative three-way 
 merge from the common parent. It's not fancy, but it's something I feel 
 comfortable with as a merge strategy. For example, arch (and in particular 
 darcs) seems to want to try to be clever about the merges, and I'd 
 always live in fear. 

How much do you care about the situation where there is no best common
ancestor (which can happen if you're merging two main lines, each of which
has merged with both of a pair of minor trees)? I think that arch is even
more conservative, in that it doesn't look for a common ancestor, and
reports conflicts whenever changes overlap at all. Of course, reliability
by virtue of never working without help is not a big win over living in
fear; you always have to check over it, not because you're afraid, but
because it needs you to.

 And, finally, there's obviously performance. I _think_ a normal merge with
 nary a conflict and just a few tens of files changed should be possible in
 a second. I realize that sounds crazy to some people, but I think it's
 entirely doable. Half of that is writing the new tree out (that is a
 relative costly op due to the compression). The other half is the work.

I think that the time spent on I/O will be overwhelmed by the time spent
issuing the command at that rate. It might matter if you start getting
into merging lots of things at once, but that's more like a minute for a
merge group with 600 changes rather than a second per merge; we could
potentially save a lot of time based of having a bunch of information left
over from the previous merge when starting merge number 2. So 15 seconds
plus half a second per merge might be better than a second per merge in
the case that matters.

-Daniel
*This .sig left intentionally blank*

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re: Re: write-tree is pasky-0.4

2005-04-15 Thread Linus Torvalds


On Fri, 15 Apr 2005, Daniel Barkalow wrote:
 
 Is there some reason you don't commit before merging? All of the current
 merge theory seems to want to merge two commits, using the information git
 keeps about them.

Note that the 3-way merge would _only_ merge the committed state. The 
thing is, 99% of all merges end up touching files that I never touch 
myself (ie other architectures), so me being able to merge them even when 
_I_ am in the middle of something is a good thing.

So even when I have dirty state, the merge would only merge the clean
state. And then before the merge information is put back into my working
directory, I'd do a check-files on the result, making sure that nothing
that got changed by the merge isn't up-to-date.

 How much do you care about the situation where there is no best common
 ancestor

I care. Even if the best common parent is 3 months ago, I care. I'd much 
rather get a big explicit conflict than a clean merge that ends up being 
debatable because people played games with per-file merging or something 
questionable like that.

 I think that the time spent on I/O will be overwhelmed by the time spent
 issuing the command at that rate.

There is no time at all spent on IO.

All my email is local, and if this all ends up working out well, I can 
track the other peoples object trees in local subdirectories with some 
daily rsyncs. And I have enough memory in my machines that there is 
basically no disk IO - the only tree I normally touch is the kernel trees, 
they all stay in cache.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re: Re: write-tree is pasky-0.4

2005-04-15 Thread Daniel Barkalow
On Fri, 15 Apr 2005, Linus Torvalds wrote:

 On Fri, 15 Apr 2005, Daniel Barkalow wrote:
  
  So you want to merge someone else's tree into your committed state, and
  then merge the result with your working directory to get the working
  directory you continue with, provided that the second merge is trivial?
 
 No, you don't even merge the working directory.
 
 The low-level tools should entirely ignore the working directory. To a
 low-level merge, the working directory doesn't even exist. It just gets
 three commits (or trees) and merges two of them with the third as a
 parent, and does all of it in it's own temporary merge working
 directory.

It seems like users won't expect there to be a new working directory for
the merge in which they are supposed to resolve te conflicts, but where
they don't see their uncommited changes. In any case, the low-level tools
have to care about *some* working directory, even if it isn't the parent
of .git, and the parent of .git seems like where other similar things
happen. If we're being conservative about merging, we're likely to report
a lot of conflicts, at least until we work out better techniques than a
simple 3-way merge.

  For the latter, there are sometimes multiple ancestors which fit this
  criterion
 
 Yes. Let's just pick one at random (or more likely, the latest one by 
 date - let's not actually be _random_ random) at first. 

Okay; I've currently got the one where the number of generations it is
away from the further head is the smallest, and of equal ones, an
arbitrary choice. If people are generally similar in the amount they
diverge before commiting, this should be the most similar ancestor.

 There are other heuristics we can try, ie if it turns out that it's common
 to have a couple of alternatives (but no more than some small number, say
 five or so), we can literally just -try- to do a tree-only merge, and see
 how many lines out common output you get from diff-tree.
 
 Because that how mnay files do we need to merge is the number you want
 to minimize, and doing a couple of extra diff-tree + join  operations
 should be so fast that nobody will notice that we actually tried five
 different merges to see which one looked the best.
 
 But hey, especially if the merge fails with real clashes (ie there are
 changes in common and running merge leaves conflicts), and there were
 other alternate parents to choose, there's nothing wrong with just
 printing them out and saying you might try to specify one of these
 manually.

I think we should be able to get good results out of doing the 5 merges
and reporting a conflict only if there's a conflict in all of them; it
shouldn't be possible for two to succeed but give different results (if it
did, clearly our current algorithm is unsafe, since it would give some
undesired output if it happened to use the wrong ancestor).

I'm thinking of not actually calling merge(1) for this at all; it just
calls diff3, and diff3 is only 1745 lines including option parsing. We can
probably arrange to look around for better ancestors in case of conflicts
we'd otherwise have to report, and get this all tidy and more efficient
than having diff3 re-read files. And if we only go to other ancestors in
case of conflicts, we're going to be a lot faster total than getting a
reaction from the user, almost no matter what we do.

 I really don't think we should worry too much about this until we've 
 actually used the system for a while and seen what it does. So just start 
 with nearest common parent with most recent date. Which I think you 
 already implemented, no?

I've got something like that (see above); did you want it in some form
other than the patch I sent you?

-Daniel
*This .sig left intentionally blank*

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html