Re: Re: Re: write-tree is pasky-0.4
On Fri, 15 Apr 2005, Linus Torvalds wrote: I think I've explained my name tracking worries. When it comes to how to merge, there's three issues: - we do commonly have merge clashes where both trees have applied the exact same patch. That should merge perfectly well using the 3-way merge from a common parent that Junio has, but not your current bring patches forward kind of strategy. I think 3-way merge is probably the best starting point, but I think that there might be value in being able to identify the commits of each side involved in a conflict. I think this would help with cases where both sides pick up an identical patch, and then each side makes a further change to a different part of the changed region (you find out that the other guy's change was supposed to follow the patch, and don't conflict with it). - I _do_ actually sometimes merge with dirty state in my working directory, which is why I want the merge to take place in a separate (and temporary) directory, which allows for a failed merge without having any major cleanup. If the merge fails, it's not a big deal, and I can just blow the merge directory away without losing the work I had in my real working directory. Is there some reason you don't commit before merging? All of the current merge theory seems to want to merge two commits, using the information git keeps about them. It should be cheap to get a new clean working directory to merge in, too, particularly if we add a cache of hardlinkable expanded blobs. - reliability. I care much less for clever than I care for guaranteed to never do the wrong thing. If I have to fix up some stuff by hand, I'll happily do so. But if I can't trust the merge and have to _check_ things by hand afterwards, that will make me leery of the merges, and _that_ is bad. The third point is why I'm going to the ultra-conservative three-way merge from the common parent. It's not fancy, but it's something I feel comfortable with as a merge strategy. For example, arch (and in particular darcs) seems to want to try to be clever about the merges, and I'd always live in fear. How much do you care about the situation where there is no best common ancestor (which can happen if you're merging two main lines, each of which has merged with both of a pair of minor trees)? I think that arch is even more conservative, in that it doesn't look for a common ancestor, and reports conflicts whenever changes overlap at all. Of course, reliability by virtue of never working without help is not a big win over living in fear; you always have to check over it, not because you're afraid, but because it needs you to. And, finally, there's obviously performance. I _think_ a normal merge with nary a conflict and just a few tens of files changed should be possible in a second. I realize that sounds crazy to some people, but I think it's entirely doable. Half of that is writing the new tree out (that is a relative costly op due to the compression). The other half is the work. I think that the time spent on I/O will be overwhelmed by the time spent issuing the command at that rate. It might matter if you start getting into merging lots of things at once, but that's more like a minute for a merge group with 600 changes rather than a second per merge; we could potentially save a lot of time based of having a bunch of information left over from the previous merge when starting merge number 2. So 15 seconds plus half a second per merge might be better than a second per merge in the case that matters. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Re: write-tree is pasky-0.4
On Fri, 15 Apr 2005, Daniel Barkalow wrote: Is there some reason you don't commit before merging? All of the current merge theory seems to want to merge two commits, using the information git keeps about them. Note that the 3-way merge would _only_ merge the committed state. The thing is, 99% of all merges end up touching files that I never touch myself (ie other architectures), so me being able to merge them even when _I_ am in the middle of something is a good thing. So even when I have dirty state, the merge would only merge the clean state. And then before the merge information is put back into my working directory, I'd do a check-files on the result, making sure that nothing that got changed by the merge isn't up-to-date. How much do you care about the situation where there is no best common ancestor I care. Even if the best common parent is 3 months ago, I care. I'd much rather get a big explicit conflict than a clean merge that ends up being debatable because people played games with per-file merging or something questionable like that. I think that the time spent on I/O will be overwhelmed by the time spent issuing the command at that rate. There is no time at all spent on IO. All my email is local, and if this all ends up working out well, I can track the other peoples object trees in local subdirectories with some daily rsyncs. And I have enough memory in my machines that there is basically no disk IO - the only tree I normally touch is the kernel trees, they all stay in cache. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Re: write-tree is pasky-0.4
On Fri, 15 Apr 2005, Linus Torvalds wrote: On Fri, 15 Apr 2005, Daniel Barkalow wrote: So you want to merge someone else's tree into your committed state, and then merge the result with your working directory to get the working directory you continue with, provided that the second merge is trivial? No, you don't even merge the working directory. The low-level tools should entirely ignore the working directory. To a low-level merge, the working directory doesn't even exist. It just gets three commits (or trees) and merges two of them with the third as a parent, and does all of it in it's own temporary merge working directory. It seems like users won't expect there to be a new working directory for the merge in which they are supposed to resolve te conflicts, but where they don't see their uncommited changes. In any case, the low-level tools have to care about *some* working directory, even if it isn't the parent of .git, and the parent of .git seems like where other similar things happen. If we're being conservative about merging, we're likely to report a lot of conflicts, at least until we work out better techniques than a simple 3-way merge. For the latter, there are sometimes multiple ancestors which fit this criterion Yes. Let's just pick one at random (or more likely, the latest one by date - let's not actually be _random_ random) at first. Okay; I've currently got the one where the number of generations it is away from the further head is the smallest, and of equal ones, an arbitrary choice. If people are generally similar in the amount they diverge before commiting, this should be the most similar ancestor. There are other heuristics we can try, ie if it turns out that it's common to have a couple of alternatives (but no more than some small number, say five or so), we can literally just -try- to do a tree-only merge, and see how many lines out common output you get from diff-tree. Because that how mnay files do we need to merge is the number you want to minimize, and doing a couple of extra diff-tree + join operations should be so fast that nobody will notice that we actually tried five different merges to see which one looked the best. But hey, especially if the merge fails with real clashes (ie there are changes in common and running merge leaves conflicts), and there were other alternate parents to choose, there's nothing wrong with just printing them out and saying you might try to specify one of these manually. I think we should be able to get good results out of doing the 5 merges and reporting a conflict only if there's a conflict in all of them; it shouldn't be possible for two to succeed but give different results (if it did, clearly our current algorithm is unsafe, since it would give some undesired output if it happened to use the wrong ancestor). I'm thinking of not actually calling merge(1) for this at all; it just calls diff3, and diff3 is only 1745 lines including option parsing. We can probably arrange to look around for better ancestors in case of conflicts we'd otherwise have to report, and get this all tidy and more efficient than having diff3 re-read files. And if we only go to other ancestors in case of conflicts, we're going to be a lot faster total than getting a reaction from the user, almost no matter what we do. I really don't think we should worry too much about this until we've actually used the system for a while and seen what it does. So just start with nearest common parent with most recent date. Which I think you already implemented, no? I've got something like that (see above); did you want it in some form other than the patch I sent you? -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html