Re: refspecs with '*' as part of pattern
On Mon, 6 Jul 2015, Junio C Hamano wrote: > Jacob Keller writes: > > > I've been looking at the refspecs for git fetch, and noticed that > > globs are partially supported. I wanted to use something like: > > > > refs/tags/some-prefix-*:refs/tags/some-prefix-* > > > > as a refspec, so that I can fetch only tags which have a specific > > prefix. I know that I could use namespaces to separate tags, but > > unfortunately, I am unable to fix the tag format. The specific > > repository in question is also generating several tags which are not > > relevant to me, in formats that are not really useful for human > > consumption. I am also not able to fix this less than useful practice. > > > > However, I noticed that refspecs only support * as a single component. > > The match algorithm works perfectly fine, as documented in > > abd2bde78bd9 ("Support '*' in the middle of a refspec") > > > > What is the reason for not allowing slightly more arbitrary > > expressions? Obviously no more than one *... > > I cannot seem to be able to find related discussions around that > patch, so this is only my guess, but I suspect that this is to > discourage people from doing something like: > > refs/tags/*:refs/tags/foo-* > > which would open can of worms (e.g. imagine you fetch with that > pathspec and then push with refs/tags/*:refs/tags/* back there; > would you now get foo-v1.0.0 and foo-foo-v1.0.0 for their v1.0.0 > tag?) we'd prefer not having to worry about. That wouldn't be it, since refs/tags/*:refs/tags/foo/* would have the same problem, assuming you didn't set up the push refspec carefully. I think it was mostly that it would be too easy to accidentally do something you don't want by having some other character instead of a slash, like refs/heads/*:refs/heads-*. Aside from the increased risk of hard-to-spot typos leading to very weird behavior, nothing actually goes wrong; in fact, I've been using git with that check removed for ages because I wanted a refspec like refs/heads/something-*:refs/heads/*. And it works fine as a local patch, since you don't need your refspec handling to interoperate with other repositories. -Daniel *This .sig left intentionally blank* -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 00/13] New remote-hg helper
On Wed, 31 Oct 2012, Felipe Contreras wrote: > Hi, > > On Wed, Oct 31, 2012 at 7:59 PM, Jonathan Nieder wrote: > > Felipe Contreras wrote: > >> On Wed, Oct 31, 2012 at 7:20 PM, Johannes Schindelin > >> wrote: > > > >>> I just tested this with junio/next and it seems this issue is still > >>> unfixed: instead of > >>> > >>> reset refs/heads/blub > >>> from e7510461b7db54b181d07acced0ed3b1ada072c8 > >>> > >>> I get > >>> > >>> reset refs/heads/blub > >>> from :0 > >>> > >>> when running "git fast-export ^master blub". > >> > >> That is not a problem. It has been discussed extensively, and the > >> consensus seems to be that such command should throw nothing: > >> > >> http://article.gmane.org/gmane.comp.version-control.git/208729 > > > > Um. Are you claiming I have said that "git fast-export ^master blub" > > should silently emit nothing? Or has this been discussed extensively > > with someone else? > > Maybe I misunderstood when you said: > > A patch meeting the above description would make perfect sense to me. > > Anyway, when you have: > > % git fast-export ^next next^{commit} > # nothing > % git fast-export ^next next~0 > # nothing > % git fast-export ^next next~1 > # nothing > % git fast-export ^next next~2 > # nothing > > It only makes sense that: > > % git fast-export ^next next > # nothing > > It doesn't get any more obvious than that. But to each his own. I think that may be true where you have "next" in both places, but I think: $ git checkout -b new-branch master $ git fast-export ^master new-branch ought to emit no "commit" lines, but needs to emit a "reset" line. After all, you haven't told fast-export that the ref "new-branch" is up to date, and you have told it that you want it to be exported. If you create a new branch off of an existing commit, don't change it, and push it to hg, it shouldn't be up to remote-hg to figure out what should happen with no input; it should get a: reset refs/heads/new-branch from [something] I don't know why Johannes seems to want [something] not to be a mark reference (unless he's complaining about getting an invalid mark reference when there aren't any marks defined), but surely something of the above form is necessary to tell remote-hg to create the new branch. I think it would be worth testing that: $ git checkout -b new-branch master $ git push hg new-branch creates the new branch successfully (which I think it does, but wouldn't if "git fast-export ^master new-branch" actually returned nothing; parsed_refs gets it from the reset line). AFAICT, your code relies on getting the behavior that fast-export actually gives, not the behavior you seem to want or the behavior Johannes seems to want. And the reason that you don't need any changes to fast-export is that your process maps marks instead of sha1s. -Daniel *This .sig left intentionally blank* -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git-clone ignores umask for working tree
On Fri, 6 Jul 2012, Alex Riesen wrote: > Hi list, > > when git-clone was built in, its treatment of umask has changed: the shell > version respected umask for newly created directories by using plain mkdir(1), > and the builtin version just uses mkdir(work_tree, 0755). > > Is it intentional? I have the vague feeling that it was intentional, but it's entirely plausible that I just overlooked that mkdir(2) applies umask and went for the mode that you normally want. I don't think there's any particular need for this operation to be more restrictive than umask. -Daniel *This .sig left intentionally blank* -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Multi-ancestor read-tree notes
On Fri, 9 Sep 2005, Junio C Hamano wrote: > Daniel Barkalow <[EMAIL PROTECTED]> writes: > > > In case #16, I'm not sure what I should produce. I think the best thing > > might be to not leave anything in stage 1. The desired end effect is that > > the user is given a file with a section like: > > > > { > > *t = NULL; > > *m = 0; > > <<<<<<<< > > return Z_DATA_ERROR; > > > > return Z_OK; > >>>>>>>>> > > } > > I was thinking a bit more about this. Let's rephrase case #16. > I'll call merge bases O1, O2,... and merge heads A and B, and we > are interested in one path. > > If O1 and O2, the path has quite different contents. A has the > same contents as O1 and B has the same contents as O2. There's a bit more subtlety here: since these are common ancestors, A must have somehow changed O2's version to O1's version, and B must have changed O1's version to O2's version. It's isn't just that each side left the file the same, but from different ancestral versions; both of the other versions must have gotten rejected somehow. I think the real key is to identify what was going on in between. > We should not just pick one or the other and do two-file merge > between the version in A and B (we could prototype by massaging > 'diff A B' output to produce what is common between A and B and > run (RCS) merge of A and B pretending that the common contents > is the original to produce something like the above). > > If A has slight changes since O1 but B did not change since O2, > ideally I think we would want the same thing to happen. Let's > call it case #16+. > > What does the current implementation do? It is not case #16 > because A and O1 does not exactly match. I suspect the result > will be skewed because B has an exact match with O2. Yes, in this case we miss whatever caused A to reject O2, and we use the modified O2, because we don't realize that A's rejection of O2 should also apply to the version in B. Unfortunately, this looks just like the situation where both sides took O1, and B did a further modification to that. > The situation becomes more interesting if both A and B has slight > changes since O1 and O2 respectively. They do not exactly match > with their bases, but I think ideally we would like something > very similar to case #16 resolution to happen. I think the right thing, ideally, is to have the content merge also take multiple ancestors and have a #16 case itself when it's deciding which version of a block to use. The #16+ case is actually trickier, because we have fewer cues. > One way to solve this would be to try doing things entirely in > read-tree by doing not just exact matches but also checking the > amount of changes -- if each heads has similar but different > base call it case #16 and try two-file merge between the heads > disregarding the bases. > > But I am a bit reluctant to suggest this. My gut feeling tells > me that these 'interesting' cases are easier if scripted outside > read-tree machinery to later enhance and improve the heuristics. > > Of course, the current case #16 detected by the exact match rule > should be something we can automatically handle, but to make > things safer to use I think we should have a way to detect case > #16+ situlation and avoid mistakenly favoring A over B (or vice > versa) only because one has slight modification while the other > does not. I think #16+ is extra uncommon, because it involves someone making an irrelevant modification to a patched version of a file while someone else reverts the patch. I'm actually interested in doing a big spiffy program to do merges with information drawn as needed from the history, stuff happening on a per-hunk level, and support for block moves. It'll take a while before it gets anywhere, but I still think it's likely that people won't hit #16+ and get unexpected behavior before it's ready. The main thing I'm unsure of is whether Fredrick's algorithm is actually not a better solution: it is possible to understand what happened leading up to a merge either by looking at the time after the common ancestors or by looking at the time before them. I think that the more recent history is a better guide, but the older history is easier to use; the case his version isn't good for, I think, is when the common ancestors of the sides are even more complicated to merge. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFH] Merge driver
On Fri, 9 Sep 2005, Junio C Hamano wrote: > Daniel Barkalow <[EMAIL PROTECTED]> writes: > > > It tries to make sure that there is room to put stuff for resolving a > > conflict without messing with modified files in the directory. > > I agree it can be used that way, but nobody seems to use it for > that purpose as far as I can tell hence my earlier comment. But > let's leave the door open by having them as independent > options. Ah, okay. I hadn't realized that resolve used -u for that call to read-tree. You're entirely right. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFH] Merge driver
On Fri, 9 Sep 2005, Junio C Hamano wrote: > I have several requests to people who are interested in merges > and read-tree changes. > > I am pretty much set to use the recent read-tree updates Daniel > has been working on. The only reason it has not hit the > "master" branch yet, except that it still has known leaks that > have not been plugged, is because read-tree is so fundamental to > everything we do, and I am trying to be extremely conservative > here. I've beaten it myself reasonably well and have not found > any regressions (except removal of --emu23 which I believe > nobody uses anyway), but I'd appreciate people to try it out and > see if it performs well for your dataset. > > If you are planning further surgery on read-tree code, please > base your changes on Daniel's rewrite to avoid your effort being > wasted. This request goes both to Chuck (active_cache > abstraction) and Fredrik (addition of 'ignore index and working > tree matching rules' [*1*]). > > A proposed merge driver 'git-merge' is in the proposed updates > branch. This is intended to be the top-level user interface to > the merge machinery which drives multiple merge strategy > scripts, and I am hoping that I can eventually (1) retire > 'git-resolve' and 'git-octopus' (they simply become merge > strategy scripts driven by 'git-merge') and (2) call 'git-merge' > from 'git-pull'. What I have in the proposed updates branch has > been fixed since my earlier message to the list and has a new > merge strategy script, in addition to 'resolve' and 'octopus', > called 'git-merge-multibase'. This uses Daniel's read-tree that > can use more than one merge bases. I request Daniel to give OK > to its name or suggest a better name for this script -- I would > even accept 'git-merge-barkalow' if you want ;-). I'd actually been thinking it would just go into the the "resolve" driver, with that going back to before it chose among merge-base outputs and just sending the whole list to read-tree. > If you are planning to implement a new merge strategy, please > use the ones in the proposed updates branch as examples, and > complain and suggest improvements if you find the interface > between the strategy scripts and the driver lacking. This > request goes primarily to Fredrik. I'm interested in doing the > renaming merge that would have helped HPA's klibc-kbuild vs > klibc case myself but if somebody else is so inclined please go > wild. > > And finally, a request to everybody; please try out 'git-merge' > and see how you like it. > > `git-merge` [-n] [-s ]...... > > -n:: > Do not show diffstat at the end of the merge. > > -s :: > use that merge strategy; can be given more than once to > specify them in the order they should be tried. If > there is no `-s` option, built-in list of strategies is > used instead. > > :: > our branch head commit. > > :: > other branch head merged into our branch. You need at > least one . Specifying more than one > obviously means you are trying an Octopus. > > Here is a sample transcript from a test resolving one of the > 'more-than-one-merge-base' commits Fredrik found in the kernel > repository (": siamese;" is my $PS1; " " is my $PS2). > > : siamese; git reset --hard b8112df71cae7d6a86158caeb19d215f56c4f9ab > : siamese; git merge -n \ > 'reproduce 0e396ee43e445cb7c215a98da4e76d0ce354d9d7' \ > HEAD 2089a0d38bc9c2cdd084207ebf7082b18cf4bf58 > Trying merge strategy resolve... > Trying to find the optimum merge base. > Trying simple merge. > Simple merge failed, trying Automatic merge. > Removing drivers/net/fmv18x.c > Auto-merging drivers/net/r8169.c. > merge: warning: conflicts during merge > ERROR: Merge conflict in drivers/net/r8169.c. > Removing drivers/net/sk_g16.c > Removing drivers/net/sk_g16.h > fatal: merge program failed > Rewinding the tree to pristine... > Trying merge strategy multibase... > Trying simple merge. > Simple merge failed, trying Automatic merge. > Removing drivers/net/fmv18x.c > Auto-merging drivers/net/r8169.c. > merge: warning: conflicts during merge > ERROR: Merge conflict in drivers/net/r8169.c. > Removing drivers/net/sk_g16.c > Removing drivers/net/sk_g16.h > fatal: merge program failed > Rewinding the tree to pristine... > Trying merge strategy octopus... > Rewinding the tree to pristine... > Using the multibase to prepare resolving by hand. > Trying simple merge. > Simple merge failed, trying Automatic merge. > Removing drivers/net/fmv18x.c > Auto-merging drivers/net/r8169.c. > merge: warning: conflicts during merge > ERROR: Merge conflict in drivers/net/r8169.c. > Removing drivers/net/sk_g16.c > Removing drivers/net/sk_g16.h > fatal: merge program failed > Automatic merge failed; fix up by hand > : siamese; git-update-cache --refresh >
Re: Multi-ancestor read-tree notes
On Thu, 8 Sep 2005, Junio C Hamano wrote: > Daniel Barkalow <[EMAIL PROTECTED]> writes: > > > I assume that what you want is something to include everything from two > > commits, which would give conflicts if a name is reused? > > My understanding is that Darrin wants to do what Linus did when > he merged gitk into git.git. > > Personally I think that is a specialized application and > something like the git-merge-projects script I posted as a > follow-up would be more appropriate than adding it to the > current merge discussion. Well, it's an easy addition to read-tree; just need a merge function which takes two entries and adds the non-NULL one in stage 0, or adds both if they both exist. git-merge-script probably shouldn't be the entry point to it, of course, but that part isn't my area anyway. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Multi-ancestor read-tree notes
On Thu, 8 Sep 2005, Darrin Thompson wrote: > On Mon, 2005-09-05 at 01:41 -0400, Daniel Barkalow wrote: > > I've got a version of read-tree which accepts multiple ancestors and does > > a merge using information from all of them. > > Do the multiple ancestors have to share a common parent? More to the > point, is this read-tree any more friendly to baseless merges? read-tree doesn't care about the relationships between its inputs; it's only interested in the trees. But using ancestors which aren't common is unlikely to give you desired results. I think, if you do read-tree a^ b^ a b, you will get everything into the index, but it'll all going to be conflicts. I assume that what you want is something to include everything from two commits, which would give conflicts if a name is reused? -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] A new merge algorithm, take 3
On Thu, 8 Sep 2005, Fredrik Kuivinen wrote: > The first one agrees with what was actually committed. For the second > one the difference between the tree produced by the algorithm and what > was committed is: > > diff --git a/include/net/ieee80211.h b/include/net/ieee80211.h > --- a/include/net/ieee80211.h > +++ b/include/net/ieee80211.h > @@ -425,9 +425,7 @@ struct ieee80211_stats { > > struct ieee80211_device; > > -#if 0 /* for later */ > #include "ieee80211_crypt.h" > -#endif > > #define SEC_KEY_1 (1<<0) > #define SEC_KEY_2 (1<<1) > > > I have looked at the files and common ancestors involved and I think > that this change have been introduced manually. I may have missed > something when I analysed it though... Certainly possible that it was done manually. > > > The merge cases reported by Tony Luck and Len Brown are both cleanly > > > merged by my code. > > > > Do they come out correctly? Both of those have cases which cannot be > > decided correctly with only the ancestor trees, due to one branch > > reverting a patch that was only in one ancestor. The correct result is to > > revert that patch, but figuring out that requires looking at more trees. I > > think your algorithm should work for this case, but it would be good to > > have verification. (IIRC, Len got the correct result while Tony got the > > wrong result and then corrected it later.) > > Len's merge case come out identically to the tree he committed. I have > described what I got for Tony's case in > <[EMAIL PROTECTED]> (my merge algorithm > produces the result Tony expected to get, but he didn't get that from > git-resolve-script). Good. It looks to me like this is a good algorithm in practice, then. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] A new merge algorithm, take 3
On Thu, 8 Sep 2005, Fredrik Kuivinen wrote: > On Wed, Sep 07, 2005 at 02:33:42PM -0400, Daniel Barkalow wrote: > > On Wed, 7 Sep 2005, Fredrik Kuivinen wrote: > > > > > Of the 500 merge commits that currently exists in the kernel > > > repository 19 produces non-clean merges with git-merge-script. The > > > four merge cases listed in > > > <[EMAIL PROTECTED]> are cleanly merged by > > > git-merge-script. Every merge commit which is cleanly merged by > > > git-resolve-script is also cleanly merged by git-merge-script, > > > furthermore the results are identical. There are currently two merges > > > in the kernel repository which are not cleanly merged by > > > git-resolve-script but are cleanly merged by git-merge-script. > > > > If you use my read-tree and change git-resolve-script to pass all of the > > ancestors to it, how does it do? I expect you'll still be slightly ahead, > > because we don't (yet) have content merge with multiple ancestors. You > > should also check the merge that Tony Luck reported, which undid a revert, > > as well as the one that Len Brown reported around the same time which had > > similar problems. I think maintainer trees are a much better test for a > > merge algorithm, because the kernel repository is relatively linear, while > > maintainers tend more to merge things back and forth. > > Junio tested some of the multiple common ancestor cases with your > version of read-tree and reported his results in > <[EMAIL PROTECTED]>. Oh, right. I'm clearly not paying enough attention here. > The two cases my algorithm merges cleanly and git-resolve-script do > not merge cleanly are 0e396ee43e445cb7c215a98da4e76d0ce354d9d7 and > 0c168775709faa74c1b87f1e61046e0c51ade7f3. Both of them have two common > ancestors. The second one have, as far as I know, not been tested with > your read-tree. Okay, I'll have to check whether the result I get seems right. I take it your result agrees with what the users actually produced by hand? > The merge cases reported by Tony Luck and Len Brown are both cleanly > merged by my code. Do they come out correctly? Both of those have cases which cannot be decided correctly with only the ancestor trees, due to one branch reverting a patch that was only in one ancestor. The correct result is to revert that patch, but figuring out that requires looking at more trees. I think your algorithm should work for this case, but it would be good to have verification. (IIRC, Len got the correct result while Tony got the wrong result and then corrected it later.) > You are probably right about the maintainer trees. I should have a > look at some of them. Do you know any specific repositories with > interesting merge cases? Not especially, except that I would guess that people who have reported hitting bad cases would be more likely to have other interesting merges in their trees. You might also try merging maintainer trees with each other, since it's relatively likely that there would be complicating overlap that only doesn't cause confusion because things get rearranged in -mm. For that matter, I bet you'd get plenty of test cases out of trying to replicate -mm as a git tree. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] A new merge algorithm, take 3
On Wed, 7 Sep 2005, Fredrik Kuivinen wrote: > Of the 500 merge commits that currently exists in the kernel > repository 19 produces non-clean merges with git-merge-script. The > four merge cases listed in > <[EMAIL PROTECTED]> are cleanly merged by > git-merge-script. Every merge commit which is cleanly merged by > git-resolve-script is also cleanly merged by git-merge-script, > furthermore the results are identical. There are currently two merges > in the kernel repository which are not cleanly merged by > git-resolve-script but are cleanly merged by git-merge-script. If you use my read-tree and change git-resolve-script to pass all of the ancestors to it, how does it do? I expect you'll still be slightly ahead, because we don't (yet) have content merge with multiple ancestors. You should also check the merge that Tony Luck reported, which undid a revert, as well as the one that Len Brown reported around the same time which had similar problems. I think maintainer trees are a much better test for a merge algorithm, because the kernel repository is relatively linear, while maintainers tend more to merge things back and forth. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Multi-ancestor read-tree notes
On Tue, 6 Sep 2005, Junio C Hamano wrote: > Daniel Barkalow <[EMAIL PROTECTED]> writes: > > > Good. (Although that patch doesn't seem to be directly on top of my > > version; I can tell what it's doing anyway) > > That one was against the proposed updates head. I've updated it > again to include the patch. > > > I'm happy with the content in "pu"; the issue is just whether you want the > > history cleaned up more. In the series I sent, I kept forgetting parts > > that belonged in earlier patches. > > Again, that is up to you. I am not _that_ perfectionist but I > do not mind reapplying updated ones if you are ;-). What's there is fine with me. (I'll work on improving the documentation as a further patch) > > Could you look over the documentation in > > Documentation/technical/trivial-merge.txt, and see if it's a > > suitable replacement for the table in > > t1000-read-tree-m-3way.sh? > > I do not understand what you meant by '*' and 'index+' in > one-way merge table. I take the first row ('*') to mean "If the > tree is missing a path, that path is removed from the index." '*' means that that case applies regardless of what's there. 'index+' means that it's the index, with the stat information. I forgot to actually explain the table before going on to the interesting section. > I like the second sentence in three-way merge description. That > is a very easy-to-understand description of what the index > requirements are. > > You have 2 2ALTs. Also 14 and 14ALT look like they are the same > rule now. Ah, right. I had originally listed "index" in the table, with separate cases for having it match the head and having it match the result, but then ditched that when I figured out how that actually works. > What's "(empty)^" in "ancest"? All of them must be empty for > this rule to apply? The '^' means that all must be like that. I have to check, but I think that 8ALT and 10ALT should be '+'. > I am not quite sure it is 'a suitable replacement' yet; the > existing table you can see it covers all the cases, but with > things like "'ancestor+' means one of them matches", I cannot > really tell the table covers all the cases or some cases fall of > the end of the chain. All of the "any ancestor" spots are good for covering things. Case #11 (which actually needs to be at the bottom) is basically "everything else". > Also when we have more than one ancestors or one remotes and we > say "no merge", it is still unspecified (and I have to admit I > cannot readily say what the result should be for all of them, > except that I agree #16 will be fine with an empty stage1) what > are left in which stages. Presently, except for case #16, only the first ancestor is used in "no merge" output. The right thing should be worked out and documented, of course. I'm not at all convinced at this point that we can do much with multiple remotes in a single application of the rules; you won't necessarily have the same merge base for all pairs, and all sorts of things go wrong if you start including ancestors that aren't related to something, or not including common ancestors of some pair. What might work is to have the error for an unmerged index only happen when you get to a "no merge" result, so that you can get as many conflicts as possible (in different files) resolved by the user at the same time. > I personally think the exotic cases (i.e. no rule applies, or > "no merge" result with more than one ancestors/remotes) needs to > be handled outside read-tree anyway, by the script that drives > read-tree to attempt trivial merges. I think case #16 would benefit from doing more stuff, but there aren't any holes in the rules, and I think that, for the multiple ancestors in "no merge", we just want to use the one with the least conflict. (Or, if we write our own merge, do a #16/#13,#14/#11 decision per-hunk in our merge, which is the really right thing). I think the common case for multiple ancestors will really be that you've got a side branch that split before the split you're resolving, and was merged into both sides before now; in this case, there's no big problem, and it's not the exotic cross-merge case. Of course, we won't see this in projects like the kernel and git, which aren't that amorphous. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Multi-ancestor read-tree notes
On Tue, 6 Sep 2005, Junio C Hamano wrote: > Daniel Barkalow <[EMAIL PROTECTED]> writes: > > > Do you know if there's anything like case #16 in there? I'd be interested > > to know if there's anything that gets handled automatically in different > > ways depending on which single base is used, and doesn't require manual > > intervention with multiple bases, because that's probably wrong. > > Re-running the tests with the attached patch shows there weren't any. Good. (Although that patch doesn't seem to be directly on top of my version; I can tell what it's doing anyway) > > Great. Want me to send the patches with better organization, or are you > > set with what I've sent? > > That's up to you. If you are content with what I have in the pu > branch, there is no need to bother resending. OTOH if you have > further clean-ups in mind, i.e. "better organization" above, I > do not mind dropping the current ones from "pu" and replace them > with another set from you. I'm happy with the content in "pu"; the issue is just whether you want the history cleaned up more. In the series I sent, I kept forgetting parts that belonged in earlier patches. Could you look over the documentation in Documentation/technical/trivial-merge.txt, and see if it's a suitable replacement for the table in t1000-read-tree-m-3way.sh? It should be the same, except for ALT or non-ALT versions that we're not using, combining a few matching cases, describing the rules behind index requirements rather than listing outcomes, and the addition of info on how multiple ancestors are handled. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Make sure the diff machinery outputs "\ No newline ..." in english
On Mon, 5 Sep 2005, Linus Torvalds wrote: > On Mon, 5 Sep 2005, Fredrik Kuivinen wrote: > > > > After a quick look through the diff source I didn't find anything > > else. It's quite possible that I haved missed something though. Most > > of the translated messages are related to error reporting, which I > > guess might be nice to have in the user specified language. > > Is it possible that we could integrate the "diff" algorithm into git, and > get rid of the dependency on an external GNU diff? It would also make the > portability problems go away (ie old diff's being broken). > > It would also potentially speed up the normal built-in diff a lot, since > we wouldn't have to execute a whole other program to generate a diff, just > call a helper function the way we do for xdiff.. > > Unreasonable? The algorithm actually used by GNU diff is pretty complicated, and I don't really understand the actual implementation, which evidentally has a few important refinements over the original paper. I've written my own diff, mainly to try a different algorithm, and it seems to work, but the code isn't yet appropriate to submit. This algorithm also has the advantage that it can identify moved sections and is less interested in interleaving a removed function with a new function to provide the shortest possible diff. I expect that I could get it to work if I put in a day on it; it's mostly writing a hashtable implementation for non-NULL-terminated string-keyed hash tables. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: bogus merges
On Mon, 5 Sep 2005, Linus Torvalds wrote: > On Mon, 5 Sep 2005, Wayne Scott wrote: > > > > A recent commit in linux-2.6 looks like this: > > It hopefully shouldn't happen any more with the improved and fixed > git-merge-base. Couldn't it also happen if there's stale data in MERGE_HEAD when you commit a normal patch? The description doesn't look like a merge at all, but rather like a normal patch that inappropriately picked up an extra head. I'd guess he tried to merge something, got a conflict, decided that he didn't really want to do that anyway, switched to a different branch, applied a patch, and committed without noticing the note that he seemed to be committing a merge. Probably the right thing is actually to clean up more when switching tasks, but it would probably also be worth checking that merges make sense as well. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Multi-ancestor read-tree notes
On Mon, 5 Sep 2005, Junio C Hamano wrote: > Daniel Barkalow <[EMAIL PROTECTED]> writes: > > > I've got a version of read-tree which accepts multiple ancestors and does > > a merge using information from all of them. > > After disabling the debugging printf(), I used this read-tree to > try resolving the parents of four commits Fredrik Kuivinen gave > us in <[EMAIL PROTECTED]> using > their two merge bases, and compared the resulting tree with the > tree recorded in the commit. The results are really promising. > > For the following two commits, multi-base merge resolved their > parents trivially and produced the same result as the tree in > the commit. The current "best-base merge" in the master branch > performed far worse and left many conflicts. > > - 467ca22d3371f132ee225a5591a1ed0cd518cb3d > - da28c12089dfcfb8695b6b555cdb8e03dda2b690 > > Another one, 0e396ee43e445cb7c215a98da4e76d0ce354d9d7, > multi-base merge left only one conflicting path to be hand > resolved. The best-base merge again performed far worse. > > The other one, 3190186362466658f01b2e354e639378ce07e1a9, is > resolved trivially with both algorithms. Do you know if there's anything like case #16 in there? I'd be interested to know if there's anything that gets handled automatically in different ways depending on which single base is used, and doesn't require manual intervention with multiple bases, because that's probably wrong. > > In case #16, I'm not sure what I should produce. I think the best thing > > might be to not leave anything in stage 1. > > Because? I know it would affect the readers of index files if > you did so, but it would seem the most natural in git > architecture to have merge-cache look at the resulting cache > with such multiple stage 1 entries (and other stages) and let > the script make a decision. I didn't want to break the assumption of only one entry per stage in the initial version. I'm also not sure that listing the ancestors is particularly useful in this case. They have to be exactly the contents of stages 2 and 3, plus possibly more stuff that's not been kept by either side. What you actually want is a two-way merge (i.e., a diff between the two sides, presented in "merge" format), so you don't really need any ancestors, unless it would fit some more general case that way. > > The desired end effect is that the user is given a file with a > > section like: > > > > { > > *t = NULL; > > *m = 0; > > <<<<<<<< > > return Z_DATA_ERROR; > > > > return Z_OK; > >>>>>>>>> > > } > > Sounds fine. > > Anyway, I really am happy to see this multi-base merge perform > well on real-world data, and you are certainly the git hero of > the week ;-). Great. Want me to send the patches with better organization, or are you set with what I've sent? -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] Document the trivial merge rules for 3(+more ancestors)-way merges.
Signed-off-by: Daniel Barkalow --- Documentation/technical/trivial-merge.txt | 92 + 1 files changed, 92 insertions(+), 0 deletions(-) create mode 100644 Documentation/technical/trivial-merge.txt 7544be0a8eda7b796150729a7795c2639278da62 diff --git a/Documentation/technical/trivial-merge.txt b/Documentation/technical/trivial-merge.txt new file mode 100644 --- /dev/null +++ b/Documentation/technical/trivial-merge.txt @@ -0,0 +1,92 @@ +Trivial merge rules +=== + +This document describes the outcomes of the trivial merge logic in read-tree. + +One-way merge +- + +This replaces the index with a different tree, keeping the stat info +for entries that don't change, and allowing -u to make the minimum +required changes to the working tree to have it match. + + index treeresult + --- + * (empty) (empty) + (empty) treetree + index+ treetree + index+ index index+ + +Two-way merge +- + + + +Three-way merge +--- + +It is permitted for the index to lack an entry; this does not prevent +any case from applying. + +If the index exists, it is an error for it not to match either the +head or (if the merge is trivial) the result. + +If multiple cases apply, the one used is listed first. + +A result of "no merge" means that index is left in stage 0, ancest in +stage 1, head in stage 2, and remote in stage 3 (if any of these are +empty, no entry is left for that stage). Otherwise, the given entry is +left in stage 0, and there are no other entries. + +A result of "no merge" is an error if the index is not empty and not +up-to-date. + +*empty* means that the tree must not have a directory-file conflict + with the entry. + +For multiple ancestors or remotes, a '+' means that this case applies +even if only one ancestor or remote fits; normally, all of the +ancestors or remotes must be the same. + +case ancestheadremoteresult + +1 (empty)+ (empty) (empty) (empty) +2ALT (empty)+ *empty* remoteremote +2ALT (empty)+ *empty* remoteremote +2 (empty)^ (empty) remoteno merge +3ALT (empty)+ head*empty* head +3 (empty)^ head(empty) no merge +4 (empty)^ headremoteno merge +5ALT * headhead head +6 ancest^ (empty) (empty) no merge +8ALT ancest(empty) ancest(empty) +7 ancest+ (empty) remoteno merge +9 ancest+ head(empty) no merge +10ALT ancestancest (empty) (empty) +11ancest+ headremoteno merge +16anc1/anc2 anc1anc2 no merge +13ancest+ headancesthead +14ancest+ ancest remoteremote +14ALT ancest+ ancest remoteremote + +Only #2ALT and #3ALT use *empty*, because these are the only cases +where there can be conflicts that didn't exist before. Note that we +allow directory-file conflicts between things in different stages +after the trivial merge. + +A possible alternative for #6 is (empty), which would make it like +#1. This is not used, due to the likelihood that it arises due to +moving the file to multiple different locations or moving and deleting +it in different branches. + +Case #1 is included for completeness, and also in case we decide to +put on '+' markings; any path that is never mentioned at all isn't +handled. + +Note that #16 is when both #13 and #14 apply; in this case, we refuse +the trivial merge, because we can't tell from this data which is +right. This is a case of a reverted patch (in some direction, maybe +multiple times), and the right answer depends on looking at crossings +of history or common ancestors of the ancestors. + +The status as of Sep 5 is that multiple remotes are not supported \ No newline at end of file - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] Rewrite read-tree
Adds support for multiple ancestors, removes --emu23, much simplification. Signed-off-by: Daniel Barkalow <[EMAIL PROTECTED]> --- read-tree.c | 811 +++-- t/t1005-read-tree-m-2way-emu23.sh | 422 --- 2 files changed, 425 insertions(+), 808 deletions(-) delete mode 100755 t/t1005-read-tree-m-2way-emu23.sh f196469bec156947038f1d3d00c899c9044334ca diff --git a/read-tree.c b/read-tree.c --- a/read-tree.c +++ b/read-tree.c @@ -5,73 +5,291 @@ */ #include "cache.h" -static int stage = 0; +#include "object.h" +#include "tree.h" + +static int merge = 0; static int update = 0; -static int unpack_tree(unsigned char *sha1) -{ - void *buffer; - unsigned long size; - int ret; +static int head_idx = -1; +static int merge_size = 0; - buffer = read_object_with_reference(sha1, "tree", &size, NULL); - if (!buffer) - return -1; - ret = read_tree(buffer, size, stage, NULL); - free(buffer); +static struct object_list *trees = NULL; + +static struct cache_entry df_conflict_entry = { +}; + +static struct tree_entry_list df_conflict_list = { + .name = NULL, + .next = &df_conflict_list +}; + +typedef int (*merge_fn_t)(struct cache_entry **src); + +static int entcmp(char *name1, int dir1, char *name2, int dir2) +{ + int len1 = strlen(name1); + int len2 = strlen(name2); + int len = len1 < len2 ? len1 : len2; + int ret = memcmp(name1, name2, len); + unsigned char c1, c2; + if (ret) + return ret; + c1 = name1[len]; + c2 = name2[len]; + if (!c1 && dir1) + c1 = '/'; + if (!c2 && dir2) + c2 = '/'; + ret = (c1 < c2) ? -1 : (c1 > c2) ? 1 : 0; + if (c1 && c2 && !ret) + ret = len1 - len2; return ret; } -static int path_matches(struct cache_entry *a, struct cache_entry *b) +static int unpack_trees_rec(struct tree_entry_list **posns, int len, + const char *base, merge_fn_t fn, int *indpos) { - int len = ce_namelen(a); - return ce_namelen(b) == len && - !memcmp(a->name, b->name, len); + int baselen = strlen(base); + int src_size = len + 1; + do { + int i; + char *first; + int firstdir = 0; + int pathlen; + unsigned ce_size; + struct tree_entry_list **subposns; + struct cache_entry **src; + int any_files = 0; + int any_dirs = 0; + char *cache_name; + int ce_stage; + + /* Find the first name in the input. */ + + first = NULL; + cache_name = NULL; + + /* Check the cache */ + if (merge && *indpos < active_nr) { + /* This is a bit tricky: */ + /* If the index has a subdirectory (with +* contents) as the first name, it'll get a +* filename like "foo/bar". But that's after +* "foo", so the entry in trees will get +* handled first, at which point we'll go into +* "foo", and deal with "bar" from the index, +* because the base will be "foo/". The only +* way we can actually have "foo/bar" first of +* all the things is if the trees don't +* contain "foo" at all, in which case we'll +* handle "foo/bar" without going into the +* directory, but that's fine (and will return +* an error anyway, with the added unknown +* file case. +*/ + + cache_name = active_cache[*indpos]->name; + if (strlen(cache_name) > baselen && + !memcmp(cache_name, base, baselen)) { + cache_name += baselen; + first = cache_name; + } else { + cache_name = NULL; + } + } + + if (first) + printf("index %s\n", first); + + for (i = 0; i < len; i++) { + if (!posns[i] || posns[i] == &df_conflict_list) + continue; + printf("%d %s\n", i + 1, posns[i]->name); +
[PATCH 2/4] Add function to append to an object_list.
Signed-off-by: Daniel Barkalow <[EMAIL PROTECTED]> --- object.c | 11 +++ object.h |3 +++ 2 files changed, 14 insertions(+), 0 deletions(-) 88cf2db55848e7a2cf655171c7e9fd74c70a0281 diff --git a/object.c b/object.c --- a/object.c +++ b/object.c @@ -184,6 +184,17 @@ struct object_list *object_list_insert(s return new_list; } +void object_list_append(struct object *item, + struct object_list **list_p) +{ + while (*list_p) { + list_p = &((*list_p)->next); + } + *list_p = xmalloc(sizeof(struct object_list)); + (*list_p)->next = NULL; + (*list_p)->item = item; +} + unsigned object_list_length(struct object_list *list) { unsigned ret = 0; diff --git a/object.h b/object.h --- a/object.h +++ b/object.h @@ -41,6 +41,9 @@ void mark_reachable(struct object *obj, struct object_list *object_list_insert(struct object *item, struct object_list **list_p); +void object_list_append(struct object *item, + struct object_list **list_p); + unsigned object_list_length(struct object_list *list); int object_list_contains(struct object_list *list, struct object *obj); - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] Add a function for getting a struct tree for an ent.
Signed-off-by: Daniel Barkalow <[EMAIL PROTECTED]> --- tree.c | 21 + tree.h |3 +++ 2 files changed, 24 insertions(+), 0 deletions(-) 3bfcc20b6aeff3e1fbcce97a426383c9770a2105 diff --git a/tree.c b/tree.c --- a/tree.c +++ b/tree.c @@ -1,5 +1,7 @@ #include "tree.h" #include "blob.h" +#include "commit.h" +#include "tag.h" #include "cache.h" #include @@ -212,3 +214,22 @@ int parse_tree(struct tree *item) free(buffer); return ret; } + +struct tree *parse_tree_indirect(const unsigned char *sha1) +{ + struct object *obj = parse_object(sha1); + do { + if (!obj) + return NULL; + if (obj->type == tree_type) + return (struct tree *) obj; + else if (obj->type == commit_type) + obj = &(((struct commit *) obj)->tree->object); + else if (obj->type == tag_type) + obj = ((struct tag *) obj)->tagged; + else + return NULL; + if (!obj->parsed) + parse_object(obj->sha1); + } while (1); +} diff --git a/tree.h b/tree.h --- a/tree.h +++ b/tree.h @@ -32,4 +32,7 @@ int parse_tree_buffer(struct tree *item, int parse_tree(struct tree *tree); +/* Parses and returns the tree in the given ent, chasing tags and commits. */ +struct tree *parse_tree_indirect(const unsigned char *sha1); + #endif /* TREE_H */ - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] Support multiple ancestors in read-tree
Various messages have already described this series. There's still a memory leak that should get resolved, but otherwise it should work. I'm not entirely sure that all directory-file conflict cases are handled properly, and some undefined cases behave differently. Also, I was a bit careless with preparing the patches. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Multi-ancestor read-tree notes
I've got a version of read-tree which accepts multiple ancestors and does a merge using information from all of them. The basic features are that it looks for an ancestor which would permit a trivial merge, and uses that. However, if it finds ancestors which permit different trivial merges, it does not merge (which I call case #16). In case #16, I'm not sure what I should produce. I think the best thing might be to not leave anything in stage 1. The desired end effect is that the user is given a file with a section like: { *t = NULL; *m = 0; return Z_DATA_ERROR; return Z_OK; } In other news, the merge that was giving Len Brown problems a while ago turns out to have the above conflict, and he happened to end up doing the right thing and not reverting Linus's revert of an unnecessary (but harmless) change. I only noticed this just now, when I was testing that merge, and got it to generate only two conflicts regardless of order of ancestors (didn't try to resolve the other one, drivers/acpi/osl.c, with "merge" either way). So this test is encouraging: I get fewer non-trivial cases than either of the ancestors alone gives, and I catch a case that both single ancestors gets wrong. Note that there are still some memory leaks for me to fix, but that's the only flaw I know of with this. Patches against mainline to follow shortly. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Reorganize read-tree
On Sun, 4 Sep 2005, Junio C Hamano wrote: > Daniel Barkalow <[EMAIL PROTECTED]> writes: > > > I got mostly done with this before Linus mentioned the possibility of > > having multiple index entries in the same stage for a single path. I > > finished it anyway, but I'm not sure that we won't want to know which of > > the common ancestors contributed which, and, if some of them don't have a > > path, we wouldn't be able to tell. The other advantages I see to this > > approach are: > > I've finished reading your patch, after beating it reasonably > heavily by feeding combinations of nonsense trees to make sure > it produces the same result as the original implementation. I > have not found any regression from the read-tree in "master" > branch, after you fixed the path ordering issues. Good. > > There are various potential refinements, plus removing a bunch of memory > > leaks, still to do, but I think this is sufficiently close to review. > > I am not so worried about the leaks right now; they are > something that could be fixed before it hits the "master" > branch. Right. > I like your approach of reading the input trees, along with the > existing index contents, and re-populating the index one path at > a time. It probably is more readable. > > I further think that you can get the best of both worlds, by > inventing a convention that mode=0 entry means 'this path does > not exist in this tree'. This would allow you to have multiple > entries at the same stage and still tell which one came from > which tree. Instead of calling fn in unpack_trees(), you could > make it only unpack the tree into the index, and then after > unpacking is done, call fn() repeatedly to resolve the resulting > index. I think that almost all of the benefit actually comes from calling fn() in unpack_trees() and not putting anything in the index before merging. Without that, you need the complex index management and the complicated search for DF conflicts. The main point of not reading everything into the index before calling fn() on stuff is that the index is actually really difficult to deal with in this situation (because you are simultaneously moving through it, removing and modifying entries, and searching it for conflicts). The improvement in readability comes from not doing this. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Moved files and merges
On Sun, 4 Sep 2005, Junio C Hamano wrote: > Sam Ravnborg <[EMAIL PROTECTED]> writes: > > > If the problem is not fully understood it can be difficult to come up > > with the proper solution. And with the example above the problem should > > be really easy to understand. > > Then we have the tree as used by hpa with a few more mergers in it. But > > the above is what was initial tried to do with the added complexity of a > > few more renames etc. > > All true. Let's redraw that simplified scenario, and see if > what I said still holds. It may be interesting to store my > previous message and this one and run diff between them. I > suspect that the main difference to come out would be the the > problem description part and the merge machinery part would not > be all that different. I'm not quite so convinced, because I think that the actual situation is a bit more natural, and therefore our expectations at the end should be closer to right with less attention to detail. But I think the actual situation is more interesting, anyway, because it's more likely to happen and we're more likely to be able to help. > > This is a simplified scenario of klibc vs klibc-kbuild HPA had > trouble with, to help us think of a way to solve this > interesting merge problem. > >#1 - #3 - #5 - #7 >// / > #0 - #2 - #4 - #6 > > There are two lines of developments. #0->#1 renames F to G and > introduces K. #0->#2 keeps F as F and does not introduce K. > > At commit #3, #2 is merged into #1. The changes made to the > file contents of F between #0 and #2 are appreciated, but we > would also want to keep our decision to rename F to G and our > new file K. So commit #3 has the resulting merge contents in G > and has K, inherited from #1. This _might_ be different from > what we traditionally consider a 'merge', but from the use case > point of view it is a valid thing one would want to do. I think this is actually quite a regular merge, and I think we should be able to offer some assistance. The situation with K is normal: case #3ALT. If someone introduces a file and there's no file or directory with that name in other trees, we assume that the merge should include it. F/G is trickier, and I don't think we can actually do much about it with the current structure of read-tree/merge-cache/etc, but, theoretically, we should recognize that #0->#1 is a rename plus content changes, and #0->#2 is content changes, so the total should be the rename plus contents changes; I think we want to additionally signal a conflict, because there's a reasonable chance that the rename will interfere with the #0->#2 changes, and need intervention. Most likely, this just means that we should not commit automatically, but have the user test the result first. For now, of course, we don't get renames at any point in the merging procedure, so our code can't tell, and sees it as a big conflict that the user has to deal with. But we can agree on what the result is if the user "includes all the changes from the other branch" (and see the situation you reported first as "cherry-picking" the content and leaving the structural changes). > Commit #4 is a continued development from #2; changes are made > to F, and there is no K. Commit #5 similarly is a continued > development from #3; its changes are made to G and K also has > further changes. > > We are about to merge #6 into #5 to create #7. We should be > able to take advantage of what the user did when the merge #3 > was made; namely, we should be able to infer that the line of > development that flows #0 .. #3 .. #7 prefers to rename F to G, > and also wants the newly introduced K. We should be able to > tell it by looking at what the merge #3 did. Again, K should be unexceptional, because we're keeping a file that was added to one side but not the other. (In the other situation, it still works; relative to the common ancestor, we're in #8ALT, since #5 doesn't have K, which was in #2 and #6; we see the rejection in a merge as a removal, which is effectively the same.) > Now, how can we use git to figure that out? First off, it should handle K automatically, because we're still including a file added by one side without interference from the other side. > First, given our current head (#5) and the other head we are > about to merge (#6), we need a way to tell if we merged from > them before (i.e. the existence of #3) and if so the latest of > such merge (i.e. #3). > > The merge base between #5 and #6 is #2. We can look at commits > between us (#5) and the merge base (#2), find a merge (#3), > which has two parents. One of the parents is #2 which is > reachable from #6, and the other is #1 which is not reachable > from #6 but is reachable from #5. Can we say that this reliably > tells us that #2 is on their side and #1 is on our side? Does > the fact that #3 is the commit topologically closest to #5 tell > us that #3
Re: Tool renames? was Re: First stab at glossary
On Sat, 3 Sep 2005, Junio C Hamano wrote: > Daniel Barkalow <[EMAIL PROTECTED]> writes: > > > I think "fetch" is more applicable to what they do. > > OK. then they are git-http-fetch and friends. How about > git-ssh-push? The counterpart of fetch-pack/clone-pack is > called upload-pack, so would git-ssh-upload make things more > consistent? I dunno. I like that idea. > > I don't think it matters very much whether something is a script or not; > > on the other hand, it would be good to have "git" list a reasonable set of > > commands to use through the interface, which would exclude, for example, > > git-merge-one-file-script, and include the above commands. > > Are you suggesting to drop -script from git-merge-one-file? > Then git-cherry should perhaps keep its current name. I'd suggest it get a different ending, like .sh or -helper. That way, it's distinct both from binaries and from scripts that people run directly. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Tool renames? was Re: First stab at glossary
On Fri, 2 Sep 2005, Junio C Hamano wrote: > I said: > > > I'll draw up a strawman tonight unless somebody else > > does it first. > > 1. Say 'index' when you are tempted to say 'cache'. > > git-checkout-cache git-checkout-index > git-convert-cache git-convert-index > git-diff-cache git-diff-index > git-fsck-cache git-fsck-index > git-merge-cache git-merge-index > git-update-cachegit-update-index Agreed, except that git-convert-cache and git-fsck-cache actually have nothing to do this the index by any name, and should probably be git-convert-objects and git-fsck-objects. > 2. The act of combining two or more heads is called 'merging'; >fetching immediately followed by merging is called 'pulling'. > > git-resolve-script git-merge-script > >The commit walkers are called *-pull, but this is probably >confusing. They are not pulling. > > git-http-pull git-http-walk > git-local-pull git-local-walk > git-ssh-pullgit-ssh-walk I think "fetch" is more applicable to what they do. > 3. Non-binaries are called '*-scripts'. > >In earlier discussions some people seem to like the >distinction between *-script and others; I did not >particularly like it, but I am throwing this in for >discussion. > > git-applymbox git-applymbox-script > git-applypatch git-applypatch-script > git-cherry git-cherry-script > git-shortloggit-shortlog-script > git-whatchanged git-whatchanged-script I don't think it matters very much whether something is a script or not; on the other hand, it would be good to have "git" list a reasonable set of commands to use through the interface, which would exclude, for example, git-merge-one-file-script, and include the above commands. > 4. To be removed shortly. > > git-clone-dumb-http should be folded into git-clone-script Agreed. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Tool renames? was Re: First stab at glossary
On Thu, 1 Sep 2005, Junio C Hamano wrote: > Tim Ottinger <[EMAIL PROTECTED]> writes: > > > git-update-cache for instance? > > I am not sure which 'cache' commands need to be 'index' now. > > Logically you are right, but I suspect that may not fly well in > practice. Too many of us have already got our fingers wired to > type cache, and the glossary is there to describe both cache and > index. My vote's for changing the official names, but keeping symlinks for the old names. As far as I know, there aren't any actual conflicts, and we might as well have new users pick up the logical names. I particularly think "git merge" would be really good to have. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/2] Remove emu23, fix entry order
A few things to improve testing. I'll clean up the series as a whole once it's tested. This removes the emu23 tests; I think that the only DF conflict tests were in that set, however, so these should be fished out and added to something else. Signed-off-by: Daniel Barkalow <[EMAIL PROTECTED]> --- read-tree.c | 89 +++- t/t1005-read-tree-m-2way-emu23.sh | 422 - 2 files changed, 37 insertions(+), 474 deletions(-) delete mode 100755 t/t1005-read-tree-m-2way-emu23.sh 63092a4dfb2042e8fc21260b2f315b01e9163940 diff --git a/read-tree.c b/read-tree.c --- a/read-tree.c +++ b/read-tree.c @@ -9,7 +9,6 @@ #include "tree.h" static int merge = 0; -static int emu23 = 0; static int update = 0; static struct object_list *trees = NULL; @@ -19,19 +18,39 @@ typedef int (*merge_fn_t)(struct cache_e int df_conflicts_2, int df_conflicts_3); +static int entcmp(char *name1, int dir1, char *name2, int dir2) +{ + int len1 = strlen(name1); + int len2 = strlen(name2); + int len = len1 < len2 ? len1 : len2; + int ret = memcmp(name1, name2, len); + unsigned char c1, c2; + if (ret) + return ret; + c1 = name1[len]; + c2 = name2[len]; + if (!c1 && dir1) + c1 = '/'; + if (!c2 && dir2) + c2 = '/'; + ret = (c1 < c2) ? -1 : (c1 > c2) ? 1 : 0; + if (c1 && c2 && !ret) + ret = len1 - len2; + return ret; +} + static int unpack_trees_rec(struct tree_entry_list **posns, int len, const char *base, merge_fn_t fn, int file2, int file3, int *indpos) { int baselen = strlen(base); int src_size = len + 1; - if (emu23) - src_size++; if (src_size > 4) src_size = 4; do { int i; char *first = NULL; + int firstdir = 0; int pathlen; unsigned ce_size; int dir2 = 0; @@ -73,11 +92,23 @@ static int unpack_trees_rec(struct tree_ } } + /* + if (first) + printf("%s\n", first); + */ + for (i = 0; i < len; i++) { if (!posns[i]) continue; - if (!first || strcmp(first, posns[i]->name) > 0) + /* + printf("%d %s\n", i + 1, posns[i]->name); + */ + if (!first || entcmp(first, firstdir, +posns[i]->name, +posns[i]->directory) > 0) { first = posns[i]->name; + firstdir = posns[i]->directory; + } } /* No name means we're done */ if (!first) @@ -94,19 +125,6 @@ static int unpack_trees_rec(struct tree_ src_size); src[0] = active_cache[*indpos]; remove_cache_entry_at(*indpos); - if (emu23) { - // we need this in stage 2 as well as stage 0 - struct cache_entry *copy = - xmalloc(ce_size); - memcpy(copy, src[0], ce_size); - copy->ce_flags = - create_ce_flags(baselen + pathlen, 2); - if (dir2 || file2) { - die("cannot merge index and our head tree"); - } - src[2] = copy; - subfile2 = 1; - } } for (i = 0; i < len; i++) { @@ -125,8 +143,6 @@ static int unpack_trees_rec(struct tree_ } else { ce_stage = i + merge; } - if (emu23 && ce_stage == 2) - ce_stage = 3; if (posns[i]->directory) { if (!subposns) { @@ -137,8 +153,6 @@ static int unpack_trees_rec(struct tree_ parse_tree(posns[i]->item.tree); subposns[i] = posns[i]->item.tree->entries; posns[i] = posns[i]->next; - if (emu23 && ce_stage == 1) -
Re: Reworked read-tree.
On Thu, 1 Sep 2005, Junio C Hamano wrote: > Daniel, I do not know what your current status is, but I think > you need something like this. Yup, I forgot to actually test that functionality. > --- > diff --git a/tree.c b/tree.c > --- a/tree.c > +++ b/tree.c > @@ -224,10 +224,12 @@ struct tree *parse_tree_indirect(const u > if (obj->type == tree_type) > return (struct tree *) obj; > else if (obj->type == commit_type) > - return ((struct commit *) obj)->tree; > + obj = (struct object *)(((struct commit *) obj)->tree); obj = &((struct commit *) obj)->tree->object; Multiple sequential casts always bother me, and we do actually have a field for this. > else if (obj->type == tag_type) > - obj = ((struct tag *) obj)->tagged; > + obj = deref_tag(obj); Shouldn't be necessary (once you've got the parse_object below); we're already in a loop dereferencing things. > else > return NULL; > + if (!obj->parsed) > + parse_object(obj->sha1); > } while (1); > } > > - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Couple of read-tree questions
On Wed, 31 Aug 2005, Junio C Hamano wrote: > Daniel Barkalow <[EMAIL PROTECTED]> writes: > > > Is there any current use for read-tree with multiple trees without -m or > > equivalent? > > I did not know it even allowed multiple trees without -m, but > you are right. It does not seem to complain. > > I have never thought about using multiple trees without -m, and > I do not remember hearing any plan nor purpose of using it to do > something interesting from Linus. I think its allowing multiple > trees without -m is simply a bug. I guess it was probably that its behavior was obvious and didn't require any extra code. It still follows entirely from one tree without -m, but it might be worth prohibiting unless someone has a reason to do it intentionally. > > Why does --emu23 use I+H for stage 2, rather than just I? Wouldn't this > > just reintroduce removed files? > > They are not "removed files", at least in the original context. > > The original intention was that git was supposed to work without > having _any_ files in the working tree. The reason why > multi-tree read-tree has so many special cases that says "must > match *if* work file exists", is that not having a corresponding > working file was supposed to be equivalent to having the file > checked out *and* unmodified. But they'd not only be missing from the working tree but also from the (pre-read-tree) index, which should only happen, assuming the index came from "read-tree H", if they were subsequently removed from the index. I'd understand treating index entries for files missing from the working tree as up to date. (The thread you mention seems to say that we accept entries being missing from the index as if they were unchanged, but I don't see a good reason for this; you'd be dealing with the full set in the index for the merge, even if you don't have a populated working tree) > I do not think anybody currently uses --emu23. I did it because > it has a potential of making the two-tree fast forward (which is > used in "git checkout" to switch between branches) easier to > manage when the working tree is dirty than doing straight > two-tree merge, but that is just a theoretical potential never > tested in the field. Frankly, I do not mind, and I do not think > anybody else minds, too much if you need to break or remove > emu23 if that would make your code clean-up and redoing > read-tree easier. I should have asked sooner, then. :) There's a bunch of clutter to get it to work that I can remove if it's not actually necessary. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Stgit - patch history / add extra parents
On Tue, 30 Aug 2005, Catalin Marinas wrote: > Back from holiday. Thanks to all who replied to this thread. > > On Tue, 2005-08-23 at 14:05 -0400, Daniel Barkalow wrote: > > Having a useful diff isn't really a requirement for a parent; the diff in > > the case of a merge is going to be the total of everything that happened > > elsewhere. The point is to be able to reach some commits between which > > there are interesting diffs. > > > > This also depends on how exactly freeze is used; if you use it before > > commiting a modification to the patch without rebasing, you get: > > > > old-top -> new-top > > ^^ > >\ / > > bottom > > > > bottom to old-top is the old patch > > bottom to new-top is the new patch > > old-top to new-top is the change to the patch > > > > Then you want to keep new-top as a parent for rebasings until one of these > > is frozen. These links are not interesting to look at, but preserve the > > path to the old-top:new-top change, which is interesting. > > This was my initial StGIT implementation (up to version 0.3), only that > there was no freeze command. Since I want an StGIT tree to be clean to > the outside world, I wouldn't keep multiple parents for the visible top > of a patch. > > As I understand from Junio's and Linus' e-mails (on the 23rd of August), > there might be problems with merging the HEAD of an StGIT-managed tree > if the above method is accessible via HEAD. Right, you'd want a separate head which is what you ask people to merge; the rest is only visible to people who are working on preparing the patch. But you could keep both sets of stuff (sharing tree objects but not commits). > > Ignoring the links to the corresponding bottoms, the development therefore > > looks like: > > > > local1 -> local2 -> merge -> local3 -> merge > > ^ ^ ^ > > mainline>-->->-->-->-> > > > > And this is how development is normally supposed to look. The trick is to > > only include a minimal number of merges. > > A merge occurs every time a patch is rebased. Anyway, having the bottoms > in the graph (which is the main idea of StGIT) together with the old-top > (or frozen state) parents make the graph pretty complicated. It should be possible to drop merges such that there's only one between any pair of local changes. That is, if you rebase at the end of the line above, it would get as parents local3 and the new bottom, not the last merge and the new bottom. The mainline changes only come in through the bottoms, so higher levels should look the same, but with the lower levels in the place of mainline. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Reorganize read-tree
On Wed, 31 Aug 2005, Catalin Marinas wrote: > Daniel Barkalow <[EMAIL PROTECTED]> wrote: > > I got mostly done with this before Linus mentioned the possibility of > > having multiple index entries in the same stage for a single path. I > > finished it anyway, but I'm not sure that we won't want to know which of > > the common ancestors contributed which, and, if some of them don't have a > > path, we wouldn't be able to tell. > > I don't have time to look at the patch and I don't have a good > knowledge of the GIT internals, so I will just ask. Does this patch > changes the call convention for git-merge-one-file-script? I have my > own script for StGIT and I would need to know whether it is affected > or not. Nope, it only changes the trivial merge calling convention within read-tree.c; I think it's plausible that we might like to add information at some point, but the short-term goal is just to prevent a few bad cases in trivial merges. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Reorganize read-tree
On Tue, 30 Aug 2005, Junio C Hamano wrote: > Dan, I really really *REALLY* wanted to try this out in "pu" > branch and even was about to rig some torture chamber for > testing before applying the patch, but you got the shiny blue > bat X-<. I'll send a replacement with the settings correct. > A patch to SubmittingPatches, MUA specific help section for > users of Pine 4.63 would be very much appreciated. Ah, it looks like a recent version changed the default behavior to do the right thing, and inverted the sense of the configuration option. (Either that or Gentoo did it.) So you need to set the "no-strip-whitespace-before-send" option, unless the option you have is "strip-whitespace-before-send", in which case you should avoid checking it. I don't actually have things set up for preparing patches from work, although I can resend the patches I prepared earlier. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2 (resend)] Object model additions for read-tree
Adds object_list_append() and a function to get the struct tree from an ent. Signed-off-by: Daniel Barkalow <[EMAIL PROTECTED]> --- object.c | 11 +++ object.h |3 +++ tree.c | 19 +++ tree.h |3 +++ 4 files changed, 36 insertions(+), 0 deletions(-) 49d33c385aa69d17c991300f73e77c6718a2b4a6 diff --git a/object.c b/object.c --- a/object.c +++ b/object.c @@ -184,6 +184,17 @@ struct object_list *object_list_insert(s return new_list; } +void object_list_append(struct object *item, + struct object_list **list_p) +{ + while (*list_p) { + list_p = &((*list_p)->next); + } + *list_p = xmalloc(sizeof(struct object_list)); + (*list_p)->next = NULL; + (*list_p)->item = item; +} + unsigned object_list_length(struct object_list *list) { unsigned ret = 0; diff --git a/object.h b/object.h --- a/object.h +++ b/object.h @@ -41,6 +41,9 @@ void mark_reachable(struct object *obj, struct object_list *object_list_insert(struct object *item, struct object_list **list_p); +void object_list_append(struct object *item, + struct object_list **list_p); + unsigned object_list_length(struct object_list *list); int object_list_contains(struct object_list *list, struct object *obj); diff --git a/tree.c b/tree.c --- a/tree.c +++ b/tree.c @@ -1,5 +1,7 @@ #include "tree.h" #include "blob.h" +#include "commit.h" +#include "tag.h" #include "cache.h" #include @@ -212,3 +214,20 @@ int parse_tree(struct tree *item) free(buffer); return ret; } + +struct tree *parse_tree_indirect(const unsigned char *sha1) +{ + struct object *obj = parse_object(sha1); + do { + if (!obj) + return NULL; + if (obj->type == tree_type) + return (struct tree *) obj; + else if (obj->type == commit_type) + return ((struct commit *) obj)->tree; + else if (obj->type == tag_type) + obj = ((struct tag *) obj)->tagged; + else + return NULL; + } while (1); +} diff --git a/tree.h b/tree.h --- a/tree.h +++ b/tree.h @@ -32,4 +32,7 @@ int parse_tree_buffer(struct tree *item, int parse_tree(struct tree *tree); +/* Parses and returns the tree in the given ent, chasing tags and commits. */ +struct tree *parse_tree_indirect(const unsigned char *sha1); + #endif /* TREE_H */ - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2 (resend)] Change read-tree to merge before using the index.
Signed-off-by: Daniel Barkalow <[EMAIL PROTECTED]> --- read-tree.c | 522 ++- 1 files changed, 297 insertions(+), 225 deletions(-) d0f45ad81db2e133c49c23bd09c5615da344bb5c diff --git a/read-tree.c b/read-tree.c --- a/read-tree.c +++ b/read-tree.c @@ -5,28 +5,280 @@ */ #include "cache.h" -static int stage = 0; +#include "object.h" +#include "tree.h" + +static int merge = 0; +static int emu23 = 0; static int update = 0; -static int unpack_tree(unsigned char *sha1) +static struct object_list *trees = NULL; + +typedef int (*merge_fn_t)(struct cache_entry **src, + struct cache_entry **dest, + int df_conflicts_2, + int df_conflicts_3); + +static int unpack_trees_rec(struct tree_entry_list **posns, int len, + const char *base, merge_fn_t fn, + int file2, int file3, int *indpos) +{ + int baselen = strlen(base); + int src_size = len + 1; + if (emu23) + src_size++; + if (src_size > 4) + src_size = 4; + do { + int i; + char *first = NULL; + int pathlen; + unsigned ce_size; + int dir2 = 0; + int dir3 = 0; + int subfile2 = file2; + int subfile3 = file3; + struct tree_entry_list **subposns = NULL; + struct cache_entry **src = NULL; + char *cache_name = NULL; + + /* Find the first name in the input. */ + + /* Check the cache */ + if (merge && *indpos < active_nr) { + /* This is a bit tricky: */ + /* If the index has a subdirectory (with +* contents) as the first name, it'll get a +* filename like "foo/bar". But that's after +* "foo", so the entry in trees will get +* handled first, at which point we'll go into +* "foo", and deal with "bar" from the index, +* because the base will be "foo/". The only +* way we can actually have "foo/bar" first of +* all the things is if the trees don't +* contain "foo" at all, in which case we'll +* handle "foo/bar" without going into the +* directory, but that's fine (and will return +* an error anyway, with the added unknown +* file case. +*/ + + cache_name = active_cache[*indpos]->name; + if (strlen(cache_name) > baselen && + !memcmp(cache_name, base, baselen)) { + cache_name += baselen; + first = cache_name; + } else { + cache_name = NULL; + } + } + + for (i = 0; i < len; i++) { + if (!posns[i]) + continue; + if (!first || strcmp(first, posns[i]->name) > 0) + first = posns[i]->name; + } + /* No name means we're done */ + if (!first) + return 0; + + pathlen = strlen(first); + ce_size = cache_entry_size(baselen + pathlen); + + if (cache_name && !strcmp(cache_name, first)) { + src = xmalloc(sizeof(struct cache_entry *) * + src_size); + memset(src, 0, + sizeof(struct cache_entry *) * + src_size); + src[0] = active_cache[*indpos]; + remove_cache_entry_at(*indpos); + if (emu23) { + // we need this in stage 2 as well as stage 0 + struct cache_entry *copy = + xmalloc(ce_size); + memcpy(copy, src[0], ce_size); + copy->ce_flags = + create_ce_flags(baselen + pathlen, 2); + if (dir2 || file2) { + die("cannot merge index and our head tree"); + } + src[2] = copy; + subfile2 = 1;
[PATCH] Change read-tree to merge before using the index.
Signed-off-by: Daniel Barkalow <[EMAIL PROTECTED]> --- read-tree.c | 522 ++- 1 files changed, 297 insertions(+), 225 deletions(-) d0f45ad81db2e133c49c23bd09c5615da344bb5c diff --git a/read-tree.c b/read-tree.c --- a/read-tree.c +++ b/read-tree.c @@ -5,28 +5,280 @@ */ #include "cache.h" -static int stage = 0; +#include "object.h" +#include "tree.h" + +static int merge = 0; +static int emu23 = 0; static int update = 0; -static int unpack_tree(unsigned char *sha1) +static struct object_list *trees = NULL; + +typedef int (*merge_fn_t)(struct cache_entry **src, + struct cache_entry **dest, + int df_conflicts_2, + int df_conflicts_3); + +static int unpack_trees_rec(struct tree_entry_list **posns, int len, + const char *base, merge_fn_t fn, + int file2, int file3, int *indpos) +{ + int baselen = strlen(base); + int src_size = len + 1; + if (emu23) + src_size++; + if (src_size > 4) + src_size = 4; + do { + int i; + char *first = NULL; + int pathlen; + unsigned ce_size; + int dir2 = 0; + int dir3 = 0; + int subfile2 = file2; + int subfile3 = file3; + struct tree_entry_list **subposns = NULL; + struct cache_entry **src = NULL; + char *cache_name = NULL; + + /* Find the first name in the input. */ + + /* Check the cache */ + if (merge && *indpos < active_nr) { + /* This is a bit tricky: */ + /* If the index has a subdirectory (with +* contents) as the first name, it'll get a +* filename like "foo/bar". But that's after +* "foo", so the entry in trees will get +* handled first, at which point we'll go into +* "foo", and deal with "bar" from the index, +* because the base will be "foo/". The only +* way we can actually have "foo/bar" first of +* all the things is if the trees don't +* contain "foo" at all, in which case we'll +* handle "foo/bar" without going into the +* directory, but that's fine (and will return +* an error anyway, with the added unknown +* file case. +*/ + + cache_name = active_cache[*indpos]->name; + if (strlen(cache_name) > baselen && + !memcmp(cache_name, base, baselen)) { + cache_name += baselen; + first = cache_name; + } else { + cache_name = NULL; + } + } + + for (i = 0; i < len; i++) { + if (!posns[i]) + continue; + if (!first || strcmp(first, posns[i]->name) > 0) + first = posns[i]->name; + } + /* No name means we're done */ + if (!first) + return 0; + + pathlen = strlen(first); + ce_size = cache_entry_size(baselen + pathlen); + + if (cache_name && !strcmp(cache_name, first)) { + src = xmalloc(sizeof(struct cache_entry *) * + src_size); + memset(src, 0, + sizeof(struct cache_entry *) * + src_size); + src[0] = active_cache[*indpos]; + remove_cache_entry_at(*indpos); + if (emu23) { + // we need this in stage 2 as well as stage 0 + struct cache_entry *copy = + xmalloc(ce_size); + memcpy(copy, src[0], ce_size); + copy->ce_flags = + create_ce_flags(baselen + pathlen, 2); + if (dir2 || file2) { + die("cannot merge index and our head tree"); + } + src[2] = copy; + subfile2 = 1;
[PATCH 1/2] Object model additions for read-tree
Adds object_list_append() and a function to get the struct tree from an ent. Signed-off-by: Daniel Barkalow <[EMAIL PROTECTED]> --- object.c | 11 +++ object.h |3 +++ tree.c | 19 +++ tree.h |3 +++ 4 files changed, 36 insertions(+), 0 deletions(-) 49d33c385aa69d17c991300f73e77c6718a2b4a6 diff --git a/object.c b/object.c --- a/object.c +++ b/object.c @@ -184,6 +184,17 @@ struct object_list *object_list_insert(s return new_list; } +void object_list_append(struct object *item, + struct object_list **list_p) +{ + while (*list_p) { + list_p = &((*list_p)->next); + } + *list_p = xmalloc(sizeof(struct object_list)); + (*list_p)->next = NULL; + (*list_p)->item = item; +} + unsigned object_list_length(struct object_list *list) { unsigned ret = 0; diff --git a/object.h b/object.h --- a/object.h +++ b/object.h @@ -41,6 +41,9 @@ void mark_reachable(struct object *obj, struct object_list *object_list_insert(struct object *item, struct object_list **list_p); +void object_list_append(struct object *item, + struct object_list **list_p); + unsigned object_list_length(struct object_list *list); int object_list_contains(struct object_list *list, struct object *obj); diff --git a/tree.c b/tree.c --- a/tree.c +++ b/tree.c @@ -1,5 +1,7 @@ #include "tree.h" #include "blob.h" +#include "commit.h" +#include "tag.h" #include "cache.h" #include @@ -212,3 +214,20 @@ int parse_tree(struct tree *item) free(buffer); return ret; } + +struct tree *parse_tree_indirect(const unsigned char *sha1) +{ + struct object *obj = parse_object(sha1); + do { + if (!obj) + return NULL; + if (obj->type == tree_type) + return (struct tree *) obj; + else if (obj->type == commit_type) + return ((struct commit *) obj)->tree; + else if (obj->type == tag_type) + obj = ((struct tag *) obj)->tagged; + else + return NULL; + } while (1); +} diff --git a/tree.h b/tree.h --- a/tree.h +++ b/tree.h @@ -32,4 +32,7 @@ int parse_tree_buffer(struct tree *item, int parse_tree(struct tree *tree); +/* Parses and returns the tree in the given ent, chasing tags and commits. */ +struct tree *parse_tree_indirect(const unsigned char *sha1); + #endif /* TREE_H */ - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] Reorganize read-tree
I got mostly done with this before Linus mentioned the possibility of having multiple index entries in the same stage for a single path. I finished it anyway, but I'm not sure that we won't want to know which of the common ancestors contributed which, and, if some of them don't have a path, we wouldn't be able to tell. The other advantages I see to this approach are: - it uses the more common parser of tree objects, moving toward having only one (diff-cache still uses read_tree(), however). - it doesn't need to do very complicated things with the index; the original read-tree does a bunch of stuff with an index with a gap in the middle containing obsolete entries. - it uses a much simpler method of finding directory/file conflicts, which is possible because the struct trees represent directories as well as files. - it deals with each path completely before going on to the next one, instead of first dealing with each input tree and then dealing with each path. - it removes a lot of intimate knowledge of the index structure from the program. The general idea is that it figures out what trees you want, and then iterates through the entry lists together, recursing into directories, and calls the merge function with an array of the index entries (not yet added) for the path in each tree; the merge function adds the appropriate things to the index. Note that this set doesn't include calling merge functions with multiple ancestors or remotes; that can be done when we've decided on whether my version of read-tree is worth using. There are various potential refinements, plus removing a bunch of memory leaks, still to do, but I think this is sufficiently close to review. (Refinements: it ought to have two indices in memory, the old and the new, and never modify the old and only append to the new, to simplify things further; it ought to use a sentinal value for the index entry to indicate that there is something in the tree to conflict with there being a file at the given path; the --emu23 logic could be clearer) The first patch adds a few functions to the object library. The second patch changes read-tree around; It is essentially a rewrite, except for the merge functions and main(). -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Comments in read-tree about #nALT
On Sat, 27 Aug 2005, Linus Torvalds wrote: > On Sat, 27 Aug 2005, Daniel Barkalow wrote: > > > > What I missed was that the effect of causes_df_conflict is to give "no > > merge" for the entry, rather than giving an error overall. So I do need an > > equivalent. > > Daniel, > I'm not 100% sure what you're trying to do, but one thing that might work > out is to just having multiple "stage 3" entries with the same pathname. > > We current use 4 stages: > - stage 0 is "resolved" > - stage 1 is "original" > - stage 2 is "one branch" > - stage 3 is "another branch" > > But if we allowed duplicate entries per stage, I think we could easily > just fold stage 2/3 into one stage, and just have entries in stage 2. > That would immediately mean that a three-way merge could be way. > > The only rule would be that when you add a entry to stage 2, you must > always add it after any previous entry that is already in stage 2. That > should be easy. It looks like stage 2 is currently special as the stage that's similar to the index/HEAD/working tree. However, I don't see any problem with entries in stage 3, except that, if you have a non-maximal number of them for some reason, it'll be impossible to determine which came from which tree. > In fact, this extension might even allow us to solve the "multiple merge > base" problem: we could allow multiple entries in "stage 1" too, ie one > entry per merge base (and just collapse identical entries - there's no > ordering involved in stage 1 entries). That's actually the problem I was working on. > So you could merge "n" trees with "m" bases, and all without really > changing the current logic much at all. > > Maybe I'm missing something (like what you're trying to do in the first > place), but this _seems_ doable. I'd be afraid of confusing everything by removing the uniqueness invariant, although I guess not too much does anything with entries in stages other than 0. I probably just don't find the index as intuitive as you do and as the struct tree representation. I'm working on arranging the code to look at each path in sequence, with the input trees as the inner loop, rather than with the loops in the other order; using parse_tree to parse the objects instead of read_tree; and doing trivial merges before putting things in the cache, rather than after. I'd been thinking that this would avoid a limit on the number of stages, since I hadn't considered whether multiple entries for the same path and stage could be allowed. I still think that my order is likely to be easier to understand and involve read-tree relying less on tricky properties of the data structures, but I'll have to get it done before I can say that for sure. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Comments in read-tree about #nALT
On Sat, 27 Aug 2005, Daniel Barkalow wrote: > Okay, so it looks to me like the only cases that care about the contents > of the index, other than in stage 0 (which is effectively another tree, > but already in index-form rather than tree-form), are 2 and 3, and these > only care because read-tree modifies the stage of entries, rather > than removing them and adding a stage-0 replacement entry; if it went > through the add logic without SKIP_DFCHECK, that would reject the same > things that causes_df_conflict rejects (at the point where whichever is > second is done). > > So if I do the merge on tree entries (plus a stage-0 ce for the input from > the index), and then add the result(s) to the cache, I can skip > causes_df_conflict() in favor of just not using SKIP_DFCHECK. Is this > right? What I missed was that the effect of causes_df_conflict is to give "no merge" for the entry, rather than giving an error overall. So I do need an equivalent. > Also, there doesn't actually seem to be a DF test in t1000; I think the > t1005 DF test covers these cases (by the emu23 path into this code). Is > this right? Looks like stuff all over the place fails if causes_df_conflict is messed up, so I should be covered. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Comments in read-tree about #nALT
On Sat, 27 Aug 2005, Junio C Hamano wrote: > Daniel Barkalow <[EMAIL PROTECTED]> writes: > > > Part of threeway_merge, however, wants to search the rest of the cache for > > interfering entries in some cases, which would have to happen differently, > > because I won't have the cache completely filled out beforehand. I'm > > trying to figure out what the comments are talking about, and they seem to > > refer to a list of the possible cases. Is that list somewhere convenient? > > Please look for END_OF_CASE_TABLE in t/t1000-read-tree-m-3way.sh; > the table talks about some of the (ALT) not implemented, but > some of them are ("git whatchanged t/t1000-read-tree-m-3way" > would tell you which). It looks like all of them are implemented: #2ALT, #3ALT, #5ALT, and #14ALT, according to the commit comments, and the others seem from the email you quote to have been done in the process of getting #5ALT. > Two way cases are described in Documentation/git-read-tree.txt, > if you care. If you were not touching the three-way case right > now, I'd move/copy the three way cases there as well, but that > can wait until after your changes. I'd actually like to introduce Documentation/technical/trivial-merge for this stuff; I think it would be good to have documentation for people who need to know how the stuff works, rather than just how to use it, so we get a balance between reams of information that users don't want to wade through and being too vague for future developers. Okay, so it looks to me like the only cases that care about the contents of the index, other than in stage 0 (which is effectively another tree, but already in index-form rather than tree-form), are 2 and 3, and these only care because read-tree modifies the stage of entries, rather than removing them and adding a stage-0 replacement entry; if it went through the add logic without SKIP_DFCHECK, that would reject the same things that causes_df_conflict rejects (at the point where whichever is second is done). So if I do the merge on tree entries (plus a stage-0 ce for the input from the index), and then add the result(s) to the cache, I can skip causes_df_conflict() in favor of just not using SKIP_DFCHECK. Is this right? Also, there doesn't actually seem to be a DF test in t1000; I think the t1005 DF test covers these cases (by the emu23 path into this code). Is this right? -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Merges without bases
On Sat, 27 Aug 2005, Martin Langhoff wrote: > On 8/27/05, Daniel Barkalow <[EMAIL PROTECTED]> wrote: > > The problem with both of these (and doing it in the build system) is that, > > when a project includes another project, you generally don't want whatever > > revision of the included project happens to be the latest; you want the > > revision of the included project that the revision of the including > > project you're looking at matches. That is, if App includes Lib, and > > Exactly - so you do it on a tag, or a commit date with cvs. With Arch, > GIT and others that have a stable id for each commit, you can use that > or the more user-friendly tags. I'm thinking of cases like openssl, openssh, and libcrypto. Openssl and openssh both use libcrypto but not each other (looking at the ldd output, rather than packaging). However, it would be too much of a pain to work directly on libcrypto without working through some other package, because the library doesn't have its own applications. Furthermore, if you're doing much to libcrypto, you're likely doing it in the context of a particular application (say, for example, ssh needs a new cipher that isn't supported for SSL at the time). You'd want to make simultaneous changes to libcrypto to implement the new feature and to openssh to use it; neither can be validated until the other is written, which means that you'll have both projects checked out and dirty (in the cache sense) at the same time, and be building the using project. It would also be good to be able to check in this whole thing through the version control system, rather than partially through a change to the build system. That is, if I change the included libcrypto, commit it, and commit the including openssh, the system as a whole should understand that I want to change which commit of libcrypto gets used. Similarly, it would be good to merge changes into the libcrypto used by openssh with the same procedure used to merge changes to openssh itself, including supporting non-fast-forward when there's a local version in use. (Of course, currently, libcrypto is strictly part of openssl, because it would be too much of a pain with the present version control to make it independant, and openssh depends on openssl, despite not even linking against -lssl, because openssl got libcrypto first.) > The good thing here is that a makefile will know how to handle the > situation if the external lib is hosted in Arch, in SVN, or Visual > SourceSafe. If your external lib is only available as a tarball in a > url, you can fetch that and uncompress it too. Arch configurations are > _cute_ but useless in any but the most narrow cases. Certainly, if it's sufficiently external to be in a different SCM it should be handled by the build system. Actually, if it's even nearly that external, it's probably going to be handled best by requiring people to go get it themselves. I find it odd that you say that the standard approach is to have the build system fetch a version of the included package; my experience is that projects either just report (or fail to report) a dependancy on having the other package or they copy the project into their project. The former means they can't change it (which is generally good, unless it becomes necessary), while the latter causes update problems (c.f. zlib). I think that Arch configurations and the CVS equivalent are, in fact, useless, but that this is only due to implementation being insufficiently clever, not due to the concept being inherently bad; I feel the same way about distributed development under Arch, which is really nice under git, so I have hope that something better could be done. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Comments in read-tree about #nALT
I've gotten to the point of having all of the entries for a given path ready to put into the cache at the same, and now I want to convert the merge functions to take their data directly, rather than in the cache, so that they can take extra entries for extra ancestors. Part of threeway_merge, however, wants to search the rest of the cache for interfering entries in some cases, which would have to happen differently, because I won't have the cache completely filled out beforehand. I'm trying to figure out what the comments are talking about, and they seem to refer to a list of the possible cases. Is that list somewhere convenient? -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC, PATCH] A new merge algorithm (EXPERIMENTAL)
On Fri, 26 Aug 2005, Fredrik Kuivinen wrote: > On Fri, Aug 26, 2005 at 04:48:32PM -0400, Daniel Barkalow wrote: > > On Fri, 26 Aug 2005, Fredrik Kuivinen wrote: > > > > > I will try to describe how the algorithm works. The problem with the > > > usual 3-way merge algorithm is that we sometimes do not have a unique > > > common ancestor. In [1] B and C seems to be equally good. What this > > > algorithm does is to _merge_ the common ancestors, in this case B and > > > C, into a temporary tree lets call it T. It does then use this > > > temporary tree T as the common ancestor for D and E to produce the > > > final merge result. In the case described in [1] this will work out > > > fine and we get a clean merge with the expected result. > > > > The only problem I can see with this is that it's likely to generate > > conflicts between the shared heads, and the user is going to be confused > > trying to resolve them, because the files with the conflicts will be > > missing all of the more recent changes. > > I don't actually think that conflicts between shared heads is a > problem. Given the criss-cross case (we want to merge A and B into M): > > M > |\ > | \ > A B > |\/| > |/\| > C D > | / > |/ > E > > Lets assume there is a merge conflict if we try to merge C and D > (which are the two shared heads). Then both A and B must resolve this > conflict. If they have done it in the same way we wont get a merge > conflict at M, if they have resolved it differently we will get a > merge conflict. In the first case there is no merge conflict at M, in > the second case the user has to pick which one of the two different > resolutions she wants. > > Note that the algorithm will happily write non-clean merge results to > the object database during the "merge shared heads" stage. Hence, when > we are merging C and D "internally" we will _not_ ask the user to > resolve any eventual merge conflicts. Oh, okay, didn't see that part. So the merge for M sees that the old conflict is replaced entire with the common resolution or with a conflict between the different resolutions, but it doesn't report the old conflict anyway, because that section's been replaced in both sides. > > Other than that, I think it should > > give the right answer, although it will presumably involve a lot of > > ancient history doing the internal merge. (Which would probably be really > > painful if you've got two branches that cross-merge regularly and never > > actually completely sync) > > The expensive part is the repeated merging. But as I wrote in my mail > multiple shared heads seems to be pretty uncommon. As far as I can > tell there is no reason for the number of shared heads to increase as > a repository grows larger. However, this do probably depend on usage > patterns. I'd guess that the number of shared heads will increase as the people's usage gets more flexible. If people expected good results, I could see the stable series being mostly done as patches to 2.6.X, which would then be merged into various trees, and these would then be frequent common ancestors in merges. I'd also not be surprised in Linus's tree were abnormally straightforward, due to stuff getting serialized in -mm. > > I'm getting pretty close to having a version of read-tree that does the > > trivial merge portion based comparing the sides against all of the shared > > heads. I think yours will be better for the cases we've identified, giving > > the correct answer for Tony's case rather than reporting a conflict, but > > it's clearly more complicated. I think my changes are worthwhile anyway, > > since they make the merging logic more central, but obviously > > insufficient. > > > > I've been thinking that could be useful to have read-tree figure out the > > history itself, instead of being passed ancestors, in which case it could > > use your method, except more efficiently (and only look further at the > > history when needed). > > It will be interesting to have a look at the code when you are done. > I find the Git architecture with respect to merging to be quite > nice. A core which handles the simple cases _fast_ and let the more > complicated cases be handled by someone else. Right; I'm mostly just trying to get the fast path to not miss cases that are more complicated than they look. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC, PATCH] A new merge algorithm (EXPERIMENTAL)
On Fri, 26 Aug 2005, Fredrik Kuivinen wrote: > I will try to describe how the algorithm works. The problem with the > usual 3-way merge algorithm is that we sometimes do not have a unique > common ancestor. In [1] B and C seems to be equally good. What this > algorithm does is to _merge_ the common ancestors, in this case B and > C, into a temporary tree lets call it T. It does then use this > temporary tree T as the common ancestor for D and E to produce the > final merge result. In the case described in [1] this will work out > fine and we get a clean merge with the expected result. The only problem I can see with this is that it's likely to generate conflicts between the shared heads, and the user is going to be confused trying to resolve them, because the files with the conflicts will be missing all of the more recent changes. Other than that, I think it should give the right answer, although it will presumably involve a lot of ancient history doing the internal merge. (Which would probably be really painful if you've got two branches that cross-merge regularly and never actually completely sync) I'm getting pretty close to having a version of read-tree that does the trivial merge portion based comparing the sides against all of the shared heads. I think yours will be better for the cases we've identified, giving the correct answer for Tony's case rather than reporting a conflict, but it's clearly more complicated. I think my changes are worthwhile anyway, since they make the merging logic more central, but obviously insufficient. I've been thinking that could be useful to have read-tree figure out the history itself, instead of being passed ancestors, in which case it could use your method, except more efficiently (and only look further at the history when needed). -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Merges without bases
On Fri, 26 Aug 2005, Martin Langhoff wrote: > On 8/26/05, Junio C Hamano <[EMAIL PROTECTED]> wrote: > > their core GIT tools come from. But how would _I_ pull from > > that "My Project", if I did not want to pull unrelated stuff in? > > and then... > > > What I think _might_ deserve a bit more support would be a merge > > of a foreign project as a subdirectory of a project. Linus > > tla has an interesting implementation (and horrible name) for > something like this. In Arch-speak, they are called 'configurations', > a versioned control file that describes that in subdirectory foo we > import from this other repo#branch. > > In cvs, you just do nested checkouts, and trust a `cvs update` done at > the top will do the right thing; and in fact recent cvs versions do. The problem with both of these (and doing it in the build system) is that, when a project includes another project, you generally don't want whatever revision of the included project happens to be the latest; you want the revision of the included project that the revision of the including project you're looking at matches. That is, if App includes Lib, and you're looking at an App commit, you want to have the version of Lib that the commit was made with, not the latest version of Lib, which may not be backwards compatible across non-release commits, or, in any case, won't help in reconstructing a earlier state. I think a primary function of a SCM is to be able to say, "It worked last Friday, and it's broken now. What's different?" If the answer is, "On Saturday, we updated the included Lib to their version from Thursday, which is broken", it'll be really hard to track down without special tracking. I think it's the lack of the special tracking, therefore, that makes this not a good feature in most SCMs, and makes them not better than having the build system do it (and potentially worse, if you've got your build system checking out a version specified in a version-controlled file). But I think that git can do better, including support for the required version sometimes being a locally modified one and sometimes being the official one when the local modifications have been accepted upstream. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Looking at multiple ancestors in merge
On Fri, 26 Aug 2005, Junio C Hamano wrote: > Daniel Barkalow <[EMAIL PROTECTED]> writes: > > > I've started this, and have gotten as far as having read-tree accept > 3 > > trees and ignore everything but the last 3. Am I correct in assuming that > > if I break read-tree in any way, some test will fail? > > If some test fails you would know you broke it, but the inverse > is probably not always true. > > I think the current read-tree test suite has reasonably wide > coverage of all the interesting cases. But the definition of > "interesting" was derived from the current world order (IOW, the > test suite was designed around the way we do things right now as > a whitebox test, not a blackbox test). I would not be surprised > if some of them did not catch breakage you may introduce during > the development. Okay; I think the only thing that I'm going to change with respect to how it makes decisions will be with 4+ trees, and those will obviously need new tests, > I wonder however if extending the current way of doing things in > the cache is the right thing. Right now we use two bits out of > the top four bits for recording stage, one bit for the update > bit, so you have only one extra bit to extend the number of > stages, which means you could hold at most 7 trees at once. > > You "ignore things but the last 3", so this may not be too much > of a problem, but I am a bit puzzled what you meant by it > though. Are you talking about reading more than 3 trees and > keeping only the 3 to be merged, discarding the rest, doing the > selection per path? For each path, I intend to look at all the entries and make trivial merge judgements on them, but then only leave the usual stage 2 and stage 3, and a chosen stage 1. The way I'm writing the changes is: In the argument parsing loop, just form a list of the tree objects, and actually read them after the whole list is ready. If there are more than 3, ignore all but the last 3. This lets you give an arbitary number of common ancestors to read-tree, and it won't mess up, but it will only use one of them. I've done this. Next, scan through the tree entry lists for all the trees together, and generate cache entries for the same path in the different trees at the same time. I've written this, but I've got a few bugs, and the 3way merge tests are dutifully failing. Then, I'll do the trivial merge on tree entries rather than cache entries. Finally, I'll extend the trivial merge to use the extra ancestors. Since merge(1) doesn't handle multiple common ancestors, having more than 3 stages in the cache after the trivial merge isn't going to be useful for now. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Looking at multiple ancestors in merge
On Wed, 24 Aug 2005, Daniel Barkalow wrote: > Of course, this is going to take a bit of work, because read-tree > currently puts all of its arguments into the cache and then works on > merging, and taking multiple ancestors requires putting them somewhere > else, because they won't fit in the cache. I've started this, and have gotten as far as having read-tree accept > 3 trees and ignore everything but the last 3. Am I correct in assuming that if I break read-tree in any way, some test will fail? -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Storing state in $GIT_DIR
On Thu, 25 Aug 2005, Junio C Hamano wrote: > Now, among the existing object types, there are only two kinds > of objects you can use for this. If the only thing you need to > record is some textual information with one pointer to git > branch head, then you can use tag that points at the git head, > and store everything else as the tag comment. This is doable > but unwieldy. I don't think this buys you anything, because then the tag needs to be accessible from something, which is the same problem you were trying to solve for the commit. > You could abuse a commit object as well; you store commit > objects (such as the corresponding git branch head) as parent > commits, and put everything else in a tree that is associated > with that commit. If you want to go that way, you could add a new field to commits with minimal effort: you just need to parse it in commit.c, generate it in git-commit-tree (with an option), and pull it in pull.c, and everything should work as far as making the git portion follow the metadata around. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Merges without bases
On Thu, 25 Aug 2005, Junio C Hamano wrote: > One thing that makes me reluctant to recommend this "merging > unrelated projects" business is that I suspect that it makes > things _much_ harder for the upstream project that is being > merged, and should not be done without prior arrangement; Linus > merged gitk after talking with paulus, so that was OK. I'd still like to revive my idea of having projects overlaid on each other, where the commits in the project that absorbed the other project say, essentially, "also include this other commit, but any changes to those files belong to that branch, not this one". That way, Linus could have included gitk in git, but changes to it, even when done in a git working tree, would show up in commits that only include gitk. (git actually can handle this with the alternative index file mechanism that Linus mentioned in a different thread.) Definitely post-1.0, of course. > Suppose the above "My Project" is published, people send patches > for core GIT part to it, and you as the maintainer of that "My > Project" accept those patches. The users of "My Project" would > be happy with the new features and wouldn't care less where > their core GIT tools come from. But how would _I_ pull from > that "My Project", if I did not want to pull unrelated stuff in? With the right info, the tools could be made to automatically generate suitable commits, because those files would be tracked by a separate index file and committed into a separate branch, which would then be reincluded (by reference) in the containing project. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Stgit - patch history / add extra parents
On Thu, 25 Aug 2005, Jan Veldeman wrote: > Daniel Barkalow wrote: > > > I'm not sure how applicable to this situation stgit really is; I see stgit > > as optimized for the case of a patch set which is basically done, where > > you want to keep it applicable to the mainline as the mainline advances. > > Maybe I forgot to mention this: I would also like to have my development > tree split up in a patch stack. The separate patches makes tracking the > mainline a lot easier (conflicts are a lot easier to solve) I just try to keep things in this state sufficiently briefly that it doesn't become a problem. I also split things up into a bunch of branches, rather than into a stack of patches, and only work on parallel development before I've actually got a candidate for a series. > But this would assume that once the patch goes into stgit, it won't > change except when the parent gets updated. I think we will still change > the patches quite a bit and simultanious by a couple of people. The extension I had proposed to stgit should work for this; it would let you version control each patch just like other git projects. I just think it wouldn't work so well before the group has agreed on what patches there are. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Looking at multiple ancestors in merge
On Wed, 24 Aug 2005, A Large Angry SCM wrote: > Daniel Barkalow wrote: > > I'm starting to work on letting the merging process see multiple > > ancestors, and I think it's messy enough that I should actually discuss > > it. > > > > Review of the issue: > > > > It is possible to lost reverts in cases when merging two commits with > > multiple ancestors, in the following pattern: (letters representing blobs > > at some filename, children to the right) > > > > a-b-b-a-? > > \ X / > > a-b-b > > > [Lots of stuff deleted] > > There seems to be a lot of effort being put into auto-magically choosing > the "right" merge in the presence of multiple possible merge bases. > Unfortunately, most (all?) of the proposals are attempting to divine > intent, and so, are guaranteed to be 100% wrong at least some of the time. > > Wouldn't it be better, instead, to detect that current merge being > attempted is ambiguous and require the user to specify the correct merge > base? The alternative is a tool that appears to work all of the time but > does the wrong thing some of the time. My proposal is actually to detect when a merge is ambiguous. In order to determine that, however, you have to evaluate multiple potential outcomes and see if they are actually different. I'm working on an efficient way to do that. Then further work could look into eliminating possibilities when information about the history excludes them. There were two issues in the case that Tony hit: it ignored a potential correct outcome for the merge, and it didn't ignore an outcome which could be demonstrated to be incorrect. The priority is to resolve the first, but things which improve the second or help with solutions to the second are worth understanding. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] Looking at multiple ancestors in merge
I'm starting to work on letting the merging process see multiple ancestors, and I think it's messy enough that I should actually discuss it. Review of the issue: It is possible to lost reverts in cases when merging two commits with multiple ancestors, in the following pattern: (letters representing blobs at some filename, children to the right) a-b-b-a-? \ X / a-b-b You form a branch with unrelated changes, apply a patch in the top line, separately merge both ways, do unrelated development in the bottom line, and revert the patch in the top line. Then you're trying to merge the two lines. There are two candidates for the common ancestor, the two in the second column. If you pick the top one, you get the revert; if you pick the bottom one, you don't. This is a bug, because it ignores the 'a' version due to it being "unchanged", but it actually did change and changed back. Note that the revert is going to also be ignored if there isn't the "X" in the middle of that diagram and the a->b change on the bottom is due to independantly applying the same patch. Users are more likely to expect this, however, than the situation above, where the side that is causing the patch to be included never applied it explicitly at all; it just merged at an unfortunate moment. My theory is that we should handle merges by passing all of the ancestors to read-tree, and having it use the following additions to the rules for trivial merges: - If any of the ancestors matches a side, don't use that side - If you eliminate both side, don't do the trivial merge (The first of these also means that it'll pick the best combination of ancestors for maximizing trivial merges, as a nice side effect; the second means that it'll avoid messing up with reverts when it has a chance of understanding them) If it doesn't do the trivial merge, it just puts the blob from the first listed ancestor in stage 1, rather than trying anything fancy. (As a further improvement, we could actually look through the history for reasons to disregard a similarity, which would determine that there isn't a continuous line of similarity from the recent 'a' to the common ancestor 'a', and therefore that it should be retained; but I'll be satisfied for now with having it just not do the incorrect trivial merge.) Of course, this is going to take a bit of work, because read-tree currently puts all of its arguments into the cache and then works on merging, and taking multiple ancestors requires putting them somewhere else, because they won't fit in the cache. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] undo and redo
On Wed, 24 Aug 2005, Carl Baldwin wrote: > This is interesting. Can a ref be to a tree rather than a commit? And > it still works? I guess it would. I hadn't thought about that. Generally, each subdirectory of refs/ has refs to objects of the same type, and heads/ is commits, but other directories are other things. tags/ is all tag objects, and you could have undo/ be trees. > Will prune preserve any tree mentioned in any file in refs? How does > this work exactly? It keeps any object reachable from an object that there's a ref to in refs. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] undo and redo
On Wed, 24 Aug 2005, Carl Baldwin wrote: > This brings up a good point (indirectly). "git prune" would destroy the > undo objects. I had thought of this but decided to ignore it for the > time being. If you made undo store the tree under refs somewhere, git prune would preserve it. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: baffled again
On Wed, 24 Aug 2005, Linus Torvalds wrote: > Now, if the shared patch hadn't been a patch, but a shared _commit_, then > the thing would have been unambiguous - the shared commit would have been > the merge point, and the revert would have clearly undone that shared > commit. Actually, it was a shared commit (4aec0fb12267718c750475f3404337ad13caa8f5), which was (an ancestor of) a candidate merge point, but wasn't the one selected. Since a different one was chosen, it looked to the 3-way merge like a shared patch (since it ignores the untaken parent in the merges in the history). This should be fixable, but it'll require more cleverness in read-tree. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Query about status of http-pull
On Wed, 24 Aug 2005, Martin Schlemmer wrote: > Hi, > > Recently cogito again say that the rsync method will be deprecated in > future (due to http-pull now supporting pack objects I suppose), but it > seems to me that it still have other issues: > > - > lycan linux-2.6 # git pull origin > Fetching HEAD using http > Getting pack list > error: Couldn't get 0572e3da3ff5c3744b2f606ecf296d5f89a4bbdf: not separate or > in any pack > error: Tried > http://www.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git/objects/05/72e3da3ff5c3744b2f606ecf296d5f89a4bbdf > Cannot obtain needed object 0572e3da3ff5c3744b2f606ecf296d5f89a4bbdf > while processing commit . It looks like pack-c24bb5025e835a3d8733931ce7cc440f7bfbaaed isn't in the pack list. I suspect that updating this file should really be done by anything that creates pack files, because people forget to run the program that does it otherwise and then http has problems. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: baffled again
On Wed, 24 Aug 2005, Junio C Hamano wrote: > [EMAIL PROTECTED] writes: > > > So I have another anomaly in my GIT tree. A patch to > > back out a bogus change to arch/ia64/hp/sim/boot/bootloader.c > > in my release branch at commit > > > > 62d75f3753647656323b0365faa43fc1a8f7be97 > > > > appears to have been lost when I merged the release branch to > > the test branch at commit > > > > 0c3e091838f02c537ccab3b6e8180091080f7df2 > > : siamese; git cat-file commit 0c3e091838f02c537ccab3b6e8180091080f7df2 > tree 61a407356d1e897e0badea552ce69e657cab6108 > parent 7ffacc1a2527c219b834fe226a7a55dc67ca3637 > parent a4cce10492358b33d33bb43f98284c80482037e8 > author Tony Luck <[EMAIL PROTECTED]> 1124808655 -0700 > committer Tony Luck <[EMAIL PROTECTED]> 1124808655 -0700 > > Pull release into test branch > > So I pulled 7ffacc and a4cce1 from your repository and started > digging from there. 7ffacc was the head of "test" branch back > then, and a4cce1 was the head of "release" branch. I checked > out 7ffacc in the repository and pulled a4cce1 into it, using > the GIT with the "optimum merge-base" patch. > > : siamese; git pull . aegl-release > Packing 0 objects > Unpacking 0 objects > > * committish: a4cce10492358b33d33bb43f98284c80482037e8 > refs/heads/aegl-release from . > Trying to find the optimum merge base. > Trying to merge a4cce10492358b33d33bb43f98284c80482037e8 into > 7ffacc1a2527c219b834fe226a7a55dc67ca3637 using > c1ffb910f7a4e1e79d462bb359067d97ad1a8a25. > Simple merge failed, trying Automatic merge > Auto-merging arch/ia64/sn/kernel/io_init.c. > Committed merge db376974c0aebb9e99e5cd0bce21088c6a9d927c > arch/ia64/hp/sim/boot/boot_head.S |2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > It is using c1ffb9 as the merge base. The problematic path > in the three trees involved are: > > : siamese; git ls-tree -r aegl-test-7ffacc1a | grep > arch/ia64/hp/sim/boot/bootloader.c > 100644 blob a7bed60b69f9e8de9a49944e22d03fb388ae93c7 > arch/ia64/hp/sim/boot/bootloader.c > : siamese; git ls-tree -r aegl-release-a4cce1 | grep > arch/ia64/hp/sim/boot/bootloader.c > 100644 blob 51a7b7b4dd0e7c5720683a40637cdb79a31ec4c4 > arch/ia64/hp/sim/boot/bootloader.c > : siamese; git ls-tree -r aegl-c1ffb9 | grep > arch/ia64/hp/sim/boot/bootloader.c > 100644 blob 51a7b7b4dd0e7c5720683a40637cdb79a31ec4c4 > arch/ia64/hp/sim/boot/bootloader.c > > So the file did not change between the merge base and release, > and test had the change. merge-cache picked the one in the test > release. Your guess in the other message hits the mark. > > I wonder what _other_ candidates these two commits have in > common and what would have happened if they were used as the > base instead? > > : siamese; git merge-base -a aegl-test-7ffacc1a aegl-release-a4cce1 > f6fdd7d9c273bb2a20ab467cb57067494f932fa3 > 3a931d4cca1b6dabe1085cc04e909575df9219ae > c1ffb910f7a4e1e79d462bb359067d97ad1a8a25 > > You can check what variant of the file each of these commits > contain. > > What is happening is: > > * the problematic patch 4aec0f is one before 3a931d. Among the > three merge-base candidates, only 3a931d contains teh wrongly > patched version. > > * the problematic change 4aec0f patch introduces is part of test > branch, because it was pulled via release. > > * the tip of release being merged into test has this patch > reverted, and the file is exactly the same as before 4aec0f > patch. > > So three-way trivial merge algorithm says, "hey, the file did > not change between common ancestor and release but it is > different in test, so the one in the test branch must be the > merge result." > > This does not have much to do with which common ancestor > merge-base chooses. Sorry, I am not sure what is the right way > to resolve this offhand. If it picks 3a931d4cca1b6dabe1085cc04e909575df9219ae, it will determine that the file didn't change between that and test, and is different in release, so the one in release must be right. I believe that the hint that something is going on is that different common ancestors give different trivial merges (as opposed to some giving failure and some giving the same result), and resolving it probably involves identifying that that paths from f6f... and c1f... to release don't keep the same blob through the middle, despite having the same ends. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Automatic merge failed, fix up by hand
On Tue, 23 Aug 2005, Junio C Hamano wrote: > Only lightly tested, in the sense that I did only this one case > and nothing else. For a large repository and with complex > merges, "merge-base -a" _might_ end up reporting many > candidates, in which case the pre-merge step to figure out the > best merge base may turn out to be disastrously slow. I dunno. I think it's the right thing to do for now (and what I was going to suggest), and if people find it too slow, we can consider teaching read-tree to take multiple common ancestors and use any of them that gives clear result on a per-file basis. On the other hand, Tony might have hit a bad case with an ill-chosen common ancestor for a patch/revert sequence, and we probably want to look into that if we've got some history that demonstrates the problem. I think that, if there are two common ancestors, one of which has applied a patch and one of which hasn't, and on one side of the merge it gets reverted, we should get the revert, but we'll only get it if we choose the ancestor where it was applied. (Letters are versions of the file, which 'b' being the bad patch; the second column is the two choices for common ancestor) a-b-a-? / X / a-b-b-b Of course, you could have the two lines exactly flipped for a different file in the same commits, or for a different hunk in the same file, and there would be no single choice that doesn't lose the revert. The really right thing to do is identify that there is a b->a transition that is not a trivial merge and that is not beyond a common ancestor, but that's hard to determine easily and with sufficient granularity to catch everything. I still someday want to do a version of diff/merge for git that could select common ancestors on a per-hunk basis and identify block moves and avoid giving confusing (but marginally shorter) diffs, but that's a major undertaking that I don't have time for right now. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Removing deleted files after checkout
On Tue, 23 Aug 2005, Carl Baldwin wrote: > The thing that this doesn't do is remove empty directories when the last > file is deleted. I once expressed the opinion in a previous thread that > directories should be added and removed explicitly in git. (Thus > allowing an empty directory to be added). If this were to happen then > this case would get handled correctly. However, if git stays with the > status quo then I think that git-read-tree -u should be changed to > remove the empty directory. This would make it consistent. I think that git-read-tree -u ought to remove a directory if it removes the last file (or directory) in it. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Stgit - patch history / add extra parents
On Tue, 23 Aug 2005, Jan Veldeman wrote: > Daniel Barkalow wrote: > > > On Tue, 23 Aug 2005, Catalin Marinas wrote: > > > > Something is legitimate as a parent if someone took that commit and did > > something to it to get the new commit. The operation which caused the > > change is not specified. But you only want to include it if anyone cares > > about the parent. > > This is indeed what I thought a parent should be used for. As an adition, > I'll try to explain why I would sometimes want to care about some parents: > > I want to track a mailine tree, but have quite a few changes, which shoudn't > be commited to the mainline immediately (let's call it my development tree). > This is why I would use stgit. But I would also want to colaborate with > other developers on this development tree, so I sometimes want to make > updates available of this development tree to the others. This is where > current stgit falls short. To easily share this development tree, I want > some history (not all, only the ones I choose) of this development tree > included, so that the other developers can easily follow my development. > > The parents which should be visible to the outside, will always be versions > of my development tree, which I have previously pushed out. My way of > working would become: > * make changes, all over the place, using stgit > * still make changes (none of these gets tracked, intermittent versions are > lost) > * having a good day: changes looks good, I want to push this out: > * push my tree out > * stgit-free (which makes the pushed out commits, the new parents of my > stgit patches) > * restart from top I'm not sure how applicable to this situation stgit really is; I see stgit as optimized for the case of a patch set which is basically done, where you want to keep it applicable to the mainline as the mainline advances. For your application, I'd just have a git branch full of various stuff, and then generate clean commits by branching mainline, diffing development against it, cutting the diff down to just what I want to push, and applying that. Then the clean patch goes into stgit. > [...] > > This also depends on how exactly freeze is used; if you use it before > > commiting a modification to the patch without rebasing, you get: > > > > old-top -> new-top > > ^^ > >\ / > > bottom > > > > bottom to old-top is the old patch > > bottom to new-top is the new patch > > old-top to new-top is the change to the patch > > > > Then you want to keep new-top as a parent for rebasings until one of these > > is frozen. These links are not interesting to look at, but preserve the > > path to the old-top:new-top change, which is interesting. > > my proposal does something like this, but a little more: not only does it > keep track of the link between old-top and new-top, it also keeps track of > the links between old-patch-in-between and new-patch-in-between. > (This makes sense when the top is being removed or reordered) I was thinking of this as being the top and bottom commits for a single tracked patch, not as a whole series. I think patches lower wouldn't be affected, and patches higher would see this as a rebase. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Removing deleted files after checkout
On Tue, 23 Aug 2005, Carl Baldwin wrote: > The point is to push and use a post-update hook to do the checkout. So, > this won't be possible. You could have the remote repository be something like "~/git/website.git", and have a hook which does: "cd ~/www; git pull ~/git/website.git/". That is, have three things: the directory where you work on stuff, the central storage location, and the area that the web server serves, and have the storage location automatically update the web server area. That's what I do with my website section that's still in CVS, and the general concept is good (and means that the "real" repository isn't somewhere the web server is poking around). > > which will correctly identify before and after, and remove any files that > > were removed. > > > > Alternatively, you could do, at point 1: > > > > cp .git/refs/master .git/refs/deployed > > git checkout deployed > > How to get a post-update hook to do this? I suppose an update script > could set this up for the post-update to later use. If you have "deployed" checked out, and you push to "master" in the same repository, having the hook do "git resolve deployed master auto-update" should work. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Removing deleted files after checkout
On Tue, 23 Aug 2005, Carl Baldwin wrote: > On Tue, Aug 23, 2005 at 03:43:56PM -0400, Daniel Barkalow wrote: > > On Tue, 23 Aug 2005, Carl Baldwin wrote: > > > > > Hello, > > > > > > I recently started using git to revision control the source for my > > > web-page. I wrote a post-update hook to checkout the files when I push > > > to the 'live' repository. > > > > > > In this particular context I decided that it was important to me to remove > > > deleted files after checking out the new HEAD. I accomplished this by > > > running > > > git-ls-files before and after the checkout. > > > > > > Is there a better way? Could there be some way built into git to easily > > > find out what files dissappear when replacing the current index with one > > > from a new tree? Is there already? The behavior of git should NOT > > > change to delete these files but I would argue that some way should > > > exist to query what files disappeared if removing them is desired. > > > > If you don't use -f, git-checkout-script removes deleted files. Using -f > > tells it to ignore the old index, which means that it can't tell the > > difference between removed files and files that weren't tracked at all. > > Maybe I'm doing something wrong. This does not happen for me. > > I tried a simple test with git v0.99.4... > > cd > mkdir test-git && cd test-git/ > echo testing | cg-init > echo contents > file > git-add-script file > git-commit-script -m 'testing' [point 1] > cd .. > cg-clone test-git/.git/ test-git2 > cd test-git2 > cg-rm file > git-commit-script -m 'testing' > ls > cg-push > cd ../test-git > git-checkout-script Ah, okay. I think "push" and "checkout" don't play that well together; "push" changes the ref, which "checkout" uses to determine what it expects for the old contents, and then it's confused. What you probably actually want is: cd ../test-git git pull ../test-git2 which will correctly identify before and after, and remove any files that were removed. Alternatively, you could do, at point 1: cp .git/refs/master .git/refs/deployed git checkout deployed Then, after the push and cd: git checkout master cp .git/refs/master .git/refs/deployed git checkout deployed because checkout does remove files if you switch from a branch with them (e.g., deployed) to one without them (master, after the push). -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Removing deleted files after checkout
On Tue, 23 Aug 2005, Carl Baldwin wrote: > Hello, > > I recently started using git to revision control the source for my > web-page. I wrote a post-update hook to checkout the files when I push > to the 'live' repository. > > In this particular context I decided that it was important to me to remove > deleted files after checking out the new HEAD. I accomplished this by running > git-ls-files before and after the checkout. > > Is there a better way? Could there be some way built into git to easily > find out what files dissappear when replacing the current index with one > from a new tree? Is there already? The behavior of git should NOT > change to delete these files but I would argue that some way should > exist to query what files disappeared if removing them is desired. If you don't use -f, git-checkout-script removes deleted files. Using -f tells it to ignore the old index, which means that it can't tell the difference between removed files and files that weren't tracked at all. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Stgit - patch history / add extra parents
On Tue, 23 Aug 2005, Catalin Marinas wrote: > > So the point is that there are things which are, in fact, parents, but we > > don't want to list them, because it's not desired information. > > What's the definition of a parent in GIT terms? What are the > restriction for a commit object to be a parent? Can a parent be an > arbitrarily chosen commit? Something is legitimate as a parent if someone took that commit and did something to it to get the new commit. The operation which caused the change is not specified. But you only want to include it if anyone cares about the parent. (For example, I often start with a chunk of work that does multiple things and is committed; I take mainline and generate a series of commits from there. It would be legitimate to list my development commit as a parent of each of these, since I did actually take it and strip out the unrelated changes. This would be a bit confusing in the log, but would make merges between something based on the "messy" version and something based on the "refined" version work well. On the other hand, I don't want to report the existance of the messy version, so I don't include it.) > An StGIT patch is a represented by a top and bottom commit > objects. The bottom one is the same as the parent of the top > commit. The patch is the diff between the top's tree id and the > bottom's tree id. > > Jan's proposal is to allow a freeze command to save the current top > hash and later be used as a second parent for the newly generated > top. The problem I see with this approach is that (even for the > internal view you described) the newly generated top will have two > parents, new-bottom and old-top, but only the diff between new-top and > new-bottom is meaningful. The diff between new-top and old-top (as a > parent-child relation) wouldn't contain anything relevant to the patch > but all the new changes to the base of the stack. Having a useful diff isn't really a requirement for a parent; the diff in the case of a merge is going to be the total of everything that happened elsewhere. The point is to be able to reach some commits between which there are interesting diffs. This also depends on how exactly freeze is used; if you use it before commiting a modification to the patch without rebasing, you get: old-top -> new-top ^^ \ / bottom bottom to old-top is the old patch bottom to new-top is the new patch old-top to new-top is the change to the patch Then you want to keep new-top as a parent for rebasings until one of these is frozen. These links are not interesting to look at, but preserve the path to the old-top:new-top change, which is interesting. Ignoring the links to the corresponding bottoms, the development therefore looks like: local1 -> local2 -> merge -> local3 -> merge ^ ^ ^ mainline>-->->-->-->-> And this is how development is normally supposed to look. The trick is to only include a minimal number of merges. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Stgit - patch history / add extra parents
On Sun, 21 Aug 2005, Jan Veldeman wrote: > Catalin Marinas wrote: > > > > So for example, you only tag (freeze) the history when exporting the > > > patches. When an error is being reported on that version, it's easy to > > > view > > > it and also view the progress that was already been made on those patches. > > > > I agree that it is a useful feature to be able to individually tag the > > patches. The problem is how to do this best. Your approach looks to me > > like it's not following the GIT DAG structure recommendation. Maybe the > > GIT designers could further comment on this but a commit object with > > multiple parents should be a result of a merge operation. A commit with > > a single parent should represent a transition of the tree from one state > > to another. With the freeze command you proposed, a commit with multiple > > parents is no longer a result of a merge operation, but just a > > convenience for tracking the patch history with gitk. > > My interpretation of parents is broader than only merges, and reading the > README file, I believe it also the intension to do so (snippet from README > file): > > A "commit" object ties such directory hierarchies together into > a DAG of revisions - each "commit" is associated with exactly one tree > (the directory hierarchy at the time of the commit). In addition, a > "commit" refers to one or more "parent" commit objects that describe the > history of how we arrived at that directory hierarchy. One factor not mentioned there is that, as things move upstream, we often want to discard a lot of history; if someone commits constantly to deal with editor malfunction or something, we don't really want to take all of this junk into the project history when it is cleaned up and accepted. So the point is that there are things which are, in fact, parents, but we don't want to list them, because it's not desired information. Probably the right thing is to have two views of the stack: the internal view, showing what actually happened, and the external view, showing what would have happened if the developers had done everything right the first time. When you make changes to the series, this adds to the internal view and entirely replaces the external view. I think that users will also want to discard the commits from the stack before rebasing in favor of the commits after, because (a) rebasing isn't all that interesting, especially if there's minimal merging, and (b) otherwise you'd get a ton of boring commits that obscure the interesting ones. I think that the best rule would be that, when you modify a patch, the previous version is the new version's parent, and when you rebase a series, you include as a parent any parent of the input that isn't also in the input (but never include the input itself as a parent of the output; the point of rebasing is to pretend that it was the newer mainline that you modified). This should mean that the internal history of a patch consists of the present version, based on each version that was replaced due to changing the patch rather than rebasing it. Of course, there's an interesting situation with the commits earlier in a series from a patch that was changed not being ancestors of the newer versions of those patches (because they weren't interesting in the development of those patches) but accessible as the commits that an interesting patch was based on. A possible solution is just to consider the revision of any patch a significant event in the history of the whole stack, causing all of the patches to get a new retained version. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Importing from a patch-oriented SCM
On Fri, 19 Aug 2005, Martin Langhoff wrote: > On 8/19/05, Junio C Hamano <[EMAIL PROTECTED]> wrote: > > Martin Langhoff <[EMAIL PROTECTED]> writes: > > > > > If I remember correctly, Junio added some stuff in the merge & rebase > > > code that will identify if a particular patch has been seen and > > > applied, and skip it even if it's a bit out of order. But I don't know > > > > I think you are talking about git-patch-id. > > Is this used at commit time, and stored somewhere (doesn't seem to be) > or do you select older patches from the destination branch at merge > time? If a patch is applied verbatim, or a merge results in no conflicts (i.e., only offsets), then you can run git-patch-id on the diff caused by it and compare the result with the git-patch-id of the diff caused by your local change to see if you've found it. Of course, if there was any modification to the patch or a conflict was resolved, you won't see a match, but that's plausibly correct anyway: you don't know whether the content change that resulted from your patch really matched the change you wanted to make. > If you only compare patches since the last merge, patches that were > merged but somehow unreported will fall into a black hole and cause a > conflict going forward anyway. Hmm. That seems to be a problem I > won't be able to avoid if merges happen out-of-order. They might cause conflicts, but they're relatively unlikely to require manual intervention, because the merging mechanism in git is stronger than the one in arch (by virtue of identifying a common ancestor), and will recognize when a section of changes made by both sides is the same and produce a warning rather than a conflict. That's how the rebase stuff can identify that your rebased patch is empty (when upstream applies your patch): the content change that it would make has been made. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Merge conflicts as .rej .orig files
On Fri, 19 Aug 2005, Martin Langhoff wrote: > After using arch for a while, I've gotten used to getting .rej and > .orig files instead of big ugly conflict markers inside the file. > Emacs has a nice 'diff' mode that is a boon when dealing with > conflicts this way. > > Is there a way to convince cogito/git to leave reject files around? > What utility is git using to do the merges? Or at least: where should > I look? I believe you should be able to get that effect by having a version of "git-merge-one-script" that does "diff -c $2 $3 | patch $1" or "diff -c $2 $1 | patch $3", depending on which you want as the orig. (Or something like that. I'm not sure exactly how to get the conflict files out of the script and into the right place, or the arguments it gets.) Of course, you'll probably have more conflicts to deal with, because the merging code gets less information that way. (In particular, you'll lose the "already contains changes" behavior, so you'll be unhappy if you have patches merged upstream.) -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Subject: [PATCH] Updates to glossary
On Thu, 18 Aug 2005, Johannes Schindelin wrote: > tree object:: > - An object containing a list of blob and/or tree objects. > - (A tree usually corresponds to a directory without > - subdirectories). > + An object containing a list of file names and modes along with refs > + to the associated blob and/or tree objects. A tree object is > + equivalent to a directory. Actually, it contains object names, not refs, to be completely precise. (refs would imply an additional indirection.) -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: First stab at glossary
On Wed, 17 Aug 2005, Johannes Schindelin wrote: > Hi, > > On Wed, 17 Aug 2005, Daniel Barkalow wrote: > > > On Wed, 17 Aug 2005, Johannes Schindelin wrote: > > > > > object name:: > > > Synonym for SHA1. > > > > Have we killed the use of the third term "hash" for this? I'd say that > > "object name" is the standard term, and "SHA1" is a nickname, if only > > because "object name" is more descriptive of the particular use of the > > term. > > Okay for "hash". I think we only need at most two names for this, so this is more a matter of fixing old usage than documenting it. > > I think we might want to entirely kill the "cache" term, and talk only > > about the "index" and "index entries". Of course, a bunch of the code will > > have to be renamed to make this completely successful, but we could change > > the glossary and documentation, and mention "cache" and "cache entry" as > > old names for "index" and "index entry" respectively. > > For me, "index" is just the file named "index" (holding stat data and a > ref for each cache entry). That is why I say an "index" contains "cache > entries", not "index entries" (wee, that sounds wrong :-). Well, it often contains information not present anywhere else (the status of a merge; the set of files being committed, added, or removed), so it isn't really a cache at all. > > > working tree:: > > > The set of files and directories currently being worked on. > > > Think "ls -laR" > > > > This is where the data is actually in the filesystem, and you can edit and > > compile it (as opposed to a tree object or the index, which semantically > > have the same contents, but aren't presented in the filesystem that way). > > Maybe I was too cautious. Linus very new idea was to think of the lowest > level of an SCM as a file system. But I did not want to mention that. > Thinking of it again, maybe I should. You probably don't need to mention that tree objects and index files can be thought of as filesystems, but you should specify that the working tree really is in the Unix filesystem, in case people have heard of the idea. It should be clear to say 'You can "cd" there and "ls" to list your files.', rather than 'Think "ls -laR"' which makes my think of the output, which is like the output from git-ls-files. > > > checkout:: > > > > Move after "revision"? > > Ultimately, the glossary terms will be sorted alphabetically. If you look > at the file attached to my original mail, this is already sorted and > marked up using asciidoc. However, I wanted you and the list to understand > how I grouped terms. The asciidoc'ed file is generated by a perl script. Ah, okay. > > > resolve:: > > > The action of fixing up manually what a failed automatic merge > > > left behind. > > > > "Resolve" is also used for the automatic case (e.g., in > > "git-resolve-script", which goes from having two commits and a message to > > having a new commit). I'm not sure what the distinction is supposed to be. > > I did not like that naming anyway. In reality, git-resolve-script does not > resolve anything, but it merges two revisions, possibly leaving something > to resolve. Right; I think we should change the name of the script. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: First stab at glossary
On Wed, 17 Aug 2005, Johannes Schindelin wrote: > Hi, > > long, long time. Here?s my first stab at the glossary, attached the > alphabetically sorted, asciidoc marked up txt file (Comments? > Suggestions? Pizzas?): > > object:: > The unit of storage in GIT. It is uniquely identified by > the SHA1 of its contents. Consequently, an object can not > be changed. > > SHA1:: > A 20-byte sequence (or 41-byte file containing the hex > representation and a newline). It is calculated from the > contents of an object by the Secure Hash Algorithm 1. It's also often 40-character string (with whatever termination) in places like commit objects, tag objects, command-line arguments, listings, and so forth. > object database:: > Stores a set of "objects", and an individial object is identified > by its SHA1 (its ref). The objects are either stored as single > files, or live inside of packs. > > object name:: > Synonym for SHA1. Have we killed the use of the third term "hash" for this? I'd say that "object name" is the standard term, and "SHA1" is a nickname, if only because "object name" is more descriptive of the particular use of the term. > blob object:: > Untyped object, i.e. the contents of a file. This "i.e." should be "e.g.", since symlink targets are also stored as blobs, and any other bulk data stored by itself would be. (IIRC, Junio has a tagged blob to hold his public key, for example) > tree object:: > An object containing a list of blob and/or tree objects. > (A tree usually corresponds to a directory without > subdirectories). > > tree:: > Either a working tree, or a tree object together with the > dependent blob and tree objects (i.e. a stored representation > of a working tree). > > cache:: > A collection of files whose contents are stored as objects. > The cache is a stored version of your working tree. Well, can > also contain a second, and even a third version of a working > tree, which are used when merging. > > cache entry:: > The information regarding a particular file, stored in the index. > A cache entry can be unmerged, if a merge was started, but not > yet finished (i.e. if the cache contains multiple versions of > that file). > > index:: > Contains information about the cache contents, in particular > timestamps and mode flags ("stat information") for the files > stored in the cache. An unmerged index is an index which contains > unmerged cache entries. I think we might want to entirely kill the "cache" term, and talk only about the "index" and "index entries". Of course, a bunch of the code will have to be renamed to make this completely successful, but we could change the glossary and documentation, and mention "cache" and "cache entry" as old names for "index" and "index entry" respectively. > working tree:: > The set of files and directories currently being worked on. > Think "ls -laR" This is where the data is actually in the filesystem, and you can edit and compile it (as opposed to a tree object or the index, which semantically have the same contents, but aren't presented in the filesystem that way). > directory:: > The list you get with "ls" :-) > > checkout:: > The action of updating the working tree to a revision which was > stored in the object database. Move after "revision"? > revision:: > A particular state of files and directories which was stored in > the object database. It is referenced by a commit object. > > commit:: > The action of storing the current state of the cache in the > object database. The result is a revision. > > commit object:: > An object which contains the information about a particular > revision, such as parents, committer, author, date and the > tree object which corresponds to the top directory of the > stored revision. Move "parent" around here. > changeset:: > BitKeeper/cvsps speak for "commit". Since git does not store > changes, but states, it really does not make sense to use > the term "changesets" with git. > > ent:: > Favorite synonym to "tree-ish" by some total geeks. Move after "tree-ish". > head:: > The top of a branch. It contains a ref to the corresponding > commit object. > > branch:: > A non-cyclical graph of revisions, i.e. the complete history of > a particular revision, which does not (yet) have children, which > is called the branch head. The branch heads are stored in > $GIT_DIR/refs/heads/. A branch head might have children, if they're in another branch. (E.g., I pull mainline, make a new branch based on it, and commit a change; the head of mainline is still a branch head, even though it's the parent of my new commit, because my new commit isn't in mainline.) > ref:: > A 40-byte hex representation of a SHA
Re: [RFC PATCH] Add support for figuring out where in the git archive we are
On Tue, 16 Aug 2005, Linus Torvalds wrote: > If you use the GIT_DIR environment variable approach, it assumes that all > filenames you give it are absolute and acts the way it always did before. > > Comments? Like? Dislike? I'm all in favor, at least in the general case. I suspect there'll be some things where we have to discuss the behavior, but we can argue that when it comes up. I think, slightly before 1.0, we should sort the library functions into a new set of object files with matching header files, because "setup" is not really distinctive, and there's at least one duplicate implementation (the ssh subprocess code in your connect.c is the same as my rsh.c in what it does, although yours uses two pipes and mine uses a socket). -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Patches exchange is bad?
On Tue, 16 Aug 2005, Marco Costalba wrote: > Martin Langhoff wrote: > > >>From what I understand, you'll want the StGIT infrastructure. If you > >use git/cogito, there is an underlying assumption that you'll want > >all the patches merged across, and a simple cg-update will bring in > >all the pending stuff. > > > > My concerns are both metodologicals and practical: > > 1) Method: To use the 'free patching workflow' on git is something foreseen in > git design, something coherent with the fork + develop + merge cycle that it > seems, at least to me, THE way git is meant to be used. Or it is stretching > the possibility of the tool to something technically allowed but not > suggested. Patches are definitely meant to be part of how git is used; they are the primary way of getting clean history out of messy history (that is, saving a content change while discarding some history that isn't applicable). There's relatively little support in git itself, but that's because the point is to go outside the system's tracking. There have been various discussions of more explicit support, and nobody's been able to come up with a need. > 2) Practical: The round trip git-format-patch + git-applymbox is the logical > and > natural way to reach this goal or, also in this case, I intend to stretch > some tools, > designed for one thing, for something else? I'd guess that git-diff-tree + git-apply (without the rest of the scripting) would be more effective when you're not doing anything with the intermediate files, since it saves doing a bunch of formatting and parsing. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Alternate object pool mechanism updates.
On Tue, 16 Aug 2005, Linus Torvalds wrote: > Finally, I have to say that that "info" directory is confusing. Namely, > there's two of them - the "git info" and the "object info" directories are > totally different directories - maybe logical, but to me it smells like > "info" is here a code-name for "misc files that don't make sense anywhere > else". > > What this all is leading up to is that I think we'd be better off with a > totally new "git config" file, in ".git/config", and we'd have all the > startup configuration there. Including things like alternate object > directories, perhaps standard preferences for that particular repo, and > things like the "grafts" thing. > > Wouldn't that be nice? I'd originally proposed the .git/info directory because I keep multiple working trees for the same repository, by having symlinks for .git/objects and .git/refs, and I could also get other per-repository things to be shared properly without knowing exactly what they are if they're in a subdirectory of .git that could be a symlink. This would mean that a ".git/config" would be per-working-tree, like .git/index or .git/HEAD, not pre-repository like ".git/info/config". Of course, the core didn't have any thing to go in .git/info at the time, so it didn't really get tacked down. (I find it convenient to have mainline and my latest work both checked out for reference while I'm generating a series of commits for a patch set, and I don't want three different repositories which could be out of sync; this also keeps the repository safely out of pwd, since I have the actual repositories as ~/git/{project}.git/) -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git 1.0 Synopis (Draft v4)
On Tue, 16 Aug 2005, Johannes Schindelin wrote: > Hi, > > On Tue, 16 Aug 2005, Junio C Hamano wrote: > > > - Are all the files in Documentation/ reachable from git(7) > > or otherwise made into a standalone document using asciidoc > > by the Makefile? I haven't looked into documentation > > generation myself (I use only the text files as they are); > > help to update the Makefile by somebody handy with asciidoc > > suite is greatly appreciated here. > > > > Volunteers? > > The attached script reveals: > > git-unpack-objects.txt is not reachable from git.txt > git-cvsimport-script.txt is not reachable from git.txt > git-send-email-script.txt is not reachable from git.txt > git-rename-script.txt is not reachable from git.txt > tutorial.txt is not reachable from git.txt > git-show-index.txt is not reachable from git.txt > cvs-migration.txt is not reachable from git.txt > diffcore.txt is not reachable from git.txt > git-ls-remote-script.txt is not reachable from git.txt > git-apply.txt is not reachable from git.txt > git-diff-stages.txt is not reachable from git.txt > pack-protocol.txt is not reachable from git.txt The ones that don't start with git probably don't belong in the same set; perhaps there should be a "technical" (or something similar but shorter) subdirectory for developer documentation instead of user documentation? (And tutorial and cvs-migration can move to howto) -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git 1.0 Synopis (Draft v4)
On Tue, 16 Aug 2005, Junio C Hamano wrote: > Daniel Barkalow <[EMAIL PROTECTED]> writes: > > > It might be worth putting the list of things left to do before 1.0 in the > > tree (since they clearly covary), and it would be useful to know what > > you're thinking of as preventing the release at any particular stage. > > Yeah, yeah. Call me lazy. > > Excerpts from my "last mile to 1.0", my Itchlist, and pieces from > random other messages since then. > > - Documentation. [I really need help here --- among ~7000 lines > there, I've written around 2500 lines, David Greaves another > 2500, and Linus 1400. And it is not very easy to proofread > what you wrote yourself.] I'm not sure how done this can actually get before some sort of feature freeze; the best ways to do things keeps changing as more convenient ways are added. Once the new stuff is diverted to post-1.0, I'd be interested in going through it. > - git prune and git fsck-cache; think about their interactions > with an object database that borrows from another. This > includes the case where .git/objects itself is symlinked to > somewhere else (i.e. running "git prune" that somewhere else > without consulting this repository would lose objects), and > alternates pointing at somewhere else (i.e. ditto). It should be fine, but only if .git/refs is symlinked to the matching place; this gives you the same repository with multiple working trees. Having refs/ and objects/ directories that aren't always together would be much less safe. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Support packs in local-pull
If it doesn't find an object, it looks for an index that contains it and uses the same methods on that instead. Signed-off-by: Daniel Barkalow <[EMAIL PROTECTED]> --- local-pull.c | 112 +++--- 1 files changed, 91 insertions(+), 21 deletions(-) aafbc7fb9ae059b9c9afa42e8d2c0548ea960f9f diff --git a/local-pull.c b/local-pull.c --- a/local-pull.c +++ b/local-pull.c @@ -15,34 +15,54 @@ void prefetch(unsigned char *sha1) { } -int fetch(unsigned char *sha1) +static struct packed_git *packs = NULL; + +void setup_index(unsigned char *sha1) { - static int object_name_start = -1; - static char filename[PATH_MAX]; - char *hex = sha1_to_hex(sha1); - const char *dest_filename = sha1_file_name(sha1); + struct packed_git *new_pack; + char filename[PATH_MAX]; + strcpy(filename, path); + strcat(filename, "/objects/pack/pack-"); + strcat(filename, sha1_to_hex(sha1)); + strcat(filename, ".idx"); + new_pack = parse_pack_index_file(sha1, filename); + new_pack->next = packs; + packs = new_pack; +} - if (object_name_start < 0) { - strcpy(filename, path); /* e.g. git.git */ - strcat(filename, "/objects/"); - object_name_start = strlen(filename); +int setup_indices() +{ + DIR *dir; + struct dirent *de; + char filename[PATH_MAX]; + unsigned char sha1[20]; + sprintf(filename, "%s/objects/pack/", path); + dir = opendir(filename); + while ((de = readdir(dir)) != NULL) { + int namelen = strlen(de->d_name); + if (namelen != 50 || + strcmp(de->d_name + namelen - 5, ".pack")) + continue; + get_sha1_hex(sha1, de->d_name + 5); + setup_index(sha1); } - filename[object_name_start+0] = hex[0]; - filename[object_name_start+1] = hex[1]; - filename[object_name_start+2] = '/'; - strcpy(filename + object_name_start + 3, hex + 2); + return 0; +} + +int copy_file(const char *source, const char *dest, const char *hex) +{ if (use_link) { - if (!link(filename, dest_filename)) { + if (!link(source, dest)) { pull_say("link %s\n", hex); return 0; } /* If we got ENOENT there is no point continuing. */ if (errno == ENOENT) { - fprintf(stderr, "does not exist %s\n", filename); + fprintf(stderr, "does not exist %s\n", source); return -1; } } - if (use_symlink && !symlink(filename, dest_filename)) { + if (use_symlink && !symlink(source, dest)) { pull_say("symlink %s\n", hex); return 0; } @@ -50,25 +70,25 @@ int fetch(unsigned char *sha1) int ifd, ofd, status; struct stat st; void *map; - ifd = open(filename, O_RDONLY); + ifd = open(source, O_RDONLY); if (ifd < 0 || fstat(ifd, &st) < 0) { close(ifd); - fprintf(stderr, "cannot open %s\n", filename); + fprintf(stderr, "cannot open %s\n", source); return -1; } map = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, ifd, 0); close(ifd); if (map == MAP_FAILED) { - fprintf(stderr, "cannot mmap %s\n", filename); + fprintf(stderr, "cannot mmap %s\n", source); return -1; } - ofd = open(dest_filename, O_WRONLY | O_CREAT | O_EXCL, 0666); + ofd = open(dest, O_WRONLY | O_CREAT | O_EXCL, 0666); status = ((ofd < 0) || (write(ofd, map, st.st_size) != st.st_size)); munmap(map, st.st_size); close(ofd); if (status) - fprintf(stderr, "cannot write %s\n", dest_filename); + fprintf(stderr, "cannot write %s\n", dest); else pull_say("copy %s\n", hex); return status; @@ -77,6 +97,56 @@ int fetch(unsigned char *sha1) return -1; } +int fetch_pack(unsigned char *sha1) +{ + struct packed_git *target; + char filename[PATH_MAX]; + if (setup_indices()) + return -1; + target = find_sha1_pack(sha1, packs); + if (!target) + return error("Couldn't find %s: not separate or in
[PATCH] Add function to read an index file from an arbitrary filename.
Note that the pack file has to be in the usual location if it gets installed later. Signed-off-by: Daniel Barkalow <[EMAIL PROTECTED]> --- cache.h |2 ++ sha1_file.c | 10 -- 2 files changed, 10 insertions(+), 2 deletions(-) 59e5c6d163edae5da6136560d48a4750cceacdc6 diff --git a/cache.h b/cache.h --- a/cache.h +++ b/cache.h @@ -319,6 +319,8 @@ extern int get_ack(int fd, unsigned char extern struct ref **get_remote_heads(int in, struct ref **list, int nr_match, char **match); extern struct packed_git *parse_pack_index(unsigned char *sha1); +extern struct packed_git *parse_pack_index_file(unsigned char *sha1, + char *idx_path); extern void prepare_packed_git(void); extern void install_packed_git(struct packed_git *pack); diff --git a/sha1_file.c b/sha1_file.c --- a/sha1_file.c +++ b/sha1_file.c @@ -476,12 +476,18 @@ struct packed_git *add_packed_git(char * struct packed_git *parse_pack_index(unsigned char *sha1) { + char *path = sha1_pack_index_name(sha1); + return parse_pack_index_file(sha1, path); +} + +struct packed_git *parse_pack_index_file(unsigned char *sha1, char *idx_path) +{ struct packed_git *p; unsigned long idx_size; void *idx_map; - char *path = sha1_pack_index_name(sha1); + char *path; - if (check_packed_git_idx(path, &idx_size, &idx_map)) + if (check_packed_git_idx(idx_path, &idx_size, &idx_map)) return NULL; path = sha1_pack_name(sha1); - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] Fix local-pull on packed repository
This adds essentially the same logic to local-pull that http-pull has, with the exception that it reads the index out of the source directory, rather than copying it. This, in turn, requires the ability to use an index file in some other directory. 1: Use index file in another directory 2: Copy/link/symlink pack files as appropriate -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Cloning speed comparison
On Mon, 15 Aug 2005, Junio C Hamano wrote: > Daniel Barkalow <[EMAIL PROTECTED]> writes: > > > I should be able to get http-pull down to the neighborhood of > > (current) ssh-pull; http-pull is that slow (when the source repository > > isn't packed) because it's entirely sequential, rather than overlapping > > requests like ssh-pull now does. > > I like those prefetch() and process() code in pull.c very much. > > I have been wondering if increasing parallelism more by > prefetching beyond the immediate parents of the current commit, > in "if (get_history)" part of process_commit(). Maybe it is not > worth it because doing a commit, its associated tree(s) and its > parents would already give us enough parallelism already. It is actually already maxing out the parallelism; it has a FIFO of objects which it needs, and calls prefetch() when it enqueues an object and fetch() when it dequeues it. It only cares about the dependancies for this purpose, not the types. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git 1.0 Synopis (Draft v4)
On Mon, 15 Aug 2005, Junio C Hamano wrote: > Ryan Anderson <[EMAIL PROTECTED]> writes: > > > I was waiting until you said, "Ok, 1.00 tomorrow morning" > > Makes sense. There would be some weeks until that happens I am > afraid. It might be worth putting the list of things left to do before 1.0 in the tree (since they clearly covary), and it would be useful to know what you're thinking of as preventing the release at any particular stage. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Cloning speed comparison
On Sat, 13 Aug 2005, Petr Baudis wrote: > Hello, > > I've wondered how slow the protocols other than rsync are, and the > (well, a bit dubious; especially wrt. caching on the remote side) > results are: > > git clone-pack:ssh 25s > git rsync 27s > git http-pull 47s > git dumb-http 54s > git ssh-pull660s > > cogito clone-pack:ssh 35s (!) > cogito rsync 140s > cogito ssh-pull480s > cogito http-pull extrapolated to about an hour! I should be able to get http-pull down to the neighborhood of (current) ssh-pull; http-pull is that slow (when the source repository isn't packed) because it's entirely sequential, rather than overlapping requests like ssh-pull now does. I should also be able to get ssh-pull down to the area of clone-pack, but that's lower-priority, since there's clone-pack. (I've written an untested patch for local-pull, which I'll be testing, cleaning, and submitting tonight, assuming my newly-arrived monitor actually works) > PS: > With the latest git version as of time of writing this: > $ time cg-clone git+ssh://[EMAIL PROTECTED]/home/pasky/WWW/dev/git/.g > cogito > ... > progress: 5759 objects, 10292457 bytes > $ time cg-clone http://localhost/~pasky/dev/git/.g cogito > ... > progress: 8681 objects, 14881571 bytes I've noticed that ssh connections don't actually disconnect at the end with recent versions of ssh sometimes. In my experience, this occasionally happens with git, but always happens with scp, suggesting that it's an ssh bug of some sort; I've also only noticed this with openssh 3.9_p1 with some of Gentoo's -r2 patches. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [OT?] git tools at SourceForge ?
On Sat, 13 Aug 2005, Martin Langhoff wrote: > >Yes, developers can just merge with each other directly > > I take it that you mean an exchange of patches that does not depend on > having public repos. What are the mechanisms available on that front, > other than patchbombs? If each developer has a trivial web server, they can put their output there, and everyone else can pull from it, because it only needs to serve static files out of a directory structure that the programs create regularly. Of course, this is only strictly different from a public repo in that you don't advertize it beyond the other developers. But it's a within-system equivalent to posting a link to a web-hosted patch set, which people sometimes do to pass things around. > > And so I'd be thrilled to have some site like SF support it. > > Eduforge's charter is to host education-related projects, so that's > not a free-for-all-comers, but I'm considering git support, as our > usage of git is growing. If you contribe the git support to gforge, presumably similar hosting sites will pick it up before too long. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [OT?] git tools at SourceForge ?
On Fri, 12 Aug 2005, Linus Torvalds wrote: > And it's possible that git usage won't expand all that much either. But > quite frankly, I think git is a lot better than CVS (or even SVN) by now, > and I wouldn't be surprised if it started getting some use outside of the > git-only and kernel projects once people start getting more used to it. > And so I'd be thrilled to have some site like SF support it. I certainly think it's going to happen; it's just not surprising that it hasn't happened yet. Once there's a stable release and some publicity, I'd expect SF to see it as worthwhile. But a hosting site with git-only shell access needs to know what the necessary programs are going to be, which we haven't committed to quite yet. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [OT?] git tools at SourceForge ?
On Fri, 12 Aug 2005, Wolfgang Denk wrote: > This is somewhat off topic here, so I apologize, but I didn't know > any better place to ask: > > Has anybody any information if SourceForge is going to provide git / > cogito / ... for the projects they host? I asked SF, and they openend > a new Feature Request (item #1252867); the message I received sounded > as if I was the first person on the planet to ask... > > Am I really alone with this? The git architecture makes the central server less important, and it's easy to run your own. Also, kernel.org is providing space to a set of people with a large overlap with git users, since git hasn't been particularly publicized and kernel.org is hosting git. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Add "--sign" option to git-format-patch-script
On Fri, 12 Aug 2005, Junio C Hamano wrote: > Good intentions, but I'd rather see these S-O-B lines in the > actual commit objects. Giving format-patch this option would > discourage people to do so. Maybe a patch to git commit would > be more appropriate, methinks. Maybe also something in format-patch to check that the commit has one? I, at least, tend to have unsigned commits for tracking stuff I've done but not cleaned up and signed ones that I want to send off as patches. I've confused the branches on occasion, although never when sending stuff, and it would be nice to have format-patch tell you if the commit didn't look right. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Re: git-http-pull broken in latest git
On Thu, 11 Aug 2005, Junio C Hamano wrote: > Daniel Barkalow <[EMAIL PROTECTED]> writes: > > > Petr Baudis <[EMAIL PROTECTED]> writes: > >> Yes, but cg-clone doesn't - it naively depended on the core git tools > >> actually, er.. working. ;-) > > Sorry about that. I used to have a wrapper to deal with packs > around http-pull before Daniel's pack enhancement, and yanking > it before really checking that enhanced http-pull actually > worked was my fault as well. It was actually the patches after the http-pull fixes (the ones for parallelizing pull.c) that broke things; one advantage to fixing local-pull would be that you can set up tests for it reasonably effectively, which would have caught the regression. > > At some point, I have to revisit getting git-ssh-* to generate exactly the > > required pack and transfer that, but that's an efficiency issue, not a > > correctness one, and shouldn't be relevant to the problem you're having. > > Wouldn't enhancing ssh-push to generate packs on the fly involve > reinventing send-pack and/or upload-pack? The idea is that you wouldn't have to identify what situation applied yourself; you could just invoke git-ssh-pull/git-ssh-push, and it would happen faster due to the compression benefits. The point is that scripts can just pick which git-*-pull to use based on the format of the remote branch address, without variation in behavior. > The same thing can be said about local-pull to a lesser degree. > Lesser because people, including Pasky who said so on the list > recently, would like its hard-linking behaviour, and its not > exploding the existing packs, which send-pack and upload-pack > would not give. So I would rate local-pull higher than > ssh-push/pull on the priority scale if I were doing them. This is a higher priority, but writing more than bugfixes is unpleasent at the moment due to my home workstation's monitor dying, so it'll probably be next week that I'll get to it. The git-ssh-* stuff is longer-term, since it works now, and isn't even all that slow with the overlapping requests. You could, actually, probably do the local-pull fix if you wanted. I seem to recall that being your code originally; you just need to have fetch() identify that an object is in a pack, copy/link/symlink the index and pack instead of the object file, and add the pack to the list of registered packs. I've mostly been failing to deal with reading an index file that is in some directory that hasn't been registered as somewhere to read from (i.e. the source repository). -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Re: git-http-pull broken in latest git
On Fri, 12 Aug 2005, Petr Baudis wrote: > Dear diary, on Fri, Aug 12, 2005 at 01:21:46AM CEST, I got a letter > where Junio C Hamano <[EMAIL PROTECTED]> told me that... > > Petr Baudis <[EMAIL PROTECTED]> writes: > > > > > $ git-cat-file commit bf570303153902ec3d85570ed24515bcf8948848 | grep tree > > > tree 41f10531f1799bbb31a1e0f7652363154ce96f45 > > > $ git-read-tree 41f10531f1799bbb31a1e0f7652363154ce96f45 > > > fatal: failed to unpack tree object > > > 41f10531f1799bbb31a1e0f7652363154ce96f45 > > > > > Kaboom. I think the issue might be that the reference dependency tree > > > building is broken and it should've pulled the other pack as well. > > > > Last time I checked, git-http-pull did not utilize the pack > > dependency information, which indeed is wrong. When it decides > > to fetch a pack instead of an asked-for object, it should check > > which commits the pack expects to have in your local repository > > and add them to its list of things to slurp. > > > > A good news is that "git clone" as a whole works fine. > > Yes, but cg-clone doesn't - it naively depended on the core git tools > actually, er.. working. ;-) > > This became a nightmare to me by now - on two machines I tried to pull > to over HTTP, that failed miserably, and I got stuck until I applied > Daniel's patch there (and cleaned up after previous git-http-pulls). > > So I have this packless git-pb repository and suspecting no evil, I pull > from you (thankfully I have .git/objects/pack there from some historical > pulls). I do a merge commit: > > packed >... J > packed \ >> M > / >... P > > Now I want to pull on another machine. That pulls M and then fails since > I have no .git/objects/pack there, bummer. So I mkdir it, but get no > further w/o Daniel's patch - for git-*-pull, J is missing and that's it. > So I apply the patch, and get friendly > > error: Unable to determine requirements of type (null) for M > > and only after I delete M from the database, I finally succeed with > git-http-pull. (That was with --repair.) That's not good since this > might occur even naturally when the pull is interrupted. Insufficient testing on my part; patch at the end. > With git-ssh-pull, the situation is even more vexing - it refuses to > fetch the packs for some reason yet unknown to me (I will debug it > tomorrow). git-ssh-pull doesn't deal in packs; it gets individual objects out of packs, which git-ssh-push (on the remote side) should be extracting. Perhaps you have a git-ssh-push on the remote side that's before I make packs work (it used to need to have the files for objects it was sending). At some point, I have to revisit getting git-ssh-* to generate exactly the required pack and transfer that, but that's an efficiency issue, not a correctness one, and shouldn't be relevant to the problem you're having. --- [PATCH] Also parse objects we already have In the case where we don't know from context what type an object is, but we don't have to fetch it, we need to parse it to determine the type before processing it. Signed-off-by: Daniel Barkalow <[EMAIL PROTECTED]> --- pull.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) b8c382e76da25f45ff86176a6a6affdd9a28d60b diff --git a/pull.c b/pull.c --- a/pull.c +++ b/pull.c @@ -127,6 +127,7 @@ static int process(unsigned char *sha1, { struct object *obj = lookup_object_type(sha1, type); if (has_sha1_file(sha1)) { + parse_object(sha1); /* We already have it, so we should scan it now. */ return process_object(obj); } - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Re: git-http-pull broken in latest git
On Thu, 11 Aug 2005, Junio C Hamano wrote: > Daniel Barkalow <[EMAIL PROTECTED]> writes: > > > It should work anyway,... > > That is true. Please forget about the "recommendation" to slurp > packs and not falling back on commit walker. > > Thanks for the patch. No problem; I had been wondering what the rest of those lines were about anyway. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Re: git-http-pull broken in latest git
On Thu, 11 Aug 2005, Junio C Hamano wrote: > Petr Baudis <[EMAIL PROTECTED]> writes: > > > $ git-cat-file commit bf570303153902ec3d85570ed24515bcf8948848 | grep tree > > tree 41f10531f1799bbb31a1e0f7652363154ce96f45 > > $ git-read-tree 41f10531f1799bbb31a1e0f7652363154ce96f45 > > fatal: failed to unpack tree object 41f10531f1799bbb31a1e0f7652363154ce96f45 > > > Kaboom. I think the issue might be that the reference dependency tree > > building is broken and it should've pulled the other pack as well. > > Last time I checked, git-http-pull did not utilize the pack > dependency information, which indeed is wrong. Is there documentation on the format? > When it decides to fetch a pack instead of an asked-for object, it > should check which commits the pack expects to have in your local > repository and add them to its list of things to slurp. It should work anyway, except that I messed up some logic in the parallel pull stuff; when it finds it has something already, it ignores it entirely, rather than processing it. The following patch fixes this. --- [PATCH] Fix parallel pull dependancy tracking. It didn't refetch an object it already had (good), but didn't process it, either (bad). Synchronously process anything you already have. Signed-off-by: Daniel Barkalow <[EMAIL PROTECTED]> --- pull.c | 57 - 1 files changed, 32 insertions(+), 25 deletions(-) 9b6b4b259c6b00d5b2502c158bc800d7623352bc diff --git a/pull.c b/pull.c --- a/pull.c +++ b/pull.c @@ -98,12 +98,38 @@ static int process_tag(struct tag *tag) static struct object_list *process_queue = NULL; static struct object_list **process_queue_end = &process_queue; -static int process(unsigned char *sha1, const char *type) +static int process_object(struct object *obj) { - struct object *obj; - if (has_sha1_file(sha1)) + if (obj->type == commit_type) { + if (process_commit((struct commit *)obj)) + return -1; + return 0; + } + if (obj->type == tree_type) { + if (process_tree((struct tree *)obj)) + return -1; return 0; - obj = lookup_object_type(sha1, type); + } + if (obj->type == blob_type) { + return 0; + } + if (obj->type == tag_type) { + if (process_tag((struct tag *)obj)) + return -1; + return 0; + } + return error("Unable to determine requirements " +"of type %s for %s", +obj->type, sha1_to_hex(obj->sha1)); +} + +static int process(unsigned char *sha1, const char *type) +{ + struct object *obj = lookup_object_type(sha1, type); + if (has_sha1_file(sha1)) { + /* We already have it, so we should scan it now. */ + return process_object(obj); + } if (object_list_contains(process_queue, obj)) return 0; object_list_insert(obj, process_queue_end); @@ -134,27 +160,8 @@ static int loop(void) return -1; if (!obj->type) parse_object(obj->sha1); - if (obj->type == commit_type) { - if (process_commit((struct commit *)obj)) - return -1; - continue; - } - if (obj->type == tree_type) { - if (process_tree((struct tree *)obj)) - return -1; - continue; - } - if (obj->type == blob_type) { - continue; - } - if (obj->type == tag_type) { - if (process_tag((struct tag *)obj)) - return -1; - continue; - } - return error("Unable to determine requirements " -"of type %s for %s", -obj->type, sha1_to_hex(obj->sha1)); + if (process_object(obj)) + return -1; } return 0; } - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bootstrapping into git, commit gripes at me
On Mon, 11 Jul 2005, Junio C Hamano wrote: > Linus Torvalds <[EMAIL PROTECTED]> writes: > > > But what about the branch name? Should we just ask the user? Together with > > a flag, like > > > > git checkout -b new-branch v2.6.12 > > > > for somebody who wants to specify the branch name? Or should we pick a > > random name and add a helper function to rename a branch later? > > > > Opinions? > > How about treating "master" a temporary thing --- "whatever I > happen to be working on right now"? That conflicts with my usage, where I have a single repository for all of my working directories, with .git/refs and .git/objects being symlinks to it, but .git/HEAD being different for each branch. The stuff in objects/ and refs/ really shouldn't depend on what you're currently doing for this reason. My way of thinking of "master" is that it's a real branch, which is for all of the situations where you aren't using a specially-designated branch. For many people, they only do stuff that's not designated specially; Jeff only does stuff that is designated specially. But if you do both, you'll want master to be left alone while you work on the side branch. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] Demo support for packs via HTTP
On Mon, 11 Jul 2005, Darrin Thompson wrote: > On Sun, 2005-07-10 at 15:56 -0400, Daniel Barkalow wrote: > > + curl_easy_setopt(curl, CURLOPT_FILE, indexfile); > > + curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, fwrite); > > + curl_easy_setopt(curl, CURLOPT_URL, url); > > I was hoping to send in a patch which would turn on user auth and turn > off ssl peer verification. > > Your (preliminary obviously) patch puts curl handling in two places. Is > there a place were I can safely start working on adding the needed > setopts? If I understand the curl documentation, you should be able to set options on the curl object when it has just been created, if those options aren't going to change between requests. Note that I make requests from multiple places, but I use the same curl object for all of them. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Support for packs in HTTP
On Mon, 11 Jul 2005, Linus Torvalds wrote: > > > On Mon, 11 Jul 2005, Daniel Barkalow wrote: > > On Sun, 10 Jul 2005, Linus Torvalds wrote: > > > > > > > > You really _mustn't_ try to create the pack directly to the > > > $GIT_DIR/objects/pack subdirectory - that would make git itself start > > > possibly using that pack before the index is all done, and that would be > > > just wrong and nasty. > > > > > > So you really should _always_ generate the pack somewhere else, and then > > > move it (pack file first, index file second). > > > > It's currently fine ignoring index files without corresponding > > pack files (sha1_file.c, line 470). > > That doesn't help. Well, it means that the order you move them doesn't matter, because it will ignore the pair if either hasn't been moved. > Redgardless of which order you write them (and you _will_ write the > pack-file first), you'll find that at some point you have both files, but > one or the other isn't fully written, ie they are unusable. (Off topic: note that git-http-pull writes the _index_ first, because it fetches it to determine if it should fetch the pack) > And yes, you can handle that by always checking the SHA1 of the files when > you open them, but the fact is, you shouldn't need to, just to use it. > Checking the SHA1 of the pack-file in particular is very expensive (since > it's potentially a huge file, and you don't even want to read all of it). IIRC, we check the size of the pack file and there are hashes around the ends of the two files which have to match; but this is a die() check, not an ignore check, so we just crash with a clear error message rather than doing crazy stuff (like reading from beyond the end of the mmap). > So that's what I decided the rule is: never ever have a partial file, and > thus you can by definition use them immediately when you see both files. > > But that requires that you write them under another name than the final > one. And since you want that _anyway_ for other uses, you don't hide that > inside "git-pack-objects", but you make it an exported interface. We should never write anything under the final name, anyway, for just this reason; we already use open/write/close/rename for objects, refs, and cache (maybe not working directory files, though). I think we're actually agreeing on this. My position is that the temporary location should be something like {final-name}.part, such that it doesn't match *.idx or *.pack beforehand (so it doesn't look like a complete file that you might want to send to someone) and it doesn't have to worry about EXDEV on the rename. Also, I would ideally like to be able to resume an interrupted download, which means that it would have to find the partial file in a predictable location, given what it's supposed to contain. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Support for packs in HTTP
On Sun, 10 Jul 2005, Linus Torvalds wrote: > On Sun, 10 Jul 2005, Daniel Barkalow wrote: > > > > Perhaps git-pack-objects should have the base as a optional argument, > > with a default of the filename in $GIT_DIR/objects/pack and an option > > for sending just the pack file to stdout? > > You really _mustn't_ try to create the pack directly to the > $GIT_DIR/objects/pack subdirectory - that would make git itself start > possibly using that pack before the index is all done, and that would be > just wrong and nasty. > > So you really should _always_ generate the pack somewhere else, and then > move it (pack file first, index file second). It's currently fine ignoring index files without corresponding pack files (sha1_file.c, line 470). Do you want to make the constraint that the pack/ directory doesn't have index files for packs that aren't also there? (I've been putting the index files for packs that might be possibile to get there, and relying on the above check to make sure that they don't affect anything if it hasn't fetched the pack.) Of course, we should never write to files in locations that anything looks at; we want everything to appear atomically, completely written and verified. But there's nothing wrong with having the C code place the objects, which is certainly going to be necessary in the case of downloading them by HTTP, since the program will want to place them and enable them while running. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Support for packs in HTTP
On Sun, 10 Jul 2005, Linus Torvalds wrote: > > > On Sun, 10 Jul 2005, Daniel Barkalow wrote: > > > On Sun, 10 Jul 2005, Linus Torvalds wrote: > > > > > > Well, regardless, we want to be able to specify which directory to write > > > them to. We don't necessarily want to write them to the current working > > > directory, nor do we want to write them to their eventual destination in > > > .git/objects/pack. > > > > > > In fact, the main current user ("git repack") really wants to write them > > > to a temporary file, and one that isn't even called "pack-xxx", since it > > > ends up doing cleanup with > > > > > > rm -f .tmp-pack-* > > > > > > in case a previous re-pack was interrupted (in which case it simply cannor > > > know what the exact name was supposed to be). > > > > > > So the "basename" ends up being necessary and meaningful regardless. We > > > do > > > _not_ want to remove that capability. > > > > Shouldn't we do the same thing we do with object files? I don't see any > > difference in desired behavior. > > Well, the main difference is that pack-files can be used for many things. > > For example, a web interface for getting a pack-file between two releases: > say you knew you had version xyzzy, and you want to get version xyzzy+1, > you could do that through webgit some way even with a "stupid" interface. > Kay already had some patch to generate pack-files for something. > > The point being that pack-files are _not_ like objects. Pack-files are > meant for communication. Having them in .git/objects/pack is just one > special case. Okay, I can see the use for them getting written to arbitrary paths; but I think that it's worth having a canonical location for a pack that's being used by the system (either not having been sent anywhere, or after having been received). Perhaps git-pack-objects should have the base as a optional argument, with a default of the filename in $GIT_DIR/objects/pack and an option for sending just the pack file to stdout? I think that covers everything in order of usefulness, and means that the program deals with any filename that the user doesn't know in advance. > > Why not checksum it in a predictable order, either that of the pack file > > or the index? We do care that it's something verifiable, so that people > > can't cause intentional collisions (for a DoS) just by naming their packs > > after existing packs that users might not have downloaded yet. > > We could sha1-sum the "sorted by SHA1" list, I guess. That'd be good; then git-http-pull can validate the hash on the index and be sure that a matching pack file from a different location still has the same contents. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Support for packs in HTTP
On Sun, 10 Jul 2005, Linus Torvalds wrote: > On Sun, 10 Jul 2005, Junio C Hamano wrote: > > > > So I would suggest either: > > > > - droping the packname parameter from git-pack-objects. Make > > the packs always named pack-X{40}.pack (or just X{40}.pack); > > Well, regardless, we want to be able to specify which directory to write > them to. We don't necessarily want to write them to the current working > directory, nor do we want to write them to their eventual destination in > .git/objects/pack. > > In fact, the main current user ("git repack") really wants to write them > to a temporary file, and one that isn't even called "pack-xxx", since it > ends up doing cleanup with > > rm -f .tmp-pack-* > > in case a previous re-pack was interrupted (in which case it simply cannor > know what the exact name was supposed to be). > > So the "basename" ends up being necessary and meaningful regardless. We do > _not_ want to remove that capability. Shouldn't we do the same thing we do with object files? I don't see any difference in desired behavior. > > also have verify-pack to verify the name of the packfile, > > and make sure X{40} part of the name matches what it claims > > to contain; > > Now, that would be fine, but it can't be done. Not the way things are laid > out. A SHA1 checksum depends on the order the data was checksummed in, and > we don't even save that. Why not checksum it in a predictable order, either that of the pack file or the index? We do care that it's something verifiable, so that people can't cause intentional collisions (for a DoS) just by naming their packs after existing packs that users might not have downloaded yet. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] write_sha1_to_fd()
Add write_sha1_to_fd(), which writes an object to a file descriptor. This includes support for unpacking it and recompressing it. Signed-off-by: Daniel Barkalow <[EMAIL PROTECTED]> --- commit 264ff9f3dcde5553728b34fa08e04643b2b55946 tree 353fe33ae9c7265d7b685bca864d657e3efe2849 parent c3eb461762b1d65e424fc4ede6a1d4f3e0a679f7 author Daniel Barkalow <[EMAIL PROTECTED]> 1121033477 -0400 committer Daniel Barkalow <[EMAIL PROTECTED](none)> 1121033477 -0400 Index: cache.h === --- 545ef8191b517b7f9e4ea558edaf526038ed1895/cache.h (mode:100644 sha1:719a77dfabb24e58abd21b7f3a4b846a114e000a) +++ 353fe33ae9c7265d7b685bca864d657e3efe2849/cache.h (mode:100644 sha1:38dac6d6a413f1c788e5331ef4741fc15d72d9bd) @@ -187,6 +187,7 @@ extern int read_tree(void *buffer, unsigned long size, int stage); extern int write_sha1_from_fd(const unsigned char *sha1, int fd); +extern int write_sha1_to_fd(int fd, const unsigned char *sha1); extern int has_sha1_pack(const unsigned char *sha1); extern int has_sha1_file(const unsigned char *sha1); Index: sha1_file.c === --- 545ef8191b517b7f9e4ea558edaf526038ed1895/sha1_file.c (mode:100644 sha1:27136fdba0fbf2dd943f2634cb49660cdbf95ec4) +++ 353fe33ae9c7265d7b685bca864d657e3efe2849/sha1_file.c (mode:100644 sha1:08560b2c7a6dff400a46160501c247081f9bb4c7) @@ -1326,6 +1326,65 @@ return 0; } +int write_sha1_to_fd(int fd, const unsigned char *sha1) +{ + ssize_t size; + unsigned long objsize; + int posn = 0; + char *buf = map_sha1_file_internal(sha1, &objsize, 0); + z_stream stream; + if (!buf) { + unsigned char *unpacked; + unsigned long len; + char type[20]; + char hdr[50]; + int hdrlen; + // need to unpack and recompress it by itself + unpacked = read_packed_sha1(sha1, type, &len); + + hdrlen = sprintf(hdr, "%s %lu", type, len) + 1; + + /* Set it up */ + memset(&stream, 0, sizeof(stream)); + deflateInit(&stream, Z_BEST_COMPRESSION); + size = deflateBound(&stream, len + hdrlen); + buf = xmalloc(size); + + /* Compress it */ + stream.next_out = buf; + stream.avail_out = size; + + /* First header.. */ + stream.next_in = hdr; + stream.avail_in = hdrlen; + while (deflate(&stream, 0) == Z_OK) + /* nothing */; + + /* Then the data itself.. */ + stream.next_in = unpacked; + stream.avail_in = len; + while (deflate(&stream, Z_FINISH) == Z_OK) + /* nothing */; + deflateEnd(&stream); + + objsize = stream.total_out; + } + + do { + size = write(fd, buf + posn, objsize - posn); + if (size <= 0) { + if (!size) { + fprintf(stderr, "write closed"); + } else { + perror("write "); + } + return -1; + } + posn += size; + } while (posn < objsize); + return 0; +} + int write_sha1_from_fd(const unsigned char *sha1, int fd) { char *filename = sha1_file_name(sha1); Index: ssh-push.c === --- 545ef8191b517b7f9e4ea558edaf526038ed1895/ssh-push.c (mode:100644 sha1:090d6f9f8fbde2d736ac5bf563415b0fa402b5aa) +++ 353fe33ae9c7265d7b685bca864d657e3efe2849/ssh-push.c (mode:100644 sha1:aac70af514e0dc5507fa4997ebad54352c973215) @@ -7,13 +7,13 @@ static unsigned char local_version = 1; static unsigned char remote_version = 0; +static int verbose = 0; + static int serve_object(int fd_in, int fd_out) { ssize_t size; - int posn = 0; unsigned char sha1[20]; - unsigned long objsize; - void *buf; signed char remote; + int posn = 0; do { size = read(fd_in, sha1 + posn, 20 - posn); if (size < 0) { @@ -25,12 +25,12 @@ posn += size; } while (posn < 20); - /* fprintf(stderr, "Serving %s\n", sha1_to_hex(sha1)); */ + if (verbose) + fprintf(stderr, "Serving %s\n", sha1_to_hex(sha1)); + remote = 0; - buf = map_sha1_file(sha1, &objsize); - - if (!buf) { + if (!has_sha1_file(sha1)) { fprintf(stderr, "git-ssh-push: could not find %s\n", sha1_to_hex(sha1));
[PATCH 2/2] Remove map_sha1_file
Remove map_sha1_file(), now unused. Signed-off-by: Daniel Barkalow <[EMAIL PROTECTED]> --- commit c21a02262f770a25b005378e06354e582aa1bfd8 tree 7ac9fabe666f00f37572e7b349fdb859bf8a6491 parent 264ff9f3dcde5553728b34fa08e04643b2b55946 author Daniel Barkalow <[EMAIL PROTECTED]> 1121033599 -0400 committer Daniel Barkalow <[EMAIL PROTECTED](none)> 1121033599 -0400 Index: cache.h === --- 353fe33ae9c7265d7b685bca864d657e3efe2849/cache.h (mode:100644 sha1:38dac6d6a413f1c788e5331ef4741fc15d72d9bd) +++ 7ac9fabe666f00f37572e7b349fdb859bf8a6491/cache.h (mode:100644 sha1:11ba95c8aa9202fa3b1a3cbc07bc976641cd1908) @@ -167,7 +167,6 @@ int safe_create_leading_directories(char *path); /* Read and unpack a sha1 file into memory, write memory to a sha1 file */ -extern void * map_sha1_file(const unsigned char *sha1, unsigned long *size); extern int unpack_sha1_header(z_stream *stream, void *map, unsigned long mapsize, void *buffer, unsigned long size); extern int parse_sha1_header(char *hdr, char *type, unsigned long *sizep); extern int sha1_object_info(const unsigned char *, char *, unsigned long *); Index: sha1_file.c === --- 353fe33ae9c7265d7b685bca864d657e3efe2849/sha1_file.c (mode:100644 sha1:08560b2c7a6dff400a46160501c247081f9bb4c7) +++ 7ac9fabe666f00f37572e7b349fdb859bf8a6491/sha1_file.c (mode:100644 sha1:e082f2e6cb985caca11979311c291aa51d6c37fd) @@ -578,8 +578,7 @@ } static void *map_sha1_file_internal(const unsigned char *sha1, - unsigned long *size, - int say_error) + unsigned long *size) { struct stat st; void *map; @@ -587,8 +586,6 @@ char *filename = find_sha1_file(sha1, &st); if (!filename) { - if (say_error) - error("cannot map sha1 file %s", sha1_to_hex(sha1)); return NULL; } @@ -602,8 +599,6 @@ break; /* Fallthrough */ case 0: - if (say_error) - perror(filename); return NULL; } @@ -620,11 +615,6 @@ return map; } -void *map_sha1_file(const unsigned char *sha1, unsigned long *size) -{ - return map_sha1_file_internal(sha1, size, 1); -} - int unpack_sha1_header(z_stream *stream, void *map, unsigned long mapsize, void *buffer, unsigned long size) { /* Get the data stream */ @@ -1112,7 +1102,7 @@ z_stream stream; char hdr[128]; - map = map_sha1_file_internal(sha1, &mapsize, 0); + map = map_sha1_file_internal(sha1, &mapsize); if (!map) { struct pack_entry e; @@ -1151,7 +1141,7 @@ unsigned long mapsize; void *map, *buf; - map = map_sha1_file_internal(sha1, &mapsize, 0); + map = map_sha1_file_internal(sha1, &mapsize); if (map) { buf = unpack_sha1_file(map, mapsize, type, size); munmap(map, mapsize); @@ -1331,7 +1321,7 @@ ssize_t size; unsigned long objsize; int posn = 0; - char *buf = map_sha1_file_internal(sha1, &objsize, 0); + char *buf = map_sha1_file_internal(sha1, &objsize); z_stream stream; if (!buf) { unsigned char *unpacked; - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] Handing sending objects from packs
This series adds support for sending individual objects from packs in in git-ssh-push and removes map_sha1_file. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Make --recover cause pull to trace everything
Make the --recover flag check the parents of commits which are already available. This is needed currently to deal with cases where a parent is pulled along with a commit (in a pack, e.g.) and references above that parent aren't also pulled together. Signed-off-by: Daniel Barkalow <[EMAIL PROTECTED]> --- commit 75e8c1be7a778e0a0fa119fe1bc408341932e7e5 tree ffbe708117543c356eb2981f1e0540b89b7a95e2 parent a7336ae514738f159dad314d6674961427f043a6 author Daniel Barkalow <[EMAIL PROTECTED]> 1121024019 -0400 committer Daniel Barkalow <[EMAIL PROTECTED](none)> 1121024019 -0400 Index: http-pull.c === --- 248f72f3e4dcb40693488b0c06f93d0b38122b8e/http-pull.c (mode:100644 sha1:1f9d60b9b1d5eed85b24d96c240666bbfc5a22ed) +++ ffbe708117543c356eb2981f1e0540b89b7a95e2/http-pull.c (mode:100644 sha1:3fa56f08b0b8e7316afcaab3a7bfa3f2d26b550f) @@ -146,7 +146,10 @@ int arg = 1; while (arg < argc && argv[arg][0] == '-') { - if (argv[arg][1] == 't') { + if (argv[arg][1] == '-') { + if (!strcmp(argv[arg] + 2, "recover")) + careful = 1; + } else if (argv[arg][1] == 't') { get_tree = 1; } else if (argv[arg][1] == 'c') { get_history = 1; Index: local-pull.c === --- 248f72f3e4dcb40693488b0c06f93d0b38122b8e/local-pull.c (mode:100644 sha1:2f06fbee8b840a7ae642f5a22e2cb993687f3470) +++ ffbe708117543c356eb2981f1e0540b89b7a95e2/local-pull.c (mode:100644 sha1:0d10c07844030bc7cb615cf916dce89592151be7) @@ -116,7 +116,10 @@ int arg = 1; while (arg < argc && argv[arg][0] == '-') { - if (argv[arg][1] == 't') + if (argv[arg][1] == '-') { + if (!strcmp(argv[arg] + 2, "recover")) + careful = 1; + } else if (argv[arg][1] == 't') get_tree = 1; else if (argv[arg][1] == 'c') get_history = 1; Index: pull.c === --- 248f72f3e4dcb40693488b0c06f93d0b38122b8e/pull.c (mode:100644 sha1:ed3078e3b27c62c07558fd94f339801cbd685593) +++ ffbe708117543c356eb2981f1e0540b89b7a95e2/pull.c (mode:100644 sha1:d9763840c7ebcb1e5838c3b960695cafcca3ac73) @@ -11,6 +11,7 @@ const unsigned char *current_ref = NULL; +int careful = 0; int get_tree = 0; int get_history = 0; int get_all = 0; @@ -91,7 +92,8 @@ if (get_history) { struct commit_list *parents = obj->parents; for (; parents; parents = parents->next) { - if (has_sha1_file(parents->item->object.sha1)) + if (!careful && + has_sha1_file(parents->item->object.sha1)) continue; if (make_sure_we_have_it(NULL, parents->item->object.sha1)) { Index: pull.h === --- 248f72f3e4dcb40693488b0c06f93d0b38122b8e/pull.h (mode:100644 sha1:e173ae3337c4465da87d849f4e5c9da203fdf01d) +++ ffbe708117543c356eb2981f1e0540b89b7a95e2/pull.h (mode:100644 sha1:d1076468b71b31dd5e59ec55d98de830cf9df60e) @@ -21,6 +21,12 @@ /* If set, the hash that the current value of write_ref must be. */ extern const unsigned char *current_ref; +/* + * Set to check on everything, instead of stopping at points where we think + * we must have everything. + */ +extern int careful; + /* Set to fetch the target tree. */ extern int get_tree; Index: ssh-pull.c === --- 248f72f3e4dcb40693488b0c06f93d0b38122b8e/ssh-pull.c (mode:100644 sha1:26356dd7d84ea1bc9f7320b18562ed4117d4fac0) +++ ffbe708117543c356eb2981f1e0540b89b7a95e2/ssh-pull.c (mode:100644 sha1:7ca4243f3bd84590e7bb94467fd5acccd7d4d6f9) @@ -61,7 +61,10 @@ const char *prog = getenv("GIT_SSH_PUSH") ? : "git-ssh-push"; while (arg < argc && argv[arg][0] == '-') { - if (argv[arg][1] == 't') { + if (argv[arg][1] == '-') { + if (!strcmp(argv[arg] + 2, "recover")) + careful = 1; + } else if (argv[arg][1] == 't') { get_tree = 1; } else if (argv[arg][1] == 'c') { get_history = 1; - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html