from:"Daniel Barkalow"

Re: refspecs with '*' as part of pattern

2015-07-06 Thread Daniel Barkalow

On Mon, 6 Jul 2015, Junio C Hamano wrote:

> Jacob Keller  writes:
> 
> > I've been looking at the refspecs for git fetch, and noticed that
> > globs are partially supported. I wanted to use something like:
> >
> > refs/tags/some-prefix-*:refs/tags/some-prefix-*
> >
> > as a refspec, so that I can fetch only tags which have a specific
> > prefix. I know that I could use namespaces to separate tags, but
> > unfortunately, I am unable to fix the tag format. The specific
> > repository in question is also generating several tags which are not
> > relevant to me, in formats that are not really useful for human
> > consumption. I am also not able to fix this less than useful practice.
> >
> > However, I noticed that refspecs only support * as a single component.
> > The match algorithm works perfectly fine, as documented in
> > abd2bde78bd9 ("Support '*' in the middle of a refspec")
> >
> > What is the reason for not allowing slightly more arbitrary
> > expressions? Obviously no more than one *...
> 
> I cannot seem to be able to find related discussions around that
> patch, so this is only my guess, but I suspect that this is to
> discourage people from doing something like:
> 
>   refs/tags/*:refs/tags/foo-*
> 
> which would open can of worms (e.g. imagine you fetch with that
> pathspec and then push with refs/tags/*:refs/tags/* back there;
> would you now get foo-v1.0.0 and foo-foo-v1.0.0 for their v1.0.0
> tag?) we'd prefer not having to worry about.

That wouldn't be it, since refs/tags/*:refs/tags/foo/* would have the same 
problem, assuming you didn't set up the push refspec carefully.

I think it was mostly that it would be too easy to accidentally do 
something you don't want by having some other character instead of a 
slash, like refs/heads/*:refs/heads-*.

Aside from the increased risk of hard-to-spot typos leading to very weird 
behavior, nothing actually goes wrong; in fact, I've been using git with 
that check removed for ages because I wanted a refspec like 
refs/heads/something-*:refs/heads/*. And it works fine as a local patch, 
since you don't need your refspec handling to interoperate with other 
repositories.

-Daniel
*This .sig left intentionally blank*
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 00/13] New remote-hg helper

2012-10-31 Thread Daniel Barkalow

On Wed, 31 Oct 2012, Felipe Contreras wrote:

> Hi,
> 
> On Wed, Oct 31, 2012 at 7:59 PM, Jonathan Nieder  wrote:
> > Felipe Contreras wrote:
> >> On Wed, Oct 31, 2012 at 7:20 PM, Johannes Schindelin
> >>  wrote:
> >
> >>> I just tested this with junio/next and it seems this issue is still
> >>> unfixed: instead of
> >>>
> >>> reset refs/heads/blub
> >>> from e7510461b7db54b181d07acced0ed3b1ada072c8
> >>>
> >>> I get
> >>>
> >>> reset refs/heads/blub
> >>> from :0
> >>>
> >>> when running "git fast-export ^master blub".
> >>
> >> That is not a problem. It has been discussed extensively, and the
> >> consensus seems to be that such command should throw nothing:
> >>
> >> http://article.gmane.org/gmane.comp.version-control.git/208729
> >
> > Um.  Are you claiming I have said that "git fast-export ^master blub"
> > should silently emit nothing?  Or has this been discussed extensively
> > with someone else?
> 
> Maybe I misunderstood when you said:
> > A patch meeting the above description would make perfect sense to me.
> 
> Anyway, when you have:
> 
> % git fast-export ^next next^{commit}
> # nothing
> % git fast-export ^next next~0
> # nothing
> % git fast-export ^next next~1
> # nothing
> % git fast-export ^next next~2
> # nothing
> 
> It only makes sense that:
> 
> % git fast-export ^next next
> # nothing
> 
> It doesn't get any more obvious than that. But to each his own.

I think that may be true where you have "next" in both places, but I  
think:

$ git checkout -b new-branch master
$ git fast-export ^master new-branch

ought to emit no "commit" lines, but needs to emit a "reset" line. After 
all, you haven't told fast-export that the ref "new-branch" is up to date, 
and you have told it that you want it to be exported. If you create a new 
branch off of an existing commit, don't change it, and push it to hg, it 
shouldn't be up to remote-hg to figure out what should happen with no 
input; it should get a:

reset refs/heads/new-branch
from [something]

I don't know why Johannes seems to want [something] not to be a mark 
reference (unless he's complaining about getting an invalid mark 
reference when there aren't any marks defined), but surely something of 
the above form is necessary to tell remote-hg to create the new branch.

I think it would be worth testing that:

$ git checkout -b new-branch master
$ git push hg new-branch

creates the new branch successfully (which I think it does, but wouldn't 
if "git fast-export ^master new-branch" actually returned nothing; 
parsed_refs gets it from the reset line).

AFAICT, your code relies on getting the behavior that fast-export actually 
gives, not the behavior you seem to want or the behavior Johannes seems to 
want. And the reason that you don't need any changes to fast-export is 
that your process maps marks instead of sha1s.

-Daniel
*This .sig left intentionally blank*
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: git-clone ignores umask for working tree

2012-07-06 Thread Daniel Barkalow

On Fri, 6 Jul 2012, Alex Riesen wrote:

> Hi list,
> 
> when git-clone was built in, its treatment of umask has changed: the shell
> version respected umask for newly created directories by using plain mkdir(1),
> and the builtin version just uses mkdir(work_tree, 0755).
>
> Is it intentional?

I have the vague feeling that it was intentional, but it's entirely 
plausible that I just overlooked that mkdir(2) applies umask and went for 
the mode that you normally want. I don't think there's any particular need 
for this operation to be more restrictive than umask.

-Daniel
*This .sig left intentionally blank*
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Multi-ancestor read-tree notes

2005-09-09 Thread Daniel Barkalow

On Fri, 9 Sep 2005, Junio C Hamano wrote:

> Daniel Barkalow <[EMAIL PROTECTED]> writes:
> 
> > In case #16, I'm not sure what I should produce. I think the best thing 
> > might be to not leave anything in stage 1. The desired end effect is that 
> > the user is given a file with a section like:
> >
> >   {
> > *t = NULL;
> > *m = 0;
> > <<<<<<<<
> > return Z_DATA_ERROR;
> > 
> > return Z_OK;
> >>>>>>>>>
> >   }
> 
> I was thinking a bit more about this.  Let's rephrase case #16.
> I'll call merge bases O1, O2,... and merge heads A and B, and we
> are interested in one path.
> 
> If O1 and O2, the path has quite different contents.  A has the
> same contents as O1 and B has the same contents as O2. 

There's a bit more subtlety here: since these are common ancestors, A must 
have somehow changed O2's version to O1's version, and B must have changed 
O1's version to O2's version. It's isn't just that each side left the file 
the same, but from different ancestral versions; both of the other 
versions must have gotten rejected somehow. I think the real key is to 
identify what was going on in between.

> We should not just pick one or the other and do two-file merge
> between the version in A and B (we could prototype by massaging
> 'diff A B' output to produce what is common between A and B and
> run (RCS) merge of A and B pretending that the common contents
> is the original to produce something like the above).
> 
> If A has slight changes since O1 but B did not change since O2,
> ideally I think we would want the same thing to happen.  Let's
> call it case #16+.
> 
> What does the current implementation do?  It is not case #16
> because A and O1 does not exactly match.  I suspect the result
> will be skewed because B has an exact match with O2. 

Yes, in this case we miss whatever caused A to reject O2, and we use the 
modified O2, because we don't realize that A's rejection of O2 should also 
apply to the version in B. Unfortunately, this looks just like the 
situation where both sides took O1, and B did a further modification to 
that.

> The situation becomes more interesting if both A and B has slight
> changes since O1 and O2 respectively.  They do not exactly match
> with their bases, but I think ideally we would like something
> very similar to case #16 resolution to happen.

I think the right thing, ideally, is to have the content merge also take 
multiple ancestors and have a #16 case itself when it's deciding which 
version of a block to use. The #16+ case is actually trickier, because we 
have fewer cues.

> One way to solve this would be to try doing things entirely in
> read-tree by doing not just exact matches but also checking the
> amount of changes -- if each heads has similar but different
> base call it case #16 and try two-file merge between the heads
> disregarding the bases.
> 
> But I am a bit reluctant to suggest this.  My gut feeling tells
> me that these 'interesting' cases are easier if scripted outside
> read-tree machinery to later enhance and improve the heuristics.
> 
> Of course, the current case #16 detected by the exact match rule
> should be something we can automatically handle, but to make
> things safer to use I think we should have a way to detect case
> #16+ situlation and avoid mistakenly favoring A over B (or vice
> versa) only because one has slight modification while the other
> does not.

I think #16+ is extra uncommon, because it involves someone making an 
irrelevant modification to a patched version of a file while someone else 
reverts the patch. I'm actually interested in doing a big spiffy program 
to do merges with information drawn as needed from the history, stuff 
happening on a per-hunk level, and support for block moves. It'll take a 
while before it gets anywhere, but I still think it's likely that people 
won't hit #16+ and get unexpected behavior before it's ready.

The main thing I'm unsure of is whether Fredrick's algorithm is actually 
not a better solution: it is possible to understand what happened leading 
up to a merge either by looking at the time after the common ancestors or 
by looking at the time before them. I think that the more recent history 
is a better guide, but the older history is easier to use; the case his 
version isn't good for, I think, is when the common ancestors of the sides 
are even more complicated to merge.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFH] Merge driver

2005-09-09 Thread Daniel Barkalow

On Fri, 9 Sep 2005, Junio C Hamano wrote:

> Daniel Barkalow <[EMAIL PROTECTED]> writes:
> 
> > It tries to make sure that there is room to put stuff for resolving a 
> > conflict without messing with modified files in the directory.
> 
> I agree it can be used that way, but nobody seems to use it for
> that purpose as far as I can tell hence my earlier comment.  But
> let's leave the door open by having them as independent
> options.

Ah, okay. I hadn't realized that resolve used -u for that call to 
read-tree. You're entirely right.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFH] Merge driver

2005-09-09 Thread Daniel Barkalow

On Fri, 9 Sep 2005, Junio C Hamano wrote:

> I have several requests to people who are interested in merges
> and read-tree changes.
> 
> I am pretty much set to use the recent read-tree updates Daniel
> has been working on.  The only reason it has not hit the
> "master" branch yet, except that it still has known leaks that
> have not been plugged, is because read-tree is so fundamental to
> everything we do, and I am trying to be extremely conservative
> here.  I've beaten it myself reasonably well and have not found
> any regressions (except removal of --emu23 which I believe
> nobody uses anyway), but I'd appreciate people to try it out and
> see if it performs well for your dataset.
> 
> If you are planning further surgery on read-tree code, please
> base your changes on Daniel's rewrite to avoid your effort being
> wasted.  This request goes both to Chuck (active_cache
> abstraction) and Fredrik (addition of 'ignore index and working
> tree matching rules' [*1*]).
> 
> A proposed merge driver 'git-merge' is in the proposed updates
> branch.  This is intended to be the top-level user interface to
> the merge machinery which drives multiple merge strategy
> scripts, and I am hoping that I can eventually (1) retire
> 'git-resolve' and 'git-octopus' (they simply become merge
> strategy scripts driven by 'git-merge') and (2) call 'git-merge'
> from 'git-pull'.  What I have in the proposed updates branch has
> been fixed since my earlier message to the list and has a new
> merge strategy script, in addition to 'resolve' and 'octopus',
> called 'git-merge-multibase'.  This uses Daniel's read-tree that
> can use more than one merge bases.  I request Daniel to give OK
> to its name or suggest a better name for this script -- I would
> even accept 'git-merge-barkalow' if you want ;-).

I'd actually been thinking it would just go into the the "resolve" driver, 
with that going back to before it chose among merge-base outputs and just 
sending the whole list to read-tree.

> If you are planning to implement a new merge strategy, please
> use the ones in the proposed updates branch as examples, and
> complain and suggest improvements if you find the interface
> between the strategy scripts and the driver lacking.  This
> request goes primarily to Fredrik.  I'm interested in doing the
> renaming merge that would have helped HPA's klibc-kbuild vs
> klibc case myself but if somebody else is so inclined please go
> wild.
> 
> And finally, a request to everybody; please try out 'git-merge'
> and see how you like it.
> 
> `git-merge` [-n] [-s ]......
> 
> -n::
>   Do not show diffstat at the end of the merge.
> 
> -s ::
>   use that merge strategy; can be given more than once to
>   specify them in the order they should be tried.  If
>   there is no `-s` option, built-in list of strategies is
>   used instead.
> 
> ::
>   our branch head commit.
> 
> ::
>   other branch head merged into our branch.  You need at
>   least one .  Specifying more than one 
>   obviously means you are trying an Octopus.
> 
> Here is a sample transcript from a test resolving one of the
> 'more-than-one-merge-base' commits Fredrik found in the kernel
> repository (": siamese;" is my $PS1; "  " is my $PS2).
> 
> : siamese; git reset --hard b8112df71cae7d6a86158caeb19d215f56c4f9ab
> : siamese; git merge -n \
>   'reproduce 0e396ee43e445cb7c215a98da4e76d0ce354d9d7' \
>   HEAD 2089a0d38bc9c2cdd084207ebf7082b18cf4bf58
> Trying merge strategy resolve...
> Trying to find the optimum merge base.
> Trying simple merge.
> Simple merge failed, trying Automatic merge.
> Removing drivers/net/fmv18x.c
> Auto-merging drivers/net/r8169.c.
> merge: warning: conflicts during merge
> ERROR: Merge conflict in drivers/net/r8169.c.
> Removing drivers/net/sk_g16.c
> Removing drivers/net/sk_g16.h
> fatal: merge program failed
> Rewinding the tree to pristine...
> Trying merge strategy multibase...
> Trying simple merge.
> Simple merge failed, trying Automatic merge.
> Removing drivers/net/fmv18x.c
> Auto-merging drivers/net/r8169.c.
> merge: warning: conflicts during merge
> ERROR: Merge conflict in drivers/net/r8169.c.
> Removing drivers/net/sk_g16.c
> Removing drivers/net/sk_g16.h
> fatal: merge program failed
> Rewinding the tree to pristine...
> Trying merge strategy octopus...
> Rewinding the tree to pristine...
> Using the multibase to prepare resolving by hand.
> Trying simple merge.
> Simple merge failed, trying Automatic merge.
> Removing drivers/net/fmv18x.c
> Auto-merging drivers/net/r8169.c.
> merge: warning: conflicts during merge
> ERROR: Merge conflict in drivers/net/r8169.c.
> Removing drivers/net/sk_g16.c
> Removing drivers/net/sk_g16.h
> fatal: merge program failed
> Automatic merge failed; fix up by hand
> : siamese; git-update-cache --refresh
>

Re: Multi-ancestor read-tree notes

2005-09-08 Thread Daniel Barkalow

On Thu, 8 Sep 2005, Junio C Hamano wrote:

> Daniel Barkalow <[EMAIL PROTECTED]> writes:
> 
> > I assume that what you want is something to include everything from two 
> > commits, which would give conflicts if a name is reused?
> 
> My understanding is that Darrin wants to do what Linus did when
> he merged gitk into git.git.
> 
> Personally I think that is a specialized application and
> something like the git-merge-projects script I posted as a
> follow-up would be more appropriate than adding it to the
> current merge discussion.

Well, it's an easy addition to read-tree; just need a merge function which 
takes two entries and adds the non-NULL one in stage 0, or adds both if 
they both exist. git-merge-script probably shouldn't be the entry point to 
it, of course, but that part isn't my area anyway.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Multi-ancestor read-tree notes

2005-09-08 Thread Daniel Barkalow

On Thu, 8 Sep 2005, Darrin Thompson wrote:

> On Mon, 2005-09-05 at 01:41 -0400, Daniel Barkalow wrote:
> > I've got a version of read-tree which accepts multiple ancestors and does 
> > a merge using information from all of them.
> 
> Do the multiple ancestors have to share a common parent? More to the
> point, is this read-tree any more friendly to baseless merges?

read-tree doesn't care about the relationships between its inputs; it's 
only interested in the trees. But using ancestors which aren't common is 
unlikely to give you desired results. I think, if you do read-tree a^ b^ a 
b, you will get everything into the index, but it'll all going to be 
conflicts.

I assume that what you want is something to include everything from two 
commits, which would give conflicts if a name is reused?

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] A new merge algorithm, take 3

2005-09-08 Thread Daniel Barkalow

On Thu, 8 Sep 2005, Fredrik Kuivinen wrote:

> The first one agrees with what was actually committed. For the second
> one the difference between the tree produced by the algorithm and what
> was committed is:
> 
> diff --git a/include/net/ieee80211.h b/include/net/ieee80211.h
> --- a/include/net/ieee80211.h
> +++ b/include/net/ieee80211.h
> @@ -425,9 +425,7 @@ struct ieee80211_stats {
>  
>  struct ieee80211_device;
>  
> -#if 0 /* for later */
>  #include "ieee80211_crypt.h"
> -#endif
>  
>  #define SEC_KEY_1 (1<<0)
>  #define SEC_KEY_2 (1<<1)
> 
> 
> I have looked at the files and common ancestors involved and I think
> that this change have been introduced manually. I may have missed
> something when I analysed it though...

Certainly possible that it was done manually.

> > > The merge cases reported by Tony Luck and Len Brown are both cleanly
> > > merged by my code.
> > 
> > Do they come out correctly? Both of those have cases which cannot be 
> > decided correctly with only the ancestor trees, due to one branch 
> > reverting a patch that was only in one ancestor. The correct result is to 
> > revert that patch, but figuring out that requires looking at more trees. I 
> > think your algorithm should work for this case, but it would be good to 
> > have verification. (IIRC, Len got the correct result while Tony got the 
> > wrong result and then corrected it later.)
> 
> Len's merge case come out identically to the tree he committed. I have
> described what I got for Tony's case in
> <[EMAIL PROTECTED]> (my merge algorithm
> produces the result Tony expected to get, but he didn't get that from
> git-resolve-script).

Good. It looks to me like this is a good algorithm in practice, then.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] A new merge algorithm, take 3

2005-09-08 Thread Daniel Barkalow

On Thu, 8 Sep 2005, Fredrik Kuivinen wrote:

> On Wed, Sep 07, 2005 at 02:33:42PM -0400, Daniel Barkalow wrote:
> > On Wed, 7 Sep 2005, Fredrik Kuivinen wrote:
> > 
> > > Of the 500 merge commits that currently exists in the kernel
> > > repository 19 produces non-clean merges with git-merge-script. The
> > > four merge cases listed in
> > > <[EMAIL PROTECTED]> are cleanly merged by
> > > git-merge-script. Every merge commit which is cleanly merged by
> > > git-resolve-script is also cleanly merged by git-merge-script,
> > > furthermore the results are identical. There are currently two merges
> > > in the kernel repository which are not cleanly merged by
> > > git-resolve-script but are cleanly merged by git-merge-script.
> > 
> > If you use my read-tree and change git-resolve-script to pass all of the 
> > ancestors to it, how does it do? I expect you'll still be slightly ahead, 
> > because we don't (yet) have content merge with multiple ancestors. You 
> > should also check the merge that Tony Luck reported, which undid a revert, 
> > as well as the one that Len Brown reported around the same time which had 
> > similar problems. I think maintainer trees are a much better test for a 
> > merge algorithm, because the kernel repository is relatively linear, while 
> > maintainers tend more to merge things back and forth.
> 
> Junio tested some of the multiple common ancestor cases with your
> version of read-tree and reported his results in
> <[EMAIL PROTECTED]>.

Oh, right. I'm clearly not paying enough attention here.

> The two cases my algorithm merges cleanly and git-resolve-script do
> not merge cleanly are 0e396ee43e445cb7c215a98da4e76d0ce354d9d7 and
> 0c168775709faa74c1b87f1e61046e0c51ade7f3. Both of them have two common
> ancestors. The second one have, as far as I know, not been tested with
> your read-tree.

Okay, I'll have to check whether the result I get seems right. I take it 
your result agrees with what the users actually produced by hand?

> The merge cases reported by Tony Luck and Len Brown are both cleanly
> merged by my code.

Do they come out correctly? Both of those have cases which cannot be 
decided correctly with only the ancestor trees, due to one branch 
reverting a patch that was only in one ancestor. The correct result is to 
revert that patch, but figuring out that requires looking at more trees. I 
think your algorithm should work for this case, but it would be good to 
have verification. (IIRC, Len got the correct result while Tony got the 
wrong result and then corrected it later.)

> You are probably right about the maintainer trees. I should have a
> look at some of them. Do you know any specific repositories with
> interesting merge cases?

Not especially, except that I would guess that people who have reported 
hitting bad cases would be more likely to have other interesting merges in 
their trees. You might also try merging maintainer trees with each other, 
since it's relatively likely that there would be complicating overlap that 
only doesn't cause confusion because things get rearranged in -mm. For 
that matter, I bet you'd get plenty of test cases out of trying to 
replicate -mm as a git tree.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] A new merge algorithm, take 3

2005-09-07 Thread Daniel Barkalow

On Wed, 7 Sep 2005, Fredrik Kuivinen wrote:

> Of the 500 merge commits that currently exists in the kernel
> repository 19 produces non-clean merges with git-merge-script. The
> four merge cases listed in
> <[EMAIL PROTECTED]> are cleanly merged by
> git-merge-script. Every merge commit which is cleanly merged by
> git-resolve-script is also cleanly merged by git-merge-script,
> furthermore the results are identical. There are currently two merges
> in the kernel repository which are not cleanly merged by
> git-resolve-script but are cleanly merged by git-merge-script.

If you use my read-tree and change git-resolve-script to pass all of the 
ancestors to it, how does it do? I expect you'll still be slightly ahead, 
because we don't (yet) have content merge with multiple ancestors. You 
should also check the merge that Tony Luck reported, which undid a revert, 
as well as the one that Len Brown reported around the same time which had 
similar problems. I think maintainer trees are a much better test for a 
merge algorithm, because the kernel repository is relatively linear, while 
maintainers tend more to merge things back and forth.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Multi-ancestor read-tree notes

2005-09-06 Thread Daniel Barkalow

On Tue, 6 Sep 2005, Junio C Hamano wrote:

> Daniel Barkalow <[EMAIL PROTECTED]> writes:
> 
> > Good. (Although that patch doesn't seem to be directly on top of my 
> > version; I can tell what it's doing anyway)
> 
> That one was against the proposed updates head.  I've updated it
> again to include the patch.
> 
> > I'm happy with the content in "pu"; the issue is just whether you want the 
> > history cleaned up more. In the series I sent, I kept forgetting parts 
> > that belonged in earlier patches.
> 
> Again, that is up to you.  I am not _that_ perfectionist but I
> do not mind reapplying updated ones if you are ;-).

What's there is fine with me.

(I'll work on improving the documentation as a further patch)

> > Could you look over the documentation in
> > Documentation/technical/trivial-merge.txt, and see if it's a
> > suitable replacement for the table in
> > t1000-read-tree-m-3way.sh?
> 
> I do not understand what you meant by '*' and 'index+' in
> one-way merge table.  I take the first row ('*') to mean "If the
> tree is missing a path, that path is removed from the index."

'*' means that that case applies regardless of what's there. 'index+' 
means that it's the index, with the stat information. I forgot to actually 
explain the table before going on to the interesting section.

> I like the second sentence in three-way merge description.  That
> is a very easy-to-understand description of what the index
> requirements are.
> 
> You have 2 2ALTs.  Also 14 and 14ALT look like they are the same
> rule now.

Ah, right. I had originally listed "index" in the table, with separate 
cases for having it match the head and having it match the result, but 
then ditched that when I figured out how that actually works.

> What's "(empty)^" in "ancest"?  All of them must be empty for
> this rule to apply?

The '^' means that all must be like that. 

I have to check, but I think that 8ALT and 10ALT should be '+'.

> I am not quite sure it is 'a suitable replacement' yet; the
> existing table you can see it covers all the cases, but with
> things like "'ancestor+' means one of them matches", I cannot
> really tell the table covers all the cases or some cases fall of
> the end of the chain.

All of the "any ancestor" spots are good for covering things. Case #11 
(which actually needs to be at the bottom) is basically "everything else".

> Also when we have more than one ancestors or one remotes and we
> say "no merge", it is still unspecified (and I have to admit I
> cannot readily say what the result should be for all of them,
> except that I agree #16 will be fine with an empty stage1) what
> are left in which stages.

Presently, except for case #16, only the first ancestor is used in "no 
merge" output. The right thing should be worked out and documented, of 
course.

I'm not at all convinced at this point that we can do much with multiple 
remotes in a single application of the rules; you won't necessarily have 
the same merge base for all pairs, and all sorts of things go wrong if you 
start including ancestors that aren't related to something, or not 
including common ancestors of some pair.

What might work is to have the error for an unmerged index only happen 
when you get to a "no merge" result, so that you can get as many conflicts 
as possible (in different files) resolved by the user at the same time.

> I personally think the exotic cases (i.e. no rule applies, or
> "no merge" result with more than one ancestors/remotes) needs to
> be handled outside read-tree anyway, by the script that drives
> read-tree to attempt trivial merges.

I think case #16 would benefit from doing more stuff, but there aren't any 
holes in the rules, and I think that, for the multiple ancestors in "no 
merge", we just want to use the one with the least conflict. (Or, if we 
write our own merge, do a #16/#13,#14/#11 decision per-hunk in our merge, 
which is the really right thing). I think the common case for multiple 
ancestors will really be that you've got a side branch that split before 
the split you're resolving, and was merged into both sides before now; in 
this case, there's no big problem, and it's not the exotic cross-merge 
case. Of course, we won't see this in projects like the kernel and git, 
which aren't that amorphous.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Multi-ancestor read-tree notes

2005-09-06 Thread Daniel Barkalow

On Tue, 6 Sep 2005, Junio C Hamano wrote:

> Daniel Barkalow <[EMAIL PROTECTED]> writes:
> 
> > Do you know if there's anything like case #16 in there? I'd be interested 
> > to know if there's anything that gets handled automatically in different 
> > ways depending on which single base is used, and doesn't require manual 
> > intervention with multiple bases, because that's probably wrong.
> 
> Re-running the tests with the attached patch shows there weren't any.

Good. (Although that patch doesn't seem to be directly on top of my 
version; I can tell what it's doing anyway)

> > Great. Want me to send the patches with better organization, or are you 
> > set with what I've sent?
> 
> That's up to you.  If you are content with what I have in the pu
> branch, there is no need to bother resending.  OTOH if you have
> further clean-ups in mind, i.e. "better organization" above, I
> do not mind dropping the current ones from "pu" and replace them
> with another set from you.

I'm happy with the content in "pu"; the issue is just whether you want the 
history cleaned up more. In the series I sent, I kept forgetting parts 
that belonged in earlier patches.

Could you look over the documentation in 
Documentation/technical/trivial-merge.txt, and see if it's a suitable 
replacement for the table in t1000-read-tree-m-3way.sh? It should be the 
same, except for ALT or non-ALT versions that we're not using, combining a 
few matching cases, describing the rules behind index requirements rather 
than listing outcomes, and the addition of info on how multiple ancestors 
are handled.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Make sure the diff machinery outputs "\ No newline ..." in english

2005-09-06 Thread Daniel Barkalow

On Mon, 5 Sep 2005, Linus Torvalds wrote:

> On Mon, 5 Sep 2005, Fredrik Kuivinen wrote:
> > 
> > After a quick look through the diff source I didn't find anything
> > else. It's quite possible that I haved missed something though. Most
> > of the translated messages are related to error reporting, which I
> > guess might be nice to have in the user specified language.
> 
> Is it possible that we could integrate the "diff" algorithm into git, and 
> get rid of the dependency on an external GNU diff? It would also make the 
> portability problems go away (ie old diff's being broken).
> 
> It would also potentially speed up the normal built-in diff a lot, since
> we wouldn't have to execute a whole other program to generate a diff, just
> call a helper function the way we do for xdiff..
> 
> Unreasonable?

The algorithm actually used by GNU diff is pretty complicated, and I don't 
really understand the actual implementation, which evidentally has a few 
important refinements over the original paper.

I've written my own diff, mainly to try a different algorithm, and it 
seems to work, but the code isn't yet appropriate to submit. This 
algorithm also has the advantage that it can identify moved sections and 
is less interested in interleaving a removed function with a new function 
to provide the shortest possible diff. I expect that I could get it to 
work if I put in a day on it; it's mostly writing a hashtable 
implementation for non-NULL-terminated string-keyed hash tables.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: bogus merges

2005-09-06 Thread Daniel Barkalow

On Mon, 5 Sep 2005, Linus Torvalds wrote:

> On Mon, 5 Sep 2005, Wayne Scott wrote:
> >
> > A recent commit in linux-2.6 looks like this:
> 
> It hopefully shouldn't happen any more with the improved and fixed
> git-merge-base.

Couldn't it also happen if there's stale data in MERGE_HEAD when you 
commit a normal patch? The description doesn't look like a merge at all, 
but rather like a normal patch that inappropriately picked up an extra 
head. I'd guess he tried to merge something, got a conflict, decided that 
he didn't really want to do that anyway, switched to a different branch, 
applied a patch, and committed without noticing the note that he seemed to 
be committing a merge.

Probably the right thing is actually to clean up more when switching 
tasks, but it would probably also be worth checking that merges make sense 
as well.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Multi-ancestor read-tree notes

2005-09-06 Thread Daniel Barkalow

On Mon, 5 Sep 2005, Junio C Hamano wrote:

> Daniel Barkalow <[EMAIL PROTECTED]> writes:
> 
> > I've got a version of read-tree which accepts multiple ancestors and does 
> > a merge using information from all of them.
> 
> After disabling the debugging printf(), I used this read-tree to
> try resolving the parents of four commits Fredrik Kuivinen gave
> us in <[EMAIL PROTECTED]> using
> their two merge bases, and compared the resulting tree with the
> tree recorded in the commit.  The results are really promising.
> 
> For the following two commits, multi-base merge resolved their
> parents trivially and produced the same result as the tree in
> the commit.  The current "best-base merge" in the master branch
> performed far worse and left many conflicts.
> 
>  - 467ca22d3371f132ee225a5591a1ed0cd518cb3d 
>  - da28c12089dfcfb8695b6b555cdb8e03dda2b690
> 
> Another one, 0e396ee43e445cb7c215a98da4e76d0ce354d9d7,
> multi-base merge left only one conflicting path to be hand
> resolved.  The best-base merge again performed far worse.
> 
> The other one, 3190186362466658f01b2e354e639378ce07e1a9, is
> resolved trivially with both algorithms.

Do you know if there's anything like case #16 in there? I'd be interested 
to know if there's anything that gets handled automatically in different 
ways depending on which single base is used, and doesn't require manual 
intervention with multiple bases, because that's probably wrong.

> > In case #16, I'm not sure what I should produce. I think the best thing 
> > might be to not leave anything in stage 1.
> 
> Because?  I know it would affect the readers of index files if
> you did so, but it would seem the most natural in git
> architecture to have merge-cache look at the resulting cache
> with such multiple stage 1 entries (and other stages) and let
> the script make a decision.

I didn't want to break the assumption of only one entry per stage in the 
initial version. I'm also not sure that listing the ancestors is 
particularly useful in this case. They have to be exactly the contents of 
stages 2 and 3, plus possibly more stuff that's not been kept by either 
side. What you actually want is a two-way merge (i.e., a diff between the 
two sides, presented in "merge" format), so you don't really need any 
ancestors, unless it would fit some more general case that way.

> > The desired end effect is that the user is given a file with a
> > section like:
> >
> >   {
> > *t = NULL;
> > *m = 0;
> > <<<<<<<<
> > return Z_DATA_ERROR;
> > 
> > return Z_OK;
> >>>>>>>>>
> >   }
> 
> Sounds fine.
> 
> Anyway, I really am happy to see this multi-base merge perform
> well on real-world data, and you are certainly the git hero of
> the week ;-).

Great. Want me to send the patches with better organization, or are you 
set with what I've sent?

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/4] Document the trivial merge rules for 3(+more ancestors)-way merges.

2005-09-04 Thread Daniel Barkalow

Signed-off-by: Daniel Barkalow
---

 Documentation/technical/trivial-merge.txt |   92 +
 1 files changed, 92 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/technical/trivial-merge.txt

7544be0a8eda7b796150729a7795c2639278da62
diff --git a/Documentation/technical/trivial-merge.txt 
b/Documentation/technical/trivial-merge.txt
new file mode 100644
--- /dev/null
+++ b/Documentation/technical/trivial-merge.txt
@@ -0,0 +1,92 @@
+Trivial merge rules
+===
+
+This document describes the outcomes of the trivial merge logic in read-tree.
+
+One-way merge
+-
+
+This replaces the index with a different tree, keeping the stat info
+for entries that don't change, and allowing -u to make the minimum
+required changes to the working tree to have it match.
+
+   index   treeresult
+   ---
+   *   (empty) (empty)
+   (empty) treetree
+   index+  treetree
+   index+  index   index+
+
+Two-way merge
+-
+
+
+
+Three-way merge
+---
+
+It is permitted for the index to lack an entry; this does not prevent
+any case from applying.
+
+If the index exists, it is an error for it not to match either the
+head or (if the merge is trivial) the result.
+
+If multiple cases apply, the one used is listed first.
+
+A result of "no merge" means that index is left in stage 0, ancest in
+stage 1, head in stage 2, and remote in stage 3 (if any of these are
+empty, no entry is left for that stage). Otherwise, the given entry is
+left in stage 0, and there are no other entries.
+
+A result of "no merge" is an error if the index is not empty and not
+up-to-date.
+
+*empty* means that the tree must not have a directory-file conflict
+ with the entry.
+
+For multiple ancestors or remotes, a '+' means that this case applies
+even if only one ancestor or remote fits; normally, all of the
+ancestors or remotes must be the same.
+
+case  ancestheadremoteresult
+
+1 (empty)+  (empty) (empty)   (empty)
+2ALT  (empty)+  *empty* remoteremote
+2ALT  (empty)+  *empty* remoteremote
+2 (empty)^  (empty) remoteno merge
+3ALT  (empty)+  head*empty*   head
+3 (empty)^  head(empty)   no merge
+4 (empty)^  headremoteno merge
+5ALT  * headhead  head
+6 ancest^   (empty) (empty)   no merge
+8ALT  ancest(empty) ancest(empty)
+7 ancest+   (empty) remoteno merge
+9 ancest+   head(empty)   no merge
+10ALT ancestancest  (empty)   (empty)
+11ancest+   headremoteno merge
+16anc1/anc2 anc1anc2  no merge
+13ancest+   headancesthead
+14ancest+   ancest  remoteremote
+14ALT ancest+   ancest  remoteremote
+
+Only #2ALT and #3ALT use *empty*, because these are the only cases
+where there can be conflicts that didn't exist before. Note that we
+allow directory-file conflicts between things in different stages
+after the trivial merge.
+
+A possible alternative for #6 is (empty), which would make it like
+#1. This is not used, due to the likelihood that it arises due to
+moving the file to multiple different locations or moving and deleting
+it in different branches.
+
+Case #1 is included for completeness, and also in case we decide to
+put on '+' markings; any path that is never mentioned at all isn't
+handled.
+
+Note that #16 is when both #13 and #14 apply; in this case, we refuse
+the trivial merge, because we can't tell from this data which is
+right. This is a case of a reverted patch (in some direction, maybe
+multiple times), and the right answer depends on looking at crossings
+of history or common ancestors of the ancestors.
+
+The status as of Sep 5 is that multiple remotes are not supported
\ No newline at end of file

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/4] Rewrite read-tree

2005-09-04 Thread Daniel Barkalow

Adds support for multiple ancestors, removes --emu23, much simplification.

Signed-off-by: Daniel Barkalow <[EMAIL PROTECTED]>
---

 read-tree.c   |  811 +++--
 t/t1005-read-tree-m-2way-emu23.sh |  422 ---
 2 files changed, 425 insertions(+), 808 deletions(-)
 delete mode 100755 t/t1005-read-tree-m-2way-emu23.sh

f196469bec156947038f1d3d00c899c9044334ca
diff --git a/read-tree.c b/read-tree.c
--- a/read-tree.c
+++ b/read-tree.c
@@ -5,73 +5,291 @@
  */
 #include "cache.h"
 
-static int stage = 0;
+#include "object.h"
+#include "tree.h"
+
+static int merge = 0;
 static int update = 0;
 
-static int unpack_tree(unsigned char *sha1)
-{
-   void *buffer;
-   unsigned long size;
-   int ret;
+static int head_idx = -1;
+static int merge_size = 0;
 
-   buffer = read_object_with_reference(sha1, "tree", &size, NULL);
-   if (!buffer)
-   return -1;
-   ret = read_tree(buffer, size, stage, NULL);
-   free(buffer);
+static struct object_list *trees = NULL;
+
+static struct cache_entry df_conflict_entry = { 
+};
+
+static struct tree_entry_list df_conflict_list = {
+   .name = NULL,
+   .next = &df_conflict_list
+};
+
+typedef int (*merge_fn_t)(struct cache_entry **src);
+
+static int entcmp(char *name1, int dir1, char *name2, int dir2)
+{
+   int len1 = strlen(name1);
+   int len2 = strlen(name2);
+   int len = len1 < len2 ? len1 : len2;
+   int ret = memcmp(name1, name2, len);
+   unsigned char c1, c2;
+   if (ret)
+   return ret;
+   c1 = name1[len];
+   c2 = name2[len];
+   if (!c1 && dir1)
+   c1 = '/';
+   if (!c2 && dir2)
+   c2 = '/';
+   ret = (c1 < c2) ? -1 : (c1 > c2) ? 1 : 0;
+   if (c1 && c2 && !ret)
+   ret = len1 - len2;
return ret;
 }
 
-static int path_matches(struct cache_entry *a, struct cache_entry *b)
+static int unpack_trees_rec(struct tree_entry_list **posns, int len,
+   const char *base, merge_fn_t fn, int *indpos)
 {
-   int len = ce_namelen(a);
-   return ce_namelen(b) == len &&
-   !memcmp(a->name, b->name, len);
+   int baselen = strlen(base);
+   int src_size = len + 1;
+   do {
+   int i;
+   char *first;
+   int firstdir = 0;
+   int pathlen;
+   unsigned ce_size;
+   struct tree_entry_list **subposns;
+   struct cache_entry **src;
+   int any_files = 0;
+   int any_dirs = 0;
+   char *cache_name;
+   int ce_stage;
+
+   /* Find the first name in the input. */
+
+   first = NULL;
+   cache_name = NULL;
+
+   /* Check the cache */
+   if (merge && *indpos < active_nr) {
+   /* This is a bit tricky: */
+   /* If the index has a subdirectory (with
+* contents) as the first name, it'll get a
+* filename like "foo/bar". But that's after
+* "foo", so the entry in trees will get
+* handled first, at which point we'll go into
+* "foo", and deal with "bar" from the index,
+* because the base will be "foo/". The only
+* way we can actually have "foo/bar" first of
+* all the things is if the trees don't
+* contain "foo" at all, in which case we'll
+* handle "foo/bar" without going into the
+* directory, but that's fine (and will return
+* an error anyway, with the added unknown
+* file case.
+*/
+
+   cache_name = active_cache[*indpos]->name;
+   if (strlen(cache_name) > baselen &&
+   !memcmp(cache_name, base, baselen)) {
+   cache_name += baselen;
+   first = cache_name;
+   } else {
+   cache_name = NULL;
+   }
+   }
+
+   if (first)
+   printf("index %s\n", first);
+
+   for (i = 0; i < len; i++) {
+   if (!posns[i] || posns[i] == &df_conflict_list)
+   continue;
+   printf("%d %s\n", i + 1, posns[i]->name);
+

[PATCH 2/4] Add function to append to an object_list.

2005-09-04 Thread Daniel Barkalow

Signed-off-by: Daniel Barkalow <[EMAIL PROTECTED]>
---

 object.c |   11 +++
 object.h |3 +++
 2 files changed, 14 insertions(+), 0 deletions(-)

88cf2db55848e7a2cf655171c7e9fd74c70a0281
diff --git a/object.c b/object.c
--- a/object.c
+++ b/object.c
@@ -184,6 +184,17 @@ struct object_list *object_list_insert(s
 return new_list;
 }
 
+void object_list_append(struct object *item,
+   struct object_list **list_p)
+{
+   while (*list_p) {
+   list_p = &((*list_p)->next);
+   }
+   *list_p = xmalloc(sizeof(struct object_list));
+   (*list_p)->next = NULL;
+   (*list_p)->item = item;
+}
+
 unsigned object_list_length(struct object_list *list)
 {
unsigned ret = 0;
diff --git a/object.h b/object.h
--- a/object.h
+++ b/object.h
@@ -41,6 +41,9 @@ void mark_reachable(struct object *obj, 
 struct object_list *object_list_insert(struct object *item, 
   struct object_list **list_p);
 
+void object_list_append(struct object *item,
+   struct object_list **list_p);
+
 unsigned object_list_length(struct object_list *list);
 
 int object_list_contains(struct object_list *list, struct object *obj);

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/4] Add a function for getting a struct tree for an ent.

2005-09-04 Thread Daniel Barkalow

Signed-off-by: Daniel Barkalow <[EMAIL PROTECTED]>
---

 tree.c |   21 +
 tree.h |3 +++
 2 files changed, 24 insertions(+), 0 deletions(-)

3bfcc20b6aeff3e1fbcce97a426383c9770a2105
diff --git a/tree.c b/tree.c
--- a/tree.c
+++ b/tree.c
@@ -1,5 +1,7 @@
 #include "tree.h"
 #include "blob.h"
+#include "commit.h"
+#include "tag.h"
 #include "cache.h"
 #include 
 
@@ -212,3 +214,22 @@ int parse_tree(struct tree *item)
free(buffer);
return ret;
 }
+
+struct tree *parse_tree_indirect(const unsigned char *sha1)
+{
+   struct object *obj = parse_object(sha1);
+   do {
+   if (!obj)
+   return NULL;
+   if (obj->type == tree_type)
+   return (struct tree *) obj;
+   else if (obj->type == commit_type)
+   obj = &(((struct commit *) obj)->tree->object);
+   else if (obj->type == tag_type)
+   obj = ((struct tag *) obj)->tagged;
+   else
+   return NULL;
+   if (!obj->parsed)
+   parse_object(obj->sha1);
+   } while (1);
+}
diff --git a/tree.h b/tree.h
--- a/tree.h
+++ b/tree.h
@@ -32,4 +32,7 @@ int parse_tree_buffer(struct tree *item,
 
 int parse_tree(struct tree *tree);
 
+/* Parses and returns the tree in the given ent, chasing tags and commits. */
+struct tree *parse_tree_indirect(const unsigned char *sha1);
+
 #endif /* TREE_H */

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/4] Support multiple ancestors in read-tree

2005-09-04 Thread Daniel Barkalow

Various messages have already described this series. There's still a 
memory leak that should get resolved, but otherwise it should work. I'm 
not entirely sure that all directory-file conflict cases are handled 
properly, and some undefined cases behave differently. Also, I was a bit 
careless with preparing the patches.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Multi-ancestor read-tree notes

2005-09-04 Thread Daniel Barkalow

I've got a version of read-tree which accepts multiple ancestors and does 
a merge using information from all of them.

The basic features are that it looks for an ancestor which would permit a 
trivial merge, and uses that. However, if it finds ancestors which permit 
different trivial merges, it does not merge (which I call case #16).

In case #16, I'm not sure what I should produce. I think the best thing 
might be to not leave anything in stage 1. The desired end effect is that 
the user is given a file with a section like:

  {
*t = NULL;
*m = 0;

return Z_DATA_ERROR;

return Z_OK;

  }

In other news, the merge that was giving Len Brown problems a while ago 
turns out to have the above conflict, and he happened to end up doing the 
right thing and not reverting Linus's revert of an unnecessary (but 
harmless) change. I only noticed this just now, when I was testing that 
merge, and got it to generate only two conflicts regardless of order of 
ancestors (didn't try to resolve the other one, drivers/acpi/osl.c, with 
"merge" either way).

So this test is encouraging: I get fewer non-trivial cases than either of 
the ancestors alone gives, and I catch a case that both single ancestors 
gets wrong.

Note that there are still some memory leaks for me to fix, but that's the 
only flaw I know of with this.

Patches against mainline to follow shortly.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] Reorganize read-tree

2005-09-04 Thread Daniel Barkalow

On Sun, 4 Sep 2005, Junio C Hamano wrote:

> Daniel Barkalow <[EMAIL PROTECTED]> writes:
> 
> > I got mostly done with this before Linus mentioned the possibility of
> > having multiple index entries in the same stage for a single path. I
> > finished it anyway, but I'm not sure that we won't want to know which of
> > the common ancestors contributed which, and, if some of them don't have a
> > path, we wouldn't be able to tell. The other advantages I see to this
> > approach are:
> 
> I've finished reading your patch, after beating it reasonably
> heavily by feeding combinations of nonsense trees to make sure
> it produces the same result as the original implementation.  I
> have not found any regression from the read-tree in "master"
> branch, after you fixed the path ordering issues.

Good.

> > There are various potential refinements, plus removing a bunch of memory
> > leaks, still to do, but I think this is sufficiently close to review.
> 
> I am not so worried about the leaks right now; they are
> something that could be fixed before it hits the "master"
> branch.

Right.

> I like your approach of reading the input trees, along with the
> existing index contents, and re-populating the index one path at
> a time.  It probably is more readable.
> 
> I further think that you can get the best of both worlds, by
> inventing a convention that mode=0 entry means 'this path does
> not exist in this tree'. This would allow you to have multiple
> entries at the same stage and still tell which one came from
> which tree.  Instead of calling fn in unpack_trees(), you could
> make it only unpack the tree into the index, and then after
> unpacking is done, call fn() repeatedly to resolve the resulting
> index. 

I think that almost all of the benefit actually comes from calling fn() in 
unpack_trees() and not putting anything in the index before merging. 
Without that, you need the complex index management and the complicated 
search for DF conflicts. The main point of not reading everything into the 
index before calling fn() on stuff is that the index is actually really 
difficult to deal with in this situation (because you are simultaneously 
moving through it, removing and modifying entries, and searching it for 
conflicts). The improvement in readability comes from not doing this.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Moved files and merges

2005-09-04 Thread Daniel Barkalow

On Sun, 4 Sep 2005, Junio C Hamano wrote:

> Sam Ravnborg <[EMAIL PROTECTED]> writes:
> 
> > If the problem is not fully understood it can be difficult to come up
> > with the proper solution. And with the example above the problem should
> > be really easy to understand.
> > Then we have the tree as used by hpa with a few more mergers in it. But
> > the above is what was initial tried to do with the added complexity of a
> > few more renames etc.
> 
> All true.  Let's redraw that simplified scenario, and see if
> what I said still holds.  It may be interesting to store my
> previous message and this one and run diff between them.  I
> suspect that the main difference to come out would be the the
> problem description part and the merge machinery part would not
> be all that different.

I'm not quite so convinced, because I think that the actual situation is a 
bit more natural, and therefore our expectations at the end should be 
closer to right with less attention to detail. But I think the actual 
situation is more interesting, anyway, because it's more likely to happen 
and we're more likely to be able to help.

> 
> This is a simplified scenario of klibc vs klibc-kbuild HPA had
> trouble with, to help us think of a way to solve this
> interesting merge problem.
> 
>#1 - #3 - #5 - #7
>// /
> #0 - #2 - #4 - #6
> 
> There are two lines of developments.  #0->#1 renames F to G and
> introduces K.  #0->#2 keeps F as F and does not introduce K.
> 
> At commit #3, #2 is merged into #1.  The changes made to the
> file contents of F between #0 and #2 are appreciated, but we
> would also want to keep our decision to rename F to G and our
> new file K.  So commit #3 has the resulting merge contents in G
> and has K, inherited from #1.  This _might_ be different from
> what we traditionally consider a 'merge', but from the use case
> point of view it is a valid thing one would want to do.

I think this is actually quite a regular merge, and I think we should be 
able to offer some assistance. The situation with K is normal: case #3ALT. 
If someone introduces a file and there's no file or directory with that 
name in other trees, we assume that the merge should include it.

F/G is trickier, and I don't think we can actually do much about it with 
the current structure of read-tree/merge-cache/etc, but, theoretically, we 
should recognize that #0->#1 is a rename plus content changes, and #0->#2 
is content changes, so the total should be the rename plus contents 
changes; I think we want to additionally signal a conflict, because 
there's a reasonable chance that the rename will interfere with the #0->#2 
changes, and need intervention. Most likely, this just means that we 
should not commit automatically, but have the user test the result first.

For now, of course, we don't get renames at any point in the merging 
procedure, so our code can't tell, and sees it as a big conflict that the 
user has to deal with. But we can agree on what the result is if the user 
"includes all the changes from the other branch" (and see the situation 
you reported first as "cherry-picking" the content and leaving the 
structural changes).

> Commit #4 is a continued development from #2; changes are made
> to F, and there is no K.  Commit #5 similarly is a continued
> development from #3; its changes are made to G and K also has
> further changes.
> 
> We are about to merge #6 into #5 to create #7.  We should be
> able to take advantage of what the user did when the merge #3
> was made; namely, we should be able to infer that the line of
> development that flows #0 .. #3 .. #7 prefers to rename F to G,
> and also wants the newly introduced K.  We should be able to
> tell it by looking at what the merge #3 did.

Again, K should be unexceptional, because we're keeping a file that was 
added to one side but not the other. (In the other situation, it still 
works; relative to the common ancestor, we're in #8ALT, since #5 doesn't 
have K, which was in #2 and #6; we see the rejection in a merge as a 
removal, which is effectively the same.)

> Now, how can we use git to figure that out?

First off, it should handle K automatically, because we're still including 
a file added by one side without interference from the other side.

> First, given our current head (#5) and the other head we are
> about to merge (#6), we need a way to tell if we merged from
> them before (i.e. the existence of #3) and if so the latest of
> such merge (i.e. #3).
> 
> The merge base between #5 and #6 is #2.  We can look at commits
> between us (#5) and the merge base (#2), find a merge (#3),
> which has two parents.  One of the parents is #2 which is
> reachable from #6, and the other is #1 which is not reachable
> from #6 but is reachable from #5.  Can we say that this reliably
> tells us that #2 is on their side and #1 is on our side?  Does
> the fact that #3 is the commit topologically closest to #5 tell
> us that #3

Re: Tool renames? was Re: First stab at glossary

2005-09-04 Thread Daniel Barkalow

On Sat, 3 Sep 2005, Junio C Hamano wrote:

> Daniel Barkalow <[EMAIL PROTECTED]> writes:
> 
> > I think "fetch" is more applicable to what they do.
> 
> OK.  then they are git-http-fetch and friends.  How about
> git-ssh-push?  The counterpart of fetch-pack/clone-pack is
> called upload-pack, so would git-ssh-upload make things more
> consistent?  I dunno.

I like that idea.

> > I don't think it matters very much whether something is a script or not; 
> > on the other hand, it would be good to have "git" list a reasonable set of 
> > commands to use through the interface, which would exclude, for example, 
> > git-merge-one-file-script, and include the above commands.
> 
> Are you suggesting to drop -script from git-merge-one-file?
> Then git-cherry should perhaps keep its current name.

I'd suggest it get a different ending, like .sh or -helper. That way, it's 
distinct both from binaries and from scripts that people run directly.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Tool renames? was Re: First stab at glossary

2005-09-02 Thread Daniel Barkalow

On Fri, 2 Sep 2005, Junio C Hamano wrote:

> I said:
> 
> > I'll draw up a strawman tonight unless somebody else
> > does it first.
> 
> 1. Say 'index' when you are tempted to say 'cache'.
> 
> git-checkout-cache  git-checkout-index
> git-convert-cache   git-convert-index
> git-diff-cache  git-diff-index
> git-fsck-cache  git-fsck-index
> git-merge-cache git-merge-index
> git-update-cachegit-update-index

Agreed, except that git-convert-cache and git-fsck-cache actually have 
nothing to do this the index by any name, and should probably be 
git-convert-objects and git-fsck-objects.

> 2. The act of combining two or more heads is called 'merging';
>fetching immediately followed by merging is called 'pulling'.
> 
> git-resolve-script  git-merge-script
> 
>The commit walkers are called *-pull, but this is probably
>confusing.  They are not pulling.
> 
> git-http-pull   git-http-walk
> git-local-pull  git-local-walk
> git-ssh-pullgit-ssh-walk

I think "fetch" is more applicable to what they do.

> 3. Non-binaries are called '*-scripts'.
> 
>In earlier discussions some people seem to like the
>distinction between *-script and others; I did not
>particularly like it, but I am throwing this in for
>discussion.
> 
> git-applymbox   git-applymbox-script
> git-applypatch  git-applypatch-script
> git-cherry  git-cherry-script
> git-shortloggit-shortlog-script
> git-whatchanged git-whatchanged-script

I don't think it matters very much whether something is a script or not; 
on the other hand, it would be good to have "git" list a reasonable set of 
commands to use through the interface, which would exclude, for example, 
git-merge-one-file-script, and include the above commands.

> 4. To be removed shortly.
> 
> git-clone-dumb-http should be folded into git-clone-script

Agreed.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Tool renames? was Re: First stab at glossary

2005-09-02 Thread Daniel Barkalow

On Thu, 1 Sep 2005, Junio C Hamano wrote:

> Tim Ottinger <[EMAIL PROTECTED]> writes:
> 
> > git-update-cache for instance?
> > I am not sure which 'cache' commands need to be 'index' now.
> 
> Logically you are right, but I suspect that may not fly well in
> practice.  Too many of us have already got our fingers wired to
> type cache, and the glossary is there to describe both cache and
> index.

My vote's for changing the official names, but keeping symlinks for the 
old names. As far as I know, there aren't any actual conflicts, and we 
might as well have new users pick up the logical names. I particularly 
think "git merge" would be really good to have.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/2] Remove emu23, fix entry order

2005-09-01 Thread Daniel Barkalow

A few things to improve testing. I'll clean up the series as a whole once 
it's tested.

This removes the emu23 tests; I think that the only DF conflict tests were 
in that set, however, so these should be fished out and added to something 
else.

Signed-off-by: Daniel Barkalow <[EMAIL PROTECTED]>

---

 read-tree.c   |   89 +++-
 t/t1005-read-tree-m-2way-emu23.sh |  422 -
 2 files changed, 37 insertions(+), 474 deletions(-)
 delete mode 100755 t/t1005-read-tree-m-2way-emu23.sh

63092a4dfb2042e8fc21260b2f315b01e9163940
diff --git a/read-tree.c b/read-tree.c
--- a/read-tree.c
+++ b/read-tree.c
@@ -9,7 +9,6 @@
 #include "tree.h"
 
 static int merge = 0;
-static int emu23 = 0;
 static int update = 0;
 
 static struct object_list *trees = NULL;
@@ -19,19 +18,39 @@ typedef int (*merge_fn_t)(struct cache_e
  int df_conflicts_2,
  int df_conflicts_3);
 
+static int entcmp(char *name1, int dir1, char *name2, int dir2)
+{
+   int len1 = strlen(name1);
+   int len2 = strlen(name2);
+   int len = len1 < len2 ? len1 : len2;
+   int ret = memcmp(name1, name2, len);
+   unsigned char c1, c2;
+   if (ret)
+   return ret;
+   c1 = name1[len];
+   c2 = name2[len];
+   if (!c1 && dir1)
+   c1 = '/';
+   if (!c2 && dir2)
+   c2 = '/';
+   ret = (c1 < c2) ? -1 : (c1 > c2) ? 1 : 0;
+   if (c1 && c2 && !ret)
+   ret = len1 - len2;
+   return ret;
+}
+
 static int unpack_trees_rec(struct tree_entry_list **posns, int len,
const char *base, merge_fn_t fn, 
int file2, int file3, int *indpos)
 {
int baselen = strlen(base);
int src_size = len + 1;
-   if (emu23)
-   src_size++;
if (src_size > 4)
src_size = 4;
do {
int i;
char *first = NULL;
+   int firstdir = 0;
int pathlen;
unsigned ce_size;
int dir2 = 0;
@@ -73,11 +92,23 @@ static int unpack_trees_rec(struct tree_
}
}
 
+   /*
+   if (first)
+   printf("%s\n", first);
+   */
+
for (i = 0; i < len; i++) {
if (!posns[i])
continue;
-   if (!first || strcmp(first, posns[i]->name) > 0)
+   /*
+   printf("%d %s\n", i + 1, posns[i]->name);
+   */
+   if (!first || entcmp(first, firstdir,
+posns[i]->name, 
+posns[i]->directory) > 0) {
first = posns[i]->name;
+   firstdir = posns[i]->directory;
+   }
}
/* No name means we're done */
if (!first)
@@ -94,19 +125,6 @@ static int unpack_trees_rec(struct tree_
   src_size);
src[0] = active_cache[*indpos];
remove_cache_entry_at(*indpos);
-   if (emu23) {
-   // we need this in stage 2 as well as stage 0
-   struct cache_entry *copy =
-   xmalloc(ce_size);
-   memcpy(copy, src[0], ce_size);
-   copy->ce_flags = 
-   create_ce_flags(baselen + pathlen, 2);
-   if (dir2 || file2) {
-   die("cannot merge index and our head 
tree");
-   }
-   src[2] = copy;
-   subfile2 = 1;
-   }
}
 
for (i = 0; i < len; i++) {
@@ -125,8 +143,6 @@ static int unpack_trees_rec(struct tree_
} else {
ce_stage = i + merge;
}
-   if (emu23 && ce_stage == 2)
-   ce_stage = 3;
 
if (posns[i]->directory) {
if (!subposns) {
@@ -137,8 +153,6 @@ static int unpack_trees_rec(struct tree_
parse_tree(posns[i]->item.tree);
subposns[i] = posns[i]->item.tree->entries;
posns[i] = posns[i]->next;
-   if (emu23 && ce_stage == 1)
-

Re: Reworked read-tree.

2005-09-01 Thread Daniel Barkalow

On Thu, 1 Sep 2005, Junio C Hamano wrote:

> Daniel, I do not know what your current status is, but I think
> you need something like this.

Yup, I forgot to actually test that functionality.

> ---
> diff --git a/tree.c b/tree.c
> --- a/tree.c
> +++ b/tree.c
> @@ -224,10 +224,12 @@ struct tree *parse_tree_indirect(const u
>   if (obj->type == tree_type)
>   return (struct tree *) obj;
>   else if (obj->type == commit_type)
> - return ((struct commit *) obj)->tree;
> + obj = (struct object *)(((struct commit *) obj)->tree);

obj = &((struct commit *) obj)->tree->object;

Multiple sequential casts always bother me, and we do actually have a 
field for this.

>   else if (obj->type == tag_type)
> - obj = ((struct tag *) obj)->tagged;
> + obj = deref_tag(obj);

Shouldn't be necessary (once you've got the parse_object below); we're 
already in a loop dereferencing things.

>   else
>   return NULL;
> + if (!obj->parsed)
> + parse_object(obj->sha1);
>   } while (1);
>  }
> 
> 
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Couple of read-tree questions

2005-08-31 Thread Daniel Barkalow

On Wed, 31 Aug 2005, Junio C Hamano wrote:

> Daniel Barkalow <[EMAIL PROTECTED]> writes:
> 
> > Is there any current use for read-tree with multiple trees without -m or 
> > equivalent?
> 
> I did not know it even allowed multiple trees without -m, but
> you are right.  It does not seem to complain.
> 
> I have never thought about using multiple trees without -m, and
> I do not remember hearing any plan nor purpose of using it to do
> something interesting from Linus.  I think its allowing multiple
> trees without -m is simply a bug.

I guess it was probably that its behavior was obvious and didn't require 
any extra code. It still follows entirely from one tree without -m, but it 
might be worth prohibiting unless someone has a reason to do it 
intentionally.

> > Why does --emu23 use I+H for stage 2, rather than just I? Wouldn't this 
> > just reintroduce removed files?
> 
> They are not "removed files", at least in the original context.
> 
> The original intention was that git was supposed to work without
> having _any_ files in the working tree.  The reason why
> multi-tree read-tree has so many special cases that says "must
> match *if* work file exists", is that not having a corresponding
> working file was supposed to be equivalent to having the file
> checked out *and* unmodified.

But they'd not only be missing from the working tree but also from the 
(pre-read-tree) index, which should only happen, assuming the index came 
from "read-tree H", if they were subsequently removed from the index. I'd 
understand treating index entries for files missing from the working tree 
as up to date.

(The thread you mention seems to say that we accept entries being missing 
from the index as if they were unchanged, but I don't see a good reason 
for this; you'd be dealing with the full set in the index for the merge, 
even if you don't have a populated working tree)

> I do not think anybody currently uses --emu23.  I did it because
> it has a potential of making the two-tree fast forward (which is
> used in "git checkout" to switch between branches) easier to
> manage when the working tree is dirty than doing straight
> two-tree merge, but that is just a theoretical potential never
> tested in the field.  Frankly, I do not mind, and I do not think
> anybody else minds, too much if you need to break or remove
> emu23 if that would make your code clean-up and redoing
> read-tree easier.

I should have asked sooner, then. :) There's a bunch of clutter to get it 
to work that I can remove if it's not actually necessary.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Stgit - patch history / add extra parents

2005-08-31 Thread Daniel Barkalow

On Tue, 30 Aug 2005, Catalin Marinas wrote:

> Back from holiday. Thanks to all who replied to this thread.
> 
> On Tue, 2005-08-23 at 14:05 -0400, Daniel Barkalow wrote:
> > Having a useful diff isn't really a requirement for a parent; the diff in
> > the case of a merge is going to be the total of everything that happened
> > elsewhere. The point is to be able to reach some commits between which
> > there are interesting diffs.
> > 
> > This also depends on how exactly freeze is used; if you use it before
> > commiting a modification to the patch without rebasing, you get:
> > 
> > old-top -> new-top
> >   ^^
> >\  /
> >   bottom
> > 
> > bottom to old-top is the old patch
> > bottom to new-top is the new patch
> > old-top to new-top is the change to the patch
> > 
> > Then you want to keep new-top as a parent for rebasings until one of these
> > is frozen. These links are not interesting to look at, but preserve the
> > path to the old-top:new-top change, which is interesting.
> 
> This was my initial StGIT implementation (up to version 0.3), only that
> there was no freeze command. Since I want an StGIT tree to be clean to
> the outside world, I wouldn't keep multiple parents for the visible top
> of a patch.
> 
> As I understand from Junio's and Linus' e-mails (on the 23rd of August),
> there might be problems with merging the HEAD of an StGIT-managed tree
> if the above method is accessible via HEAD.

Right, you'd want a separate head which is what you ask people to merge; 
the rest is only visible to people who are working on preparing the patch. 
But you could keep both sets of stuff (sharing tree objects but not 
commits).

> > Ignoring the links to the corresponding bottoms, the development therefore
> > looks like:
> > 
> > local1 -> local2 -> merge -> local3 -> merge
> > ^   ^  ^
> > mainline>-->->-->-->->
> > 
> > And this is how development is normally supposed to look. The trick is to
> > only include a minimal number of merges.
> 
> A merge occurs every time a patch is rebased. Anyway, having the bottoms
> in the graph (which is the main idea of StGIT) together with the old-top
> (or frozen state) parents make the graph pretty complicated.

It should be possible to drop merges such that there's only one between 
any pair of local changes. That is, if you rebase at the end of the line 
above, it would get as parents local3 and the new bottom, not the last 
merge and the new bottom.

The mainline changes only come in through the bottoms, so higher levels 
should look the same, but with the lower levels in the place of mainline.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] Reorganize read-tree

2005-08-31 Thread Daniel Barkalow

On Wed, 31 Aug 2005, Catalin Marinas wrote:

> Daniel Barkalow <[EMAIL PROTECTED]> wrote:
> > I got mostly done with this before Linus mentioned the possibility of
> > having multiple index entries in the same stage for a single path. I
> > finished it anyway, but I'm not sure that we won't want to know which of
> > the common ancestors contributed which, and, if some of them don't have a
> > path, we wouldn't be able to tell.
> 
> I don't have time to look at the patch and I don't have a good
> knowledge of the GIT internals, so I will just ask. Does this patch
> changes the call convention for git-merge-one-file-script? I have my
> own script for StGIT and I would need to know whether it is affected
> or not.

Nope, it only changes the trivial merge calling convention within 
read-tree.c; I think it's plausible that we might like to add information 
at some point, but the short-term goal is just to prevent a few bad cases 
in trivial merges.
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] Reorganize read-tree

2005-08-31 Thread Daniel Barkalow

On Tue, 30 Aug 2005, Junio C Hamano wrote:

> Dan, I really really *REALLY* wanted to try this out in "pu"
> branch and even was about to rig some torture chamber for
> testing before applying the patch, but you got the shiny blue
> bat X-<.

I'll send a replacement with the settings correct.

> A patch to SubmittingPatches, MUA specific help section for
> users of Pine 4.63 would be very much appreciated.

Ah, it looks like a recent version changed the default behavior to do the 
right thing, and inverted the sense of the configuration option. (Either 
that or Gentoo did it.) So you need to set the 
"no-strip-whitespace-before-send" option, unless the option you have is 
"strip-whitespace-before-send", in which case you should avoid checking 
it.

I don't actually have things set up for preparing patches from work, 
although I can resend the patches I prepared earlier.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2 (resend)] Object model additions for read-tree

2005-08-31 Thread Daniel Barkalow

Adds object_list_append() and a function to get the struct tree from an ent.

Signed-off-by: Daniel Barkalow <[EMAIL PROTECTED]>

---

 object.c |   11 +++
 object.h |3 +++
 tree.c   |   19 +++
 tree.h   |3 +++
 4 files changed, 36 insertions(+), 0 deletions(-)

49d33c385aa69d17c991300f73e77c6718a2b4a6
diff --git a/object.c b/object.c
--- a/object.c
+++ b/object.c
@@ -184,6 +184,17 @@ struct object_list *object_list_insert(s
 return new_list;
 }
 
+void object_list_append(struct object *item,
+   struct object_list **list_p)
+{
+   while (*list_p) {
+   list_p = &((*list_p)->next);
+   }
+   *list_p = xmalloc(sizeof(struct object_list));
+   (*list_p)->next = NULL;
+   (*list_p)->item = item;
+}
+
 unsigned object_list_length(struct object_list *list)
 {
unsigned ret = 0;
diff --git a/object.h b/object.h
--- a/object.h
+++ b/object.h
@@ -41,6 +41,9 @@ void mark_reachable(struct object *obj, 
 struct object_list *object_list_insert(struct object *item, 
   struct object_list **list_p);
 
+void object_list_append(struct object *item,
+   struct object_list **list_p);
+
 unsigned object_list_length(struct object_list *list);
 
 int object_list_contains(struct object_list *list, struct object *obj);
diff --git a/tree.c b/tree.c
--- a/tree.c
+++ b/tree.c
@@ -1,5 +1,7 @@
 #include "tree.h"
 #include "blob.h"
+#include "commit.h"
+#include "tag.h"
 #include "cache.h"
 #include 
 
@@ -212,3 +214,20 @@ int parse_tree(struct tree *item)
free(buffer);
return ret;
 }
+
+struct tree *parse_tree_indirect(const unsigned char *sha1)
+{
+   struct object *obj = parse_object(sha1);
+   do {
+   if (!obj)
+   return NULL;
+   if (obj->type == tree_type)
+   return (struct tree *) obj;
+   else if (obj->type == commit_type)
+   return ((struct commit *) obj)->tree;
+   else if (obj->type == tag_type)
+   obj = ((struct tag *) obj)->tagged;
+   else
+   return NULL;
+   } while (1);
+}
diff --git a/tree.h b/tree.h
--- a/tree.h
+++ b/tree.h
@@ -32,4 +32,7 @@ int parse_tree_buffer(struct tree *item,
 
 int parse_tree(struct tree *tree);
 
+/* Parses and returns the tree in the given ent, chasing tags and commits. */
+struct tree *parse_tree_indirect(const unsigned char *sha1);
+
 #endif /* TREE_H */

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2 (resend)] Change read-tree to merge before using the index.

2005-08-31 Thread Daniel Barkalow

Signed-off-by: Daniel Barkalow <[EMAIL PROTECTED]>

---

 read-tree.c |  522 ++-
 1 files changed, 297 insertions(+), 225 deletions(-)

d0f45ad81db2e133c49c23bd09c5615da344bb5c
diff --git a/read-tree.c b/read-tree.c
--- a/read-tree.c
+++ b/read-tree.c
@@ -5,28 +5,280 @@
  */
 #include "cache.h"
 
-static int stage = 0;
+#include "object.h"
+#include "tree.h"
+
+static int merge = 0;
+static int emu23 = 0;
 static int update = 0;
 
-static int unpack_tree(unsigned char *sha1)
+static struct object_list *trees = NULL;
+
+typedef int (*merge_fn_t)(struct cache_entry **src, 
+ struct cache_entry **dest, 
+ int df_conflicts_2,
+ int df_conflicts_3);
+
+static int unpack_trees_rec(struct tree_entry_list **posns, int len,
+   const char *base, merge_fn_t fn, 
+   int file2, int file3, int *indpos)
+{
+   int baselen = strlen(base);
+   int src_size = len + 1;
+   if (emu23)
+   src_size++;
+   if (src_size > 4)
+   src_size = 4;
+   do {
+   int i;
+   char *first = NULL;
+   int pathlen;
+   unsigned ce_size;
+   int dir2 = 0;
+   int dir3 = 0;
+   int subfile2 = file2;
+   int subfile3 = file3;
+   struct tree_entry_list **subposns = NULL;
+   struct cache_entry **src = NULL;
+   char *cache_name = NULL;
+
+   /* Find the first name in the input. */
+
+   /* Check the cache */
+   if (merge && *indpos < active_nr) {
+   /* This is a bit tricky: */
+   /* If the index has a subdirectory (with
+* contents) as the first name, it'll get a
+* filename like "foo/bar". But that's after
+* "foo", so the entry in trees will get
+* handled first, at which point we'll go into
+* "foo", and deal with "bar" from the index,
+* because the base will be "foo/". The only
+* way we can actually have "foo/bar" first of
+* all the things is if the trees don't
+* contain "foo" at all, in which case we'll
+* handle "foo/bar" without going into the
+* directory, but that's fine (and will return
+* an error anyway, with the added unknown
+* file case.
+*/
+
+   cache_name = active_cache[*indpos]->name;
+   if (strlen(cache_name) > baselen &&
+   !memcmp(cache_name, base, baselen)) {
+   cache_name += baselen;
+   first = cache_name;
+   } else {
+   cache_name = NULL;
+   }
+   }
+
+   for (i = 0; i < len; i++) {
+   if (!posns[i])
+   continue;
+   if (!first || strcmp(first, posns[i]->name) > 0)
+   first = posns[i]->name;
+   }
+   /* No name means we're done */
+   if (!first)
+   return 0;
+
+   pathlen = strlen(first);
+   ce_size = cache_entry_size(baselen + pathlen);
+
+   if (cache_name && !strcmp(cache_name, first)) {
+   src = xmalloc(sizeof(struct cache_entry *) * 
+ src_size);
+   memset(src, 0,
+  sizeof(struct cache_entry *) * 
+  src_size);
+   src[0] = active_cache[*indpos];
+   remove_cache_entry_at(*indpos);
+   if (emu23) {
+   // we need this in stage 2 as well as stage 0
+   struct cache_entry *copy =
+   xmalloc(ce_size);
+   memcpy(copy, src[0], ce_size);
+   copy->ce_flags = 
+   create_ce_flags(baselen + pathlen, 2);
+   if (dir2 || file2) {
+   die("cannot merge index and our head 
tree");
+   }
+   src[2] = copy;
+   subfile2 = 1;

[PATCH] Change read-tree to merge before using the index.

2005-08-30 Thread Daniel Barkalow

Signed-off-by: Daniel Barkalow <[EMAIL PROTECTED]>
---

 read-tree.c |  522 ++-
 1 files changed, 297 insertions(+), 225 deletions(-)

d0f45ad81db2e133c49c23bd09c5615da344bb5c
diff --git a/read-tree.c b/read-tree.c
--- a/read-tree.c
+++ b/read-tree.c
@@ -5,28 +5,280 @@
  */
 #include "cache.h"

-static int stage = 0;
+#include "object.h"
+#include "tree.h"
+
+static int merge = 0;
+static int emu23 = 0;
 static int update = 0;

-static int unpack_tree(unsigned char *sha1)
+static struct object_list *trees = NULL;
+
+typedef int (*merge_fn_t)(struct cache_entry **src,
+ struct cache_entry **dest,
+ int df_conflicts_2,
+ int df_conflicts_3);
+
+static int unpack_trees_rec(struct tree_entry_list **posns, int len,
+   const char *base, merge_fn_t fn,
+   int file2, int file3, int *indpos)
+{
+   int baselen = strlen(base);
+   int src_size = len + 1;
+   if (emu23)
+   src_size++;
+   if (src_size > 4)
+   src_size = 4;
+   do {
+   int i;
+   char *first = NULL;
+   int pathlen;
+   unsigned ce_size;
+   int dir2 = 0;
+   int dir3 = 0;
+   int subfile2 = file2;
+   int subfile3 = file3;
+   struct tree_entry_list **subposns = NULL;
+   struct cache_entry **src = NULL;
+   char *cache_name = NULL;
+
+   /* Find the first name in the input. */
+
+   /* Check the cache */
+   if (merge && *indpos < active_nr) {
+   /* This is a bit tricky: */
+   /* If the index has a subdirectory (with
+* contents) as the first name, it'll get a
+* filename like "foo/bar". But that's after
+* "foo", so the entry in trees will get
+* handled first, at which point we'll go into
+* "foo", and deal with "bar" from the index,
+* because the base will be "foo/". The only
+* way we can actually have "foo/bar" first of
+* all the things is if the trees don't
+* contain "foo" at all, in which case we'll
+* handle "foo/bar" without going into the
+* directory, but that's fine (and will return
+* an error anyway, with the added unknown
+* file case.
+*/
+
+   cache_name = active_cache[*indpos]->name;
+   if (strlen(cache_name) > baselen &&
+   !memcmp(cache_name, base, baselen)) {
+   cache_name += baselen;
+   first = cache_name;
+   } else {
+   cache_name = NULL;
+   }
+   }
+
+   for (i = 0; i < len; i++) {
+   if (!posns[i])
+   continue;
+   if (!first || strcmp(first, posns[i]->name) > 0)
+   first = posns[i]->name;
+   }
+   /* No name means we're done */
+   if (!first)
+   return 0;
+
+   pathlen = strlen(first);
+   ce_size = cache_entry_size(baselen + pathlen);
+
+   if (cache_name && !strcmp(cache_name, first)) {
+   src = xmalloc(sizeof(struct cache_entry *) *
+ src_size);
+   memset(src, 0,
+  sizeof(struct cache_entry *) *
+  src_size);
+   src[0] = active_cache[*indpos];
+   remove_cache_entry_at(*indpos);
+   if (emu23) {
+   // we need this in stage 2 as well as stage 0
+   struct cache_entry *copy =
+   xmalloc(ce_size);
+   memcpy(copy, src[0], ce_size);
+   copy->ce_flags =
+   create_ce_flags(baselen + pathlen, 2);
+   if (dir2 || file2) {
+   die("cannot merge index and our head 
tree");
+   }
+   src[2] = copy;
+   subfile2 = 1;

[PATCH 1/2] Object model additions for read-tree

2005-08-30 Thread Daniel Barkalow

Adds object_list_append() and a function to get the struct tree from an ent.

Signed-off-by: Daniel Barkalow <[EMAIL PROTECTED]>
---

 object.c |   11 +++
 object.h |3 +++
 tree.c   |   19 +++
 tree.h   |3 +++
 4 files changed, 36 insertions(+), 0 deletions(-)

49d33c385aa69d17c991300f73e77c6718a2b4a6
diff --git a/object.c b/object.c
--- a/object.c
+++ b/object.c
@@ -184,6 +184,17 @@ struct object_list *object_list_insert(s
 return new_list;
 }

+void object_list_append(struct object *item,
+   struct object_list **list_p)
+{
+   while (*list_p) {
+   list_p = &((*list_p)->next);
+   }
+   *list_p = xmalloc(sizeof(struct object_list));
+   (*list_p)->next = NULL;
+   (*list_p)->item = item;
+}
+
 unsigned object_list_length(struct object_list *list)
 {
unsigned ret = 0;
diff --git a/object.h b/object.h
--- a/object.h
+++ b/object.h
@@ -41,6 +41,9 @@ void mark_reachable(struct object *obj,
 struct object_list *object_list_insert(struct object *item,
   struct object_list **list_p);

+void object_list_append(struct object *item,
+   struct object_list **list_p);
+
 unsigned object_list_length(struct object_list *list);

 int object_list_contains(struct object_list *list, struct object *obj);
diff --git a/tree.c b/tree.c
--- a/tree.c
+++ b/tree.c
@@ -1,5 +1,7 @@
 #include "tree.h"
 #include "blob.h"
+#include "commit.h"
+#include "tag.h"
 #include "cache.h"
 #include 

@@ -212,3 +214,20 @@ int parse_tree(struct tree *item)
free(buffer);
return ret;
 }
+
+struct tree *parse_tree_indirect(const unsigned char *sha1)
+{
+   struct object *obj = parse_object(sha1);
+   do {
+   if (!obj)
+   return NULL;
+   if (obj->type == tree_type)
+   return (struct tree *) obj;
+   else if (obj->type == commit_type)
+   return ((struct commit *) obj)->tree;
+   else if (obj->type == tag_type)
+   obj = ((struct tag *) obj)->tagged;
+   else
+   return NULL;
+   } while (1);
+}
diff --git a/tree.h b/tree.h
--- a/tree.h
+++ b/tree.h
@@ -32,4 +32,7 @@ int parse_tree_buffer(struct tree *item,

 int parse_tree(struct tree *tree);

+/* Parses and returns the tree in the given ent, chasing tags and commits. */
+struct tree *parse_tree_indirect(const unsigned char *sha1);
+
 #endif /* TREE_H */

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/2] Reorganize read-tree

2005-08-30 Thread Daniel Barkalow

I got mostly done with this before Linus mentioned the possibility of
having multiple index entries in the same stage for a single path. I
finished it anyway, but I'm not sure that we won't want to know which of
the common ancestors contributed which, and, if some of them don't have a
path, we wouldn't be able to tell. The other advantages I see to this
approach are:

 - it uses the more common parser of tree objects, moving toward having
   only one (diff-cache still uses read_tree(), however).
 - it doesn't need to do very complicated things with the index; the
   original read-tree does a bunch of stuff with an index with a gap in
   the middle containing obsolete entries.
 - it uses a much simpler method of finding directory/file conflicts,
   which is possible because the struct trees represent directories as
   well as files.
 - it deals with each path completely before going on to the next one,
   instead of first dealing with each input tree and then dealing with
   each path.
 - it removes a lot of intimate knowledge of the index structure from the
   program.

The general idea is that it figures out what trees you want, and then
iterates through the entry lists together, recursing into directories, and
calls the merge function with an array of the index entries (not yet
added) for the path in each tree; the merge function adds the appropriate
things to the index.

Note that this set doesn't include calling merge functions with multiple
ancestors or remotes; that can be done when we've decided on whether my
version of read-tree is worth using.

There are various potential refinements, plus removing a bunch of memory
leaks, still to do, but I think this is sufficiently close to review.

(Refinements: it ought to have two indices in memory, the old and the new,
and never modify the old and only append to the new, to simplify things
further; it ought to use a sentinal value for the index entry to indicate
that there is something in the tree to conflict with there being a file at
the given path; the --emu23 logic could be clearer)

The first patch adds a few functions to the object library.
The second patch changes read-tree around; It is essentially a rewrite,
except for the merge functions and main().

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Comments in read-tree about #nALT

2005-08-27 Thread Daniel Barkalow

On Sat, 27 Aug 2005, Linus Torvalds wrote:

> On Sat, 27 Aug 2005, Daniel Barkalow wrote:
> >
> > What I missed was that the effect of causes_df_conflict is to give "no
> > merge" for the entry, rather than giving an error overall. So I do need an
> > equivalent.
>
> Daniel,
>  I'm not 100% sure what you're trying to do, but one thing that might work
> out is to just having multiple "stage 3" entries with the same pathname.
>
> We current use 4 stages:
>  - stage 0 is "resolved"
>  - stage 1 is "original"
>  - stage 2 is "one branch"
>  - stage 3 is "another branch"
>
> But if we allowed duplicate entries per stage, I think we could easily
> just fold stage 2/3 into one stage, and just have  entries in stage 2.
> That would immediately mean that a three-way merge could be  way.
>
> The only rule would be that when you add a entry to stage 2, you must
> always add it after any previous entry that is already in stage 2. That
> should be easy.

It looks like stage 2 is currently special as the stage that's similar to
the index/HEAD/working tree. However, I don't see any problem with 
entries in stage 3, except that, if you have a non-maximal number of them
for some reason, it'll be impossible to determine which came from which
tree.

> In fact, this extension might even allow us to solve the "multiple merge
> base" problem: we could allow multiple entries in "stage 1" too, ie one
> entry per merge base (and just collapse identical entries - there's no
> ordering involved in stage 1 entries).

That's actually the problem I was working on.

> So you could merge "n" trees with "m" bases, and all without really
> changing the current logic much at all.
>
> Maybe I'm missing something (like what you're trying to do in the first
> place), but this _seems_ doable.

I'd be afraid of confusing everything by removing the uniqueness
invariant, although I guess not too much does anything with entries in
stages other than 0. I probably just don't find the index as intuitive as
you do and as the struct tree representation.

I'm working on arranging the code to look at each path in sequence, with
the input trees as the inner loop, rather than with the loops in the other
order; using parse_tree to parse the objects instead of read_tree; and
doing trivial merges before putting things in the cache, rather than
after. I'd been thinking that this would avoid a limit on the number of
stages, since I hadn't considered whether multiple entries for the same
path and stage could be allowed.

I still think that my order is likely to be easier to understand and
involve read-tree relying less on tricky properties of the data
structures, but I'll have to get it done before I can say that for sure.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Comments in read-tree about #nALT

2005-08-27 Thread Daniel Barkalow

On Sat, 27 Aug 2005, Daniel Barkalow wrote:

> Okay, so it looks to me like the only cases that care about the contents
> of the index, other than in stage 0 (which is effectively another tree,
> but already in index-form rather than tree-form), are 2 and 3, and these
> only care because read-tree modifies the stage of entries, rather
> than removing them and adding a stage-0 replacement entry; if it went
> through the add logic without SKIP_DFCHECK, that would reject the same
> things that causes_df_conflict rejects (at the point where whichever is
> second is done).
>
> So if I do the merge on tree entries (plus a stage-0 ce for the input from
> the index), and then add the result(s) to the cache, I can skip
> causes_df_conflict() in favor of just not using SKIP_DFCHECK. Is this
> right?

What I missed was that the effect of causes_df_conflict is to give "no
merge" for the entry, rather than giving an error overall. So I do need an
equivalent.

> Also, there doesn't actually seem to be a DF test in t1000; I think the
> t1005 DF test covers these cases (by the emu23 path into this code). Is
> this right?

Looks like stuff all over the place fails if causes_df_conflict is messed
up, so I should be covered.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Comments in read-tree about #nALT

2005-08-27 Thread Daniel Barkalow

On Sat, 27 Aug 2005, Junio C Hamano wrote:

> Daniel Barkalow <[EMAIL PROTECTED]> writes:
>
> > Part of threeway_merge, however, wants to search the rest of the cache for
> > interfering entries in some cases, which would have to happen differently,
> > because I won't have the cache completely filled out beforehand. I'm
> > trying to figure out what the comments are talking about, and they seem to
> > refer to a list of the possible cases. Is that list somewhere convenient?
>
> Please look for END_OF_CASE_TABLE in t/t1000-read-tree-m-3way.sh;
> the table talks about some of the (ALT) not implemented, but
> some of them are ("git whatchanged t/t1000-read-tree-m-3way"
> would tell you which).

It looks like all of them are implemented:

#2ALT, #3ALT, #5ALT, and #14ALT, according to the commit comments, and the
others seem from the email you quote to have been done in the process of
getting #5ALT.

> Two way cases are described in Documentation/git-read-tree.txt,
> if you care.  If you were not touching the three-way case right
> now, I'd move/copy the three way cases there as well, but that
> can wait until after your changes.

I'd actually like to introduce Documentation/technical/trivial-merge for
this stuff; I think it would be good to have documentation for people who
need to know how the stuff works, rather than just how to use it, so we
get a balance between reams of information that users don't want to wade
through and being too vague for future developers.

Okay, so it looks to me like the only cases that care about the contents
of the index, other than in stage 0 (which is effectively another tree,
but already in index-form rather than tree-form), are 2 and 3, and these
only care because read-tree modifies the stage of entries, rather
than removing them and adding a stage-0 replacement entry; if it went
through the add logic without SKIP_DFCHECK, that would reject the same
things that causes_df_conflict rejects (at the point where whichever is
second is done).

So if I do the merge on tree entries (plus a stage-0 ce for the input from
the index), and then add the result(s) to the cache, I can skip
causes_df_conflict() in favor of just not using SKIP_DFCHECK. Is this
right?

Also, there doesn't actually seem to be a DF test in t1000; I think the
t1005 DF test covers these cases (by the emu23 path into this code). Is
this right?

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Merges without bases

2005-08-27 Thread Daniel Barkalow

On Sat, 27 Aug 2005, Martin Langhoff wrote:

> On 8/27/05, Daniel Barkalow <[EMAIL PROTECTED]> wrote:
> > The problem with both of these (and doing it in the build system) is that,
> > when a project includes another project, you generally don't want whatever
> > revision of the included project happens to be the latest; you want the
> > revision of the included project that the revision of the including
> > project you're looking at matches. That is, if App includes Lib, and
>
> Exactly - so you do it on a tag, or a commit date with cvs. With Arch,
> GIT and others that have a stable id for each commit, you can use that
> or the more user-friendly tags.

I'm thinking of cases like openssl, openssh, and libcrypto. Openssl and
openssh both use libcrypto but not each other (looking at the ldd output,
rather than packaging). However, it would be too much of a pain to work
directly on libcrypto without working through some other package, because
the library doesn't have its own applications. Furthermore, if you're
doing much to libcrypto, you're likely doing it in the context of a
particular application (say, for example, ssh needs a new cipher that
isn't supported for SSL at the time). You'd want to make simultaneous
changes to libcrypto to implement the new feature and to openssh to use
it; neither can be validated until the other is written, which means that
you'll have both projects checked out and dirty (in the cache sense) at
the same time, and be building the using project.

It would also be good to be able to check in this whole thing through the
version control system, rather than partially through a change to the
build system. That is, if I change the included libcrypto, commit it, and
commit the including openssh, the system as a whole should understand that
I want to change which commit of libcrypto gets used. Similarly, it would
be good to merge changes into the libcrypto used by openssh with the same
procedure used to merge changes to openssh itself, including supporting
non-fast-forward when there's a local version in use.

(Of course, currently, libcrypto is strictly part of openssl, because it
would be too much of a pain with the present version control to make it
independant, and openssh depends on openssl, despite not even linking
against -lssl, because openssl got libcrypto first.)

> The good thing here is that a makefile will know how to handle the
> situation if the external lib is hosted in Arch, in SVN, or Visual
> SourceSafe. If your external lib is only available as a tarball in a
> url, you can fetch that and uncompress it too. Arch configurations are
> _cute_ but useless in any but the most narrow cases.

Certainly, if it's sufficiently external to be in a different SCM it
should be handled by the build system. Actually, if it's even nearly that
external, it's probably going to be handled best by requiring people to go
get it themselves.

I find it odd that you say that the standard approach is to have the build
system fetch a version of the included package; my experience is that
projects either just report (or fail to report) a dependancy on having the
other package or they copy the project into their project. The former
means they can't change it (which is generally good, unless it becomes
necessary), while the latter causes update problems (c.f. zlib).

I think that Arch configurations and the CVS equivalent are, in fact,
useless, but that this is only due to implementation being insufficiently
clever, not due to the concept being inherently bad; I feel the same way
about distributed development under Arch, which is really nice under git,
so I have hope that something better could be done.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments in read-tree about #nALT

2005-08-26 Thread Daniel Barkalow

I've gotten to the point of having all of the entries for a given path
ready to put into the cache at the same, and now I want to convert the
merge functions to take their data directly, rather than in the cache, so
that they can take extra entries for extra ancestors.

Part of threeway_merge, however, wants to search the rest of the cache for
interfering entries in some cases, which would have to happen differently,
because I won't have the cache completely filled out beforehand. I'm
trying to figure out what the comments are talking about, and they seem to
refer to a list of the possible cases. Is that list somewhere convenient?

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC, PATCH] A new merge algorithm (EXPERIMENTAL)

2005-08-26 Thread Daniel Barkalow

On Fri, 26 Aug 2005, Fredrik Kuivinen wrote:

> On Fri, Aug 26, 2005 at 04:48:32PM -0400, Daniel Barkalow wrote:
> > On Fri, 26 Aug 2005, Fredrik Kuivinen wrote:
> >
> > > I will try to describe how the algorithm works. The problem with the
> > > usual 3-way merge algorithm is that we sometimes do not have a unique
> > > common ancestor. In [1] B and C seems to be equally good. What this
> > > algorithm does is to _merge_ the common ancestors, in this case B and
> > > C, into a temporary tree lets call it T. It does then use this
> > > temporary tree T as the common ancestor for D and E to produce the
> > > final merge result. In the case described in [1] this will work out
> > > fine and we get a clean merge with the expected result.
> >
> > The only problem I can see with this is that it's likely to generate
> > conflicts between the shared heads, and the user is going to be confused
> > trying to resolve them, because the files with the conflicts will be
> > missing all of the more recent changes.
>
> I don't actually think that conflicts between shared heads is a
> problem. Given the criss-cross case (we want to merge A and B into M):
>
>  M
>  |\
>  | \
>  A  B
>  |\/|
>  |/\|
>  C  D
>  | /
>  |/
>  E
>
> Lets assume there is a merge conflict if we try to merge C and D
> (which are the two shared heads). Then both A and B must resolve this
> conflict. If they have done it in the same way we wont get a merge
> conflict at M, if they have resolved it differently we will get a
> merge conflict. In the first case there is no merge conflict at M, in
> the second case the user has to pick which one of the two different
> resolutions she wants.
>
> Note that the algorithm will happily write non-clean merge results to
> the object database during the "merge shared heads" stage. Hence, when
> we are merging C and D "internally" we will _not_ ask the user to
> resolve any eventual merge conflicts.

Oh, okay, didn't see that part. So the merge for M sees that the old
conflict is replaced entire with the common resolution or with a conflict
between the different resolutions, but it doesn't report the old conflict
anyway, because that section's been replaced in both sides.

> > Other than that, I think it should
> > give the right answer, although it will presumably involve a lot of
> > ancient history doing the internal merge. (Which would probably be really
> > painful if you've got two branches that cross-merge regularly and never
> > actually completely sync)
>
> The expensive part is the repeated merging. But as I wrote in my mail
> multiple shared heads seems to be pretty uncommon. As far as I can
> tell there is no reason for the number of shared heads to increase as
> a repository grows larger. However, this do probably depend on usage
> patterns.

I'd guess that the number of shared heads will increase as the people's
usage gets more flexible. If people expected good results, I could see the
stable series being mostly done as patches to 2.6.X, which would then be
merged into various trees, and these would then be frequent common
ancestors in merges. I'd also not be surprised in Linus's tree were
abnormally straightforward, due to stuff getting serialized in -mm.

> > I'm getting pretty close to having a version of read-tree that does the
> > trivial merge portion based comparing the sides against all of the shared
> > heads. I think yours will be better for the cases we've identified, giving
> > the correct answer for Tony's case rather than reporting a conflict, but
> > it's clearly more complicated. I think my changes are worthwhile anyway,
> > since they make the merging logic more central, but obviously
> > insufficient.
> >
> > I've been thinking that could be useful to have read-tree figure out the
> > history itself, instead of being passed ancestors, in which case it could
> > use your method, except more efficiently (and only look further at the
> > history when needed).
>
> It will be interesting to have a look at the code when you are done.
> I find the Git architecture with respect to merging to be quite
> nice. A core which handles the simple cases _fast_ and let the more
> complicated cases be handled by someone else.

Right; I'm mostly just trying to get the fast path to not miss cases that
are more complicated than they look.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC, PATCH] A new merge algorithm (EXPERIMENTAL)

2005-08-26 Thread Daniel Barkalow

On Fri, 26 Aug 2005, Fredrik Kuivinen wrote:

> I will try to describe how the algorithm works. The problem with the
> usual 3-way merge algorithm is that we sometimes do not have a unique
> common ancestor. In [1] B and C seems to be equally good. What this
> algorithm does is to _merge_ the common ancestors, in this case B and
> C, into a temporary tree lets call it T. It does then use this
> temporary tree T as the common ancestor for D and E to produce the
> final merge result. In the case described in [1] this will work out
> fine and we get a clean merge with the expected result.

The only problem I can see with this is that it's likely to generate
conflicts between the shared heads, and the user is going to be confused
trying to resolve them, because the files with the conflicts will be
missing all of the more recent changes. Other than that, I think it should
give the right answer, although it will presumably involve a lot of
ancient history doing the internal merge. (Which would probably be really
painful if you've got two branches that cross-merge regularly and never
actually completely sync)

I'm getting pretty close to having a version of read-tree that does the
trivial merge portion based comparing the sides against all of the shared
heads. I think yours will be better for the cases we've identified, giving
the correct answer for Tony's case rather than reporting a conflict, but
it's clearly more complicated. I think my changes are worthwhile anyway,
since they make the merging logic more central, but obviously
insufficient.

I've been thinking that could be useful to have read-tree figure out the
history itself, instead of being passed ancestors, in which case it could
use your method, except more efficiently (and only look further at the
history when needed).

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Merges without bases

2005-08-26 Thread Daniel Barkalow

On Fri, 26 Aug 2005, Martin Langhoff wrote:

> On 8/26/05, Junio C Hamano <[EMAIL PROTECTED]> wrote:
> > their core GIT tools come from.  But how would _I_ pull from
> > that "My Project", if I did not want to pull unrelated stuff in?
>
> and then...
>
> > What I think _might_ deserve a bit more support would be a merge
> > of a foreign project as a subdirectory of a project.  Linus
>
> tla has an interesting implementation (and horrible name) for
> something like this. In Arch-speak, they are called 'configurations',
> a versioned control file that describes that in subdirectory foo we
> import from this other repo#branch.
>
> In cvs, you just do nested checkouts, and trust a `cvs update` done at
> the top will do the right thing;  and in fact recent cvs versions do.

The problem with both of these (and doing it in the build system) is that,
when a project includes another project, you generally don't want whatever
revision of the included project happens to be the latest; you want the
revision of the included project that the revision of the including
project you're looking at matches. That is, if App includes Lib, and
you're looking at an App commit, you want to have the version of Lib that
the commit was made with, not the latest version of Lib, which may not be
backwards compatible across non-release commits, or, in any case, won't
help in reconstructing a earlier state. I think a primary function of a
SCM is to be able to say, "It worked last Friday, and it's broken now.
What's different?" If the answer is, "On Saturday, we updated the
included Lib to their version from Thursday, which is broken", it'll be
really hard to track down without special tracking.

I think it's the lack of the special tracking, therefore, that makes this
not a good feature in most SCMs, and makes them not better than having the
build system do it (and potentially worse, if you've got your build system
checking out a version specified in a version-controlled file). But I
think that git can do better, including support for the required version
sometimes being a locally modified one and sometimes being the official
one when the local modifications have been accepted upstream.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Looking at multiple ancestors in merge

2005-08-26 Thread Daniel Barkalow

On Fri, 26 Aug 2005, Junio C Hamano wrote:

> Daniel Barkalow <[EMAIL PROTECTED]> writes:
>
> > I've started this, and have gotten as far as having read-tree accept > 3
> > trees and ignore everything but the last 3. Am I correct in assuming that
> > if I break read-tree in any way, some test will fail?
>
> If some test fails you would know you broke it, but the inverse
> is probably not always true.
>
> I think the current read-tree test suite has reasonably wide
> coverage of all the interesting cases.  But the definition of
> "interesting" was derived from the current world order (IOW, the
> test suite was designed around the way we do things right now as
> a whitebox test, not a blackbox test).  I would not be surprised
> if some of them did not catch breakage you may introduce during
> the development.

Okay; I think the only thing that I'm going to change with respect to how
it makes decisions will be with 4+ trees, and those will obviously need
new tests,

> I wonder however if extending the current way of doing things in
> the cache is the right thing.  Right now we use two bits out of
> the top four bits for recording stage, one bit for the update
> bit, so you have only one extra bit to extend the number of
> stages, which means you could hold at most 7 trees at once.
>
> You "ignore things but the last 3", so this may not be too much
> of a problem, but I am a bit puzzled what you meant by it
> though.  Are you talking about reading more than 3 trees and
> keeping only the 3 to be merged, discarding the rest, doing the
> selection per path?

For each path, I intend to look at all the entries and make trivial merge
judgements on them, but then only leave the usual stage 2 and stage 3, and
a chosen stage 1. The way I'm writing the changes is:

In the argument parsing loop, just form a list of the tree objects, and
actually read them after the whole list is ready. If there are more than
3, ignore all but the last 3. This lets you give an arbitary number of
common ancestors to read-tree, and it won't mess up, but it will only use
one of them. I've done this.

Next, scan through the tree entry lists for all the trees together, and
generate cache entries for the same path in the different trees at the
same time. I've written this, but I've got a few bugs, and the 3way merge
tests are dutifully failing.

Then, I'll do the trivial merge on tree entries rather than cache
entries.

Finally, I'll extend the trivial merge to use the extra ancestors.

Since merge(1) doesn't handle multiple common ancestors, having more than
3 stages in the cache after the trivial merge isn't going to be useful for
now.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Looking at multiple ancestors in merge

2005-08-25 Thread Daniel Barkalow

On Wed, 24 Aug 2005, Daniel Barkalow wrote:

> Of course, this is going to take a bit of work, because read-tree
> currently puts all of its arguments into the cache and then works on
> merging, and taking multiple ancestors requires putting them somewhere
> else, because they won't fit in the cache.

I've started this, and have gotten as far as having read-tree accept > 3
trees and ignore everything but the last 3. Am I correct in assuming that
if I break read-tree in any way, some test will fail?

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Storing state in $GIT_DIR

2005-08-25 Thread Daniel Barkalow

On Thu, 25 Aug 2005, Junio C Hamano wrote:

> Now, among the existing object types, there are only two kinds
> of objects you can use for this.  If the only thing you need to
> record is some textual information with one pointer to git
> branch head, then you can use tag that points at the git head,
> and store everything else as the tag comment.  This is doable
> but unwieldy.

I don't think this buys you anything, because then the tag needs to be
accessible from something, which is the same problem you were trying to
solve for the commit.

> You could abuse a commit object as well; you store commit
> objects (such as the corresponding git branch head) as parent
> commits, and put everything else in a tree that is associated
> with that commit.

If you want to go that way, you could add a new field to commits with
minimal effort: you just need to parse it in commit.c, generate it in
git-commit-tree (with an option), and pull it in pull.c, and everything
should work as far as making the git portion follow the metadata around.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Merges without bases

2005-08-25 Thread Daniel Barkalow

On Thu, 25 Aug 2005, Junio C Hamano wrote:

> One thing that makes me reluctant to recommend this "merging
> unrelated projects" business is that I suspect that it makes
> things _much_ harder for the upstream project that is being
> merged, and should not be done without prior arrangement; Linus
> merged gitk after talking with paulus, so that was OK.

I'd still like to revive my idea of having projects overlaid on each
other, where the commits in the project that absorbed the other project
say, essentially, "also include this other commit, but any changes to
those files belong to that branch, not this one". That way, Linus could
have included gitk in git, but changes to it, even when done in a git
working tree, would show up in commits that only include gitk. (git
actually can handle this with the alternative index file mechanism that
Linus mentioned in a different thread.)

Definitely post-1.0, of course.

> Suppose the above "My Project" is published, people send patches
> for core GIT part to it, and you as the maintainer of that "My
> Project" accept those patches.  The users of "My Project" would
> be happy with the new features and wouldn't care less where
> their core GIT tools come from.  But how would _I_ pull from
> that "My Project", if I did not want to pull unrelated stuff in?

With the right info, the tools could be made to automatically generate
suitable commits, because those files would be tracked by a separate index
file and committed into a separate branch, which would then be reincluded
(by reference) in the containing project.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Stgit - patch history / add extra parents

2005-08-25 Thread Daniel Barkalow

On Thu, 25 Aug 2005, Jan Veldeman wrote:

> Daniel Barkalow wrote:
>
> > I'm not sure how applicable to this situation stgit really is; I see stgit
> > as optimized for the case of a patch set which is basically done, where
> > you want to keep it applicable to the mainline as the mainline advances.
>
> Maybe I forgot to mention this: I would also like to have my development
> tree split up in a patch stack. The separate patches makes tracking the
> mainline a lot easier (conflicts are a lot easier to solve)

I just try to keep things in this state sufficiently briefly that it
doesn't become a problem. I also split things up into a bunch of branches,
rather than into a stack of patches, and only work on parallel development
before I've actually got a candidate for a series.

> But this would assume that once the patch goes into stgit, it won't
> change except when the parent gets updated. I think we will still change
> the patches quite a bit and simultanious by a couple of people.

The extension I had proposed to stgit should work for this; it would let
you version control each patch just like other git projects. I just think
it wouldn't work so well before the group has agreed on what patches there
are.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Looking at multiple ancestors in merge

2005-08-24 Thread Daniel Barkalow

On Wed, 24 Aug 2005, A Large Angry SCM wrote:

> Daniel Barkalow wrote:
> > I'm starting to work on letting the merging process see multiple
> > ancestors, and I think it's messy enough that I should actually discuss
> > it.
> >
> > Review of the issue:
> >
> > It is possible to lost reverts in cases when merging two commits with
> > multiple ancestors, in the following pattern: (letters representing blobs
> > at some filename, children to the right)
> >
> > a-b-b-a-?
> >  \ X   /
> >   a-b-b
> >
> [Lots of stuff deleted]
>
> There seems to be a lot of effort being put into auto-magically choosing
> the "right" merge in the presence of multiple possible merge bases.
> Unfortunately, most (all?) of the proposals are attempting to divine
> intent, and so, are guaranteed to be 100% wrong at least some of the time.
>
> Wouldn't it be better, instead, to detect that current merge being
> attempted is ambiguous and require the user to specify the correct merge
> base? The alternative is a tool that appears to work all of the time but
> does the wrong thing some of the time.

My proposal is actually to detect when a merge is ambiguous. In order to
determine that, however, you have to evaluate multiple potential outcomes
and see if they are actually different. I'm working on an efficient way to
do that.

Then further work could look into eliminating possibilities when
information about the history excludes them. There were two issues in the
case that Tony hit: it ignored a potential correct outcome for the merge,
and it didn't ignore an outcome which could be demonstrated to be
incorrect. The priority is to resolve the first, but things which improve
the second or help with solutions to the second are worth understanding.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC] Looking at multiple ancestors in merge

2005-08-24 Thread Daniel Barkalow

I'm starting to work on letting the merging process see multiple
ancestors, and I think it's messy enough that I should actually discuss
it.

Review of the issue:

It is possible to lost reverts in cases when merging two commits with
multiple ancestors, in the following pattern: (letters representing blobs
at some filename, children to the right)

a-b-b-a-?
 \ X   /
  a-b-b

You form a branch with unrelated changes, apply a patch in the top line,
separately merge both ways, do unrelated development in the bottom line,
and revert the patch in the top line. Then you're trying to merge the two
lines. There are two candidates for the common ancestor, the two in the
second column. If you pick the top one, you get the revert; if you pick
the bottom one, you don't. This is a bug, because it ignores the 'a'
version due to it being "unchanged", but it actually did change and
changed back.

Note that the revert is going to also be ignored if there isn't the "X" in
the middle of that diagram and the a->b change on the bottom is due to
independantly applying the same patch. Users are more likely to expect
this, however, than the situation above, where the side that is causing
the patch to be included never applied it explicitly at all; it just
merged at an unfortunate moment.

My theory is that we should handle merges by passing all of the ancestors
to read-tree, and having it use the following additions to the rules for
trivial merges:

 - If any of the ancestors matches a side, don't use that side
 - If you eliminate both side, don't do the trivial merge

(The first of these also means that it'll pick the best combination of
ancestors for maximizing trivial merges, as a nice side effect; the second
means that it'll avoid messing up with reverts when it has a chance of
understanding them)

If it doesn't do the trivial merge, it just puts the blob from the first
listed ancestor in stage 1, rather than trying anything fancy.

(As a further improvement, we could actually look through the history for
reasons to disregard a similarity, which would determine that there isn't
a continuous line of similarity from the recent 'a' to the common ancestor
'a', and therefore that it should be retained; but I'll be satisfied for
now with having it just not do the incorrect trivial merge.)

Of course, this is going to take a bit of work, because read-tree
currently puts all of its arguments into the cache and then works on
merging, and taking multiple ancestors requires putting them somewhere
else, because they won't fit in the cache.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] undo and redo

2005-08-24 Thread Daniel Barkalow

On Wed, 24 Aug 2005, Carl Baldwin wrote:

> This is interesting.  Can a ref be to a tree rather than a commit?  And
> it still works?  I guess it would.  I hadn't thought about that.

Generally, each subdirectory of refs/ has refs to objects of the same
type, and heads/ is commits, but other directories are other things. tags/
is all tag objects, and you could have undo/ be trees.

> Will prune preserve any tree mentioned in any file in refs?  How does
> this work exactly?

It keeps any object reachable from an object that there's a ref to in
refs.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] undo and redo

2005-08-24 Thread Daniel Barkalow

On Wed, 24 Aug 2005, Carl Baldwin wrote:

> This brings up a good point (indirectly).  "git prune" would destroy the
> undo objects.  I had thought of this but decided to ignore it for the
> time being.

If you made undo store the tree under refs somewhere, git prune would
preserve it.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: baffled again

2005-08-24 Thread Daniel Barkalow

On Wed, 24 Aug 2005, Linus Torvalds wrote:

> Now, if the shared patch hadn't been a patch, but a shared _commit_, then
> the thing would have been unambiguous - the shared commit would have been
> the merge point, and the revert would have clearly undone that shared
> commit.

Actually, it was a shared commit
(4aec0fb12267718c750475f3404337ad13caa8f5), which was (an ancestor of) a
candidate merge point, but wasn't the one selected. Since a different one
was chosen, it looked to the 3-way merge like a shared patch (since it
ignores the untaken parent in the merges in the history).

This should be fixable, but it'll require more cleverness in read-tree.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Query about status of http-pull

2005-08-24 Thread Daniel Barkalow

On Wed, 24 Aug 2005, Martin Schlemmer wrote:

> Hi,
>
> Recently cogito again say that the rsync method will be deprecated in
> future (due to http-pull now supporting pack objects I suppose), but it
> seems to me that it still have other issues:
>
> -
> lycan linux-2.6 # git pull origin
> Fetching HEAD using http
> Getting pack list
> error: Couldn't get 0572e3da3ff5c3744b2f606ecf296d5f89a4bbdf: not separate or 
> in any pack
> error: Tried 
> http://www.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git/objects/05/72e3da3ff5c3744b2f606ecf296d5f89a4bbdf
> Cannot obtain needed object 0572e3da3ff5c3744b2f606ecf296d5f89a4bbdf
> while processing commit .

It looks like pack-c24bb5025e835a3d8733931ce7cc440f7bfbaaed isn't in the
pack list. I suspect that updating this file should really be done by
anything that creates pack files, because people forget to run the program
that does it otherwise and then http has problems.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: baffled again

2005-08-24 Thread Daniel Barkalow

On Wed, 24 Aug 2005, Junio C Hamano wrote:

> [EMAIL PROTECTED] writes:
>
> > So I have another anomaly in my GIT tree.  A patch to
> > back out a bogus change to arch/ia64/hp/sim/boot/bootloader.c
> > in my release branch at commit
> >
> >  62d75f3753647656323b0365faa43fc1a8f7be97
> >
> > appears to have been lost when I merged the release branch to
> > the test branch at commit
> >
> >  0c3e091838f02c537ccab3b6e8180091080f7df2
>
> : siamese; git cat-file commit 0c3e091838f02c537ccab3b6e8180091080f7df2
> tree 61a407356d1e897e0badea552ce69e657cab6108
> parent 7ffacc1a2527c219b834fe226a7a55dc67ca3637
> parent a4cce10492358b33d33bb43f98284c80482037e8
> author Tony Luck <[EMAIL PROTECTED]> 1124808655 -0700
> committer Tony Luck <[EMAIL PROTECTED]> 1124808655 -0700
>
> Pull release into test branch
>
> So I pulled 7ffacc and a4cce1 from your repository and started
> digging from there.  7ffacc was the head of "test" branch back
> then, and a4cce1 was the head of "release" branch.  I checked
> out 7ffacc in the repository and pulled a4cce1 into it, using
> the GIT with the "optimum merge-base" patch.
>
> : siamese; git pull . aegl-release
> Packing 0 objects
> Unpacking 0 objects
>
> * committish: a4cce10492358b33d33bb43f98284c80482037e8
> refs/heads/aegl-release from .
> Trying to find the optimum merge base.
> Trying to merge a4cce10492358b33d33bb43f98284c80482037e8 into 
> 7ffacc1a2527c219b834fe226a7a55dc67ca3637 using 
> c1ffb910f7a4e1e79d462bb359067d97ad1a8a25.
> Simple merge failed, trying Automatic merge
> Auto-merging arch/ia64/sn/kernel/io_init.c.
> Committed merge db376974c0aebb9e99e5cd0bce21088c6a9d927c
>  arch/ia64/hp/sim/boot/boot_head.S |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> It is using c1ffb9 as the merge base.  The problematic path
> in the three trees involved are:
>
> : siamese; git ls-tree -r aegl-test-7ffacc1a | grep 
> arch/ia64/hp/sim/boot/bootloader.c
> 100644 blob a7bed60b69f9e8de9a49944e22d03fb388ae93c7  
> arch/ia64/hp/sim/boot/bootloader.c
> : siamese; git ls-tree -r aegl-release-a4cce1 | grep 
> arch/ia64/hp/sim/boot/bootloader.c
> 100644 blob 51a7b7b4dd0e7c5720683a40637cdb79a31ec4c4  
> arch/ia64/hp/sim/boot/bootloader.c
> : siamese; git ls-tree -r aegl-c1ffb9 | grep 
> arch/ia64/hp/sim/boot/bootloader.c
> 100644 blob 51a7b7b4dd0e7c5720683a40637cdb79a31ec4c4  
> arch/ia64/hp/sim/boot/bootloader.c
>
> So the file did not change between the merge base and release,
> and test had the change.  merge-cache picked the one in the test
> release.  Your guess in the other message hits the mark.
>
> I wonder what _other_ candidates these two commits have in
> common and what would have happened if they were used as the
> base instead?
>
> : siamese; git merge-base -a aegl-test-7ffacc1a aegl-release-a4cce1
> f6fdd7d9c273bb2a20ab467cb57067494f932fa3
> 3a931d4cca1b6dabe1085cc04e909575df9219ae
> c1ffb910f7a4e1e79d462bb359067d97ad1a8a25
>
> You can check what variant of the file each of these commits
> contain.
>
> What is happening is:
>
> * the problematic patch 4aec0f is one before 3a931d.  Among the
>   three merge-base candidates, only 3a931d contains teh wrongly
>   patched version.
>
> * the problematic change 4aec0f patch introduces is part of test
>   branch, because it was pulled via release.
>
> * the tip of release being merged into test has this patch
>   reverted, and the file is exactly the same as before 4aec0f
>   patch.
>
> So three-way trivial merge algorithm says, "hey, the file did
> not change between common ancestor and release but it is
> different in test, so the one in the test branch must be the
> merge result."
>
> This does not have much to do with which common ancestor
> merge-base chooses.  Sorry, I am not sure what is the right way
> to resolve this offhand.

If it picks 3a931d4cca1b6dabe1085cc04e909575df9219ae, it will determine
that the file didn't change between that and test, and is different in
release, so the one in release must be right. I believe that the hint that
something is going on is that different common ancestors give
different trivial merges (as opposed to some giving failure and some
giving the same result), and resolving it probably involves identifying
that that paths from f6f... and c1f... to release don't keep the same blob
through the middle, despite having the same ends.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Automatic merge failed, fix up by hand

2005-08-23 Thread Daniel Barkalow

On Tue, 23 Aug 2005, Junio C Hamano wrote:

> Only lightly tested, in the sense that I did only this one case
> and nothing else.  For a large repository and with complex
> merges, "merge-base -a" _might_ end up reporting many
> candidates, in which case the pre-merge step to figure out the
> best merge base may turn out to be disastrously slow.  I dunno.

I think it's the right thing to do for now (and what I was going to
suggest), and if people find it too slow, we can consider teaching
read-tree to take multiple common ancestors and use any of them that gives
clear result on a per-file basis.

On the other hand, Tony might have hit a bad case with an ill-chosen
common ancestor for a patch/revert sequence, and we probably want to look
into that if we've got some history that demonstrates the problem. I think
that, if there are two common ancestors, one of which has applied a patch
and one of which hasn't, and on one side of the merge it gets reverted, we
should get the revert, but we'll only get it if we choose the ancestor
where it was applied.

(Letters are versions of the file, which 'b' being the bad patch; the
 second column is the two choices for common ancestor)

  a-b-a-?
 / X   /
a-b-b-b

Of course, you could have the two lines exactly flipped for a different
file in the same commits, or for a different hunk in the same file, and
there would be no single choice that doesn't lose the revert. The really
right thing to do is identify that there is a b->a transition that is not
a trivial merge and that is not beyond a common ancestor, but that's hard
to determine easily and with sufficient granularity to catch everything.

I still someday want to do a version of diff/merge for git that could
select common ancestors on a per-hunk basis and identify block moves and
avoid giving confusing (but marginally shorter) diffs, but that's a major
undertaking that I don't have time for right now.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Removing deleted files after checkout

2005-08-23 Thread Daniel Barkalow

On Tue, 23 Aug 2005, Carl Baldwin wrote:

> The thing that this doesn't do is remove empty directories when the last
> file is deleted.  I once expressed the opinion in a previous thread that
> directories should be added and removed explicitly in git.  (Thus
> allowing an empty directory to be added).  If this were to happen then
> this case would get handled correctly.  However, if git stays with the
> status quo then I think that git-read-tree -u should be changed to
> remove the empty directory.  This would make it consistent.

I think that git-read-tree -u ought to remove a directory if it removes
the last file (or directory) in it.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Stgit - patch history / add extra parents

2005-08-23 Thread Daniel Barkalow

On Tue, 23 Aug 2005, Jan Veldeman wrote:

> Daniel Barkalow wrote:
>
> > On Tue, 23 Aug 2005, Catalin Marinas wrote:
> >
> > Something is legitimate as a parent if someone took that commit and did
> > something to it to get the new commit. The operation which caused the
> > change is not specified. But you only want to include it if anyone cares
> > about the parent.
>
> This is indeed what I thought a parent should be used for. As an adition,
> I'll try to explain why I would sometimes want to care about some parents:
>
> I want to track a mailine tree, but have quite a few changes, which shoudn't
> be commited to the mainline immediately (let's call it my development tree).
> This is why I would use stgit. But I would also want to colaborate with
> other developers on this development tree, so I sometimes want to make
> updates available of this development tree to the others. This is where
> current stgit falls short. To easily share this development tree, I want
> some history (not all, only the ones I choose) of this development tree
> included, so that the other developers can easily follow my development.
>
> The parents which should be visible to the outside, will always be versions
> of my development tree, which I have previously pushed out. My way of
> working would become:
> * make changes, all over the place, using stgit
> * still make changes (none of these gets tracked, intermittent versions are
>   lost)
> * having a good day: changes looks good, I want to push this out:
>   * push my tree out
>   * stgit-free (which makes the pushed out commits, the new parents of my
> stgit patches)
> * restart from top

I'm not sure how applicable to this situation stgit really is; I see stgit
as optimized for the case of a patch set which is basically done, where
you want to keep it applicable to the mainline as the mainline advances.

For your application, I'd just have a git branch full of various stuff,
and then generate clean commits by branching mainline, diffing development
against it, cutting the diff down to just what I want to push, and
applying that. Then the clean patch goes into stgit.

> [...]
> > This also depends on how exactly freeze is used; if you use it before
> > commiting a modification to the patch without rebasing, you get:
> >
> > old-top -> new-top
> >   ^^
> >\  /
> >   bottom
> >
> > bottom to old-top is the old patch
> > bottom to new-top is the new patch
> > old-top to new-top is the change to the patch
> >
> > Then you want to keep new-top as a parent for rebasings until one of these
> > is frozen. These links are not interesting to look at, but preserve the
> > path to the old-top:new-top change, which is interesting.
>
> my proposal does something like this, but a little more: not only does it
> keep track of the link between old-top and new-top, it also keeps track of
> the links between old-patch-in-between and new-patch-in-between.
> (This makes sense when the top is being removed or reordered)

I was thinking of this as being the top and bottom commits for a single
tracked patch, not as a whole series. I think patches lower wouldn't be
affected, and patches higher would see this as a rebase.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Removing deleted files after checkout

2005-08-23 Thread Daniel Barkalow

On Tue, 23 Aug 2005, Carl Baldwin wrote:

> The point is to push and use a post-update hook to do the checkout.  So,
> this won't be possible.

You could have the remote repository be something like
"~/git/website.git", and have a hook which does: "cd ~/www; git pull
~/git/website.git/". That is, have three things: the directory where you
work on stuff, the central storage location, and the area that the web
server serves, and have the storage location automatically update the web
server area. That's what I do with my website section that's still in CVS,
and the general concept is good (and means that the "real" repository
isn't somewhere the web server is poking around).

> > which will correctly identify before and after, and remove any files that
> > were removed.
> >
> > Alternatively, you could do, at point 1:
> >
> > cp .git/refs/master .git/refs/deployed
> > git checkout deployed
>
> How to get a post-update hook to do this?  I suppose an update script
> could set this up for the post-update to later use.

If you have "deployed" checked out, and you push to "master" in the same
repository, having the hook do "git resolve deployed master auto-update"
should work.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Removing deleted files after checkout

2005-08-23 Thread Daniel Barkalow

On Tue, 23 Aug 2005, Carl Baldwin wrote:

> On Tue, Aug 23, 2005 at 03:43:56PM -0400, Daniel Barkalow wrote:
> > On Tue, 23 Aug 2005, Carl Baldwin wrote:
> >
> > > Hello,
> > >
> > > I recently started using git to revision control the source for my
> > > web-page.  I wrote a post-update hook to checkout the files when I push
> > > to the 'live' repository.
> > >
> > > In this particular context I decided that it was important to me to remove
> > > deleted files after checking out the new HEAD.  I accomplished this by 
> > > running
> > > git-ls-files before and after the checkout.
> > >
> > > Is there a better way?  Could there be some way built into git to easily
> > > find out what files dissappear when replacing the current index with one
> > > from a new tree?  Is there already?  The behavior of git should NOT
> > > change to delete these files but I would argue that some way should
> > > exist to query what files disappeared if removing them is desired.
> >
> > If you don't use -f, git-checkout-script removes deleted files. Using -f
> > tells it to ignore the old index, which means that it can't tell the
> > difference between removed files and files that weren't tracked at all.
>
> Maybe I'm doing something wrong.  This does not happen for me.
>
> I tried a simple test with git v0.99.4...
>
> cd
> mkdir test-git && cd test-git/
> echo testing | cg-init
> echo contents > file
> git-add-script file
> git-commit-script -m 'testing'

[point 1]

> cd ..
> cg-clone test-git/.git/ test-git2
> cd test-git2
> cg-rm file
> git-commit-script -m 'testing'
> ls

> cg-push
> cd ../test-git
> git-checkout-script

Ah, okay. I think "push" and "checkout" don't play that well together;
"push" changes the ref, which "checkout" uses to determine what it expects
for the old contents, and then it's confused.

What you probably actually want is:

cd ../test-git
git pull ../test-git2

which will correctly identify before and after, and remove any files that
were removed.

Alternatively, you could do, at point 1:

cp .git/refs/master .git/refs/deployed
git checkout deployed

Then, after the push and cd:

git checkout master
cp .git/refs/master .git/refs/deployed
git checkout deployed

because checkout does remove files if you switch from a branch with them
(e.g., deployed) to one without them (master, after the push).

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Removing deleted files after checkout

2005-08-23 Thread Daniel Barkalow

On Tue, 23 Aug 2005, Carl Baldwin wrote:

> Hello,
>
> I recently started using git to revision control the source for my
> web-page.  I wrote a post-update hook to checkout the files when I push
> to the 'live' repository.
>
> In this particular context I decided that it was important to me to remove
> deleted files after checking out the new HEAD.  I accomplished this by running
> git-ls-files before and after the checkout.
>
> Is there a better way?  Could there be some way built into git to easily
> find out what files dissappear when replacing the current index with one
> from a new tree?  Is there already?  The behavior of git should NOT
> change to delete these files but I would argue that some way should
> exist to query what files disappeared if removing them is desired.

If you don't use -f, git-checkout-script removes deleted files. Using -f
tells it to ignore the old index, which means that it can't tell the
difference between removed files and files that weren't tracked at all.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Stgit - patch history / add extra parents

2005-08-23 Thread Daniel Barkalow

On Tue, 23 Aug 2005, Catalin Marinas wrote:

> > So the point is that there are things which are, in fact, parents, but we
> > don't want to list them, because it's not desired information.
>
> What's the definition of a parent in GIT terms? What are the
> restriction for a commit object to be a parent? Can a parent be an
> arbitrarily chosen commit?

Something is legitimate as a parent if someone took that commit and did
something to it to get the new commit. The operation which caused the
change is not specified. But you only want to include it if anyone cares
about the parent.

(For example, I often start with a chunk of work that does multiple things
and is committed; I take mainline and generate a series of commits from
there. It would be legitimate to list my development commit as a parent of
each of these, since I did actually take it and strip out the unrelated
changes. This would be a bit confusing in the log, but would make merges
between something based on the "messy" version and something based on the
"refined" version work well. On the other hand, I don't want to report the
existance of the messy version, so I don't include it.)

> An StGIT patch is a represented by a top and bottom commit
> objects. The bottom one is the same as the parent of the top
> commit. The patch is the diff between the top's tree id and the
> bottom's tree id.
>
> Jan's proposal is to allow a freeze command to save the current top
> hash and later be used as a second parent for the newly generated
> top. The problem I see with this approach is that (even for the
> internal view you described) the newly generated top will have two
> parents, new-bottom and old-top, but only the diff between new-top and
> new-bottom is meaningful. The diff between new-top and old-top (as a
> parent-child relation) wouldn't contain anything relevant to the patch
> but all the new changes to the base of the stack.

Having a useful diff isn't really a requirement for a parent; the diff in
the case of a merge is going to be the total of everything that happened
elsewhere. The point is to be able to reach some commits between which
there are interesting diffs.

This also depends on how exactly freeze is used; if you use it before
commiting a modification to the patch without rebasing, you get:

old-top -> new-top
  ^^
   \  /
  bottom

bottom to old-top is the old patch
bottom to new-top is the new patch
old-top to new-top is the change to the patch

Then you want to keep new-top as a parent for rebasings until one of these
is frozen. These links are not interesting to look at, but preserve the
path to the old-top:new-top change, which is interesting.

Ignoring the links to the corresponding bottoms, the development therefore
looks like:

local1 -> local2 -> merge -> local3 -> merge
^   ^  ^
mainline>-->->-->-->->

And this is how development is normally supposed to look. The trick is to
only include a minimal number of merges.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Stgit - patch history / add extra parents

2005-08-22 Thread Daniel Barkalow

On Sun, 21 Aug 2005, Jan Veldeman wrote:

> Catalin Marinas wrote:
>
> > > So for example, you only tag (freeze) the history when exporting the
> > > patches.  When an error is being reported on that version, it's easy to 
> > > view
> > > it and also view the progress that was already been made on those patches.
> >
> > I agree that it is a useful feature to be able to individually tag the
> > patches. The problem is how to do this best. Your approach looks to me
> > like it's not following the GIT DAG structure recommendation. Maybe the
> > GIT designers could further comment on this but a commit object with
> > multiple parents should be a result of a merge operation. A commit with
> > a single parent should represent a transition of the tree from one state
> > to another. With the freeze command you proposed, a commit with multiple
> > parents is no longer a result of a merge operation, but just a
> > convenience for tracking the patch history with gitk.
>
> My interpretation of parents is broader than only merges, and reading the
> README file, I believe it also the intension to do so (snippet from README
> file):
>
> A "commit" object ties such directory hierarchies together into
> a DAG of revisions - each "commit" is associated with exactly one tree
> (the directory hierarchy at the time of the commit). In addition, a
> "commit" refers to one or more "parent" commit objects that describe the
> history of how we arrived at that directory hierarchy.

One factor not mentioned there is that, as things move upstream, we often
want to discard a lot of history; if someone commits constantly to deal
with editor malfunction or something, we don't really want to take all of
this junk into the project history when it is cleaned up and accepted.

So the point is that there are things which are, in fact, parents, but we
don't want to list them, because it's not desired information.

Probably the right thing is to have two views of the stack: the internal
view, showing what actually happened, and the external view, showing what
would have happened if the developers had done everything right the first
time. When you make changes to the series, this adds to the internal view
and entirely replaces the external view.

I think that users will also want to discard the commits from the stack
before rebasing in favor of the commits after, because (a) rebasing isn't
all that interesting, especially if there's minimal merging, and (b)
otherwise you'd get a ton of boring commits that obscure the interesting
ones.

I think that the best rule would be that, when you modify a patch, the
previous version is the new version's parent, and when you rebase a
series, you include as a parent any parent of the input that isn't also in
the input (but never include the input itself as a parent of the output;
the point of rebasing is to pretend that it was the newer mainline that
you modified). This should mean that the internal history of a patch
consists of the present version, based on each version that was replaced
due to changing the patch rather than rebasing it.

Of course, there's an interesting situation with the commits earlier in a
series from a patch that was changed not being ancestors of the newer
versions of those patches (because they weren't interesting in the
development of those patches) but accessible as the commits that an
interesting patch was based on.

A possible solution is just to consider the revision of any patch a
significant event in the history of the whole stack, causing all of the
patches to get a new retained version.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Importing from a patch-oriented SCM

2005-08-19 Thread Daniel Barkalow

On Fri, 19 Aug 2005, Martin Langhoff wrote:

> On 8/19/05, Junio C Hamano <[EMAIL PROTECTED]> wrote:
> > Martin Langhoff <[EMAIL PROTECTED]> writes:
> >
> > > If I remember correctly, Junio added some stuff in the merge & rebase
> > > code that will identify if a particular patch has been seen and
> > > applied, and skip it even if it's a bit out of order. But I don't know
> >
> > I think you are talking about git-patch-id.
>
> Is this used at commit time, and stored somewhere (doesn't seem to be)
> or do you select older patches from the destination branch at merge
> time?

If a patch is applied verbatim, or a merge results in no conflicts (i.e.,
only offsets), then you can run git-patch-id on the diff caused by it and
compare the result with the git-patch-id of the diff caused by your local
change to see if you've found it. Of course, if there was any modification
to the patch or a conflict was resolved, you won't see a match, but that's
plausibly correct anyway: you don't know whether the content change that
resulted from your patch really matched the change you wanted to make.

> If you only compare patches since the last merge, patches that were
> merged but somehow unreported will fall into a black hole and cause a
> conflict going forward anyway. Hmm.  That seems to be a problem I
> won't be able to avoid if merges happen out-of-order.

They might cause conflicts, but they're relatively unlikely to require
manual intervention, because the merging mechanism in git is stronger than
the one in arch (by virtue of identifying a common ancestor), and will
recognize when a section of changes made by both sides is the same and
produce a warning rather than a conflict. That's how the rebase stuff can
identify that your rebased patch is empty (when upstream applies your
patch): the content change that it would make has been made.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Merge conflicts as .rej .orig files

2005-08-19 Thread Daniel Barkalow

On Fri, 19 Aug 2005, Martin Langhoff wrote:

> After using arch for a while, I've gotten used to getting .rej and
> .orig files instead of big ugly conflict markers inside the file.
> Emacs has a nice 'diff' mode that is a boon when dealing with
> conflicts this way.
>
> Is there a way to convince cogito/git to leave reject files around?
> What utility is git using to do the merges? Or at least: where should
> I look?

I believe you should be able to get that effect by having a version
of "git-merge-one-script" that does "diff -c $2 $3 | patch $1" or "diff -c
$2 $1 | patch $3", depending on which you want as the orig. (Or something
like that. I'm not sure exactly how to get the conflict files out of the
script and into the right place, or the arguments it gets.)

Of course, you'll probably have more conflicts to deal with, because the
merging code gets less information that way. (In particular, you'll lose
the "already contains changes" behavior, so you'll be unhappy if you have
patches merged upstream.)

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Subject: [PATCH] Updates to glossary

2005-08-18 Thread Daniel Barkalow

On Thu, 18 Aug 2005, Johannes Schindelin wrote:

>  tree object::
> - An object containing a list of blob and/or tree objects.
> - (A tree usually corresponds to a directory without
> - subdirectories).
> + An object containing a list of file names and modes along with refs
> + to the associated blob and/or tree objects. A tree object is
> + equivalent to a directory.

Actually, it contains object names, not refs, to be completely precise.
(refs would imply an additional indirection.)

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: First stab at glossary

2005-08-17 Thread Daniel Barkalow

On Wed, 17 Aug 2005, Johannes Schindelin wrote:

> Hi,
>
> On Wed, 17 Aug 2005, Daniel Barkalow wrote:
>
> > On Wed, 17 Aug 2005, Johannes Schindelin wrote:
> >
> > > object name::
> > >   Synonym for SHA1.
> >
> > Have we killed the use of the third term "hash" for this? I'd say that
> > "object name" is the standard term, and "SHA1" is a nickname, if only
> > because "object name" is more descriptive of the particular use of the
> > term.
>
> Okay for "hash".

I think we only need at most two names for this, so this is more a matter
of fixing old usage than documenting it.

> > I think we might want to entirely kill the "cache" term, and talk only
> > about the "index" and "index entries". Of course, a bunch of the code will
> > have to be renamed to make this completely successful, but we could change
> > the glossary and documentation, and mention "cache" and "cache entry" as
> > old names for "index" and "index entry" respectively.
>
> For me, "index" is just the file named "index" (holding stat data and a
> ref for each cache entry). That is why I say an "index" contains "cache
> entries", not "index entries" (wee, that sounds wrong :-).

Well, it often contains information not present anywhere else (the status
of a merge; the set of files being committed, added, or removed), so it
isn't really a cache at all.

> > > working tree::
> > >   The set of files and directories currently being worked on.
> > >   Think "ls -laR"
> >
> > This is where the data is actually in the filesystem, and you can edit and
> > compile it (as opposed to a tree object or the index, which semantically
> > have the same contents, but aren't presented in the filesystem that way).
>
> Maybe I was too cautious. Linus very new idea was to think of the lowest
> level of an SCM as a file system. But I did not want to mention that.
> Thinking of it again, maybe I should.

You probably don't need to mention that tree objects and index files can
be thought of as filesystems, but you should specify that the working tree
really is in the Unix filesystem, in case people have heard of the idea.

It should be clear to say 'You can "cd" there and "ls" to list your
files.', rather than 'Think "ls -laR"' which makes my think of the output,
which is like the output from git-ls-files.

> > > checkout::
> >
> > Move after "revision"?
>
> Ultimately, the glossary terms will be sorted alphabetically. If you look
> at the file attached to my original mail, this is already sorted and
> marked up using asciidoc. However, I wanted you and the list to understand
> how I grouped terms. The asciidoc'ed file is generated by a perl script.

Ah, okay.

> > > resolve::
> > >   The action of fixing up manually what a failed automatic merge
> > >   left behind.
> >
> > "Resolve" is also used for the automatic case (e.g., in
> > "git-resolve-script", which goes from having two commits and a message to
> > having a new commit). I'm not sure what the distinction is supposed to be.
>
> I did not like that naming anyway. In reality, git-resolve-script does not
> resolve anything, but it merges two revisions, possibly leaving something
> to resolve.

Right; I think we should change the name of the script.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: First stab at glossary

2005-08-17 Thread Daniel Barkalow

On Wed, 17 Aug 2005, Johannes Schindelin wrote:

> Hi,
>
> long, long time. Here?s my first stab at the glossary, attached the
> alphabetically sorted, asciidoc marked up txt file (Comments?
> Suggestions? Pizzas?):
>
> object::
>   The unit of storage in GIT. It is uniquely identified by
>   the SHA1 of its contents. Consequently, an object can not
>   be changed.
>
> SHA1::
>   A 20-byte sequence (or 41-byte file containing the hex
>   representation and a newline). It is calculated from the
>   contents of an object by the Secure Hash Algorithm 1.

It's also often 40-character string (with whatever termination) in places
like commit objects, tag objects, command-line arguments, listings, and so
forth.

> object database::
>   Stores a set of "objects", and an individial object is identified
>   by its SHA1 (its ref). The objects are either stored as single
>   files, or live inside of packs.
>
> object name::
>   Synonym for SHA1.

Have we killed the use of the third term "hash" for this? I'd say that
"object name" is the standard term, and "SHA1" is a nickname, if only
because "object name" is more descriptive of the particular use of the
term.

> blob object::
>   Untyped object, i.e. the contents of a file.

This "i.e." should be "e.g.", since symlink targets are also stored as
blobs, and any other bulk data stored by itself would be. (IIRC, Junio has
a tagged blob to hold his public key, for example)

> tree object::
>   An object containing a list of blob and/or tree objects.
>   (A tree usually corresponds to a directory without
>   subdirectories).
>
> tree::
>   Either a working tree, or a tree object together with the
>   dependent blob and tree objects (i.e. a stored representation
>   of a working tree).
>
> cache::
>   A collection of files whose contents are stored as objects.
>   The cache is a stored version of your working tree. Well, can
>   also contain a second, and even a third version of a working
>   tree, which are used when merging.
>
> cache entry::
>   The information regarding a particular file, stored in the index.
>   A cache entry can be unmerged, if a merge was started, but not
>   yet finished (i.e. if the cache contains multiple versions of
>   that file).
>
> index::
>   Contains information about the cache contents, in particular
>   timestamps and mode flags ("stat information") for the files
>   stored in the cache. An unmerged index is an index which contains
>   unmerged cache entries.

I think we might want to entirely kill the "cache" term, and talk only
about the "index" and "index entries". Of course, a bunch of the code will
have to be renamed to make this completely successful, but we could change
the glossary and documentation, and mention "cache" and "cache entry" as
old names for "index" and "index entry" respectively.

> working tree::
>   The set of files and directories currently being worked on.
>   Think "ls -laR"

This is where the data is actually in the filesystem, and you can edit and
compile it (as opposed to a tree object or the index, which semantically
have the same contents, but aren't presented in the filesystem that way).

> directory::
>   The list you get with "ls" :-)
>
> checkout::
>   The action of updating the working tree to a revision which was
>   stored in the object database.

Move after "revision"?

> revision::
>   A particular state of files and directories which was stored in
>   the object database. It is referenced by a commit object.
>
> commit::
>   The action of storing the current state of the cache in the
>   object database. The result is a revision.
>
> commit object::
>   An object which contains the information about a particular
>   revision, such as parents, committer, author, date and the
>   tree object which corresponds to the top directory of the
>   stored revision.

Move "parent" around here.

> changeset::
>   BitKeeper/cvsps speak for "commit". Since git does not store
>   changes, but states, it really does not make sense to use
>   the term "changesets" with git.
>
> ent::
>   Favorite synonym to "tree-ish" by some total geeks.

Move after "tree-ish".

> head::
>   The top of a branch. It contains a ref to the corresponding
>   commit object.
>
> branch::
>   A non-cyclical graph of revisions, i.e. the complete history of
>   a particular revision, which does not (yet) have children, which
>   is called the branch head. The branch heads are stored in
>   $GIT_DIR/refs/heads/.

A branch head might have children, if they're in another branch. (E.g., I
pull mainline, make a new branch based on it, and commit a change; the
head of mainline is still a branch head, even though it's the parent of my
new commit, because my new commit isn't in mainline.)

> ref::
>   A 40-byte hex representation of a SHA

Re: [RFC PATCH] Add support for figuring out where in the git archive we are

2005-08-16 Thread Daniel Barkalow

On Tue, 16 Aug 2005, Linus Torvalds wrote:

> If you use the GIT_DIR environment variable approach, it assumes that all
> filenames you give it are absolute and acts the way it always did before.
>
> Comments? Like? Dislike?

I'm all in favor, at least in the general case. I suspect there'll be some
things where we have to discuss the behavior, but we can argue that when
it comes up.

I think, slightly before 1.0, we should sort the library functions into a
new set of object files with matching header files, because "setup" is not
really distinctive, and there's at least one duplicate implementation
(the ssh subprocess code in your connect.c is the same as my rsh.c in what
it does, although yours uses two pipes and mine uses a socket).

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Patches exchange is bad?

2005-08-16 Thread Daniel Barkalow

On Tue, 16 Aug 2005, Marco Costalba wrote:

> Martin Langhoff wrote:
>
> >>From what I understand, you'll want the StGIT infrastructure. If you
> >use git/cogito, there is an underlying  assumption that you'll want
> >all the patches merged across, and a simple cg-update will bring in
> >all the pending stuff.
> >
>
> My concerns are both metodologicals and practical:
>
> 1) Method: To use the 'free patching workflow' on git is something foreseen in
> git design, something coherent with the fork + develop + merge cycle that it
> seems, at least to me, THE way git is meant to be used. Or it is stretching
> the possibility of the tool to something technically allowed but not 
> suggested.

Patches are definitely meant to be part of how git is used; they are the
primary way of getting clean history out of messy history (that is, saving
a content change while discarding some history that isn't applicable).
There's relatively little support in git itself, but that's because the
point is to go outside the system's tracking. There have been various
discussions of more explicit support, and nobody's been able to come up
with a need.

> 2) Practical: The round trip git-format-patch + git-applymbox is the logical 
> and
> natural way to reach this goal or, also in this case, I intend to stretch 
> some tools,
> designed for one thing, for something else?

I'd guess that git-diff-tree + git-apply (without the rest of the
scripting) would be more effective when you're not doing anything with the
intermediate files, since it saves doing a bunch of formatting and
parsing.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Alternate object pool mechanism updates.

2005-08-16 Thread Daniel Barkalow

On Tue, 16 Aug 2005, Linus Torvalds wrote:

> Finally, I have to say that that "info" directory is confusing. Namely,
> there's two of them - the "git info" and the "object info" directories are
> totally different directories - maybe logical, but to me it smells like
> "info" is here a code-name for "misc files that don't make sense anywhere
> else".
>
> What this all is leading up to is that I think we'd be better off with a
> totally new "git config" file, in ".git/config", and we'd have all the
> startup configuration there. Including things like alternate object
> directories, perhaps standard preferences for that particular repo, and
> things like the "grafts" thing.
>
> Wouldn't that be nice?

I'd originally proposed the .git/info directory because I keep multiple
working trees for the same repository, by having symlinks for .git/objects
and .git/refs, and I could also get other per-repository things to be
shared properly without knowing exactly what they are if they're in a
subdirectory of .git that could be a symlink. This would mean that a
".git/config" would be per-working-tree, like .git/index or .git/HEAD, not
pre-repository like ".git/info/config". Of course, the core didn't have
any thing to go in .git/info at the time, so it didn't really get tacked
down.

(I find it convenient to have mainline and my latest work both checked out
for reference while I'm generating a series of commits for a patch set,
and I don't want three different repositories which could be out of sync;
this also keeps the repository safely out of pwd, since I have the actual
repositories as ~/git/{project}.git/)

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Git 1.0 Synopis (Draft v4)

2005-08-16 Thread Daniel Barkalow

On Tue, 16 Aug 2005, Johannes Schindelin wrote:

> Hi,
>
> On Tue, 16 Aug 2005, Junio C Hamano wrote:
>
> >   - Are all the files in Documentation/ reachable from git(7)
> > or otherwise made into a standalone document using asciidoc
> > by the Makefile?  I haven't looked into documentation
> > generation myself (I use only the text files as they are);
> > help to update the Makefile by somebody handy with asciidoc
> > suite is greatly appreciated here.
> >
> > Volunteers?
>
> The attached script reveals:
>
> git-unpack-objects.txt is not reachable from git.txt
> git-cvsimport-script.txt is not reachable from git.txt
> git-send-email-script.txt is not reachable from git.txt
> git-rename-script.txt is not reachable from git.txt
> tutorial.txt is not reachable from git.txt
> git-show-index.txt is not reachable from git.txt
> cvs-migration.txt is not reachable from git.txt
> diffcore.txt is not reachable from git.txt
> git-ls-remote-script.txt is not reachable from git.txt
> git-apply.txt is not reachable from git.txt
> git-diff-stages.txt is not reachable from git.txt
> pack-protocol.txt is not reachable from git.txt

The ones that don't start with git probably don't belong in the same set;
perhaps there should be a "technical" (or something similar but shorter)
subdirectory for developer documentation instead of user documentation?
(And tutorial and cvs-migration can move to howto)

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Git 1.0 Synopis (Draft v4)

2005-08-16 Thread Daniel Barkalow

On Tue, 16 Aug 2005, Junio C Hamano wrote:

> Daniel Barkalow <[EMAIL PROTECTED]> writes:
>
> > It might be worth putting the list of things left to do before 1.0 in the
> > tree (since they clearly covary), and it would be useful to know what
> > you're thinking of as preventing the release at any particular stage.
>
> Yeah, yeah.  Call me lazy.
>
> Excerpts from my "last mile to 1.0", my Itchlist, and pieces from
> random other messages since then.
>
> - Documentation. [I really need help here --- among ~7000 lines
>   there, I've written around 2500 lines, David Greaves another
>   2500, and Linus 1400.  And it is not very easy to proofread
>   what you wrote yourself.]

I'm not sure how done this can actually get before some sort of feature
freeze; the best ways to do things keeps changing as more convenient ways
are added. Once the new stuff is diverted to post-1.0, I'd be interested
in going through it.

> - git prune and git fsck-cache; think about their interactions
>   with an object database that borrows from another.  This
>   includes the case where .git/objects itself is symlinked to
>   somewhere else (i.e. running "git prune" that somewhere else
>   without consulting this repository would lose objects), and
>   alternates pointing at somewhere else (i.e. ditto).

It should be fine, but only if .git/refs is symlinked to the matching
place; this gives you the same repository with multiple working trees.
Having refs/ and objects/ directories that aren't always together would be
much less safe.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Support packs in local-pull

2005-08-15 Thread Daniel Barkalow

If it doesn't find an object, it looks for an index that contains it
and uses the same methods on that instead.

Signed-off-by: Daniel Barkalow <[EMAIL PROTECTED]>
---

 local-pull.c |  112 +++---
 1 files changed, 91 insertions(+), 21 deletions(-)

aafbc7fb9ae059b9c9afa42e8d2c0548ea960f9f
diff --git a/local-pull.c b/local-pull.c
--- a/local-pull.c
+++ b/local-pull.c
@@ -15,34 +15,54 @@ void prefetch(unsigned char *sha1)
 {
 }
 
-int fetch(unsigned char *sha1)
+static struct packed_git *packs = NULL;
+
+void setup_index(unsigned char *sha1)
 {
-   static int object_name_start = -1;
-   static char filename[PATH_MAX];
-   char *hex = sha1_to_hex(sha1);
-   const char *dest_filename = sha1_file_name(sha1);
+   struct packed_git *new_pack;
+   char filename[PATH_MAX];
+   strcpy(filename, path);
+   strcat(filename, "/objects/pack/pack-");
+   strcat(filename, sha1_to_hex(sha1));
+   strcat(filename, ".idx");
+   new_pack = parse_pack_index_file(sha1, filename);
+   new_pack->next = packs;
+   packs = new_pack;
+}
 
-   if (object_name_start < 0) {
-   strcpy(filename, path); /* e.g. git.git */
-   strcat(filename, "/objects/");
-   object_name_start = strlen(filename);
+int setup_indices()
+{
+   DIR *dir;
+   struct dirent *de;
+   char filename[PATH_MAX];
+   unsigned char sha1[20];
+   sprintf(filename, "%s/objects/pack/", path);
+   dir = opendir(filename);
+   while ((de = readdir(dir)) != NULL) {
+   int namelen = strlen(de->d_name);
+   if (namelen != 50 || 
+   strcmp(de->d_name + namelen - 5, ".pack"))
+   continue;
+   get_sha1_hex(sha1, de->d_name + 5);
+   setup_index(sha1);
}
-   filename[object_name_start+0] = hex[0];
-   filename[object_name_start+1] = hex[1];
-   filename[object_name_start+2] = '/';
-   strcpy(filename + object_name_start + 3, hex + 2);
+   return 0;
+}
+
+int copy_file(const char *source, const char *dest, const char *hex)
+{
if (use_link) {
-   if (!link(filename, dest_filename)) {
+   if (!link(source, dest)) {
pull_say("link %s\n", hex);
return 0;
}
/* If we got ENOENT there is no point continuing. */
if (errno == ENOENT) {
-   fprintf(stderr, "does not exist %s\n", filename);
+   fprintf(stderr, "does not exist %s\n", source);
return -1;
}
}
-   if (use_symlink && !symlink(filename, dest_filename)) {
+   if (use_symlink && !symlink(source, dest)) {
pull_say("symlink %s\n", hex);
return 0;
}
@@ -50,25 +70,25 @@ int fetch(unsigned char *sha1)
int ifd, ofd, status;
struct stat st;
void *map;
-   ifd = open(filename, O_RDONLY);
+   ifd = open(source, O_RDONLY);
if (ifd < 0 || fstat(ifd, &st) < 0) {
close(ifd);
-   fprintf(stderr, "cannot open %s\n", filename);
+   fprintf(stderr, "cannot open %s\n", source);
return -1;
}
map = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, ifd, 0);
close(ifd);
if (map == MAP_FAILED) {
-   fprintf(stderr, "cannot mmap %s\n", filename);
+   fprintf(stderr, "cannot mmap %s\n", source);
return -1;
}
-   ofd = open(dest_filename, O_WRONLY | O_CREAT | O_EXCL, 0666);
+   ofd = open(dest, O_WRONLY | O_CREAT | O_EXCL, 0666);
status = ((ofd < 0) ||
  (write(ofd, map, st.st_size) != st.st_size));
munmap(map, st.st_size);
close(ofd);
if (status)
-   fprintf(stderr, "cannot write %s\n", dest_filename);
+   fprintf(stderr, "cannot write %s\n", dest);
else
pull_say("copy %s\n", hex);
return status;
@@ -77,6 +97,56 @@ int fetch(unsigned char *sha1)
return -1;
 }
 
+int fetch_pack(unsigned char *sha1)
+{
+   struct packed_git *target;
+   char filename[PATH_MAX];
+   if (setup_indices())
+   return -1;
+   target = find_sha1_pack(sha1, packs);
+   if (!target)
+   return error("Couldn't find %s: not separate or in

[PATCH] Add function to read an index file from an arbitrary filename.

2005-08-15 Thread Daniel Barkalow

Note that the pack file has to be in the usual location if it gets
installed later.

Signed-off-by: Daniel Barkalow <[EMAIL PROTECTED]>
---

 cache.h |2 ++
 sha1_file.c |   10 --
 2 files changed, 10 insertions(+), 2 deletions(-)

59e5c6d163edae5da6136560d48a4750cceacdc6
diff --git a/cache.h b/cache.h
--- a/cache.h
+++ b/cache.h
@@ -319,6 +319,8 @@ extern int get_ack(int fd, unsigned char
 extern struct ref **get_remote_heads(int in, struct ref **list, int nr_match, 
char **match);
 
 extern struct packed_git *parse_pack_index(unsigned char *sha1);
+extern struct packed_git *parse_pack_index_file(unsigned char *sha1, 
+   char *idx_path);
 
 extern void prepare_packed_git(void);
 extern void install_packed_git(struct packed_git *pack);
diff --git a/sha1_file.c b/sha1_file.c
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -476,12 +476,18 @@ struct packed_git *add_packed_git(char *
 
 struct packed_git *parse_pack_index(unsigned char *sha1)
 {
+   char *path = sha1_pack_index_name(sha1);
+   return parse_pack_index_file(sha1, path);
+}
+
+struct packed_git *parse_pack_index_file(unsigned char *sha1, char *idx_path)
+{
struct packed_git *p;
unsigned long idx_size;
void *idx_map;
-   char *path = sha1_pack_index_name(sha1);
+   char *path;
 
-   if (check_packed_git_idx(path, &idx_size, &idx_map))
+   if (check_packed_git_idx(idx_path, &idx_size, &idx_map))
return NULL;
 
path = sha1_pack_name(sha1);

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/2] Fix local-pull on packed repository

2005-08-15 Thread Daniel Barkalow

This adds essentially the same logic to local-pull that http-pull has, 
with the exception that it reads the index out of the source directory, 
rather than copying it. This, in turn, requires the ability to use an 
index file in some other directory.

 1: Use index file in another directory
 2: Copy/link/symlink pack files as appropriate

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Cloning speed comparison

2005-08-15 Thread Daniel Barkalow

On Mon, 15 Aug 2005, Junio C Hamano wrote:

> Daniel Barkalow <[EMAIL PROTECTED]> writes:
> 
> > I should be able to get http-pull down to the neighborhood of 
> > (current) ssh-pull; http-pull is that slow (when the source repository 
> > isn't packed) because it's entirely sequential, rather than overlapping 
> > requests like ssh-pull now does.
> 
> I like those prefetch() and process() code in pull.c very much.
> 
> I have been wondering if increasing parallelism more by
> prefetching beyond the immediate parents of the current commit,
> in "if (get_history)" part of process_commit().  Maybe it is not
> worth it because doing a commit, its associated tree(s) and its
> parents would already give us enough parallelism already.

It is actually already maxing out the parallelism; it has a FIFO of 
objects which it needs, and calls prefetch() when it enqueues an object 
and fetch() when it dequeues it. It only cares about the dependancies for 
this purpose, not the types.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Git 1.0 Synopis (Draft v4)

2005-08-15 Thread Daniel Barkalow

On Mon, 15 Aug 2005, Junio C Hamano wrote:

> Ryan Anderson <[EMAIL PROTECTED]> writes:
> 
> > I was waiting until you said, "Ok, 1.00 tomorrow morning"
> 
> Makes sense.  There would be some weeks until that happens I am
> afraid.

It might be worth putting the list of things left to do before 1.0 in the 
tree (since they clearly covary), and it would be useful to know what 
you're thinking of as preventing the release at any particular stage.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Cloning speed comparison

2005-08-15 Thread Daniel Barkalow

On Sat, 13 Aug 2005, Petr Baudis wrote:

>   Hello,
> 
>   I've wondered how slow the protocols other than rsync are, and the
> (well, a bit dubious; especially wrt. caching on the remote side)
> results are:
> 
>   git clone-pack:ssh  25s
>   git rsync   27s
>   git http-pull   47s
>   git dumb-http   54s
>   git ssh-pull660s
> 
>   cogito  clone-pack:ssh  35s (!)
>   cogito  rsync   140s
>   cogito  ssh-pull480s
>   cogito  http-pull   extrapolated to about an hour!

I should be able to get http-pull down to the neighborhood of 
(current) ssh-pull; http-pull is that slow (when the source repository 
isn't packed) because it's entirely sequential, rather than overlapping 
requests like ssh-pull now does.

I should also be able to get ssh-pull down to the area of clone-pack, but 
that's lower-priority, since there's clone-pack.

(I've written an untested patch for local-pull, which I'll be testing, 
cleaning, and submitting tonight, assuming my newly-arrived monitor 
actually works)

>   PS:
>   With the latest git version as of time of writing this:
>   $ time cg-clone git+ssh://[EMAIL PROTECTED]/home/pasky/WWW/dev/git/.g 
> cogito
>   ...
>   progress: 5759 objects, 10292457 bytes
>   $ time cg-clone http://localhost/~pasky/dev/git/.g cogito
>   ...
>   progress: 8681 objects, 14881571 bytes

I've noticed that ssh connections don't actually disconnect at the end 
with recent versions of ssh sometimes. In my experience, this occasionally 
happens with git, but always happens with scp, suggesting that it's an ssh 
bug of some sort; I've also only noticed this with openssh 3.9_p1 with 
some of Gentoo's -r2 patches.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [OT?] git tools at SourceForge ?

2005-08-12 Thread Daniel Barkalow

On Sat, 13 Aug 2005, Martin Langhoff wrote:

> >Yes, developers can just merge with each other directly
> 
> I take it that you mean an exchange of patches that does not depend on
> having public repos. What are the mechanisms available on that front,
> other than patchbombs?

If each developer has a trivial web server, they can put their output 
there, and everyone else can pull from it, because it only needs to serve 
static files out of a directory structure that the programs create 
regularly. Of course, this is only strictly different from a public repo 
in that you don't advertize it beyond the other developers. But it's a 
within-system equivalent to posting a link to a web-hosted patch set, 
which people sometimes do to pass things around.

> > And so I'd be thrilled to have some site like SF support it.
> 
> Eduforge's charter is to host education-related projects, so that's
> not a free-for-all-comers, but I'm considering git support, as our
> usage of git is growing.

If you contribe the git support to gforge, presumably similar hosting 
sites will pick it up before too long.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [OT?] git tools at SourceForge ?

2005-08-12 Thread Daniel Barkalow

On Fri, 12 Aug 2005, Linus Torvalds wrote:

> And it's possible that git usage won't expand all that much either. But
> quite frankly, I think git is a lot better than CVS (or even SVN) by now,
> and I wouldn't be surprised if it started getting some use outside of the
> git-only and kernel projects once people start getting more used to it. 
> And so I'd be thrilled to have some site like SF support it.

I certainly think it's going to happen; it's just not surprising that it 
hasn't happened yet. Once there's a stable release and some publicity, I'd 
expect SF to see it as worthwhile. But a hosting site with git-only shell 
access needs to know what the necessary programs are going to be, which we 
haven't committed to quite yet.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [OT?] git tools at SourceForge ?

2005-08-12 Thread Daniel Barkalow

On Fri, 12 Aug 2005, Wolfgang Denk wrote:

> This is somewhat off topic here, so I apologize, but  I  didn't  know
> any better place to ask:
> 
> Has anybody any information if SourceForge is going to provide git  /
> cogito / ... for the projects they host? I asked SF, and they openend
> a new Feature Request (item #1252867); the message I received sounded
> as if I was the first person on the planet to ask...
> 
> Am I really alone with this?

The git architecture makes the central server less important, and it's 
easy to run your own. Also, kernel.org is providing space to a set of 
people with a large overlap with git users, since git hasn't been 
particularly publicized and kernel.org is hosting git.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Add "--sign" option to git-format-patch-script

2005-08-12 Thread Daniel Barkalow

On Fri, 12 Aug 2005, Junio C Hamano wrote:

> Good intentions, but I'd rather see these S-O-B lines in the
> actual commit objects.  Giving format-patch this option would
> discourage people to do so.  Maybe a patch to git commit would
> be more appropriate, methinks.

Maybe also something in format-patch to check that the commit has one? I, 
at least, tend to have unsigned commits for tracking stuff I've done but 
not cleaned up and signed ones that I want to send off as patches. I've 
confused the branches on occasion, although never when sending stuff, and 
it would be nice to have format-patch tell you if the commit didn't look 
right.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Re: git-http-pull broken in latest git

2005-08-12 Thread Daniel Barkalow

On Thu, 11 Aug 2005, Junio C Hamano wrote:

> Daniel Barkalow <[EMAIL PROTECTED]> writes:
> 
> > Petr Baudis <[EMAIL PROTECTED]> writes:
> >> Yes, but cg-clone doesn't - it naively depended on the core git tools
> >> actually, er.. working. ;-)
> 
> Sorry about that.  I used to have a wrapper to deal with packs
> around http-pull before Daniel's pack enhancement, and yanking
> it before really checking that enhanced http-pull actually
> worked was my fault as well.

It was actually the patches after the http-pull fixes (the ones for 
parallelizing pull.c) that broke things; one advantage to fixing 
local-pull would be that you can set up tests for it reasonably 
effectively, which would have caught the regression.

> > At some point, I have to revisit getting git-ssh-* to generate exactly the 
> > required pack and transfer that, but that's an efficiency issue, not a 
> > correctness one, and shouldn't be relevant to the problem you're having.
> 
> Wouldn't enhancing ssh-push to generate packs on the fly involve
> reinventing send-pack and/or upload-pack?

The idea is that you wouldn't have to identify what situation applied 
yourself; you could just invoke git-ssh-pull/git-ssh-push, and it would 
happen faster due to the compression benefits. The point is that scripts 
can just pick which git-*-pull to use based on the format of the remote 
branch address, without variation in behavior.

> The same thing can be said about local-pull to a lesser degree.
> Lesser because people, including Pasky who said so on the list
> recently, would like its hard-linking behaviour, and its not
> exploding the existing packs, which send-pack and upload-pack
> would not give.  So I would rate local-pull higher than
> ssh-push/pull on the priority scale if I were doing them.

This is a higher priority, but writing more than bugfixes is unpleasent at 
the moment due to my home workstation's monitor dying, so it'll probably 
be next week that I'll get to it. The git-ssh-* stuff is longer-term, 
since it works now, and isn't even all that slow with the overlapping 
requests.

You could, actually, probably do the local-pull fix if you wanted. I seem 
to recall that being your code originally; you just need to have fetch() 
identify that an object is in a pack, copy/link/symlink the index and 
pack instead of the object file, and add the pack to the list of 
registered packs. I've mostly been failing to deal with reading an index 
file that is in some directory that hasn't been registered as somewhere to 
read from (i.e. the source repository).

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Re: git-http-pull broken in latest git

2005-08-11 Thread Daniel Barkalow

On Fri, 12 Aug 2005, Petr Baudis wrote:

> Dear diary, on Fri, Aug 12, 2005 at 01:21:46AM CEST, I got a letter
> where Junio C Hamano <[EMAIL PROTECTED]> told me that...
> > Petr Baudis <[EMAIL PROTECTED]> writes:
> > 
> > > $ git-cat-file commit bf570303153902ec3d85570ed24515bcf8948848 | grep tree
> > > tree 41f10531f1799bbb31a1e0f7652363154ce96f45
> > > $ git-read-tree 41f10531f1799bbb31a1e0f7652363154ce96f45
> > > fatal: failed to unpack tree object 
> > > 41f10531f1799bbb31a1e0f7652363154ce96f45
> > 
> > > Kaboom. I think the issue might be that the reference dependency tree
> > > building is broken and it should've pulled the other pack as well.
> > 
> > Last time I checked, git-http-pull did not utilize the pack
> > dependency information, which indeed is wrong.  When it decides
> > to fetch a pack instead of an asked-for object, it should check
> > which commits the pack expects to have in your local repository
> > and add them to its list of things to slurp.
> > 
> > A good news is that "git clone" as a whole works fine.
> 
> Yes, but cg-clone doesn't - it naively depended on the core git tools
> actually, er.. working. ;-)
> 
> This became a nightmare to me by now - on two machines I tried to pull
> to over HTTP, that failed miserably, and I got stuck until I applied
> Daniel's patch there (and cleaned up after previous git-http-pulls).
> 
> So I have this packless git-pb repository and suspecting no evil, I pull
> from you (thankfully I have .git/objects/pack there from some historical
> pulls). I do a merge commit:
> 
>   packed
>... J
>   packed \
>> M
>  /
>... P
> 
> Now I want to pull on another machine. That pulls M and then fails since
> I have no .git/objects/pack there, bummer. So I mkdir it, but get no
> further w/o Daniel's patch - for git-*-pull, J is missing and that's it.
> So I apply the patch, and get friendly
> 
>   error: Unable to determine requirements of type (null) for M
> 
> and only after I delete M from the database, I finally succeed with
> git-http-pull. (That was with --repair.) That's not good since this
> might occur even naturally when the pull is interrupted.

Insufficient testing on my part; patch at the end.

> With git-ssh-pull, the situation is even more vexing - it refuses to
> fetch the packs for some reason yet unknown to me (I will debug it
> tomorrow).

git-ssh-pull doesn't deal in packs; it gets individual objects out of 
packs, which git-ssh-push (on the remote side) should be extracting. 
Perhaps you have a git-ssh-push on the remote side that's before I make 
packs work (it used to need to have the files for objects it was sending). 

At some point, I have to revisit getting git-ssh-* to generate exactly the 
required pack and transfer that, but that's an efficiency issue, not a 
correctness one, and shouldn't be relevant to the problem you're having.

---
[PATCH] Also parse objects we already have

In the case where we don't know from context what type an object is, but
we don't have to fetch it, we need to parse it to determine the type
before processing it.

Signed-off-by: Daniel Barkalow <[EMAIL PROTECTED]>
---

 pull.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

b8c382e76da25f45ff86176a6a6affdd9a28d60b
diff --git a/pull.c b/pull.c
--- a/pull.c
+++ b/pull.c
@@ -127,6 +127,7 @@ static int process(unsigned char *sha1, 
 {
struct object *obj = lookup_object_type(sha1, type);
if (has_sha1_file(sha1)) {
+   parse_object(sha1);
/* We already have it, so we should scan it now. */
return process_object(obj);
}
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Re: git-http-pull broken in latest git

2005-08-11 Thread Daniel Barkalow

On Thu, 11 Aug 2005, Junio C Hamano wrote:

> Daniel Barkalow <[EMAIL PROTECTED]> writes:
> 
> > It should work anyway,...
> 
> That is true.  Please forget about the "recommendation" to slurp
> packs and not falling back on commit walker.
> 
> Thanks for the patch.

No problem; I had been wondering what the rest of those lines were about 
anyway.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Re: git-http-pull broken in latest git

2005-08-11 Thread Daniel Barkalow

On Thu, 11 Aug 2005, Junio C Hamano wrote:

> Petr Baudis <[EMAIL PROTECTED]> writes:
> 
> > $ git-cat-file commit bf570303153902ec3d85570ed24515bcf8948848 | grep tree
> > tree 41f10531f1799bbb31a1e0f7652363154ce96f45
> > $ git-read-tree 41f10531f1799bbb31a1e0f7652363154ce96f45
> > fatal: failed to unpack tree object 41f10531f1799bbb31a1e0f7652363154ce96f45
> 
> > Kaboom. I think the issue might be that the reference dependency tree
> > building is broken and it should've pulled the other pack as well.
> 
> Last time I checked, git-http-pull did not utilize the pack
> dependency information, which indeed is wrong. 

Is there documentation on the format?

> When it decides to fetch a pack instead of an asked-for object, it 
> should check which commits the pack expects to have in your local 
> repository and add them to its list of things to slurp.

It should work anyway, except that I messed up some logic in the parallel 
pull stuff; when it finds it has something already, it ignores it 
entirely, rather than processing it. The following patch fixes this.
---
[PATCH] Fix parallel pull dependancy tracking.

It didn't refetch an object it already had (good), but didn't process
it, either (bad). Synchronously process anything you already have.

Signed-off-by: Daniel Barkalow <[EMAIL PROTECTED]>
---

 pull.c |   57 -
 1 files changed, 32 insertions(+), 25 deletions(-)

9b6b4b259c6b00d5b2502c158bc800d7623352bc
diff --git a/pull.c b/pull.c
--- a/pull.c
+++ b/pull.c
@@ -98,12 +98,38 @@ static int process_tag(struct tag *tag)
 static struct object_list *process_queue = NULL;
 static struct object_list **process_queue_end = &process_queue;
 
-static int process(unsigned char *sha1, const char *type)
+static int process_object(struct object *obj)
 {
-   struct object *obj;
-   if (has_sha1_file(sha1))
+   if (obj->type == commit_type) {
+   if (process_commit((struct commit *)obj))
+   return -1;
+   return 0;
+   }
+   if (obj->type == tree_type) {
+   if (process_tree((struct tree *)obj))
+   return -1;
return 0;
-   obj = lookup_object_type(sha1, type);
+   }
+   if (obj->type == blob_type) {
+   return 0;
+   }
+   if (obj->type == tag_type) {
+   if (process_tag((struct tag *)obj))
+   return -1;
+   return 0;
+   }
+   return error("Unable to determine requirements "
+"of type %s for %s",
+obj->type, sha1_to_hex(obj->sha1));
+}
+
+static int process(unsigned char *sha1, const char *type)
+{
+   struct object *obj = lookup_object_type(sha1, type);
+   if (has_sha1_file(sha1)) {
+   /* We already have it, so we should scan it now. */
+   return process_object(obj);
+   }
if (object_list_contains(process_queue, obj))
return 0;
object_list_insert(obj, process_queue_end);
@@ -134,27 +160,8 @@ static int loop(void)
return -1;
if (!obj->type)
parse_object(obj->sha1);
-   if (obj->type == commit_type) {
-   if (process_commit((struct commit *)obj))
-   return -1;
-   continue;
-   }
-   if (obj->type == tree_type) {
-   if (process_tree((struct tree *)obj))
-   return -1;
-   continue;
-   }
-   if (obj->type == blob_type) {
-   continue;
-   }
-   if (obj->type == tag_type) {
-   if (process_tag((struct tag *)obj))
-   return -1;
-   continue;
-   }
-   return error("Unable to determine requirements "
-"of type %s for %s",
-obj->type, sha1_to_hex(obj->sha1));
+   if (process_object(obj))
+   return -1;
}
return 0;
 }
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Bootstrapping into git, commit gripes at me

2005-07-12 Thread Daniel Barkalow

On Mon, 11 Jul 2005, Junio C Hamano wrote:

> Linus Torvalds <[EMAIL PROTECTED]> writes:
> 
> > But what about the branch name? Should we just ask the user? Together with 
> > a flag, like
> >
> > git checkout -b new-branch v2.6.12
> >
> > for somebody who wants to specify the branch name? Or should we pick a 
> > random name and add a helper function to rename a branch later?
> >
> > Opinions?
> 
> How about treating "master" a temporary thing --- "whatever I
> happen to be working on right now"?

That conflicts with my usage, where I have a single repository for all of
my working directories, with .git/refs and .git/objects being symlinks to 
it, but .git/HEAD being different for each branch. The stuff in objects/
and refs/ really shouldn't depend on what you're currently doing for this
reason.

My way of thinking of "master" is that it's a real branch, which is for
all of the situations where you aren't using a specially-designated
branch. For many people, they only do stuff that's not designated
specially; Jeff only does stuff that is designated specially. But if you
do both, you'll want master to be left alone while you work on the side
branch.

-Daniel
*This .sig left intentionally blank*

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] Demo support for packs via HTTP

2005-07-11 Thread Daniel Barkalow

On Mon, 11 Jul 2005, Darrin Thompson wrote:

> On Sun, 2005-07-10 at 15:56 -0400, Daniel Barkalow wrote:
> > +   curl_easy_setopt(curl, CURLOPT_FILE, indexfile);
> > +   curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, fwrite);
> > +   curl_easy_setopt(curl, CURLOPT_URL, url);
> 
> I was hoping to send in a patch which would turn on user auth and turn
> off ssl peer verification.
> 
> Your (preliminary obviously) patch puts curl handling in two places. Is
> there a place were I can safely start working on adding the needed
> setopts?

If I understand the curl documentation, you should be able to set options 
on the curl object when it has just been created, if those options aren't
going to change between requests. Note that I make requests from multiple
places, but I use the same curl object for all of them.

-Daniel
*This .sig left intentionally blank*

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] Support for packs in HTTP

2005-07-11 Thread Daniel Barkalow

On Mon, 11 Jul 2005, Linus Torvalds wrote:

> 
> 
> On Mon, 11 Jul 2005, Daniel Barkalow wrote:
> > On Sun, 10 Jul 2005, Linus Torvalds wrote:
> > 
> > > 
> > > You really _mustn't_ try to create the pack directly to the
> > > $GIT_DIR/objects/pack subdirectory - that would make git itself start
> > > possibly using that pack before the index is all done, and that would be
> > > just wrong and nasty.
> > >
> > > So you really should _always_ generate the pack somewhere else, and then 
> > > move it (pack file first, index file second).
> > 
> > It's currently fine ignoring index files without corresponding
> > pack files (sha1_file.c, line 470).
> 
> That doesn't help.

Well, it means that the order you move them doesn't matter, because it
will ignore the pair if either hasn't been moved.

> Redgardless of which order you write them (and you _will_ write the 
> pack-file first), you'll find that at some point you have both files, but 
> one or the other isn't fully written, ie they are unusable.

(Off topic: note that git-http-pull writes the _index_ first, because it
fetches it to determine if it should fetch the pack)

> And yes, you can handle that by always checking the SHA1 of the files when 
> you open them, but the fact is, you shouldn't need to, just to use it. 
> Checking the SHA1 of the pack-file in particular is very expensive (since 
> it's potentially a huge file, and you don't even want to read all of it).

IIRC, we check the size of the pack file and there are hashes around the
ends of the two files which have to match; but this is a die() check, not
an ignore check, so we just crash with a clear error message rather than
doing crazy stuff (like reading from beyond the end of the mmap).

> So that's what I decided the rule is: never ever have a partial file, and 
> thus you can by definition use them immediately when you see both files.
> 
> But that requires that you write them under another name than the final 
> one. And since you want that _anyway_ for other uses, you don't hide that 
> inside "git-pack-objects", but you make it an exported interface.

We should never write anything under the final name, anyway, for just this
reason; we already use open/write/close/rename for objects, refs, and
cache (maybe not working directory files, though). I think we're actually
agreeing on this.

My position is that the temporary location should be something like
{final-name}.part, such that it doesn't match *.idx or *.pack beforehand
(so it doesn't look like a complete file that you might want to send to
someone) and it doesn't have to worry about EXDEV on the rename. Also, I
would ideally like to be able to resume an interrupted download, which
means that it would have to find the partial file in a predictable
location, given what it's supposed to contain.

-Daniel
*This .sig left intentionally blank*

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] Support for packs in HTTP

2005-07-10 Thread Daniel Barkalow

On Sun, 10 Jul 2005, Linus Torvalds wrote:

> On Sun, 10 Jul 2005, Daniel Barkalow wrote:
> > 
> > Perhaps git-pack-objects should have the base as a optional argument,
> > with a default of the filename in $GIT_DIR/objects/pack and an option
> > for sending just the pack file to stdout?
> 
> You really _mustn't_ try to create the pack directly to the
> $GIT_DIR/objects/pack subdirectory - that would make git itself start
> possibly using that pack before the index is all done, and that would be
> just wrong and nasty.
>
> So you really should _always_ generate the pack somewhere else, and then 
> move it (pack file first, index file second).

It's currently fine ignoring index files without corresponding
pack files (sha1_file.c, line 470). Do you want to make the constraint
that the pack/ directory doesn't have index files for packs that aren't
also there? (I've been putting the index files for packs that might be
possibile to get there, and relying on the above check to make sure that
they don't affect anything if it hasn't fetched the pack.)

Of course, we should never write to files in locations that anything looks
at; we want everything to appear atomically, completely written and
verified. But there's nothing wrong with having the C code place the
objects, which is certainly going to be necessary in the case of
downloading them by HTTP, since the program will want to place them and
enable them while running.

-Daniel
*This .sig left intentionally blank*

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] Support for packs in HTTP

2005-07-10 Thread Daniel Barkalow

On Sun, 10 Jul 2005, Linus Torvalds wrote:

> 
> 
> On Sun, 10 Jul 2005, Daniel Barkalow wrote:
> 
> > On Sun, 10 Jul 2005, Linus Torvalds wrote:
> > > 
> > > Well, regardless, we want to be able to specify which directory to write 
> > > them to. We don't necessarily want to write them to the current working 
> > > directory, nor do we want to write them to their eventual destination in 
> > > .git/objects/pack.
> > > 
> > > In fact, the main current user ("git repack") really wants to write them 
> > > to a temporary file, and one that isn't even called "pack-xxx", since it 
> > > ends up doing cleanup with 
> > > 
> > >   rm -f .tmp-pack-*
> > > 
> > > in case a previous re-pack was interrupted (in which case it simply cannor
> > > know what the exact name was supposed to be).
> > > 
> > > So the "basename" ends up being necessary and meaningful regardless. We 
> > > do 
> > > _not_ want to remove that capability.
> > 
> > Shouldn't we do the same thing we do with object files? I don't see any
> > difference in desired behavior.
> 
> Well, the main difference is that pack-files can be used for many things.
> 
> For example, a web interface for getting a pack-file between two releases: 
> say you knew you had version xyzzy, and you want to get version xyzzy+1, 
> you could do that through webgit some way even with a "stupid" interface. 
> Kay already had some patch to generate pack-files for something.
> 
> The point being that pack-files are _not_ like objects. Pack-files are 
> meant for communication. Having them in .git/objects/pack is just one 
> special case.

Okay, I can see the use for them getting written to arbitrary paths; but I
think that it's worth having a canonical location for a pack that's being
used by the system (either not having been sent anywhere, or after having
been received). Perhaps git-pack-objects should have the base as a
optional argument, with a default of the filename in $GIT_DIR/objects/pack
and an option for sending just the pack file to stdout? I think that
covers everything in order of usefulness, and means that the program deals
with any filename that the user doesn't know in advance.

> > Why not checksum it in a predictable order, either that of the pack file
> > or the index? We do care that it's something verifiable, so that people
> > can't cause intentional collisions (for a DoS) just by naming their packs
> > after existing packs that users might not have downloaded yet.
> 
> We could sha1-sum the "sorted by SHA1" list, I guess.

That'd be good; then git-http-pull can validate the hash on the index and
be sure that a matching pack file from a different location still has the
same contents.

-Daniel
*This .sig left intentionally blank*

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] Support for packs in HTTP

2005-07-10 Thread Daniel Barkalow

On Sun, 10 Jul 2005, Linus Torvalds wrote:

> On Sun, 10 Jul 2005, Junio C Hamano wrote:
> >
> > So I would suggest either:
> > 
> >   - droping the packname parameter from git-pack-objects.  Make
> > the packs always named pack-X{40}.pack (or just X{40}.pack);
> 
> Well, regardless, we want to be able to specify which directory to write 
> them to. We don't necessarily want to write them to the current working 
> directory, nor do we want to write them to their eventual destination in 
> .git/objects/pack.
> 
> In fact, the main current user ("git repack") really wants to write them 
> to a temporary file, and one that isn't even called "pack-xxx", since it 
> ends up doing cleanup with 
> 
>   rm -f .tmp-pack-*
> 
> in case a previous re-pack was interrupted (in which case it simply cannor
> know what the exact name was supposed to be).
> 
> So the "basename" ends up being necessary and meaningful regardless. We do 
> _not_ want to remove that capability.

Shouldn't we do the same thing we do with object files? I don't see any
difference in desired behavior.

> > also have verify-pack to verify the name of the packfile,
> > and make sure X{40} part of the name matches what it claims
> > to contain;
> 
> Now, that would be fine, but it can't be done. Not the way things are laid 
> out. A SHA1 checksum depends on the order the data was checksummed in, and 
> we don't even save that.

Why not checksum it in a predictable order, either that of the pack file
or the index? We do care that it's something verifiable, so that people
can't cause intentional collisions (for a DoS) just by naming their packs
after existing packs that users might not have downloaded yet.

-Daniel
*This .sig left intentionally blank*

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] write_sha1_to_fd()

2005-07-10 Thread Daniel Barkalow

Add write_sha1_to_fd(), which writes an object to a file descriptor. This
includes support for unpacking it and recompressing it.

Signed-off-by: Daniel Barkalow <[EMAIL PROTECTED]>
---
commit 264ff9f3dcde5553728b34fa08e04643b2b55946
tree 353fe33ae9c7265d7b685bca864d657e3efe2849
parent c3eb461762b1d65e424fc4ede6a1d4f3e0a679f7
author Daniel Barkalow <[EMAIL PROTECTED]> 1121033477 -0400
committer Daniel Barkalow <[EMAIL PROTECTED](none)> 1121033477 -0400

Index: cache.h
===
--- 545ef8191b517b7f9e4ea558edaf526038ed1895/cache.h  (mode:100644 
sha1:719a77dfabb24e58abd21b7f3a4b846a114e000a)
+++ 353fe33ae9c7265d7b685bca864d657e3efe2849/cache.h  (mode:100644 
sha1:38dac6d6a413f1c788e5331ef4741fc15d72d9bd)
@@ -187,6 +187,7 @@
 extern int read_tree(void *buffer, unsigned long size, int stage);
 
 extern int write_sha1_from_fd(const unsigned char *sha1, int fd);
+extern int write_sha1_to_fd(int fd, const unsigned char *sha1);
 
 extern int has_sha1_pack(const unsigned char *sha1);
 extern int has_sha1_file(const unsigned char *sha1);
Index: sha1_file.c
===
--- 545ef8191b517b7f9e4ea558edaf526038ed1895/sha1_file.c  (mode:100644 
sha1:27136fdba0fbf2dd943f2634cb49660cdbf95ec4)
+++ 353fe33ae9c7265d7b685bca864d657e3efe2849/sha1_file.c  (mode:100644 
sha1:08560b2c7a6dff400a46160501c247081f9bb4c7)
@@ -1326,6 +1326,65 @@
return 0;
 }
 
+int write_sha1_to_fd(int fd, const unsigned char *sha1)
+{
+   ssize_t size;
+   unsigned long objsize;
+   int posn = 0;
+   char *buf = map_sha1_file_internal(sha1, &objsize, 0);
+   z_stream stream;
+   if (!buf) {
+   unsigned char *unpacked;
+   unsigned long len;
+   char type[20];
+   char hdr[50];
+   int hdrlen;
+   // need to unpack and recompress it by itself
+   unpacked = read_packed_sha1(sha1, type, &len);
+
+   hdrlen = sprintf(hdr, "%s %lu", type, len) + 1;
+
+   /* Set it up */
+   memset(&stream, 0, sizeof(stream));
+   deflateInit(&stream, Z_BEST_COMPRESSION);
+   size = deflateBound(&stream, len + hdrlen);
+   buf = xmalloc(size);
+
+   /* Compress it */
+   stream.next_out = buf;
+   stream.avail_out = size;
+   
+   /* First header.. */
+   stream.next_in = hdr;
+   stream.avail_in = hdrlen;
+   while (deflate(&stream, 0) == Z_OK)
+   /* nothing */;
+
+   /* Then the data itself.. */
+   stream.next_in = unpacked;
+   stream.avail_in = len;
+   while (deflate(&stream, Z_FINISH) == Z_OK)
+   /* nothing */;
+   deflateEnd(&stream);
+   
+   objsize = stream.total_out;
+   }
+
+   do {
+   size = write(fd, buf + posn, objsize - posn);
+   if (size <= 0) {
+   if (!size) {
+   fprintf(stderr, "write closed");
+   } else {
+   perror("write ");
+   }
+   return -1;
+   }
+   posn += size;
+   } while (posn < objsize);
+   return 0;
+}
+
 int write_sha1_from_fd(const unsigned char *sha1, int fd)
 {
char *filename = sha1_file_name(sha1);
Index: ssh-push.c
===
--- 545ef8191b517b7f9e4ea558edaf526038ed1895/ssh-push.c  (mode:100644 
sha1:090d6f9f8fbde2d736ac5bf563415b0fa402b5aa)
+++ 353fe33ae9c7265d7b685bca864d657e3efe2849/ssh-push.c  (mode:100644 
sha1:aac70af514e0dc5507fa4997ebad54352c973215)
@@ -7,13 +7,13 @@
 static unsigned char local_version = 1;
 static unsigned char remote_version = 0;
 
+static int verbose = 0;
+
 static int serve_object(int fd_in, int fd_out) {
ssize_t size;
-   int posn = 0;
unsigned char sha1[20];
-   unsigned long objsize;
-   void *buf;
signed char remote;
+   int posn = 0;
do {
size = read(fd_in, sha1 + posn, 20 - posn);
if (size < 0) {
@@ -25,12 +25,12 @@
posn += size;
} while (posn < 20);

-   /* fprintf(stderr, "Serving %s\n", sha1_to_hex(sha1)); */
+   if (verbose)
+   fprintf(stderr, "Serving %s\n", sha1_to_hex(sha1));
+
remote = 0;

-   buf = map_sha1_file(sha1, &objsize);
-   
-   if (!buf) {
+   if (!has_sha1_file(sha1)) {
fprintf(stderr, "git-ssh-push: could not find %s\n", 
sha1_to_hex(sha1));

[PATCH 2/2] Remove map_sha1_file

2005-07-10 Thread Daniel Barkalow

Remove map_sha1_file(), now unused.

Signed-off-by: Daniel Barkalow <[EMAIL PROTECTED]>
---
commit c21a02262f770a25b005378e06354e582aa1bfd8
tree 7ac9fabe666f00f37572e7b349fdb859bf8a6491
parent 264ff9f3dcde5553728b34fa08e04643b2b55946
author Daniel Barkalow <[EMAIL PROTECTED]> 1121033599 -0400
committer Daniel Barkalow <[EMAIL PROTECTED](none)> 1121033599 -0400

Index: cache.h
===
--- 353fe33ae9c7265d7b685bca864d657e3efe2849/cache.h  (mode:100644 
sha1:38dac6d6a413f1c788e5331ef4741fc15d72d9bd)
+++ 7ac9fabe666f00f37572e7b349fdb859bf8a6491/cache.h  (mode:100644 
sha1:11ba95c8aa9202fa3b1a3cbc07bc976641cd1908)
@@ -167,7 +167,6 @@
 int safe_create_leading_directories(char *path);
 
 /* Read and unpack a sha1 file into memory, write memory to a sha1 file */
-extern void * map_sha1_file(const unsigned char *sha1, unsigned long *size);
 extern int unpack_sha1_header(z_stream *stream, void *map, unsigned long 
mapsize, void *buffer, unsigned long size);
 extern int parse_sha1_header(char *hdr, char *type, unsigned long *sizep);
 extern int sha1_object_info(const unsigned char *, char *, unsigned long *);
Index: sha1_file.c
===
--- 353fe33ae9c7265d7b685bca864d657e3efe2849/sha1_file.c  (mode:100644 
sha1:08560b2c7a6dff400a46160501c247081f9bb4c7)
+++ 7ac9fabe666f00f37572e7b349fdb859bf8a6491/sha1_file.c  (mode:100644 
sha1:e082f2e6cb985caca11979311c291aa51d6c37fd)
@@ -578,8 +578,7 @@
 }
 
 static void *map_sha1_file_internal(const unsigned char *sha1,
-   unsigned long *size,
-   int say_error)
+   unsigned long *size)
 {
struct stat st;
void *map;
@@ -587,8 +586,6 @@
char *filename = find_sha1_file(sha1, &st);
 
if (!filename) {
-   if (say_error)
-   error("cannot map sha1 file %s", sha1_to_hex(sha1));
return NULL;
}
 
@@ -602,8 +599,6 @@
break;
/* Fallthrough */
case 0:
-   if (say_error)
-   perror(filename);
return NULL;
}
 
@@ -620,11 +615,6 @@
return map;
 }
 
-void *map_sha1_file(const unsigned char *sha1, unsigned long *size)
-{
-   return map_sha1_file_internal(sha1, size, 1);
-}
-
 int unpack_sha1_header(z_stream *stream, void *map, unsigned long mapsize, 
void *buffer, unsigned long size)
 {
/* Get the data stream */
@@ -1112,7 +1102,7 @@
z_stream stream;
char hdr[128];
 
-   map = map_sha1_file_internal(sha1, &mapsize, 0);
+   map = map_sha1_file_internal(sha1, &mapsize);
if (!map) {
struct pack_entry e;
 
@@ -1151,7 +1141,7 @@
unsigned long mapsize;
void *map, *buf;
 
-   map = map_sha1_file_internal(sha1, &mapsize, 0);
+   map = map_sha1_file_internal(sha1, &mapsize);
if (map) {
buf = unpack_sha1_file(map, mapsize, type, size);
munmap(map, mapsize);
@@ -1331,7 +1321,7 @@
ssize_t size;
unsigned long objsize;
int posn = 0;
-   char *buf = map_sha1_file_internal(sha1, &objsize, 0);
+   char *buf = map_sha1_file_internal(sha1, &objsize);
z_stream stream;
if (!buf) {
unsigned char *unpacked;

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/2] Handing sending objects from packs

2005-07-10 Thread Daniel Barkalow

This series adds support for sending individual objects from packs in in
git-ssh-push and removes map_sha1_file.

-Daniel
*This .sig left intentionally blank*

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Make --recover cause pull to trace everything

2005-07-10 Thread Daniel Barkalow

Make the --recover flag check the parents of commits which are already
available. This is needed currently to deal with cases where a parent is
pulled along with a commit (in a pack, e.g.) and references above that
parent aren't also pulled together.

Signed-off-by: Daniel Barkalow <[EMAIL PROTECTED]>
---
commit 75e8c1be7a778e0a0fa119fe1bc408341932e7e5
tree ffbe708117543c356eb2981f1e0540b89b7a95e2
parent a7336ae514738f159dad314d6674961427f043a6
author Daniel Barkalow <[EMAIL PROTECTED]> 1121024019 -0400
committer Daniel Barkalow <[EMAIL PROTECTED](none)> 1121024019 -0400

Index: http-pull.c
===
--- 248f72f3e4dcb40693488b0c06f93d0b38122b8e/http-pull.c  (mode:100644 
sha1:1f9d60b9b1d5eed85b24d96c240666bbfc5a22ed)
+++ ffbe708117543c356eb2981f1e0540b89b7a95e2/http-pull.c  (mode:100644 
sha1:3fa56f08b0b8e7316afcaab3a7bfa3f2d26b550f)
@@ -146,7 +146,10 @@
int arg = 1;
 
while (arg < argc && argv[arg][0] == '-') {
-   if (argv[arg][1] == 't') {
+   if (argv[arg][1] == '-') {
+   if (!strcmp(argv[arg] + 2, "recover"))
+   careful = 1;
+   } else if (argv[arg][1] == 't') {
get_tree = 1;
} else if (argv[arg][1] == 'c') {
get_history = 1;
Index: local-pull.c
===
--- 248f72f3e4dcb40693488b0c06f93d0b38122b8e/local-pull.c  (mode:100644 
sha1:2f06fbee8b840a7ae642f5a22e2cb993687f3470)
+++ ffbe708117543c356eb2981f1e0540b89b7a95e2/local-pull.c  (mode:100644 
sha1:0d10c07844030bc7cb615cf916dce89592151be7)
@@ -116,7 +116,10 @@
int arg = 1;
 
while (arg < argc && argv[arg][0] == '-') {
-   if (argv[arg][1] == 't')
+   if (argv[arg][1] == '-') {
+   if (!strcmp(argv[arg] + 2, "recover"))
+   careful = 1;
+   } else if (argv[arg][1] == 't')
get_tree = 1;
else if (argv[arg][1] == 'c')
get_history = 1;
Index: pull.c
===
--- 248f72f3e4dcb40693488b0c06f93d0b38122b8e/pull.c  (mode:100644 
sha1:ed3078e3b27c62c07558fd94f339801cbd685593)
+++ ffbe708117543c356eb2981f1e0540b89b7a95e2/pull.c  (mode:100644 
sha1:d9763840c7ebcb1e5838c3b960695cafcca3ac73)
@@ -11,6 +11,7 @@
 
 const unsigned char *current_ref = NULL;
 
+int careful = 0;
 int get_tree = 0;
 int get_history = 0;
 int get_all = 0;
@@ -91,7 +92,8 @@
if (get_history) {
struct commit_list *parents = obj->parents;
for (; parents; parents = parents->next) {
-   if (has_sha1_file(parents->item->object.sha1))
+   if (!careful &&
+   has_sha1_file(parents->item->object.sha1))
continue;
if (make_sure_we_have_it(NULL,
 parents->item->object.sha1)) {
Index: pull.h
===
--- 248f72f3e4dcb40693488b0c06f93d0b38122b8e/pull.h  (mode:100644 
sha1:e173ae3337c4465da87d849f4e5c9da203fdf01d)
+++ ffbe708117543c356eb2981f1e0540b89b7a95e2/pull.h  (mode:100644 
sha1:d1076468b71b31dd5e59ec55d98de830cf9df60e)
@@ -21,6 +21,12 @@
 /* If set, the hash that the current value of write_ref must be. */
 extern const unsigned char *current_ref;
 
+/* 
+ * Set to check on everything, instead of stopping at points where we think
+ * we must have everything.
+ */
+extern int careful;
+
 /* Set to fetch the target tree. */
 extern int get_tree;
 
Index: ssh-pull.c
===
--- 248f72f3e4dcb40693488b0c06f93d0b38122b8e/ssh-pull.c  (mode:100644 
sha1:26356dd7d84ea1bc9f7320b18562ed4117d4fac0)
+++ ffbe708117543c356eb2981f1e0540b89b7a95e2/ssh-pull.c  (mode:100644 
sha1:7ca4243f3bd84590e7bb94467fd5acccd7d4d6f9)
@@ -61,7 +61,10 @@
const char *prog = getenv("GIT_SSH_PUSH") ? : "git-ssh-push";
 
while (arg < argc && argv[arg][0] == '-') {
-   if (argv[arg][1] == 't') {
+   if (argv[arg][1] == '-') {
+   if (!strcmp(argv[arg] + 2, "recover"))
+   careful = 1;
+   } else if (argv[arg][1] == 't') {
get_tree = 1;
} else if (argv[arg][1] == 'c') {
get_history = 1;

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 >

1 - 100 of 183 matches

Mail list logo