Re: refspecs with '*' as part of pattern

2015-07-06 Thread Daniel Barkalow
On Mon, 6 Jul 2015, Junio C Hamano wrote:

 Jacob Keller jacob.kel...@gmail.com writes:
 
  I've been looking at the refspecs for git fetch, and noticed that
  globs are partially supported. I wanted to use something like:
 
  refs/tags/some-prefix-*:refs/tags/some-prefix-*
 
  as a refspec, so that I can fetch only tags which have a specific
  prefix. I know that I could use namespaces to separate tags, but
  unfortunately, I am unable to fix the tag format. The specific
  repository in question is also generating several tags which are not
  relevant to me, in formats that are not really useful for human
  consumption. I am also not able to fix this less than useful practice.
 
  However, I noticed that refspecs only support * as a single component.
  The match algorithm works perfectly fine, as documented in
  abd2bde78bd9 (Support '*' in the middle of a refspec)
 
  What is the reason for not allowing slightly more arbitrary
  expressions? Obviously no more than one *...
 
 I cannot seem to be able to find related discussions around that
 patch, so this is only my guess, but I suspect that this is to
 discourage people from doing something like:
 
   refs/tags/*:refs/tags/foo-*
 
 which would open can of worms (e.g. imagine you fetch with that
 pathspec and then push with refs/tags/*:refs/tags/* back there;
 would you now get foo-v1.0.0 and foo-foo-v1.0.0 for their v1.0.0
 tag?) we'd prefer not having to worry about.

That wouldn't be it, since refs/tags/*:refs/tags/foo/* would have the same 
problem, assuming you didn't set up the push refspec carefully.

I think it was mostly that it would be too easy to accidentally do 
something you don't want by having some other character instead of a 
slash, like refs/heads/*:refs/heads-*.

Aside from the increased risk of hard-to-spot typos leading to very weird 
behavior, nothing actually goes wrong; in fact, I've been using git with 
that check removed for ages because I wanted a refspec like 
refs/heads/something-*:refs/heads/*. And it works fine as a local patch, 
since you don't need your refspec handling to interoperate with other 
repositories.

-Daniel
*This .sig left intentionally blank*
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 00/13] New remote-hg helper

2012-10-31 Thread Daniel Barkalow
On Wed, 31 Oct 2012, Felipe Contreras wrote:

 Hi,
 
 On Wed, Oct 31, 2012 at 7:59 PM, Jonathan Nieder jrnie...@gmail.com wrote:
  Felipe Contreras wrote:
  On Wed, Oct 31, 2012 at 7:20 PM, Johannes Schindelin
  johannes.schinde...@gmx.de wrote:
 
  I just tested this with junio/next and it seems this issue is still
  unfixed: instead of
 
  reset refs/heads/blub
  from e7510461b7db54b181d07acced0ed3b1ada072c8
 
  I get
 
  reset refs/heads/blub
  from :0
 
  when running git fast-export ^master blub.
 
  That is not a problem. It has been discussed extensively, and the
  consensus seems to be that such command should throw nothing:
 
  http://article.gmane.org/gmane.comp.version-control.git/208729
 
  Um.  Are you claiming I have said that git fast-export ^master blub
  should silently emit nothing?  Or has this been discussed extensively
  with someone else?
 
 Maybe I misunderstood when you said:
  A patch meeting the above description would make perfect sense to me.
 
 Anyway, when you have:
 
 % git fast-export ^next next^{commit}
 # nothing
 % git fast-export ^next next~0
 # nothing
 % git fast-export ^next next~1
 # nothing
 % git fast-export ^next next~2
 # nothing
 
 It only makes sense that:
 
 % git fast-export ^next next
 # nothing
 
 It doesn't get any more obvious than that. But to each his own.

I think that may be true where you have next in both places, but I  
think:

$ git checkout -b new-branch master
$ git fast-export ^master new-branch

ought to emit no commit lines, but needs to emit a reset line. After 
all, you haven't told fast-export that the ref new-branch is up to date, 
and you have told it that you want it to be exported. If you create a new 
branch off of an existing commit, don't change it, and push it to hg, it 
shouldn't be up to remote-hg to figure out what should happen with no 
input; it should get a:

reset refs/heads/new-branch
from [something]

I don't know why Johannes seems to want [something] not to be a mark 
reference (unless he's complaining about getting an invalid mark 
reference when there aren't any marks defined), but surely something of 
the above form is necessary to tell remote-hg to create the new branch.

I think it would be worth testing that:

$ git checkout -b new-branch master
$ git push hg new-branch

creates the new branch successfully (which I think it does, but wouldn't 
if git fast-export ^master new-branch actually returned nothing; 
parsed_refs gets it from the reset line).

AFAICT, your code relies on getting the behavior that fast-export actually 
gives, not the behavior you seem to want or the behavior Johannes seems to 
want. And the reason that you don't need any changes to fast-export is 
that your process maps marks instead of sha1s.

-Daniel
*This .sig left intentionally blank*
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git-clone ignores umask for working tree

2012-07-06 Thread Daniel Barkalow
On Fri, 6 Jul 2012, Alex Riesen wrote:

 Hi list,
 
 when git-clone was built in, its treatment of umask has changed: the shell
 version respected umask for newly created directories by using plain mkdir(1),
 and the builtin version just uses mkdir(work_tree, 0755).

 Is it intentional?

I have the vague feeling that it was intentional, but it's entirely 
plausible that I just overlooked that mkdir(2) applies umask and went for 
the mode that you normally want. I don't think there's any particular need 
for this operation to be more restrictive than umask.

-Daniel
*This .sig left intentionally blank*
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFH] Merge driver

2005-09-09 Thread Daniel Barkalow
On Fri, 9 Sep 2005, Junio C Hamano wrote:

 I have several requests to people who are interested in merges
 and read-tree changes.
 
 I am pretty much set to use the recent read-tree updates Daniel
 has been working on.  The only reason it has not hit the
 master branch yet, except that it still has known leaks that
 have not been plugged, is because read-tree is so fundamental to
 everything we do, and I am trying to be extremely conservative
 here.  I've beaten it myself reasonably well and have not found
 any regressions (except removal of --emu23 which I believe
 nobody uses anyway), but I'd appreciate people to try it out and
 see if it performs well for your dataset.
 
 If you are planning further surgery on read-tree code, please
 base your changes on Daniel's rewrite to avoid your effort being
 wasted.  This request goes both to Chuck (active_cache
 abstraction) and Fredrik (addition of 'ignore index and working
 tree matching rules' [*1*]).
 
 A proposed merge driver 'git-merge' is in the proposed updates
 branch.  This is intended to be the top-level user interface to
 the merge machinery which drives multiple merge strategy
 scripts, and I am hoping that I can eventually (1) retire
 'git-resolve' and 'git-octopus' (they simply become merge
 strategy scripts driven by 'git-merge') and (2) call 'git-merge'
 from 'git-pull'.  What I have in the proposed updates branch has
 been fixed since my earlier message to the list and has a new
 merge strategy script, in addition to 'resolve' and 'octopus',
 called 'git-merge-multibase'.  This uses Daniel's read-tree that
 can use more than one merge bases.  I request Daniel to give OK
 to its name or suggest a better name for this script -- I would
 even accept 'git-merge-barkalow' if you want ;-).

I'd actually been thinking it would just go into the the resolve driver, 
with that going back to before it chose among merge-base outputs and just 
sending the whole list to read-tree.

 If you are planning to implement a new merge strategy, please
 use the ones in the proposed updates branch as examples, and
 complain and suggest improvements if you find the interface
 between the strategy scripts and the driver lacking.  This
 request goes primarily to Fredrik.  I'm interested in doing the
 renaming merge that would have helped HPA's klibc-kbuild vs
 klibc case myself but if somebody else is so inclined please go
 wild.
 
 And finally, a request to everybody; please try out 'git-merge'
 and see how you like it.
 
 `git-merge` [-n] [-s strategy]... msg head remote remote...
 
 -n::
   Do not show diffstat at the end of the merge.
 
 -s strategy::
   use that merge strategy; can be given more than once to
   specify them in the order they should be tried.  If
   there is no `-s` option, built-in list of strategies is
   used instead.
 
 head::
   our branch head commit.
 
 remote::
   other branch head merged into our branch.  You need at
   least one remote.  Specifying more than one remote
   obviously means you are trying an Octopus.
 
 Here is a sample transcript from a test resolving one of the
 'more-than-one-merge-base' commits Fredrik found in the kernel
 repository (: siamese; is my $PS1;is my $PS2).
 
 : siamese; git reset --hard b8112df71cae7d6a86158caeb19d215f56c4f9ab
 : siamese; git merge -n \
   'reproduce 0e396ee43e445cb7c215a98da4e76d0ce354d9d7' \
   HEAD 2089a0d38bc9c2cdd084207ebf7082b18cf4bf58
 Trying merge strategy resolve...
 Trying to find the optimum merge base.
 Trying simple merge.
 Simple merge failed, trying Automatic merge.
 Removing drivers/net/fmv18x.c
 Auto-merging drivers/net/r8169.c.
 merge: warning: conflicts during merge
 ERROR: Merge conflict in drivers/net/r8169.c.
 Removing drivers/net/sk_g16.c
 Removing drivers/net/sk_g16.h
 fatal: merge program failed
 Rewinding the tree to pristine...
 Trying merge strategy multibase...
 Trying simple merge.
 Simple merge failed, trying Automatic merge.
 Removing drivers/net/fmv18x.c
 Auto-merging drivers/net/r8169.c.
 merge: warning: conflicts during merge
 ERROR: Merge conflict in drivers/net/r8169.c.
 Removing drivers/net/sk_g16.c
 Removing drivers/net/sk_g16.h
 fatal: merge program failed
 Rewinding the tree to pristine...
 Trying merge strategy octopus...
 Rewinding the tree to pristine...
 Using the multibase to prepare resolving by hand.
 Trying simple merge.
 Simple merge failed, trying Automatic merge.
 Removing drivers/net/fmv18x.c
 Auto-merging drivers/net/r8169.c.
 merge: warning: conflicts during merge
 ERROR: Merge conflict in drivers/net/r8169.c.
 Removing drivers/net/sk_g16.c
 Removing drivers/net/sk_g16.h
 fatal: merge program failed
 Automatic merge failed; fix up by hand
 : siamese; git-update-cache --refresh
 drivers/net/r8169.c: needs update
 : siamese; echo 

Re: [RFH] Merge driver

2005-09-09 Thread Daniel Barkalow
On Fri, 9 Sep 2005, Junio C Hamano wrote:

 Daniel Barkalow [EMAIL PROTECTED] writes:
 
  It tries to make sure that there is room to put stuff for resolving a 
  conflict without messing with modified files in the directory.
 
 I agree it can be used that way, but nobody seems to use it for
 that purpose as far as I can tell hence my earlier comment.  But
 let's leave the door open by having them as independent
 options.

Ah, okay. I hadn't realized that resolve used -u for that call to 
read-tree. You're entirely right.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Multi-ancestor read-tree notes

2005-09-09 Thread Daniel Barkalow
On Fri, 9 Sep 2005, Junio C Hamano wrote:

 Daniel Barkalow [EMAIL PROTECTED] writes:
 
  In case #16, I'm not sure what I should produce. I think the best thing 
  might be to not leave anything in stage 1. The desired end effect is that 
  the user is given a file with a section like:
 
{
  *t = NULL;
  *m = 0;
  
  return Z_DATA_ERROR;
  
  return Z_OK;
 
}
 
 I was thinking a bit more about this.  Let's rephrase case #16.
 I'll call merge bases O1, O2,... and merge heads A and B, and we
 are interested in one path.
 
 If O1 and O2, the path has quite different contents.  A has the
 same contents as O1 and B has the same contents as O2. 

There's a bit more subtlety here: since these are common ancestors, A must 
have somehow changed O2's version to O1's version, and B must have changed 
O1's version to O2's version. It's isn't just that each side left the file 
the same, but from different ancestral versions; both of the other 
versions must have gotten rejected somehow. I think the real key is to 
identify what was going on in between.

 We should not just pick one or the other and do two-file merge
 between the version in A and B (we could prototype by massaging
 'diff A B' output to produce what is common between A and B and
 run (RCS) merge of A and B pretending that the common contents
 is the original to produce something like the above).
 
 If A has slight changes since O1 but B did not change since O2,
 ideally I think we would want the same thing to happen.  Let's
 call it case #16+.
 
 What does the current implementation do?  It is not case #16
 because A and O1 does not exactly match.  I suspect the result
 will be skewed because B has an exact match with O2. 

Yes, in this case we miss whatever caused A to reject O2, and we use the 
modified O2, because we don't realize that A's rejection of O2 should also 
apply to the version in B. Unfortunately, this looks just like the 
situation where both sides took O1, and B did a further modification to 
that.

 The situation becomes more interesting if both A and B has slight
 changes since O1 and O2 respectively.  They do not exactly match
 with their bases, but I think ideally we would like something
 very similar to case #16 resolution to happen.

I think the right thing, ideally, is to have the content merge also take 
multiple ancestors and have a #16 case itself when it's deciding which 
version of a block to use. The #16+ case is actually trickier, because we 
have fewer cues.

 One way to solve this would be to try doing things entirely in
 read-tree by doing not just exact matches but also checking the
 amount of changes -- if each heads has similar but different
 base call it case #16 and try two-file merge between the heads
 disregarding the bases.
 
 But I am a bit reluctant to suggest this.  My gut feeling tells
 me that these 'interesting' cases are easier if scripted outside
 read-tree machinery to later enhance and improve the heuristics.
 
 Of course, the current case #16 detected by the exact match rule
 should be something we can automatically handle, but to make
 things safer to use I think we should have a way to detect case
 #16+ situlation and avoid mistakenly favoring A over B (or vice
 versa) only because one has slight modification while the other
 does not.

I think #16+ is extra uncommon, because it involves someone making an 
irrelevant modification to a patched version of a file while someone else 
reverts the patch. I'm actually interested in doing a big spiffy program 
to do merges with information drawn as needed from the history, stuff 
happening on a per-hunk level, and support for block moves. It'll take a 
while before it gets anywhere, but I still think it's likely that people 
won't hit #16+ and get unexpected behavior before it's ready.

The main thing I'm unsure of is whether Fredrick's algorithm is actually 
not a better solution: it is possible to understand what happened leading 
up to a merge either by looking at the time after the common ancestors or 
by looking at the time before them. I think that the more recent history 
is a better guide, but the older history is easier to use; the case his 
version isn't good for, I think, is when the common ancestors of the sides 
are even more complicated to merge.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] A new merge algorithm, take 3

2005-09-08 Thread Daniel Barkalow
On Thu, 8 Sep 2005, Fredrik Kuivinen wrote:

 On Wed, Sep 07, 2005 at 02:33:42PM -0400, Daniel Barkalow wrote:
  On Wed, 7 Sep 2005, Fredrik Kuivinen wrote:
  
   Of the 500 merge commits that currently exists in the kernel
   repository 19 produces non-clean merges with git-merge-script. The
   four merge cases listed in
   [EMAIL PROTECTED] are cleanly merged by
   git-merge-script. Every merge commit which is cleanly merged by
   git-resolve-script is also cleanly merged by git-merge-script,
   furthermore the results are identical. There are currently two merges
   in the kernel repository which are not cleanly merged by
   git-resolve-script but are cleanly merged by git-merge-script.
  
  If you use my read-tree and change git-resolve-script to pass all of the 
  ancestors to it, how does it do? I expect you'll still be slightly ahead, 
  because we don't (yet) have content merge with multiple ancestors. You 
  should also check the merge that Tony Luck reported, which undid a revert, 
  as well as the one that Len Brown reported around the same time which had 
  similar problems. I think maintainer trees are a much better test for a 
  merge algorithm, because the kernel repository is relatively linear, while 
  maintainers tend more to merge things back and forth.
 
 Junio tested some of the multiple common ancestor cases with your
 version of read-tree and reported his results in
 [EMAIL PROTECTED].

Oh, right. I'm clearly not paying enough attention here.

 The two cases my algorithm merges cleanly and git-resolve-script do
 not merge cleanly are 0e396ee43e445cb7c215a98da4e76d0ce354d9d7 and
 0c168775709faa74c1b87f1e61046e0c51ade7f3. Both of them have two common
 ancestors. The second one have, as far as I know, not been tested with
 your read-tree.

Okay, I'll have to check whether the result I get seems right. I take it 
your result agrees with what the users actually produced by hand?

 The merge cases reported by Tony Luck and Len Brown are both cleanly
 merged by my code.

Do they come out correctly? Both of those have cases which cannot be 
decided correctly with only the ancestor trees, due to one branch 
reverting a patch that was only in one ancestor. The correct result is to 
revert that patch, but figuring out that requires looking at more trees. I 
think your algorithm should work for this case, but it would be good to 
have verification. (IIRC, Len got the correct result while Tony got the 
wrong result and then corrected it later.)

 You are probably right about the maintainer trees. I should have a
 look at some of them. Do you know any specific repositories with
 interesting merge cases?

Not especially, except that I would guess that people who have reported 
hitting bad cases would be more likely to have other interesting merges in 
their trees. You might also try merging maintainer trees with each other, 
since it's relatively likely that there would be complicating overlap that 
only doesn't cause confusion because things get rearranged in -mm. For 
that matter, I bet you'd get plenty of test cases out of trying to 
replicate -mm as a git tree.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] A new merge algorithm, take 3

2005-09-08 Thread Daniel Barkalow
On Thu, 8 Sep 2005, Fredrik Kuivinen wrote:

 The first one agrees with what was actually committed. For the second
 one the difference between the tree produced by the algorithm and what
 was committed is:
 
 diff --git a/include/net/ieee80211.h b/include/net/ieee80211.h
 --- a/include/net/ieee80211.h
 +++ b/include/net/ieee80211.h
 @@ -425,9 +425,7 @@ struct ieee80211_stats {
  
  struct ieee80211_device;
  
 -#if 0 /* for later */
  #include ieee80211_crypt.h
 -#endif
  
  #define SEC_KEY_1 (10)
  #define SEC_KEY_2 (11)
 
 
 I have looked at the files and common ancestors involved and I think
 that this change have been introduced manually. I may have missed
 something when I analysed it though...

Certainly possible that it was done manually.

   The merge cases reported by Tony Luck and Len Brown are both cleanly
   merged by my code.
  
  Do they come out correctly? Both of those have cases which cannot be 
  decided correctly with only the ancestor trees, due to one branch 
  reverting a patch that was only in one ancestor. The correct result is to 
  revert that patch, but figuring out that requires looking at more trees. I 
  think your algorithm should work for this case, but it would be good to 
  have verification. (IIRC, Len got the correct result while Tony got the 
  wrong result and then corrected it later.)
 
 Len's merge case come out identically to the tree he committed. I have
 described what I got for Tony's case in
 [EMAIL PROTECTED] (my merge algorithm
 produces the result Tony expected to get, but he didn't get that from
 git-resolve-script).

Good. It looks to me like this is a good algorithm in practice, then.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Multi-ancestor read-tree notes

2005-09-08 Thread Daniel Barkalow
On Thu, 8 Sep 2005, Darrin Thompson wrote:

 On Mon, 2005-09-05 at 01:41 -0400, Daniel Barkalow wrote:
  I've got a version of read-tree which accepts multiple ancestors and does 
  a merge using information from all of them.
 
 Do the multiple ancestors have to share a common parent? More to the
 point, is this read-tree any more friendly to baseless merges?

read-tree doesn't care about the relationships between its inputs; it's 
only interested in the trees. But using ancestors which aren't common is 
unlikely to give you desired results. I think, if you do read-tree a^ b^ a 
b, you will get everything into the index, but it'll all going to be 
conflicts.

I assume that what you want is something to include everything from two 
commits, which would give conflicts if a name is reused?

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Multi-ancestor read-tree notes

2005-09-08 Thread Daniel Barkalow
On Thu, 8 Sep 2005, Junio C Hamano wrote:

 Daniel Barkalow [EMAIL PROTECTED] writes:
 
  I assume that what you want is something to include everything from two 
  commits, which would give conflicts if a name is reused?
 
 My understanding is that Darrin wants to do what Linus did when
 he merged gitk into git.git.
 
 Personally I think that is a specialized application and
 something like the git-merge-projects script I posted as a
 follow-up would be more appropriate than adding it to the
 current merge discussion.

Well, it's an easy addition to read-tree; just need a merge function which 
takes two entries and adds the non-NULL one in stage 0, or adds both if 
they both exist. git-merge-script probably shouldn't be the entry point to 
it, of course, but that part isn't my area anyway.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] A new merge algorithm, take 3

2005-09-07 Thread Daniel Barkalow
On Wed, 7 Sep 2005, Fredrik Kuivinen wrote:

 Of the 500 merge commits that currently exists in the kernel
 repository 19 produces non-clean merges with git-merge-script. The
 four merge cases listed in
 [EMAIL PROTECTED] are cleanly merged by
 git-merge-script. Every merge commit which is cleanly merged by
 git-resolve-script is also cleanly merged by git-merge-script,
 furthermore the results are identical. There are currently two merges
 in the kernel repository which are not cleanly merged by
 git-resolve-script but are cleanly merged by git-merge-script.

If you use my read-tree and change git-resolve-script to pass all of the 
ancestors to it, how does it do? I expect you'll still be slightly ahead, 
because we don't (yet) have content merge with multiple ancestors. You 
should also check the merge that Tony Luck reported, which undid a revert, 
as well as the one that Len Brown reported around the same time which had 
similar problems. I think maintainer trees are a much better test for a 
merge algorithm, because the kernel repository is relatively linear, while 
maintainers tend more to merge things back and forth.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Multi-ancestor read-tree notes

2005-09-06 Thread Daniel Barkalow
On Mon, 5 Sep 2005, Junio C Hamano wrote:

 Daniel Barkalow [EMAIL PROTECTED] writes:
 
  I've got a version of read-tree which accepts multiple ancestors and does 
  a merge using information from all of them.
 
 After disabling the debugging printf(), I used this read-tree to
 try resolving the parents of four commits Fredrik Kuivinen gave
 us in [EMAIL PROTECTED] using
 their two merge bases, and compared the resulting tree with the
 tree recorded in the commit.  The results are really promising.
 
 For the following two commits, multi-base merge resolved their
 parents trivially and produced the same result as the tree in
 the commit.  The current best-base merge in the master branch
 performed far worse and left many conflicts.
 
  - 467ca22d3371f132ee225a5591a1ed0cd518cb3d 
  - da28c12089dfcfb8695b6b555cdb8e03dda2b690
 
 Another one, 0e396ee43e445cb7c215a98da4e76d0ce354d9d7,
 multi-base merge left only one conflicting path to be hand
 resolved.  The best-base merge again performed far worse.
 
 The other one, 3190186362466658f01b2e354e639378ce07e1a9, is
 resolved trivially with both algorithms.

Do you know if there's anything like case #16 in there? I'd be interested 
to know if there's anything that gets handled automatically in different 
ways depending on which single base is used, and doesn't require manual 
intervention with multiple bases, because that's probably wrong.

  In case #16, I'm not sure what I should produce. I think the best thing 
  might be to not leave anything in stage 1.
 
 Because?  I know it would affect the readers of index files if
 you did so, but it would seem the most natural in git
 architecture to have merge-cache look at the resulting cache
 with such multiple stage 1 entries (and other stages) and let
 the script make a decision.

I didn't want to break the assumption of only one entry per stage in the 
initial version. I'm also not sure that listing the ancestors is 
particularly useful in this case. They have to be exactly the contents of 
stages 2 and 3, plus possibly more stuff that's not been kept by either 
side. What you actually want is a two-way merge (i.e., a diff between the 
two sides, presented in merge format), so you don't really need any 
ancestors, unless it would fit some more general case that way.

  The desired end effect is that the user is given a file with a
  section like:
 
{
  *t = NULL;
  *m = 0;
  
  return Z_DATA_ERROR;
  
  return Z_OK;
 
}
 
 Sounds fine.
 
 Anyway, I really am happy to see this multi-base merge perform
 well on real-world data, and you are certainly the git hero of
 the week ;-).

Great. Want me to send the patches with better organization, or are you 
set with what I've sent?

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Make sure the diff machinery outputs \ No newline ... in english

2005-09-06 Thread Daniel Barkalow
On Mon, 5 Sep 2005, Linus Torvalds wrote:

 On Mon, 5 Sep 2005, Fredrik Kuivinen wrote:
  
  After a quick look through the diff source I didn't find anything
  else. It's quite possible that I haved missed something though. Most
  of the translated messages are related to error reporting, which I
  guess might be nice to have in the user specified language.
 
 Is it possible that we could integrate the diff algorithm into git, and 
 get rid of the dependency on an external GNU diff? It would also make the 
 portability problems go away (ie old diff's being broken).
 
 It would also potentially speed up the normal built-in diff a lot, since
 we wouldn't have to execute a whole other program to generate a diff, just
 call a helper function the way we do for xdiff..
 
 Unreasonable?

The algorithm actually used by GNU diff is pretty complicated, and I don't 
really understand the actual implementation, which evidentally has a few 
important refinements over the original paper.

I've written my own diff, mainly to try a different algorithm, and it 
seems to work, but the code isn't yet appropriate to submit. This 
algorithm also has the advantage that it can identify moved sections and 
is less interested in interleaving a removed function with a new function 
to provide the shortest possible diff. I expect that I could get it to 
work if I put in a day on it; it's mostly writing a hashtable 
implementation for non-NULL-terminated string-keyed hash tables.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Multi-ancestor read-tree notes

2005-09-06 Thread Daniel Barkalow
On Tue, 6 Sep 2005, Junio C Hamano wrote:

 Daniel Barkalow [EMAIL PROTECTED] writes:
 
  Do you know if there's anything like case #16 in there? I'd be interested 
  to know if there's anything that gets handled automatically in different 
  ways depending on which single base is used, and doesn't require manual 
  intervention with multiple bases, because that's probably wrong.
 
 Re-running the tests with the attached patch shows there weren't any.

Good. (Although that patch doesn't seem to be directly on top of my 
version; I can tell what it's doing anyway)

  Great. Want me to send the patches with better organization, or are you 
  set with what I've sent?
 
 That's up to you.  If you are content with what I have in the pu
 branch, there is no need to bother resending.  OTOH if you have
 further clean-ups in mind, i.e. better organization above, I
 do not mind dropping the current ones from pu and replace them
 with another set from you.

I'm happy with the content in pu; the issue is just whether you want the 
history cleaned up more. In the series I sent, I kept forgetting parts 
that belonged in earlier patches.

Could you look over the documentation in 
Documentation/technical/trivial-merge.txt, and see if it's a suitable 
replacement for the table in t1000-read-tree-m-3way.sh? It should be the 
same, except for ALT or non-ALT versions that we're not using, combining a 
few matching cases, describing the rules behind index requirements rather 
than listing outcomes, and the addition of info on how multiple ancestors 
are handled.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Multi-ancestor read-tree notes

2005-09-06 Thread Daniel Barkalow
On Tue, 6 Sep 2005, Junio C Hamano wrote:

 Daniel Barkalow [EMAIL PROTECTED] writes:
 
  Good. (Although that patch doesn't seem to be directly on top of my 
  version; I can tell what it's doing anyway)
 
 That one was against the proposed updates head.  I've updated it
 again to include the patch.
 
  I'm happy with the content in pu; the issue is just whether you want the 
  history cleaned up more. In the series I sent, I kept forgetting parts 
  that belonged in earlier patches.
 
 Again, that is up to you.  I am not _that_ perfectionist but I
 do not mind reapplying updated ones if you are ;-).

What's there is fine with me.

(I'll work on improving the documentation as a further patch)

  Could you look over the documentation in
  Documentation/technical/trivial-merge.txt, and see if it's a
  suitable replacement for the table in
  t1000-read-tree-m-3way.sh?
 
 I do not understand what you meant by '*' and 'index+' in
 one-way merge table.  I take the first row ('*') to mean If the
 tree is missing a path, that path is removed from the index.

'*' means that that case applies regardless of what's there. 'index+' 
means that it's the index, with the stat information. I forgot to actually 
explain the table before going on to the interesting section.

 I like the second sentence in three-way merge description.  That
 is a very easy-to-understand description of what the index
 requirements are.
 
 You have 2 2ALTs.  Also 14 and 14ALT look like they are the same
 rule now.

Ah, right. I had originally listed index in the table, with separate 
cases for having it match the head and having it match the result, but 
then ditched that when I figured out how that actually works.

 What's (empty)^ in ancest?  All of them must be empty for
 this rule to apply?

The '^' means that all must be like that. 

I have to check, but I think that 8ALT and 10ALT should be '+'.

 I am not quite sure it is 'a suitable replacement' yet; the
 existing table you can see it covers all the cases, but with
 things like 'ancestor+' means one of them matches, I cannot
 really tell the table covers all the cases or some cases fall of
 the end of the chain.

All of the any ancestor spots are good for covering things. Case #11 
(which actually needs to be at the bottom) is basically everything else.

 Also when we have more than one ancestors or one remotes and we
 say no merge, it is still unspecified (and I have to admit I
 cannot readily say what the result should be for all of them,
 except that I agree #16 will be fine with an empty stage1) what
 are left in which stages.

Presently, except for case #16, only the first ancestor is used in no 
merge output. The right thing should be worked out and documented, of 
course.

I'm not at all convinced at this point that we can do much with multiple 
remotes in a single application of the rules; you won't necessarily have 
the same merge base for all pairs, and all sorts of things go wrong if you 
start including ancestors that aren't related to something, or not 
including common ancestors of some pair.

What might work is to have the error for an unmerged index only happen 
when you get to a no merge result, so that you can get as many conflicts 
as possible (in different files) resolved by the user at the same time.

 I personally think the exotic cases (i.e. no rule applies, or
 no merge result with more than one ancestors/remotes) needs to
 be handled outside read-tree anyway, by the script that drives
 read-tree to attempt trivial merges.

I think case #16 would benefit from doing more stuff, but there aren't any 
holes in the rules, and I think that, for the multiple ancestors in no 
merge, we just want to use the one with the least conflict. (Or, if we 
write our own merge, do a #16/#13,#14/#11 decision per-hunk in our merge, 
which is the really right thing). I think the common case for multiple 
ancestors will really be that you've got a side branch that split before 
the split you're resolving, and was merged into both sides before now; in 
this case, there's no big problem, and it's not the exotic cross-merge 
case. Of course, we won't see this in projects like the kernel and git, 
which aren't that amorphous.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] Add a function for getting a struct tree for an ent.

2005-09-05 Thread Daniel Barkalow
Signed-off-by: Daniel Barkalow [EMAIL PROTECTED]
---

 tree.c |   21 +
 tree.h |3 +++
 2 files changed, 24 insertions(+), 0 deletions(-)

3bfcc20b6aeff3e1fbcce97a426383c9770a2105
diff --git a/tree.c b/tree.c
--- a/tree.c
+++ b/tree.c
@@ -1,5 +1,7 @@
 #include tree.h
 #include blob.h
+#include commit.h
+#include tag.h
 #include cache.h
 #include stdlib.h
 
@@ -212,3 +214,22 @@ int parse_tree(struct tree *item)
free(buffer);
return ret;
 }
+
+struct tree *parse_tree_indirect(const unsigned char *sha1)
+{
+   struct object *obj = parse_object(sha1);
+   do {
+   if (!obj)
+   return NULL;
+   if (obj-type == tree_type)
+   return (struct tree *) obj;
+   else if (obj-type == commit_type)
+   obj = (((struct commit *) obj)-tree-object);
+   else if (obj-type == tag_type)
+   obj = ((struct tag *) obj)-tagged;
+   else
+   return NULL;
+   if (!obj-parsed)
+   parse_object(obj-sha1);
+   } while (1);
+}
diff --git a/tree.h b/tree.h
--- a/tree.h
+++ b/tree.h
@@ -32,4 +32,7 @@ int parse_tree_buffer(struct tree *item,
 
 int parse_tree(struct tree *tree);
 
+/* Parses and returns the tree in the given ent, chasing tags and commits. */
+struct tree *parse_tree_indirect(const unsigned char *sha1);
+
 #endif /* TREE_H */

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] Add function to append to an object_list.

2005-09-05 Thread Daniel Barkalow
Signed-off-by: Daniel Barkalow [EMAIL PROTECTED]
---

 object.c |   11 +++
 object.h |3 +++
 2 files changed, 14 insertions(+), 0 deletions(-)

88cf2db55848e7a2cf655171c7e9fd74c70a0281
diff --git a/object.c b/object.c
--- a/object.c
+++ b/object.c
@@ -184,6 +184,17 @@ struct object_list *object_list_insert(s
 return new_list;
 }
 
+void object_list_append(struct object *item,
+   struct object_list **list_p)
+{
+   while (*list_p) {
+   list_p = ((*list_p)-next);
+   }
+   *list_p = xmalloc(sizeof(struct object_list));
+   (*list_p)-next = NULL;
+   (*list_p)-item = item;
+}
+
 unsigned object_list_length(struct object_list *list)
 {
unsigned ret = 0;
diff --git a/object.h b/object.h
--- a/object.h
+++ b/object.h
@@ -41,6 +41,9 @@ void mark_reachable(struct object *obj, 
 struct object_list *object_list_insert(struct object *item, 
   struct object_list **list_p);
 
+void object_list_append(struct object *item,
+   struct object_list **list_p);
+
 unsigned object_list_length(struct object_list *list);
 
 int object_list_contains(struct object_list *list, struct object *obj);

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] Rewrite read-tree

2005-09-05 Thread Daniel Barkalow
Adds support for multiple ancestors, removes --emu23, much simplification.

Signed-off-by: Daniel Barkalow [EMAIL PROTECTED]
---

 read-tree.c   |  811 +++--
 t/t1005-read-tree-m-2way-emu23.sh |  422 ---
 2 files changed, 425 insertions(+), 808 deletions(-)
 delete mode 100755 t/t1005-read-tree-m-2way-emu23.sh

f196469bec156947038f1d3d00c899c9044334ca
diff --git a/read-tree.c b/read-tree.c
--- a/read-tree.c
+++ b/read-tree.c
@@ -5,73 +5,291 @@
  */
 #include cache.h
 
-static int stage = 0;
+#include object.h
+#include tree.h
+
+static int merge = 0;
 static int update = 0;
 
-static int unpack_tree(unsigned char *sha1)
-{
-   void *buffer;
-   unsigned long size;
-   int ret;
+static int head_idx = -1;
+static int merge_size = 0;
 
-   buffer = read_object_with_reference(sha1, tree, size, NULL);
-   if (!buffer)
-   return -1;
-   ret = read_tree(buffer, size, stage, NULL);
-   free(buffer);
+static struct object_list *trees = NULL;
+
+static struct cache_entry df_conflict_entry = { 
+};
+
+static struct tree_entry_list df_conflict_list = {
+   .name = NULL,
+   .next = df_conflict_list
+};
+
+typedef int (*merge_fn_t)(struct cache_entry **src);
+
+static int entcmp(char *name1, int dir1, char *name2, int dir2)
+{
+   int len1 = strlen(name1);
+   int len2 = strlen(name2);
+   int len = len1  len2 ? len1 : len2;
+   int ret = memcmp(name1, name2, len);
+   unsigned char c1, c2;
+   if (ret)
+   return ret;
+   c1 = name1[len];
+   c2 = name2[len];
+   if (!c1  dir1)
+   c1 = '/';
+   if (!c2  dir2)
+   c2 = '/';
+   ret = (c1  c2) ? -1 : (c1  c2) ? 1 : 0;
+   if (c1  c2  !ret)
+   ret = len1 - len2;
return ret;
 }
 
-static int path_matches(struct cache_entry *a, struct cache_entry *b)
+static int unpack_trees_rec(struct tree_entry_list **posns, int len,
+   const char *base, merge_fn_t fn, int *indpos)
 {
-   int len = ce_namelen(a);
-   return ce_namelen(b) == len 
-   !memcmp(a-name, b-name, len);
+   int baselen = strlen(base);
+   int src_size = len + 1;
+   do {
+   int i;
+   char *first;
+   int firstdir = 0;
+   int pathlen;
+   unsigned ce_size;
+   struct tree_entry_list **subposns;
+   struct cache_entry **src;
+   int any_files = 0;
+   int any_dirs = 0;
+   char *cache_name;
+   int ce_stage;
+
+   /* Find the first name in the input. */
+
+   first = NULL;
+   cache_name = NULL;
+
+   /* Check the cache */
+   if (merge  *indpos  active_nr) {
+   /* This is a bit tricky: */
+   /* If the index has a subdirectory (with
+* contents) as the first name, it'll get a
+* filename like foo/bar. But that's after
+* foo, so the entry in trees will get
+* handled first, at which point we'll go into
+* foo, and deal with bar from the index,
+* because the base will be foo/. The only
+* way we can actually have foo/bar first of
+* all the things is if the trees don't
+* contain foo at all, in which case we'll
+* handle foo/bar without going into the
+* directory, but that's fine (and will return
+* an error anyway, with the added unknown
+* file case.
+*/
+
+   cache_name = active_cache[*indpos]-name;
+   if (strlen(cache_name)  baselen 
+   !memcmp(cache_name, base, baselen)) {
+   cache_name += baselen;
+   first = cache_name;
+   } else {
+   cache_name = NULL;
+   }
+   }
+
+   if (first)
+   printf(index %s\n, first);
+
+   for (i = 0; i  len; i++) {
+   if (!posns[i] || posns[i] == df_conflict_list)
+   continue;
+   printf(%d %s\n, i + 1, posns[i]-name);
+   if (!first || entcmp(first, firstdir,
+posns[i]-name, 
+posns[i]-directory)  0) {
+   first = posns[i]-name;
+   firstdir = posns[i]-directory;
+   }
+   }
+   /* No name means we're done

[PATCH 4/4] Document the trivial merge rules for 3(+more ancestors)-way merges.

2005-09-05 Thread Daniel Barkalow
Signed-off-by: Daniel Barkalow
---

 Documentation/technical/trivial-merge.txt |   92 +
 1 files changed, 92 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/technical/trivial-merge.txt

7544be0a8eda7b796150729a7795c2639278da62
diff --git a/Documentation/technical/trivial-merge.txt 
b/Documentation/technical/trivial-merge.txt
new file mode 100644
--- /dev/null
+++ b/Documentation/technical/trivial-merge.txt
@@ -0,0 +1,92 @@
+Trivial merge rules
+===
+
+This document describes the outcomes of the trivial merge logic in read-tree.
+
+One-way merge
+-
+
+This replaces the index with a different tree, keeping the stat info
+for entries that don't change, and allowing -u to make the minimum
+required changes to the working tree to have it match.
+
+   index   treeresult
+   ---
+   *   (empty) (empty)
+   (empty) treetree
+   index+  treetree
+   index+  index   index+
+
+Two-way merge
+-
+
+
+
+Three-way merge
+---
+
+It is permitted for the index to lack an entry; this does not prevent
+any case from applying.
+
+If the index exists, it is an error for it not to match either the
+head or (if the merge is trivial) the result.
+
+If multiple cases apply, the one used is listed first.
+
+A result of no merge means that index is left in stage 0, ancest in
+stage 1, head in stage 2, and remote in stage 3 (if any of these are
+empty, no entry is left for that stage). Otherwise, the given entry is
+left in stage 0, and there are no other entries.
+
+A result of no merge is an error if the index is not empty and not
+up-to-date.
+
+*empty* means that the tree must not have a directory-file conflict
+ with the entry.
+
+For multiple ancestors or remotes, a '+' means that this case applies
+even if only one ancestor or remote fits; normally, all of the
+ancestors or remotes must be the same.
+
+case  ancestheadremoteresult
+
+1 (empty)+  (empty) (empty)   (empty)
+2ALT  (empty)+  *empty* remoteremote
+2ALT  (empty)+  *empty* remoteremote
+2 (empty)^  (empty) remoteno merge
+3ALT  (empty)+  head*empty*   head
+3 (empty)^  head(empty)   no merge
+4 (empty)^  headremoteno merge
+5ALT  * headhead  head
+6 ancest^   (empty) (empty)   no merge
+8ALT  ancest(empty) ancest(empty)
+7 ancest+   (empty) remoteno merge
+9 ancest+   head(empty)   no merge
+10ALT ancestancest  (empty)   (empty)
+11ancest+   headremoteno merge
+16anc1/anc2 anc1anc2  no merge
+13ancest+   headancesthead
+14ancest+   ancest  remoteremote
+14ALT ancest+   ancest  remoteremote
+
+Only #2ALT and #3ALT use *empty*, because these are the only cases
+where there can be conflicts that didn't exist before. Note that we
+allow directory-file conflicts between things in different stages
+after the trivial merge.
+
+A possible alternative for #6 is (empty), which would make it like
+#1. This is not used, due to the likelihood that it arises due to
+moving the file to multiple different locations or moving and deleting
+it in different branches.
+
+Case #1 is included for completeness, and also in case we decide to
+put on '+' markings; any path that is never mentioned at all isn't
+handled.
+
+Note that #16 is when both #13 and #14 apply; in this case, we refuse
+the trivial merge, because we can't tell from this data which is
+right. This is a case of a reverted patch (in some direction, maybe
+multiple times), and the right answer depends on looking at crossings
+of history or common ancestors of the ancestors.
+
+The status as of Sep 5 is that multiple remotes are not supported
\ No newline at end of file

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Moved files and merges

2005-09-04 Thread Daniel Barkalow
On Sun, 4 Sep 2005, Junio C Hamano wrote:

 Sam Ravnborg [EMAIL PROTECTED] writes:
 
  If the problem is not fully understood it can be difficult to come up
  with the proper solution. And with the example above the problem should
  be really easy to understand.
  Then we have the tree as used by hpa with a few more mergers in it. But
  the above is what was initial tried to do with the added complexity of a
  few more renames etc.
 
 All true.  Let's redraw that simplified scenario, and see if
 what I said still holds.  It may be interesting to store my
 previous message and this one and run diff between them.  I
 suspect that the main difference to come out would be the the
 problem description part and the merge machinery part would not
 be all that different.

I'm not quite so convinced, because I think that the actual situation is a 
bit more natural, and therefore our expectations at the end should be 
closer to right with less attention to detail. But I think the actual 
situation is more interesting, anyway, because it's more likely to happen 
and we're more likely to be able to help.

 
 This is a simplified scenario of klibc vs klibc-kbuild HPA had
 trouble with, to help us think of a way to solve this
 interesting merge problem.
 
#1 - #3 - #5 - #7
// /
 #0 - #2 - #4 - #6
 
 There are two lines of developments.  #0-#1 renames F to G and
 introduces K.  #0-#2 keeps F as F and does not introduce K.
 
 At commit #3, #2 is merged into #1.  The changes made to the
 file contents of F between #0 and #2 are appreciated, but we
 would also want to keep our decision to rename F to G and our
 new file K.  So commit #3 has the resulting merge contents in G
 and has K, inherited from #1.  This _might_ be different from
 what we traditionally consider a 'merge', but from the use case
 point of view it is a valid thing one would want to do.

I think this is actually quite a regular merge, and I think we should be 
able to offer some assistance. The situation with K is normal: case #3ALT. 
If someone introduces a file and there's no file or directory with that 
name in other trees, we assume that the merge should include it.

F/G is trickier, and I don't think we can actually do much about it with 
the current structure of read-tree/merge-cache/etc, but, theoretically, we 
should recognize that #0-#1 is a rename plus content changes, and #0-#2 
is content changes, so the total should be the rename plus contents 
changes; I think we want to additionally signal a conflict, because 
there's a reasonable chance that the rename will interfere with the #0-#2 
changes, and need intervention. Most likely, this just means that we 
should not commit automatically, but have the user test the result first.

For now, of course, we don't get renames at any point in the merging 
procedure, so our code can't tell, and sees it as a big conflict that the 
user has to deal with. But we can agree on what the result is if the user 
includes all the changes from the other branch (and see the situation 
you reported first as cherry-picking the content and leaving the 
structural changes).

 Commit #4 is a continued development from #2; changes are made
 to F, and there is no K.  Commit #5 similarly is a continued
 development from #3; its changes are made to G and K also has
 further changes.
 
 We are about to merge #6 into #5 to create #7.  We should be
 able to take advantage of what the user did when the merge #3
 was made; namely, we should be able to infer that the line of
 development that flows #0 .. #3 .. #7 prefers to rename F to G,
 and also wants the newly introduced K.  We should be able to
 tell it by looking at what the merge #3 did.

Again, K should be unexceptional, because we're keeping a file that was 
added to one side but not the other. (In the other situation, it still 
works; relative to the common ancestor, we're in #8ALT, since #5 doesn't 
have K, which was in #2 and #6; we see the rejection in a merge as a 
removal, which is effectively the same.)

 Now, how can we use git to figure that out?

First off, it should handle K automatically, because we're still including 
a file added by one side without interference from the other side.

 First, given our current head (#5) and the other head we are
 about to merge (#6), we need a way to tell if we merged from
 them before (i.e. the existence of #3) and if so the latest of
 such merge (i.e. #3).
 
 The merge base between #5 and #6 is #2.  We can look at commits
 between us (#5) and the merge base (#2), find a merge (#3),
 which has two parents.  One of the parents is #2 which is
 reachable from #6, and the other is #1 which is not reachable
 from #6 but is reachable from #5.  Can we say that this reliably
 tells us that #2 is on their side and #1 is on our side?  Does
 the fact that #3 is the commit topologically closest to #5 tell
 us that #3 is the one we want to look deeper?
 
 This is still handwaving, but 

[PATCH 0/4] Support multiple ancestors in read-tree

2005-09-04 Thread Daniel Barkalow
Various messages have already described this series. There's still a 
memory leak that should get resolved, but otherwise it should work. I'm 
not entirely sure that all directory-file conflict cases are handled 
properly, and some undefined cases behave differently. Also, I was a bit 
careless with preparing the patches.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/2] Remove emu23, fix entry order

2005-09-02 Thread Daniel Barkalow
A few things to improve testing. I'll clean up the series as a whole once 
it's tested.

This removes the emu23 tests; I think that the only DF conflict tests were 
in that set, however, so these should be fished out and added to something 
else.

Signed-off-by: Daniel Barkalow [EMAIL PROTECTED]

---

 read-tree.c   |   89 +++-
 t/t1005-read-tree-m-2way-emu23.sh |  422 -
 2 files changed, 37 insertions(+), 474 deletions(-)
 delete mode 100755 t/t1005-read-tree-m-2way-emu23.sh

63092a4dfb2042e8fc21260b2f315b01e9163940
diff --git a/read-tree.c b/read-tree.c
--- a/read-tree.c
+++ b/read-tree.c
@@ -9,7 +9,6 @@
 #include tree.h
 
 static int merge = 0;
-static int emu23 = 0;
 static int update = 0;
 
 static struct object_list *trees = NULL;
@@ -19,19 +18,39 @@ typedef int (*merge_fn_t)(struct cache_e
  int df_conflicts_2,
  int df_conflicts_3);
 
+static int entcmp(char *name1, int dir1, char *name2, int dir2)
+{
+   int len1 = strlen(name1);
+   int len2 = strlen(name2);
+   int len = len1  len2 ? len1 : len2;
+   int ret = memcmp(name1, name2, len);
+   unsigned char c1, c2;
+   if (ret)
+   return ret;
+   c1 = name1[len];
+   c2 = name2[len];
+   if (!c1  dir1)
+   c1 = '/';
+   if (!c2  dir2)
+   c2 = '/';
+   ret = (c1  c2) ? -1 : (c1  c2) ? 1 : 0;
+   if (c1  c2  !ret)
+   ret = len1 - len2;
+   return ret;
+}
+
 static int unpack_trees_rec(struct tree_entry_list **posns, int len,
const char *base, merge_fn_t fn, 
int file2, int file3, int *indpos)
 {
int baselen = strlen(base);
int src_size = len + 1;
-   if (emu23)
-   src_size++;
if (src_size  4)
src_size = 4;
do {
int i;
char *first = NULL;
+   int firstdir = 0;
int pathlen;
unsigned ce_size;
int dir2 = 0;
@@ -73,11 +92,23 @@ static int unpack_trees_rec(struct tree_
}
}
 
+   /*
+   if (first)
+   printf(%s\n, first);
+   */
+
for (i = 0; i  len; i++) {
if (!posns[i])
continue;
-   if (!first || strcmp(first, posns[i]-name)  0)
+   /*
+   printf(%d %s\n, i + 1, posns[i]-name);
+   */
+   if (!first || entcmp(first, firstdir,
+posns[i]-name, 
+posns[i]-directory)  0) {
first = posns[i]-name;
+   firstdir = posns[i]-directory;
+   }
}
/* No name means we're done */
if (!first)
@@ -94,19 +125,6 @@ static int unpack_trees_rec(struct tree_
   src_size);
src[0] = active_cache[*indpos];
remove_cache_entry_at(*indpos);
-   if (emu23) {
-   // we need this in stage 2 as well as stage 0
-   struct cache_entry *copy =
-   xmalloc(ce_size);
-   memcpy(copy, src[0], ce_size);
-   copy-ce_flags = 
-   create_ce_flags(baselen + pathlen, 2);
-   if (dir2 || file2) {
-   die(cannot merge index and our head 
tree);
-   }
-   src[2] = copy;
-   subfile2 = 1;
-   }
}
 
for (i = 0; i  len; i++) {
@@ -125,8 +143,6 @@ static int unpack_trees_rec(struct tree_
} else {
ce_stage = i + merge;
}
-   if (emu23  ce_stage == 2)
-   ce_stage = 3;
 
if (posns[i]-directory) {
if (!subposns) {
@@ -137,8 +153,6 @@ static int unpack_trees_rec(struct tree_
parse_tree(posns[i]-item.tree);
subposns[i] = posns[i]-item.tree-entries;
posns[i] = posns[i]-next;
-   if (emu23  ce_stage == 1)
-   dir2 = 1;
if (ce_stage == 2)
dir2 = 1;
if (ce_stage == 3)
@@ -168,19 +182,6 @@ static int

Re: Tool renames? was Re: First stab at glossary

2005-09-02 Thread Daniel Barkalow
On Thu, 1 Sep 2005, Junio C Hamano wrote:

 Tim Ottinger [EMAIL PROTECTED] writes:
 
  git-update-cache for instance?
  I am not sure which 'cache' commands need to be 'index' now.
 
 Logically you are right, but I suspect that may not fly well in
 practice.  Too many of us have already got our fingers wired to
 type cache, and the glossary is there to describe both cache and
 index.

My vote's for changing the official names, but keeping symlinks for the 
old names. As far as I know, there aren't any actual conflicts, and we 
might as well have new users pick up the logical names. I particularly 
think git merge would be really good to have.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Couple of read-tree questions

2005-09-01 Thread Daniel Barkalow
On Wed, 31 Aug 2005, Junio C Hamano wrote:

 Daniel Barkalow [EMAIL PROTECTED] writes:
 
  Is there any current use for read-tree with multiple trees without -m or 
  equivalent?
 
 I did not know it even allowed multiple trees without -m, but
 you are right.  It does not seem to complain.
 
 I have never thought about using multiple trees without -m, and
 I do not remember hearing any plan nor purpose of using it to do
 something interesting from Linus.  I think its allowing multiple
 trees without -m is simply a bug.

I guess it was probably that its behavior was obvious and didn't require 
any extra code. It still follows entirely from one tree without -m, but it 
might be worth prohibiting unless someone has a reason to do it 
intentionally.

  Why does --emu23 use I+H for stage 2, rather than just I? Wouldn't this 
  just reintroduce removed files?
 
 They are not removed files, at least in the original context.
 
 The original intention was that git was supposed to work without
 having _any_ files in the working tree.  The reason why
 multi-tree read-tree has so many special cases that says must
 match *if* work file exists, is that not having a corresponding
 working file was supposed to be equivalent to having the file
 checked out *and* unmodified.

But they'd not only be missing from the working tree but also from the 
(pre-read-tree) index, which should only happen, assuming the index came 
from read-tree H, if they were subsequently removed from the index. I'd 
understand treating index entries for files missing from the working tree 
as up to date.

(The thread you mention seems to say that we accept entries being missing 
from the index as if they were unchanged, but I don't see a good reason 
for this; you'd be dealing with the full set in the index for the merge, 
even if you don't have a populated working tree)

 I do not think anybody currently uses --emu23.  I did it because
 it has a potential of making the two-tree fast forward (which is
 used in git checkout to switch between branches) easier to
 manage when the working tree is dirty than doing straight
 two-tree merge, but that is just a theoretical potential never
 tested in the field.  Frankly, I do not mind, and I do not think
 anybody else minds, too much if you need to break or remove
 emu23 if that would make your code clean-up and redoing
 read-tree easier.

I should have asked sooner, then. :) There's a bunch of clutter to get it 
to work that I can remove if it's not actually necessary.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Reworked read-tree.

2005-09-01 Thread Daniel Barkalow
On Thu, 1 Sep 2005, Junio C Hamano wrote:

 Daniel, I do not know what your current status is, but I think
 you need something like this.

Yup, I forgot to actually test that functionality.

 ---
 diff --git a/tree.c b/tree.c
 --- a/tree.c
 +++ b/tree.c
 @@ -224,10 +224,12 @@ struct tree *parse_tree_indirect(const u
   if (obj-type == tree_type)
   return (struct tree *) obj;
   else if (obj-type == commit_type)
 - return ((struct commit *) obj)-tree;
 + obj = (struct object *)(((struct commit *) obj)-tree);

obj = ((struct commit *) obj)-tree-object;

Multiple sequential casts always bother me, and we do actually have a 
field for this.

   else if (obj-type == tag_type)
 - obj = ((struct tag *) obj)-tagged;
 + obj = deref_tag(obj);

Shouldn't be necessary (once you've got the parse_object below); we're 
already in a loop dereferencing things.

   else
   return NULL;
 + if (!obj-parsed)
 + parse_object(obj-sha1);
   } while (1);
  }
 
 
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Reorganize read-tree

2005-08-31 Thread Daniel Barkalow
On Tue, 30 Aug 2005, Junio C Hamano wrote:

 Dan, I really really *REALLY* wanted to try this out in pu
 branch and even was about to rig some torture chamber for
 testing before applying the patch, but you got the shiny blue
 bat X-.

I'll send a replacement with the settings correct.

 A patch to SubmittingPatches, MUA specific help section for
 users of Pine 4.63 would be very much appreciated.

Ah, it looks like a recent version changed the default behavior to do the 
right thing, and inverted the sense of the configuration option. (Either 
that or Gentoo did it.) So you need to set the 
no-strip-whitespace-before-send option, unless the option you have is 
strip-whitespace-before-send, in which case you should avoid checking 
it.

I don't actually have things set up for preparing patches from work, 
although I can resend the patches I prepared earlier.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2 (resend)] Object model additions for read-tree

2005-08-31 Thread Daniel Barkalow
Adds object_list_append() and a function to get the struct tree from an ent.

Signed-off-by: Daniel Barkalow [EMAIL PROTECTED]

---

 object.c |   11 +++
 object.h |3 +++
 tree.c   |   19 +++
 tree.h   |3 +++
 4 files changed, 36 insertions(+), 0 deletions(-)

49d33c385aa69d17c991300f73e77c6718a2b4a6
diff --git a/object.c b/object.c
--- a/object.c
+++ b/object.c
@@ -184,6 +184,17 @@ struct object_list *object_list_insert(s
 return new_list;
 }
 
+void object_list_append(struct object *item,
+   struct object_list **list_p)
+{
+   while (*list_p) {
+   list_p = ((*list_p)-next);
+   }
+   *list_p = xmalloc(sizeof(struct object_list));
+   (*list_p)-next = NULL;
+   (*list_p)-item = item;
+}
+
 unsigned object_list_length(struct object_list *list)
 {
unsigned ret = 0;
diff --git a/object.h b/object.h
--- a/object.h
+++ b/object.h
@@ -41,6 +41,9 @@ void mark_reachable(struct object *obj, 
 struct object_list *object_list_insert(struct object *item, 
   struct object_list **list_p);
 
+void object_list_append(struct object *item,
+   struct object_list **list_p);
+
 unsigned object_list_length(struct object_list *list);
 
 int object_list_contains(struct object_list *list, struct object *obj);
diff --git a/tree.c b/tree.c
--- a/tree.c
+++ b/tree.c
@@ -1,5 +1,7 @@
 #include tree.h
 #include blob.h
+#include commit.h
+#include tag.h
 #include cache.h
 #include stdlib.h
 
@@ -212,3 +214,20 @@ int parse_tree(struct tree *item)
free(buffer);
return ret;
 }
+
+struct tree *parse_tree_indirect(const unsigned char *sha1)
+{
+   struct object *obj = parse_object(sha1);
+   do {
+   if (!obj)
+   return NULL;
+   if (obj-type == tree_type)
+   return (struct tree *) obj;
+   else if (obj-type == commit_type)
+   return ((struct commit *) obj)-tree;
+   else if (obj-type == tag_type)
+   obj = ((struct tag *) obj)-tagged;
+   else
+   return NULL;
+   } while (1);
+}
diff --git a/tree.h b/tree.h
--- a/tree.h
+++ b/tree.h
@@ -32,4 +32,7 @@ int parse_tree_buffer(struct tree *item,
 
 int parse_tree(struct tree *tree);
 
+/* Parses and returns the tree in the given ent, chasing tags and commits. */
+struct tree *parse_tree_indirect(const unsigned char *sha1);
+
 #endif /* TREE_H */

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Reorganize read-tree

2005-08-31 Thread Daniel Barkalow
On Wed, 31 Aug 2005, Catalin Marinas wrote:

 Daniel Barkalow [EMAIL PROTECTED] wrote:
  I got mostly done with this before Linus mentioned the possibility of
  having multiple index entries in the same stage for a single path. I
  finished it anyway, but I'm not sure that we won't want to know which of
  the common ancestors contributed which, and, if some of them don't have a
  path, we wouldn't be able to tell.
 
 I don't have time to look at the patch and I don't have a good
 knowledge of the GIT internals, so I will just ask. Does this patch
 changes the call convention for git-merge-one-file-script? I have my
 own script for StGIT and I would need to know whether it is affected
 or not.

Nope, it only changes the trivial merge calling convention within 
read-tree.c; I think it's plausible that we might like to add information 
at some point, but the short-term goal is just to prevent a few bad cases 
in trivial merges.
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Stgit - patch history / add extra parents

2005-08-31 Thread Daniel Barkalow
On Tue, 30 Aug 2005, Catalin Marinas wrote:

 Back from holiday. Thanks to all who replied to this thread.
 
 On Tue, 2005-08-23 at 14:05 -0400, Daniel Barkalow wrote:
  Having a useful diff isn't really a requirement for a parent; the diff in
  the case of a merge is going to be the total of everything that happened
  elsewhere. The point is to be able to reach some commits between which
  there are interesting diffs.
  
  This also depends on how exactly freeze is used; if you use it before
  commiting a modification to the patch without rebasing, you get:
  
  old-top - new-top
^^
 \  /
bottom
  
  bottom to old-top is the old patch
  bottom to new-top is the new patch
  old-top to new-top is the change to the patch
  
  Then you want to keep new-top as a parent for rebasings until one of these
  is frozen. These links are not interesting to look at, but preserve the
  path to the old-top:new-top change, which is interesting.
 
 This was my initial StGIT implementation (up to version 0.3), only that
 there was no freeze command. Since I want an StGIT tree to be clean to
 the outside world, I wouldn't keep multiple parents for the visible top
 of a patch.
 
 As I understand from Junio's and Linus' e-mails (on the 23rd of August),
 there might be problems with merging the HEAD of an StGIT-managed tree
 if the above method is accessible via HEAD.

Right, you'd want a separate head which is what you ask people to merge; 
the rest is only visible to people who are working on preparing the patch. 
But you could keep both sets of stuff (sharing tree objects but not 
commits).

  Ignoring the links to the corresponding bottoms, the development therefore
  looks like:
  
  local1 - local2 - merge - local3 - merge
  ^   ^  ^
  mainline
  
  And this is how development is normally supposed to look. The trick is to
  only include a minimal number of merges.
 
 A merge occurs every time a patch is rebased. Anyway, having the bottoms
 in the graph (which is the main idea of StGIT) together with the old-top
 (or frozen state) parents make the graph pretty complicated.

It should be possible to drop merges such that there's only one between 
any pair of local changes. That is, if you rebase at the end of the line 
above, it would get as parents local3 and the new bottom, not the last 
merge and the new bottom.

The mainline changes only come in through the bottoms, so higher levels 
should look the same, but with the lower levels in the place of mainline.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] Reorganize read-tree

2005-08-30 Thread Daniel Barkalow
I got mostly done with this before Linus mentioned the possibility of
having multiple index entries in the same stage for a single path. I
finished it anyway, but I'm not sure that we won't want to know which of
the common ancestors contributed which, and, if some of them don't have a
path, we wouldn't be able to tell. The other advantages I see to this
approach are:

 - it uses the more common parser of tree objects, moving toward having
   only one (diff-cache still uses read_tree(), however).
 - it doesn't need to do very complicated things with the index; the
   original read-tree does a bunch of stuff with an index with a gap in
   the middle containing obsolete entries.
 - it uses a much simpler method of finding directory/file conflicts,
   which is possible because the struct trees represent directories as
   well as files.
 - it deals with each path completely before going on to the next one,
   instead of first dealing with each input tree and then dealing with
   each path.
 - it removes a lot of intimate knowledge of the index structure from the
   program.

The general idea is that it figures out what trees you want, and then
iterates through the entry lists together, recursing into directories, and
calls the merge function with an array of the index entries (not yet
added) for the path in each tree; the merge function adds the appropriate
things to the index.

Note that this set doesn't include calling merge functions with multiple
ancestors or remotes; that can be done when we've decided on whether my
version of read-tree is worth using.

There are various potential refinements, plus removing a bunch of memory
leaks, still to do, but I think this is sufficiently close to review.

(Refinements: it ought to have two indices in memory, the old and the new,
and never modify the old and only append to the new, to simplify things
further; it ought to use a sentinal value for the index entry to indicate
that there is something in the tree to conflict with there being a file at
the given path; the --emu23 logic could be clearer)

The first patch adds a few functions to the object library.
The second patch changes read-tree around; It is essentially a rewrite,
except for the merge functions and main().

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] Object model additions for read-tree

2005-08-30 Thread Daniel Barkalow
Adds object_list_append() and a function to get the struct tree from an ent.

Signed-off-by: Daniel Barkalow [EMAIL PROTECTED]
---

 object.c |   11 +++
 object.h |3 +++
 tree.c   |   19 +++
 tree.h   |3 +++
 4 files changed, 36 insertions(+), 0 deletions(-)

49d33c385aa69d17c991300f73e77c6718a2b4a6
diff --git a/object.c b/object.c
--- a/object.c
+++ b/object.c
@@ -184,6 +184,17 @@ struct object_list *object_list_insert(s
 return new_list;
 }

+void object_list_append(struct object *item,
+   struct object_list **list_p)
+{
+   while (*list_p) {
+   list_p = ((*list_p)-next);
+   }
+   *list_p = xmalloc(sizeof(struct object_list));
+   (*list_p)-next = NULL;
+   (*list_p)-item = item;
+}
+
 unsigned object_list_length(struct object_list *list)
 {
unsigned ret = 0;
diff --git a/object.h b/object.h
--- a/object.h
+++ b/object.h
@@ -41,6 +41,9 @@ void mark_reachable(struct object *obj,
 struct object_list *object_list_insert(struct object *item,
   struct object_list **list_p);

+void object_list_append(struct object *item,
+   struct object_list **list_p);
+
 unsigned object_list_length(struct object_list *list);

 int object_list_contains(struct object_list *list, struct object *obj);
diff --git a/tree.c b/tree.c
--- a/tree.c
+++ b/tree.c
@@ -1,5 +1,7 @@
 #include tree.h
 #include blob.h
+#include commit.h
+#include tag.h
 #include cache.h
 #include stdlib.h

@@ -212,3 +214,20 @@ int parse_tree(struct tree *item)
free(buffer);
return ret;
 }
+
+struct tree *parse_tree_indirect(const unsigned char *sha1)
+{
+   struct object *obj = parse_object(sha1);
+   do {
+   if (!obj)
+   return NULL;
+   if (obj-type == tree_type)
+   return (struct tree *) obj;
+   else if (obj-type == commit_type)
+   return ((struct commit *) obj)-tree;
+   else if (obj-type == tag_type)
+   obj = ((struct tag *) obj)-tagged;
+   else
+   return NULL;
+   } while (1);
+}
diff --git a/tree.h b/tree.h
--- a/tree.h
+++ b/tree.h
@@ -32,4 +32,7 @@ int parse_tree_buffer(struct tree *item,

 int parse_tree(struct tree *tree);

+/* Parses and returns the tree in the given ent, chasing tags and commits. */
+struct tree *parse_tree_indirect(const unsigned char *sha1);
+
 #endif /* TREE_H */

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Change read-tree to merge before using the index.

2005-08-30 Thread Daniel Barkalow
Signed-off-by: Daniel Barkalow [EMAIL PROTECTED]
---

 read-tree.c |  522 ++-
 1 files changed, 297 insertions(+), 225 deletions(-)

d0f45ad81db2e133c49c23bd09c5615da344bb5c
diff --git a/read-tree.c b/read-tree.c
--- a/read-tree.c
+++ b/read-tree.c
@@ -5,28 +5,280 @@
  */
 #include cache.h

-static int stage = 0;
+#include object.h
+#include tree.h
+
+static int merge = 0;
+static int emu23 = 0;
 static int update = 0;

-static int unpack_tree(unsigned char *sha1)
+static struct object_list *trees = NULL;
+
+typedef int (*merge_fn_t)(struct cache_entry **src,
+ struct cache_entry **dest,
+ int df_conflicts_2,
+ int df_conflicts_3);
+
+static int unpack_trees_rec(struct tree_entry_list **posns, int len,
+   const char *base, merge_fn_t fn,
+   int file2, int file3, int *indpos)
+{
+   int baselen = strlen(base);
+   int src_size = len + 1;
+   if (emu23)
+   src_size++;
+   if (src_size  4)
+   src_size = 4;
+   do {
+   int i;
+   char *first = NULL;
+   int pathlen;
+   unsigned ce_size;
+   int dir2 = 0;
+   int dir3 = 0;
+   int subfile2 = file2;
+   int subfile3 = file3;
+   struct tree_entry_list **subposns = NULL;
+   struct cache_entry **src = NULL;
+   char *cache_name = NULL;
+
+   /* Find the first name in the input. */
+
+   /* Check the cache */
+   if (merge  *indpos  active_nr) {
+   /* This is a bit tricky: */
+   /* If the index has a subdirectory (with
+* contents) as the first name, it'll get a
+* filename like foo/bar. But that's after
+* foo, so the entry in trees will get
+* handled first, at which point we'll go into
+* foo, and deal with bar from the index,
+* because the base will be foo/. The only
+* way we can actually have foo/bar first of
+* all the things is if the trees don't
+* contain foo at all, in which case we'll
+* handle foo/bar without going into the
+* directory, but that's fine (and will return
+* an error anyway, with the added unknown
+* file case.
+*/
+
+   cache_name = active_cache[*indpos]-name;
+   if (strlen(cache_name)  baselen 
+   !memcmp(cache_name, base, baselen)) {
+   cache_name += baselen;
+   first = cache_name;
+   } else {
+   cache_name = NULL;
+   }
+   }
+
+   for (i = 0; i  len; i++) {
+   if (!posns[i])
+   continue;
+   if (!first || strcmp(first, posns[i]-name)  0)
+   first = posns[i]-name;
+   }
+   /* No name means we're done */
+   if (!first)
+   return 0;
+
+   pathlen = strlen(first);
+   ce_size = cache_entry_size(baselen + pathlen);
+
+   if (cache_name  !strcmp(cache_name, first)) {
+   src = xmalloc(sizeof(struct cache_entry *) *
+ src_size);
+   memset(src, 0,
+  sizeof(struct cache_entry *) *
+  src_size);
+   src[0] = active_cache[*indpos];
+   remove_cache_entry_at(*indpos);
+   if (emu23) {
+   // we need this in stage 2 as well as stage 0
+   struct cache_entry *copy =
+   xmalloc(ce_size);
+   memcpy(copy, src[0], ce_size);
+   copy-ce_flags =
+   create_ce_flags(baselen + pathlen, 2);
+   if (dir2 || file2) {
+   die(cannot merge index and our head 
tree);
+   }
+   src[2] = copy;
+   subfile2 = 1;
+   }
+   }
+
+   for (i = 0; i  len; i++) {
+   struct cache_entry *ce;
+   int ce_stage;
+   if (!posns[i] ||
+   strcmp(first, posns[i]-name

Comments in read-tree about #nALT

2005-08-27 Thread Daniel Barkalow
I've gotten to the point of having all of the entries for a given path
ready to put into the cache at the same, and now I want to convert the
merge functions to take their data directly, rather than in the cache, so
that they can take extra entries for extra ancestors.

Part of threeway_merge, however, wants to search the rest of the cache for
interfering entries in some cases, which would have to happen differently,
because I won't have the cache completely filled out beforehand. I'm
trying to figure out what the comments are talking about, and they seem to
refer to a list of the possible cases. Is that list somewhere convenient?

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Merges without bases

2005-08-27 Thread Daniel Barkalow
On Sat, 27 Aug 2005, Martin Langhoff wrote:

 On 8/27/05, Daniel Barkalow [EMAIL PROTECTED] wrote:
  The problem with both of these (and doing it in the build system) is that,
  when a project includes another project, you generally don't want whatever
  revision of the included project happens to be the latest; you want the
  revision of the included project that the revision of the including
  project you're looking at matches. That is, if App includes Lib, and

 Exactly - so you do it on a tag, or a commit date with cvs. With Arch,
 GIT and others that have a stable id for each commit, you can use that
 or the more user-friendly tags.

I'm thinking of cases like openssl, openssh, and libcrypto. Openssl and
openssh both use libcrypto but not each other (looking at the ldd output,
rather than packaging). However, it would be too much of a pain to work
directly on libcrypto without working through some other package, because
the library doesn't have its own applications. Furthermore, if you're
doing much to libcrypto, you're likely doing it in the context of a
particular application (say, for example, ssh needs a new cipher that
isn't supported for SSL at the time). You'd want to make simultaneous
changes to libcrypto to implement the new feature and to openssh to use
it; neither can be validated until the other is written, which means that
you'll have both projects checked out and dirty (in the cache sense) at
the same time, and be building the using project.

It would also be good to be able to check in this whole thing through the
version control system, rather than partially through a change to the
build system. That is, if I change the included libcrypto, commit it, and
commit the including openssh, the system as a whole should understand that
I want to change which commit of libcrypto gets used. Similarly, it would
be good to merge changes into the libcrypto used by openssh with the same
procedure used to merge changes to openssh itself, including supporting
non-fast-forward when there's a local version in use.

(Of course, currently, libcrypto is strictly part of openssl, because it
would be too much of a pain with the present version control to make it
independant, and openssh depends on openssl, despite not even linking
against -lssl, because openssl got libcrypto first.)

 The good thing here is that a makefile will know how to handle the
 situation if the external lib is hosted in Arch, in SVN, or Visual
 SourceSafe. If your external lib is only available as a tarball in a
 url, you can fetch that and uncompress it too. Arch configurations are
 _cute_ but useless in any but the most narrow cases.

Certainly, if it's sufficiently external to be in a different SCM it
should be handled by the build system. Actually, if it's even nearly that
external, it's probably going to be handled best by requiring people to go
get it themselves.

I find it odd that you say that the standard approach is to have the build
system fetch a version of the included package; my experience is that
projects either just report (or fail to report) a dependancy on having the
other package or they copy the project into their project. The former
means they can't change it (which is generally good, unless it becomes
necessary), while the latter causes update problems (c.f. zlib).

I think that Arch configurations and the CVS equivalent are, in fact,
useless, but that this is only due to implementation being insufficiently
clever, not due to the concept being inherently bad; I feel the same way
about distributed development under Arch, which is really nice under git,
so I have hope that something better could be done.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Comments in read-tree about #nALT

2005-08-27 Thread Daniel Barkalow
On Sat, 27 Aug 2005, Linus Torvalds wrote:

 On Sat, 27 Aug 2005, Daniel Barkalow wrote:
 
  What I missed was that the effect of causes_df_conflict is to give no
  merge for the entry, rather than giving an error overall. So I do need an
  equivalent.

 Daniel,
  I'm not 100% sure what you're trying to do, but one thing that might work
 out is to just having multiple stage 3 entries with the same pathname.

 We current use 4 stages:
  - stage 0 is resolved
  - stage 1 is original
  - stage 2 is one branch
  - stage 3 is another branch

 But if we allowed duplicate entries per stage, I think we could easily
 just fold stage 2/3 into one stage, and just have n entries in stage 2.
 That would immediately mean that a three-way merge could be n way.

 The only rule would be that when you add a entry to stage 2, you must
 always add it after any previous entry that is already in stage 2. That
 should be easy.

It looks like stage 2 is currently special as the stage that's similar to
the index/HEAD/working tree. However, I don't see any problem with n
entries in stage 3, except that, if you have a non-maximal number of them
for some reason, it'll be impossible to determine which came from which
tree.

 In fact, this extension might even allow us to solve the multiple merge
 base problem: we could allow multiple entries in stage 1 too, ie one
 entry per merge base (and just collapse identical entries - there's no
 ordering involved in stage 1 entries).

That's actually the problem I was working on.

 So you could merge n trees with m bases, and all without really
 changing the current logic much at all.

 Maybe I'm missing something (like what you're trying to do in the first
 place), but this _seems_ doable.

I'd be afraid of confusing everything by removing the uniqueness
invariant, although I guess not too much does anything with entries in
stages other than 0. I probably just don't find the index as intuitive as
you do and as the struct tree representation.

I'm working on arranging the code to look at each path in sequence, with
the input trees as the inner loop, rather than with the loops in the other
order; using parse_tree to parse the objects instead of read_tree; and
doing trivial merges before putting things in the cache, rather than
after. I'd been thinking that this would avoid a limit on the number of
stages, since I hadn't considered whether multiple entries for the same
path and stage could be allowed.

I still think that my order is likely to be easier to understand and
involve read-tree relying less on tricky properties of the data
structures, but I'll have to get it done before I can say that for sure.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Merges without bases

2005-08-25 Thread Daniel Barkalow
On Thu, 25 Aug 2005, Junio C Hamano wrote:

 One thing that makes me reluctant to recommend this merging
 unrelated projects business is that I suspect that it makes
 things _much_ harder for the upstream project that is being
 merged, and should not be done without prior arrangement; Linus
 merged gitk after talking with paulus, so that was OK.

I'd still like to revive my idea of having projects overlaid on each
other, where the commits in the project that absorbed the other project
say, essentially, also include this other commit, but any changes to
those files belong to that branch, not this one. That way, Linus could
have included gitk in git, but changes to it, even when done in a git
working tree, would show up in commits that only include gitk. (git
actually can handle this with the alternative index file mechanism that
Linus mentioned in a different thread.)

Definitely post-1.0, of course.

 Suppose the above My Project is published, people send patches
 for core GIT part to it, and you as the maintainer of that My
 Project accept those patches.  The users of My Project would
 be happy with the new features and wouldn't care less where
 their core GIT tools come from.  But how would _I_ pull from
 that My Project, if I did not want to pull unrelated stuff in?

With the right info, the tools could be made to automatically generate
suitable commits, because those files would be tracked by a separate index
file and committed into a separate branch, which would then be reincluded
(by reference) in the containing project.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Looking at multiple ancestors in merge

2005-08-25 Thread Daniel Barkalow
On Wed, 24 Aug 2005, Daniel Barkalow wrote:

 Of course, this is going to take a bit of work, because read-tree
 currently puts all of its arguments into the cache and then works on
 merging, and taking multiple ancestors requires putting them somewhere
 else, because they won't fit in the cache.

I've started this, and have gotten as far as having read-tree accept  3
trees and ignore everything but the last 3. Am I correct in assuming that
if I break read-tree in any way, some test will fail?

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: baffled again

2005-08-24 Thread Daniel Barkalow
On Wed, 24 Aug 2005, Junio C Hamano wrote:

 [EMAIL PROTECTED] writes:

  So I have another anomaly in my GIT tree.  A patch to
  back out a bogus change to arch/ia64/hp/sim/boot/bootloader.c
  in my release branch at commit
 
   62d75f3753647656323b0365faa43fc1a8f7be97
 
  appears to have been lost when I merged the release branch to
  the test branch at commit
 
   0c3e091838f02c537ccab3b6e8180091080f7df2

 : siamese; git cat-file commit 0c3e091838f02c537ccab3b6e8180091080f7df2
 tree 61a407356d1e897e0badea552ce69e657cab6108
 parent 7ffacc1a2527c219b834fe226a7a55dc67ca3637
 parent a4cce10492358b33d33bb43f98284c80482037e8
 author Tony Luck [EMAIL PROTECTED] 1124808655 -0700
 committer Tony Luck [EMAIL PROTECTED] 1124808655 -0700

 Pull release into test branch

 So I pulled 7ffacc and a4cce1 from your repository and started
 digging from there.  7ffacc was the head of test branch back
 then, and a4cce1 was the head of release branch.  I checked
 out 7ffacc in the repository and pulled a4cce1 into it, using
 the GIT with the optimum merge-base patch.

 : siamese; git pull . aegl-release
 Packing 0 objects
 Unpacking 0 objects

 * committish: a4cce10492358b33d33bb43f98284c80482037e8
 refs/heads/aegl-release from .
 Trying to find the optimum merge base.
 Trying to merge a4cce10492358b33d33bb43f98284c80482037e8 into 
 7ffacc1a2527c219b834fe226a7a55dc67ca3637 using 
 c1ffb910f7a4e1e79d462bb359067d97ad1a8a25.
 Simple merge failed, trying Automatic merge
 Auto-merging arch/ia64/sn/kernel/io_init.c.
 Committed merge db376974c0aebb9e99e5cd0bce21088c6a9d927c
  arch/ia64/hp/sim/boot/boot_head.S |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

 It is using c1ffb9 as the merge base.  The problematic path
 in the three trees involved are:

 : siamese; git ls-tree -r aegl-test-7ffacc1a | grep 
 arch/ia64/hp/sim/boot/bootloader.c
 100644 blob a7bed60b69f9e8de9a49944e22d03fb388ae93c7  
 arch/ia64/hp/sim/boot/bootloader.c
 : siamese; git ls-tree -r aegl-release-a4cce1 | grep 
 arch/ia64/hp/sim/boot/bootloader.c
 100644 blob 51a7b7b4dd0e7c5720683a40637cdb79a31ec4c4  
 arch/ia64/hp/sim/boot/bootloader.c
 : siamese; git ls-tree -r aegl-c1ffb9 | grep 
 arch/ia64/hp/sim/boot/bootloader.c
 100644 blob 51a7b7b4dd0e7c5720683a40637cdb79a31ec4c4  
 arch/ia64/hp/sim/boot/bootloader.c

 So the file did not change between the merge base and release,
 and test had the change.  merge-cache picked the one in the test
 release.  Your guess in the other message hits the mark.

 I wonder what _other_ candidates these two commits have in
 common and what would have happened if they were used as the
 base instead?

 : siamese; git merge-base -a aegl-test-7ffacc1a aegl-release-a4cce1
 f6fdd7d9c273bb2a20ab467cb57067494f932fa3
 3a931d4cca1b6dabe1085cc04e909575df9219ae
 c1ffb910f7a4e1e79d462bb359067d97ad1a8a25

 You can check what variant of the file each of these commits
 contain.

 What is happening is:

 * the problematic patch 4aec0f is one before 3a931d.  Among the
   three merge-base candidates, only 3a931d contains teh wrongly
   patched version.

 * the problematic change 4aec0f patch introduces is part of test
   branch, because it was pulled via release.

 * the tip of release being merged into test has this patch
   reverted, and the file is exactly the same as before 4aec0f
   patch.

 So three-way trivial merge algorithm says, hey, the file did
 not change between common ancestor and release but it is
 different in test, so the one in the test branch must be the
 merge result.

 This does not have much to do with which common ancestor
 merge-base chooses.  Sorry, I am not sure what is the right way
 to resolve this offhand.

If it picks 3a931d4cca1b6dabe1085cc04e909575df9219ae, it will determine
that the file didn't change between that and test, and is different in
release, so the one in release must be right. I believe that the hint that
something is going on is that different common ancestors give
different trivial merges (as opposed to some giving failure and some
giving the same result), and resolving it probably involves identifying
that that paths from f6f... and c1f... to release don't keep the same blob
through the middle, despite having the same ends.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Query about status of http-pull

2005-08-24 Thread Daniel Barkalow
On Wed, 24 Aug 2005, Martin Schlemmer wrote:

 Hi,

 Recently cogito again say that the rsync method will be deprecated in
 future (due to http-pull now supporting pack objects I suppose), but it
 seems to me that it still have other issues:

 -
 lycan linux-2.6 # git pull origin
 Fetching HEAD using http
 Getting pack list
 error: Couldn't get 0572e3da3ff5c3744b2f606ecf296d5f89a4bbdf: not separate or 
 in any pack
 error: Tried 
 http://www.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git/objects/05/72e3da3ff5c3744b2f606ecf296d5f89a4bbdf
 Cannot obtain needed object 0572e3da3ff5c3744b2f606ecf296d5f89a4bbdf
 while processing commit .

It looks like pack-c24bb5025e835a3d8733931ce7cc440f7bfbaaed isn't in the
pack list. I suspect that updating this file should really be done by
anything that creates pack files, because people forget to run the program
that does it otherwise and then http has problems.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: baffled again

2005-08-24 Thread Daniel Barkalow
On Wed, 24 Aug 2005, Linus Torvalds wrote:

 Now, if the shared patch hadn't been a patch, but a shared _commit_, then
 the thing would have been unambiguous - the shared commit would have been
 the merge point, and the revert would have clearly undone that shared
 commit.

Actually, it was a shared commit
(4aec0fb12267718c750475f3404337ad13caa8f5), which was (an ancestor of) a
candidate merge point, but wasn't the one selected. Since a different one
was chosen, it looked to the 3-way merge like a shared patch (since it
ignores the untaken parent in the merges in the history).

This should be fixable, but it'll require more cleverness in read-tree.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] undo and redo

2005-08-24 Thread Daniel Barkalow
On Wed, 24 Aug 2005, Carl Baldwin wrote:

 This brings up a good point (indirectly).  git prune would destroy the
 undo objects.  I had thought of this but decided to ignore it for the
 time being.

If you made undo store the tree under refs somewhere, git prune would
preserve it.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Looking at multiple ancestors in merge

2005-08-24 Thread Daniel Barkalow
On Wed, 24 Aug 2005, A Large Angry SCM wrote:

 Daniel Barkalow wrote:
  I'm starting to work on letting the merging process see multiple
  ancestors, and I think it's messy enough that I should actually discuss
  it.
 
  Review of the issue:
 
  It is possible to lost reverts in cases when merging two commits with
  multiple ancestors, in the following pattern: (letters representing blobs
  at some filename, children to the right)
 
  a-b-b-a-?
   \ X   /
a-b-b
 
 [Lots of stuff deleted]

 There seems to be a lot of effort being put into auto-magically choosing
 the right merge in the presence of multiple possible merge bases.
 Unfortunately, most (all?) of the proposals are attempting to divine
 intent, and so, are guaranteed to be 100% wrong at least some of the time.

 Wouldn't it be better, instead, to detect that current merge being
 attempted is ambiguous and require the user to specify the correct merge
 base? The alternative is a tool that appears to work all of the time but
 does the wrong thing some of the time.

My proposal is actually to detect when a merge is ambiguous. In order to
determine that, however, you have to evaluate multiple potential outcomes
and see if they are actually different. I'm working on an efficient way to
do that.

Then further work could look into eliminating possibilities when
information about the history excludes them. There were two issues in the
case that Tony hit: it ignored a potential correct outcome for the merge,
and it didn't ignore an outcome which could be demonstrated to be
incorrect. The priority is to resolve the first, but things which improve
the second or help with solutions to the second are worth understanding.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Stgit - patch history / add extra parents

2005-08-23 Thread Daniel Barkalow
On Tue, 23 Aug 2005, Catalin Marinas wrote:

  So the point is that there are things which are, in fact, parents, but we
  don't want to list them, because it's not desired information.

 What's the definition of a parent in GIT terms? What are the
 restriction for a commit object to be a parent? Can a parent be an
 arbitrarily chosen commit?

Something is legitimate as a parent if someone took that commit and did
something to it to get the new commit. The operation which caused the
change is not specified. But you only want to include it if anyone cares
about the parent.

(For example, I often start with a chunk of work that does multiple things
and is committed; I take mainline and generate a series of commits from
there. It would be legitimate to list my development commit as a parent of
each of these, since I did actually take it and strip out the unrelated
changes. This would be a bit confusing in the log, but would make merges
between something based on the messy version and something based on the
refined version work well. On the other hand, I don't want to report the
existance of the messy version, so I don't include it.)

 An StGIT patch is a represented by a top and bottom commit
 objects. The bottom one is the same as the parent of the top
 commit. The patch is the diff between the top's tree id and the
 bottom's tree id.

 Jan's proposal is to allow a freeze command to save the current top
 hash and later be used as a second parent for the newly generated
 top. The problem I see with this approach is that (even for the
 internal view you described) the newly generated top will have two
 parents, new-bottom and old-top, but only the diff between new-top and
 new-bottom is meaningful. The diff between new-top and old-top (as a
 parent-child relation) wouldn't contain anything relevant to the patch
 but all the new changes to the base of the stack.

Having a useful diff isn't really a requirement for a parent; the diff in
the case of a merge is going to be the total of everything that happened
elsewhere. The point is to be able to reach some commits between which
there are interesting diffs.

This also depends on how exactly freeze is used; if you use it before
commiting a modification to the patch without rebasing, you get:

old-top - new-top
  ^^
   \  /
  bottom

bottom to old-top is the old patch
bottom to new-top is the new patch
old-top to new-top is the change to the patch

Then you want to keep new-top as a parent for rebasings until one of these
is frozen. These links are not interesting to look at, but preserve the
path to the old-top:new-top change, which is interesting.

Ignoring the links to the corresponding bottoms, the development therefore
looks like:

local1 - local2 - merge - local3 - merge
^   ^  ^
mainline

And this is how development is normally supposed to look. The trick is to
only include a minimal number of merges.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Removing deleted files after checkout

2005-08-23 Thread Daniel Barkalow
On Tue, 23 Aug 2005, Carl Baldwin wrote:

 Hello,

 I recently started using git to revision control the source for my
 web-page.  I wrote a post-update hook to checkout the files when I push
 to the 'live' repository.

 In this particular context I decided that it was important to me to remove
 deleted files after checking out the new HEAD.  I accomplished this by running
 git-ls-files before and after the checkout.

 Is there a better way?  Could there be some way built into git to easily
 find out what files dissappear when replacing the current index with one
 from a new tree?  Is there already?  The behavior of git should NOT
 change to delete these files but I would argue that some way should
 exist to query what files disappeared if removing them is desired.

If you don't use -f, git-checkout-script removes deleted files. Using -f
tells it to ignore the old index, which means that it can't tell the
difference between removed files and files that weren't tracked at all.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Removing deleted files after checkout

2005-08-23 Thread Daniel Barkalow
On Tue, 23 Aug 2005, Carl Baldwin wrote:

 On Tue, Aug 23, 2005 at 03:43:56PM -0400, Daniel Barkalow wrote:
  On Tue, 23 Aug 2005, Carl Baldwin wrote:
 
   Hello,
  
   I recently started using git to revision control the source for my
   web-page.  I wrote a post-update hook to checkout the files when I push
   to the 'live' repository.
  
   In this particular context I decided that it was important to me to remove
   deleted files after checking out the new HEAD.  I accomplished this by 
   running
   git-ls-files before and after the checkout.
  
   Is there a better way?  Could there be some way built into git to easily
   find out what files dissappear when replacing the current index with one
   from a new tree?  Is there already?  The behavior of git should NOT
   change to delete these files but I would argue that some way should
   exist to query what files disappeared if removing them is desired.
 
  If you don't use -f, git-checkout-script removes deleted files. Using -f
  tells it to ignore the old index, which means that it can't tell the
  difference between removed files and files that weren't tracked at all.

 Maybe I'm doing something wrong.  This does not happen for me.

 I tried a simple test with git v0.99.4...

 cd
 mkdir test-git  cd test-git/
 echo testing | cg-init
 echo contents  file
 git-add-script file
 git-commit-script -m 'testing'

[point 1]

 cd ..
 cg-clone test-git/.git/ test-git2
 cd test-git2
 cg-rm file
 git-commit-script -m 'testing'
 ls

 cg-push
 cd ../test-git
 git-checkout-script

Ah, okay. I think push and checkout don't play that well together;
push changes the ref, which checkout uses to determine what it expects
for the old contents, and then it's confused.

What you probably actually want is:

cd ../test-git
git pull ../test-git2

which will correctly identify before and after, and remove any files that
were removed.

Alternatively, you could do, at point 1:

cp .git/refs/master .git/refs/deployed
git checkout deployed

Then, after the push and cd:

git checkout master
cp .git/refs/master .git/refs/deployed
git checkout deployed

because checkout does remove files if you switch from a branch with them
(e.g., deployed) to one without them (master, after the push).

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Removing deleted files after checkout

2005-08-23 Thread Daniel Barkalow
On Tue, 23 Aug 2005, Carl Baldwin wrote:

 The point is to push and use a post-update hook to do the checkout.  So,
 this won't be possible.

You could have the remote repository be something like
~/git/website.git, and have a hook which does: cd ~/www; git pull
~/git/website.git/. That is, have three things: the directory where you
work on stuff, the central storage location, and the area that the web
server serves, and have the storage location automatically update the web
server area. That's what I do with my website section that's still in CVS,
and the general concept is good (and means that the real repository
isn't somewhere the web server is poking around).

  which will correctly identify before and after, and remove any files that
  were removed.
 
  Alternatively, you could do, at point 1:
 
  cp .git/refs/master .git/refs/deployed
  git checkout deployed

 How to get a post-update hook to do this?  I suppose an update script
 could set this up for the post-update to later use.

If you have deployed checked out, and you push to master in the same
repository, having the hook do git resolve deployed master auto-update
should work.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Stgit - patch history / add extra parents

2005-08-23 Thread Daniel Barkalow
On Tue, 23 Aug 2005, Jan Veldeman wrote:

 Daniel Barkalow wrote:

  On Tue, 23 Aug 2005, Catalin Marinas wrote:
 
  Something is legitimate as a parent if someone took that commit and did
  something to it to get the new commit. The operation which caused the
  change is not specified. But you only want to include it if anyone cares
  about the parent.

 This is indeed what I thought a parent should be used for. As an adition,
 I'll try to explain why I would sometimes want to care about some parents:

 I want to track a mailine tree, but have quite a few changes, which shoudn't
 be commited to the mainline immediately (let's call it my development tree).
 This is why I would use stgit. But I would also want to colaborate with
 other developers on this development tree, so I sometimes want to make
 updates available of this development tree to the others. This is where
 current stgit falls short. To easily share this development tree, I want
 some history (not all, only the ones I choose) of this development tree
 included, so that the other developers can easily follow my development.

 The parents which should be visible to the outside, will always be versions
 of my development tree, which I have previously pushed out. My way of
 working would become:
 * make changes, all over the place, using stgit
 * still make changes (none of these gets tracked, intermittent versions are
   lost)
 * having a good day: changes looks good, I want to push this out:
   * push my tree out
   * stgit-free (which makes the pushed out commits, the new parents of my
 stgit patches)
 * restart from top

I'm not sure how applicable to this situation stgit really is; I see stgit
as optimized for the case of a patch set which is basically done, where
you want to keep it applicable to the mainline as the mainline advances.

For your application, I'd just have a git branch full of various stuff,
and then generate clean commits by branching mainline, diffing development
against it, cutting the diff down to just what I want to push, and
applying that. Then the clean patch goes into stgit.

 [...]
  This also depends on how exactly freeze is used; if you use it before
  commiting a modification to the patch without rebasing, you get:
 
  old-top - new-top
^^
 \  /
bottom
 
  bottom to old-top is the old patch
  bottom to new-top is the new patch
  old-top to new-top is the change to the patch
 
  Then you want to keep new-top as a parent for rebasings until one of these
  is frozen. These links are not interesting to look at, but preserve the
  path to the old-top:new-top change, which is interesting.

 my proposal does something like this, but a little more: not only does it
 keep track of the link between old-top and new-top, it also keeps track of
 the links between old-patch-in-between and new-patch-in-between.
 (This makes sense when the top is being removed or reordered)

I was thinking of this as being the top and bottom commits for a single
tracked patch, not as a whole series. I think patches lower wouldn't be
affected, and patches higher would see this as a rebase.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Removing deleted files after checkout

2005-08-23 Thread Daniel Barkalow
On Tue, 23 Aug 2005, Carl Baldwin wrote:

 The thing that this doesn't do is remove empty directories when the last
 file is deleted.  I once expressed the opinion in a previous thread that
 directories should be added and removed explicitly in git.  (Thus
 allowing an empty directory to be added).  If this were to happen then
 this case would get handled correctly.  However, if git stays with the
 status quo then I think that git-read-tree -u should be changed to
 remove the empty directory.  This would make it consistent.

I think that git-read-tree -u ought to remove a directory if it removes
the last file (or directory) in it.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Automatic merge failed, fix up by hand

2005-08-23 Thread Daniel Barkalow
On Tue, 23 Aug 2005, Junio C Hamano wrote:

 Only lightly tested, in the sense that I did only this one case
 and nothing else.  For a large repository and with complex
 merges, merge-base -a _might_ end up reporting many
 candidates, in which case the pre-merge step to figure out the
 best merge base may turn out to be disastrously slow.  I dunno.

I think it's the right thing to do for now (and what I was going to
suggest), and if people find it too slow, we can consider teaching
read-tree to take multiple common ancestors and use any of them that gives
clear result on a per-file basis.

On the other hand, Tony might have hit a bad case with an ill-chosen
common ancestor for a patch/revert sequence, and we probably want to look
into that if we've got some history that demonstrates the problem. I think
that, if there are two common ancestors, one of which has applied a patch
and one of which hasn't, and on one side of the merge it gets reverted, we
should get the revert, but we'll only get it if we choose the ancestor
where it was applied.

(Letters are versions of the file, which 'b' being the bad patch; the
 second column is the two choices for common ancestor)

  a-b-a-?
 / X   /
a-b-b-b

Of course, you could have the two lines exactly flipped for a different
file in the same commits, or for a different hunk in the same file, and
there would be no single choice that doesn't lose the revert. The really
right thing to do is identify that there is a b-a transition that is not
a trivial merge and that is not beyond a common ancestor, but that's hard
to determine easily and with sufficient granularity to catch everything.

I still someday want to do a version of diff/merge for git that could
select common ancestors on a per-hunk basis and identify block moves and
avoid giving confusing (but marginally shorter) diffs, but that's a major
undertaking that I don't have time for right now.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Stgit - patch history / add extra parents

2005-08-22 Thread Daniel Barkalow
On Sun, 21 Aug 2005, Jan Veldeman wrote:

 Catalin Marinas wrote:

   So for example, you only tag (freeze) the history when exporting the
   patches.  When an error is being reported on that version, it's easy to 
   view
   it and also view the progress that was already been made on those patches.
 
  I agree that it is a useful feature to be able to individually tag the
  patches. The problem is how to do this best. Your approach looks to me
  like it's not following the GIT DAG structure recommendation. Maybe the
  GIT designers could further comment on this but a commit object with
  multiple parents should be a result of a merge operation. A commit with
  a single parent should represent a transition of the tree from one state
  to another. With the freeze command you proposed, a commit with multiple
  parents is no longer a result of a merge operation, but just a
  convenience for tracking the patch history with gitk.

 My interpretation of parents is broader than only merges, and reading the
 README file, I believe it also the intension to do so (snippet from README
 file):

 A commit object ties such directory hierarchies together into
 a DAG of revisions - each commit is associated with exactly one tree
 (the directory hierarchy at the time of the commit). In addition, a
 commit refers to one or more parent commit objects that describe the
 history of how we arrived at that directory hierarchy.

One factor not mentioned there is that, as things move upstream, we often
want to discard a lot of history; if someone commits constantly to deal
with editor malfunction or something, we don't really want to take all of
this junk into the project history when it is cleaned up and accepted.

So the point is that there are things which are, in fact, parents, but we
don't want to list them, because it's not desired information.

Probably the right thing is to have two views of the stack: the internal
view, showing what actually happened, and the external view, showing what
would have happened if the developers had done everything right the first
time. When you make changes to the series, this adds to the internal view
and entirely replaces the external view.

I think that users will also want to discard the commits from the stack
before rebasing in favor of the commits after, because (a) rebasing isn't
all that interesting, especially if there's minimal merging, and (b)
otherwise you'd get a ton of boring commits that obscure the interesting
ones.

I think that the best rule would be that, when you modify a patch, the
previous version is the new version's parent, and when you rebase a
series, you include as a parent any parent of the input that isn't also in
the input (but never include the input itself as a parent of the output;
the point of rebasing is to pretend that it was the newer mainline that
you modified). This should mean that the internal history of a patch
consists of the present version, based on each version that was replaced
due to changing the patch rather than rebasing it.

Of course, there's an interesting situation with the commits earlier in a
series from a patch that was changed not being ancestors of the newer
versions of those patches (because they weren't interesting in the
development of those patches) but accessible as the commits that an
interesting patch was based on.

A possible solution is just to consider the revision of any patch a
significant event in the history of the whole stack, causing all of the
patches to get a new retained version.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Subject: [PATCH] Updates to glossary

2005-08-18 Thread Daniel Barkalow
On Thu, 18 Aug 2005, Johannes Schindelin wrote:

  tree object::
 - An object containing a list of blob and/or tree objects.
 - (A tree usually corresponds to a directory without
 - subdirectories).
 + An object containing a list of file names and modes along with refs
 + to the associated blob and/or tree objects. A tree object is
 + equivalent to a directory.

Actually, it contains object names, not refs, to be completely precise.
(refs would imply an additional indirection.)

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: First stab at glossary

2005-08-17 Thread Daniel Barkalow
On Wed, 17 Aug 2005, Johannes Schindelin wrote:

 Hi,

 long, long time. Here?s my first stab at the glossary, attached the
 alphabetically sorted, asciidoc marked up txt file (Comments?
 Suggestions? Pizzas?):

 object::
   The unit of storage in GIT. It is uniquely identified by
   the SHA1 of its contents. Consequently, an object can not
   be changed.

 SHA1::
   A 20-byte sequence (or 41-byte file containing the hex
   representation and a newline). It is calculated from the
   contents of an object by the Secure Hash Algorithm 1.

It's also often 40-character string (with whatever termination) in places
like commit objects, tag objects, command-line arguments, listings, and so
forth.

 object database::
   Stores a set of objects, and an individial object is identified
   by its SHA1 (its ref). The objects are either stored as single
   files, or live inside of packs.

 object name::
   Synonym for SHA1.

Have we killed the use of the third term hash for this? I'd say that
object name is the standard term, and SHA1 is a nickname, if only
because object name is more descriptive of the particular use of the
term.

 blob object::
   Untyped object, i.e. the contents of a file.

This i.e. should be e.g., since symlink targets are also stored as
blobs, and any other bulk data stored by itself would be. (IIRC, Junio has
a tagged blob to hold his public key, for example)

 tree object::
   An object containing a list of blob and/or tree objects.
   (A tree usually corresponds to a directory without
   subdirectories).

 tree::
   Either a working tree, or a tree object together with the
   dependent blob and tree objects (i.e. a stored representation
   of a working tree).

 cache::
   A collection of files whose contents are stored as objects.
   The cache is a stored version of your working tree. Well, can
   also contain a second, and even a third version of a working
   tree, which are used when merging.

 cache entry::
   The information regarding a particular file, stored in the index.
   A cache entry can be unmerged, if a merge was started, but not
   yet finished (i.e. if the cache contains multiple versions of
   that file).

 index::
   Contains information about the cache contents, in particular
   timestamps and mode flags (stat information) for the files
   stored in the cache. An unmerged index is an index which contains
   unmerged cache entries.

I think we might want to entirely kill the cache term, and talk only
about the index and index entries. Of course, a bunch of the code will
have to be renamed to make this completely successful, but we could change
the glossary and documentation, and mention cache and cache entry as
old names for index and index entry respectively.

 working tree::
   The set of files and directories currently being worked on.
   Think ls -laR

This is where the data is actually in the filesystem, and you can edit and
compile it (as opposed to a tree object or the index, which semantically
have the same contents, but aren't presented in the filesystem that way).

 directory::
   The list you get with ls :-)

 checkout::
   The action of updating the working tree to a revision which was
   stored in the object database.

Move after revision?

 revision::
   A particular state of files and directories which was stored in
   the object database. It is referenced by a commit object.

 commit::
   The action of storing the current state of the cache in the
   object database. The result is a revision.

 commit object::
   An object which contains the information about a particular
   revision, such as parents, committer, author, date and the
   tree object which corresponds to the top directory of the
   stored revision.

Move parent around here.

 changeset::
   BitKeeper/cvsps speak for commit. Since git does not store
   changes, but states, it really does not make sense to use
   the term changesets with git.

 ent::
   Favorite synonym to tree-ish by some total geeks.

Move after tree-ish.

 head::
   The top of a branch. It contains a ref to the corresponding
   commit object.

 branch::
   A non-cyclical graph of revisions, i.e. the complete history of
   a particular revision, which does not (yet) have children, which
   is called the branch head. The branch heads are stored in
   $GIT_DIR/refs/heads/.

A branch head might have children, if they're in another branch. (E.g., I
pull mainline, make a new branch based on it, and commit a change; the
head of mainline is still a branch head, even though it's the parent of my
new commit, because my new commit isn't in mainline.)

 ref::
   A 40-byte hex representation of a SHA1 pointing to a particular
   object. These are stored in $GIT_DIR/refs/.

 head ref::
   A ref pointing to a head. Often, this 

Re: Git 1.0 Synopis (Draft v4)

2005-08-16 Thread Daniel Barkalow
On Tue, 16 Aug 2005, Johannes Schindelin wrote:

 Hi,

 On Tue, 16 Aug 2005, Junio C Hamano wrote:

- Are all the files in Documentation/ reachable from git(7)
  or otherwise made into a standalone document using asciidoc
  by the Makefile?  I haven't looked into documentation
  generation myself (I use only the text files as they are);
  help to update the Makefile by somebody handy with asciidoc
  suite is greatly appreciated here.
 
  Volunteers?

 The attached script reveals:

 git-unpack-objects.txt is not reachable from git.txt
 git-cvsimport-script.txt is not reachable from git.txt
 git-send-email-script.txt is not reachable from git.txt
 git-rename-script.txt is not reachable from git.txt
 tutorial.txt is not reachable from git.txt
 git-show-index.txt is not reachable from git.txt
 cvs-migration.txt is not reachable from git.txt
 diffcore.txt is not reachable from git.txt
 git-ls-remote-script.txt is not reachable from git.txt
 git-apply.txt is not reachable from git.txt
 git-diff-stages.txt is not reachable from git.txt
 pack-protocol.txt is not reachable from git.txt

The ones that don't start with git probably don't belong in the same set;
perhaps there should be a technical (or something similar but shorter)
subdirectory for developer documentation instead of user documentation?
(And tutorial and cvs-migration can move to howto)

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Alternate object pool mechanism updates.

2005-08-16 Thread Daniel Barkalow
On Tue, 16 Aug 2005, Linus Torvalds wrote:

 Finally, I have to say that that info directory is confusing. Namely,
 there's two of them - the git info and the object info directories are
 totally different directories - maybe logical, but to me it smells like
 info is here a code-name for misc files that don't make sense anywhere
 else.

 What this all is leading up to is that I think we'd be better off with a
 totally new git config file, in .git/config, and we'd have all the
 startup configuration there. Including things like alternate object
 directories, perhaps standard preferences for that particular repo, and
 things like the grafts thing.

 Wouldn't that be nice?

I'd originally proposed the .git/info directory because I keep multiple
working trees for the same repository, by having symlinks for .git/objects
and .git/refs, and I could also get other per-repository things to be
shared properly without knowing exactly what they are if they're in a
subdirectory of .git that could be a symlink. This would mean that a
.git/config would be per-working-tree, like .git/index or .git/HEAD, not
pre-repository like .git/info/config. Of course, the core didn't have
any thing to go in .git/info at the time, so it didn't really get tacked
down.

(I find it convenient to have mainline and my latest work both checked out
for reference while I'm generating a series of commits for a patch set,
and I don't want three different repositories which could be out of sync;
this also keeps the repository safely out of pwd, since I have the actual
repositories as ~/git/{project}.git/)

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] Add support for figuring out where in the git archive we are

2005-08-16 Thread Daniel Barkalow
On Tue, 16 Aug 2005, Linus Torvalds wrote:

 If you use the GIT_DIR environment variable approach, it assumes that all
 filenames you give it are absolute and acts the way it always did before.

 Comments? Like? Dislike?

I'm all in favor, at least in the general case. I suspect there'll be some
things where we have to discuss the behavior, but we can argue that when
it comes up.

I think, slightly before 1.0, we should sort the library functions into a
new set of object files with matching header files, because setup is not
really distinctive, and there's at least one duplicate implementation
(the ssh subprocess code in your connect.c is the same as my rsh.c in what
it does, although yours uses two pipes and mine uses a socket).

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Cloning speed comparison

2005-08-15 Thread Daniel Barkalow
On Sat, 13 Aug 2005, Petr Baudis wrote:

   Hello,
 
   I've wondered how slow the protocols other than rsync are, and the
 (well, a bit dubious; especially wrt. caching on the remote side)
 results are:
 
   git clone-pack:ssh  25s
   git rsync   27s
   git http-pull   47s
   git dumb-http   54s
   git ssh-pull660s
 
   cogito  clone-pack:ssh  35s (!)
   cogito  rsync   140s
   cogito  ssh-pull480s
   cogito  http-pull   extrapolated to about an hour!

I should be able to get http-pull down to the neighborhood of 
(current) ssh-pull; http-pull is that slow (when the source repository 
isn't packed) because it's entirely sequential, rather than overlapping 
requests like ssh-pull now does.

I should also be able to get ssh-pull down to the area of clone-pack, but 
that's lower-priority, since there's clone-pack.

(I've written an untested patch for local-pull, which I'll be testing, 
cleaning, and submitting tonight, assuming my newly-arrived monitor 
actually works)

   PS:
   With the latest git version as of time of writing this:
   $ time cg-clone git+ssh://[EMAIL PROTECTED]/home/pasky/WWW/dev/git/.g 
 cogito
   ...
   progress: 5759 objects, 10292457 bytes
   $ time cg-clone http://localhost/~pasky/dev/git/.g cogito
   ...
   progress: 8681 objects, 14881571 bytes

I've noticed that ssh connections don't actually disconnect at the end 
with recent versions of ssh sometimes. In my experience, this occasionally 
happens with git, but always happens with scp, suggesting that it's an ssh 
bug of some sort; I've also only noticed this with openssh 3.9_p1 with 
some of Gentoo's -r2 patches.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Git 1.0 Synopis (Draft v4)

2005-08-15 Thread Daniel Barkalow
On Mon, 15 Aug 2005, Junio C Hamano wrote:

 Ryan Anderson [EMAIL PROTECTED] writes:
 
  I was waiting until you said, Ok, 1.00 tomorrow morning
 
 Makes sense.  There would be some weeks until that happens I am
 afraid.

It might be worth putting the list of things left to do before 1.0 in the 
tree (since they clearly covary), and it would be useful to know what 
you're thinking of as preventing the release at any particular stage.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Cloning speed comparison

2005-08-15 Thread Daniel Barkalow
On Mon, 15 Aug 2005, Junio C Hamano wrote:

 Daniel Barkalow [EMAIL PROTECTED] writes:
 
  I should be able to get http-pull down to the neighborhood of 
  (current) ssh-pull; http-pull is that slow (when the source repository 
  isn't packed) because it's entirely sequential, rather than overlapping 
  requests like ssh-pull now does.
 
 I like those prefetch() and process() code in pull.c very much.
 
 I have been wondering if increasing parallelism more by
 prefetching beyond the immediate parents of the current commit,
 in if (get_history) part of process_commit().  Maybe it is not
 worth it because doing a commit, its associated tree(s) and its
 parents would already give us enough parallelism already.

It is actually already maxing out the parallelism; it has a FIFO of 
objects which it needs, and calls prefetch() when it enqueues an object 
and fetch() when it dequeues it. It only cares about the dependancies for 
this purpose, not the types.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Add function to read an index file from an arbitrary filename.

2005-08-15 Thread Daniel Barkalow
Note that the pack file has to be in the usual location if it gets
installed later.

Signed-off-by: Daniel Barkalow [EMAIL PROTECTED]
---

 cache.h |2 ++
 sha1_file.c |   10 --
 2 files changed, 10 insertions(+), 2 deletions(-)

59e5c6d163edae5da6136560d48a4750cceacdc6
diff --git a/cache.h b/cache.h
--- a/cache.h
+++ b/cache.h
@@ -319,6 +319,8 @@ extern int get_ack(int fd, unsigned char
 extern struct ref **get_remote_heads(int in, struct ref **list, int nr_match, 
char **match);
 
 extern struct packed_git *parse_pack_index(unsigned char *sha1);
+extern struct packed_git *parse_pack_index_file(unsigned char *sha1, 
+   char *idx_path);
 
 extern void prepare_packed_git(void);
 extern void install_packed_git(struct packed_git *pack);
diff --git a/sha1_file.c b/sha1_file.c
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -476,12 +476,18 @@ struct packed_git *add_packed_git(char *
 
 struct packed_git *parse_pack_index(unsigned char *sha1)
 {
+   char *path = sha1_pack_index_name(sha1);
+   return parse_pack_index_file(sha1, path);
+}
+
+struct packed_git *parse_pack_index_file(unsigned char *sha1, char *idx_path)
+{
struct packed_git *p;
unsigned long idx_size;
void *idx_map;
-   char *path = sha1_pack_index_name(sha1);
+   char *path;
 
-   if (check_packed_git_idx(path, idx_size, idx_map))
+   if (check_packed_git_idx(idx_path, idx_size, idx_map))
return NULL;
 
path = sha1_pack_name(sha1);

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Support packs in local-pull

2005-08-15 Thread Daniel Barkalow
If it doesn't find an object, it looks for an index that contains it
and uses the same methods on that instead.

Signed-off-by: Daniel Barkalow [EMAIL PROTECTED]
---

 local-pull.c |  112 +++---
 1 files changed, 91 insertions(+), 21 deletions(-)

aafbc7fb9ae059b9c9afa42e8d2c0548ea960f9f
diff --git a/local-pull.c b/local-pull.c
--- a/local-pull.c
+++ b/local-pull.c
@@ -15,34 +15,54 @@ void prefetch(unsigned char *sha1)
 {
 }
 
-int fetch(unsigned char *sha1)
+static struct packed_git *packs = NULL;
+
+void setup_index(unsigned char *sha1)
 {
-   static int object_name_start = -1;
-   static char filename[PATH_MAX];
-   char *hex = sha1_to_hex(sha1);
-   const char *dest_filename = sha1_file_name(sha1);
+   struct packed_git *new_pack;
+   char filename[PATH_MAX];
+   strcpy(filename, path);
+   strcat(filename, /objects/pack/pack-);
+   strcat(filename, sha1_to_hex(sha1));
+   strcat(filename, .idx);
+   new_pack = parse_pack_index_file(sha1, filename);
+   new_pack-next = packs;
+   packs = new_pack;
+}
 
-   if (object_name_start  0) {
-   strcpy(filename, path); /* e.g. git.git */
-   strcat(filename, /objects/);
-   object_name_start = strlen(filename);
+int setup_indices()
+{
+   DIR *dir;
+   struct dirent *de;
+   char filename[PATH_MAX];
+   unsigned char sha1[20];
+   sprintf(filename, %s/objects/pack/, path);
+   dir = opendir(filename);
+   while ((de = readdir(dir)) != NULL) {
+   int namelen = strlen(de-d_name);
+   if (namelen != 50 || 
+   strcmp(de-d_name + namelen - 5, .pack))
+   continue;
+   get_sha1_hex(sha1, de-d_name + 5);
+   setup_index(sha1);
}
-   filename[object_name_start+0] = hex[0];
-   filename[object_name_start+1] = hex[1];
-   filename[object_name_start+2] = '/';
-   strcpy(filename + object_name_start + 3, hex + 2);
+   return 0;
+}
+
+int copy_file(const char *source, const char *dest, const char *hex)
+{
if (use_link) {
-   if (!link(filename, dest_filename)) {
+   if (!link(source, dest)) {
pull_say(link %s\n, hex);
return 0;
}
/* If we got ENOENT there is no point continuing. */
if (errno == ENOENT) {
-   fprintf(stderr, does not exist %s\n, filename);
+   fprintf(stderr, does not exist %s\n, source);
return -1;
}
}
-   if (use_symlink  !symlink(filename, dest_filename)) {
+   if (use_symlink  !symlink(source, dest)) {
pull_say(symlink %s\n, hex);
return 0;
}
@@ -50,25 +70,25 @@ int fetch(unsigned char *sha1)
int ifd, ofd, status;
struct stat st;
void *map;
-   ifd = open(filename, O_RDONLY);
+   ifd = open(source, O_RDONLY);
if (ifd  0 || fstat(ifd, st)  0) {
close(ifd);
-   fprintf(stderr, cannot open %s\n, filename);
+   fprintf(stderr, cannot open %s\n, source);
return -1;
}
map = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, ifd, 0);
close(ifd);
if (map == MAP_FAILED) {
-   fprintf(stderr, cannot mmap %s\n, filename);
+   fprintf(stderr, cannot mmap %s\n, source);
return -1;
}
-   ofd = open(dest_filename, O_WRONLY | O_CREAT | O_EXCL, 0666);
+   ofd = open(dest, O_WRONLY | O_CREAT | O_EXCL, 0666);
status = ((ofd  0) ||
  (write(ofd, map, st.st_size) != st.st_size));
munmap(map, st.st_size);
close(ofd);
if (status)
-   fprintf(stderr, cannot write %s\n, dest_filename);
+   fprintf(stderr, cannot write %s\n, dest);
else
pull_say(copy %s\n, hex);
return status;
@@ -77,6 +97,56 @@ int fetch(unsigned char *sha1)
return -1;
 }
 
+int fetch_pack(unsigned char *sha1)
+{
+   struct packed_git *target;
+   char filename[PATH_MAX];
+   if (setup_indices())
+   return -1;
+   target = find_sha1_pack(sha1, packs);
+   if (!target)
+   return error(Couldn't find %s: not separate or in any pack, 
+sha1_to_hex(sha1));
+   if (get_verbosely) {
+   fprintf(stderr, Getting pack %s\n,
+   sha1_to_hex(target-sha1));
+   fprintf(stderr,  which contains %s\n,
+   sha1_to_hex(sha1

Re: [OT?] git tools at SourceForge ?

2005-08-12 Thread Daniel Barkalow
On Fri, 12 Aug 2005, Wolfgang Denk wrote:

 This is somewhat off topic here, so I apologize, but  I  didn't  know
 any better place to ask:
 
 Has anybody any information if SourceForge is going to provide git  /
 cogito / ... for the projects they host? I asked SF, and they openend
 a new Feature Request (item #1252867); the message I received sounded
 as if I was the first person on the planet to ask...
 
 Am I really alone with this?

The git architecture makes the central server less important, and it's 
easy to run your own. Also, kernel.org is providing space to a set of 
people with a large overlap with git users, since git hasn't been 
particularly publicized and kernel.org is hosting git.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [OT?] git tools at SourceForge ?

2005-08-12 Thread Daniel Barkalow
On Fri, 12 Aug 2005, Linus Torvalds wrote:

 And it's possible that git usage won't expand all that much either. But
 quite frankly, I think git is a lot better than CVS (or even SVN) by now,
 and I wouldn't be surprised if it started getting some use outside of the
 git-only and kernel projects once people start getting more used to it. 
 And so I'd be thrilled to have some site like SF support it.

I certainly think it's going to happen; it's just not surprising that it 
hasn't happened yet. Once there's a stable release and some publicity, I'd 
expect SF to see it as worthwhile. But a hosting site with git-only shell 
access needs to know what the necessary programs are going to be, which we 
haven't committed to quite yet.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Re: git-http-pull broken in latest git

2005-08-11 Thread Daniel Barkalow
On Thu, 11 Aug 2005, Junio C Hamano wrote:

 Petr Baudis [EMAIL PROTECTED] writes:
 
  $ git-cat-file commit bf570303153902ec3d85570ed24515bcf8948848 | grep tree
  tree 41f10531f1799bbb31a1e0f7652363154ce96f45
  $ git-read-tree 41f10531f1799bbb31a1e0f7652363154ce96f45
  fatal: failed to unpack tree object 41f10531f1799bbb31a1e0f7652363154ce96f45
 
  Kaboom. I think the issue might be that the reference dependency tree
  building is broken and it should've pulled the other pack as well.
 
 Last time I checked, git-http-pull did not utilize the pack
 dependency information, which indeed is wrong. 

Is there documentation on the format?

 When it decides to fetch a pack instead of an asked-for object, it 
 should check which commits the pack expects to have in your local 
 repository and add them to its list of things to slurp.

It should work anyway, except that I messed up some logic in the parallel 
pull stuff; when it finds it has something already, it ignores it 
entirely, rather than processing it. The following patch fixes this.
---
[PATCH] Fix parallel pull dependancy tracking.

It didn't refetch an object it already had (good), but didn't process
it, either (bad). Synchronously process anything you already have.

Signed-off-by: Daniel Barkalow [EMAIL PROTECTED]
---

 pull.c |   57 -
 1 files changed, 32 insertions(+), 25 deletions(-)

9b6b4b259c6b00d5b2502c158bc800d7623352bc
diff --git a/pull.c b/pull.c
--- a/pull.c
+++ b/pull.c
@@ -98,12 +98,38 @@ static int process_tag(struct tag *tag)
 static struct object_list *process_queue = NULL;
 static struct object_list **process_queue_end = process_queue;
 
-static int process(unsigned char *sha1, const char *type)
+static int process_object(struct object *obj)
 {
-   struct object *obj;
-   if (has_sha1_file(sha1))
+   if (obj-type == commit_type) {
+   if (process_commit((struct commit *)obj))
+   return -1;
+   return 0;
+   }
+   if (obj-type == tree_type) {
+   if (process_tree((struct tree *)obj))
+   return -1;
return 0;
-   obj = lookup_object_type(sha1, type);
+   }
+   if (obj-type == blob_type) {
+   return 0;
+   }
+   if (obj-type == tag_type) {
+   if (process_tag((struct tag *)obj))
+   return -1;
+   return 0;
+   }
+   return error(Unable to determine requirements 
+of type %s for %s,
+obj-type, sha1_to_hex(obj-sha1));
+}
+
+static int process(unsigned char *sha1, const char *type)
+{
+   struct object *obj = lookup_object_type(sha1, type);
+   if (has_sha1_file(sha1)) {
+   /* We already have it, so we should scan it now. */
+   return process_object(obj);
+   }
if (object_list_contains(process_queue, obj))
return 0;
object_list_insert(obj, process_queue_end);
@@ -134,27 +160,8 @@ static int loop(void)
return -1;
if (!obj-type)
parse_object(obj-sha1);
-   if (obj-type == commit_type) {
-   if (process_commit((struct commit *)obj))
-   return -1;
-   continue;
-   }
-   if (obj-type == tree_type) {
-   if (process_tree((struct tree *)obj))
-   return -1;
-   continue;
-   }
-   if (obj-type == blob_type) {
-   continue;
-   }
-   if (obj-type == tag_type) {
-   if (process_tag((struct tag *)obj))
-   return -1;
-   continue;
-   }
-   return error(Unable to determine requirements 
-of type %s for %s,
-obj-type, sha1_to_hex(obj-sha1));
+   if (process_object(obj))
+   return -1;
}
return 0;
 }
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Re: git-http-pull broken in latest git

2005-08-11 Thread Daniel Barkalow
On Thu, 11 Aug 2005, Junio C Hamano wrote:

 Daniel Barkalow [EMAIL PROTECTED] writes:
 
  It should work anyway,...
 
 That is true.  Please forget about the recommendation to slurp
 packs and not falling back on commit walker.
 
 Thanks for the patch.

No problem; I had been wondering what the rest of those lines were about 
anyway.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bootstrapping into git, commit gripes at me

2005-07-12 Thread Daniel Barkalow
On Mon, 11 Jul 2005, Junio C Hamano wrote:

 Linus Torvalds [EMAIL PROTECTED] writes:
 
  But what about the branch name? Should we just ask the user? Together with 
  a flag, like
 
  git checkout -b new-branch v2.6.12
 
  for somebody who wants to specify the branch name? Or should we pick a 
  random name and add a helper function to rename a branch later?
 
  Opinions?
 
 How about treating master a temporary thing --- whatever I
 happen to be working on right now?

That conflicts with my usage, where I have a single repository for all of
my working directories, with .git/refs and .git/objects being symlinks to 
it, but .git/HEAD being different for each branch. The stuff in objects/
and refs/ really shouldn't depend on what you're currently doing for this
reason.

My way of thinking of master is that it's a real branch, which is for
all of the situations where you aren't using a specially-designated
branch. For many people, they only do stuff that's not designated
specially; Jeff only does stuff that is designated specially. But if you
do both, you'll want master to be left alone while you work on the side
branch.

-Daniel
*This .sig left intentionally blank*

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Support for packs in HTTP

2005-07-11 Thread Daniel Barkalow
On Mon, 11 Jul 2005, Linus Torvalds wrote:

 
 
 On Mon, 11 Jul 2005, Daniel Barkalow wrote:
  On Sun, 10 Jul 2005, Linus Torvalds wrote:
  
   
   You really _mustn't_ try to create the pack directly to the
   $GIT_DIR/objects/pack subdirectory - that would make git itself start
   possibly using that pack before the index is all done, and that would be
   just wrong and nasty.
  
   So you really should _always_ generate the pack somewhere else, and then 
   move it (pack file first, index file second).
  
  It's currently fine ignoring index files without corresponding
  pack files (sha1_file.c, line 470).
 
 That doesn't help.

Well, it means that the order you move them doesn't matter, because it
will ignore the pair if either hasn't been moved.

 Redgardless of which order you write them (and you _will_ write the 
 pack-file first), you'll find that at some point you have both files, but 
 one or the other isn't fully written, ie they are unusable.

(Off topic: note that git-http-pull writes the _index_ first, because it
fetches it to determine if it should fetch the pack)

 And yes, you can handle that by always checking the SHA1 of the files when 
 you open them, but the fact is, you shouldn't need to, just to use it. 
 Checking the SHA1 of the pack-file in particular is very expensive (since 
 it's potentially a huge file, and you don't even want to read all of it).

IIRC, we check the size of the pack file and there are hashes around the
ends of the two files which have to match; but this is a die() check, not
an ignore check, so we just crash with a clear error message rather than
doing crazy stuff (like reading from beyond the end of the mmap).

 So that's what I decided the rule is: never ever have a partial file, and 
 thus you can by definition use them immediately when you see both files.
 
 But that requires that you write them under another name than the final 
 one. And since you want that _anyway_ for other uses, you don't hide that 
 inside git-pack-objects, but you make it an exported interface.

We should never write anything under the final name, anyway, for just this
reason; we already use open/write/close/rename for objects, refs, and
cache (maybe not working directory files, though). I think we're actually
agreeing on this.

My position is that the temporary location should be something like
{final-name}.part, such that it doesn't match *.idx or *.pack beforehand
(so it doesn't look like a complete file that you might want to send to
someone) and it doesn't have to worry about EXDEV on the rename. Also, I
would ideally like to be able to resume an interrupted download, which
means that it would have to find the partial file in a predictable
location, given what it's supposed to contain.

-Daniel
*This .sig left intentionally blank*

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Demo support for packs via HTTP

2005-07-11 Thread Daniel Barkalow
On Mon, 11 Jul 2005, Darrin Thompson wrote:

 On Sun, 2005-07-10 at 15:56 -0400, Daniel Barkalow wrote:
  +   curl_easy_setopt(curl, CURLOPT_FILE, indexfile);
  +   curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, fwrite);
  +   curl_easy_setopt(curl, CURLOPT_URL, url);
 
 I was hoping to send in a patch which would turn on user auth and turn
 off ssl peer verification.
 
 Your (preliminary obviously) patch puts curl handling in two places. Is
 there a place were I can safely start working on adding the needed
 setopts?

If I understand the curl documentation, you should be able to set options 
on the curl object when it has just been created, if those options aren't
going to change between requests. Note that I make requests from multiple
places, but I use the same curl object for all of them.

-Daniel
*This .sig left intentionally blank*

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] Management of packs not yet installed

2005-07-10 Thread Daniel Barkalow
Support for parsing index files without pack files, installing pack
files while running, and checking what pack files are available.

Signed-off-by: Daniel Barkalow [EMAIL PROTECTED]

---
commit b686d7a0377c24e05dbed0dafe909dda6c3dfb48
tree ce285b1a0adb4f8d415f72668a77bc1f1f92e1e1
parent 167a4a3308f4a1606e268c2204c98d6999046ae0
author Daniel Barkalow [EMAIL PROTECTED] 1121024924 -0400
committer Daniel Barkalow [EMAIL PROTECTED](none) 1121024924 -0400

Index: cache.h
===
--- dbae854c7c91182c8a124d0b85d802945d1c6223/cache.h  (mode:100644 
sha1:84d43d366c6145a30865aa65d92ada88ab95bb9f)
+++ ce285b1a0adb4f8d415f72668a77bc1f1f92e1e1/cache.h  (mode:100644 
sha1:719a77dfabb24e58abd21b7f3a4b846a114e000a)
@@ -161,6 +161,8 @@
 extern char *mkpath(const char *fmt, ...);
 extern char *git_path(const char *fmt, ...);
 extern char *sha1_file_name(const unsigned char *sha1);
+extern char *sha1_pack_name(const unsigned char *sha1);
+extern char *sha1_pack_index_name(const unsigned char *sha1);
 
 int safe_create_leading_directories(char *path);
 
@@ -189,6 +191,9 @@
 extern int has_sha1_pack(const unsigned char *sha1);
 extern int has_sha1_file(const unsigned char *sha1);
 
+extern int has_pack_file(const unsigned char *sha1);
+extern int has_pack_index(const unsigned char *sha1);
+
 /* Convert to/from hex/sha1 representation */
 extern int get_sha1(const char *str, unsigned char *sha1);
 extern int get_sha1_hex(const char *hex, unsigned char *sha1);
@@ -260,6 +265,7 @@
void *pack_base;
unsigned int pack_last_used;
unsigned int pack_use_cnt;
+   unsigned char sha1[20];
char pack_name[0]; /* something like .git/objects/pack/x.pack */
 } *packed_git;
 
@@ -274,7 +280,14 @@
 extern int path_match(const char *path, int nr, char **match);
 extern int get_ack(int fd, unsigned char *result_sha1);
 
+extern struct packed_git *parse_pack_index(unsigned char *sha1);
+
 extern void prepare_packed_git(void);
+extern void install_packed_git(struct packed_git *pack);
+
+struct packed_git *find_sha1_pack(const unsigned char *sha1, 
+ struct packed_git *packs);
+
 extern int use_packed_git(struct packed_git *);
 extern void unuse_packed_git(struct packed_git *);
 extern struct packed_git *add_packed_git(char *, int);
Index: sha1_file.c
===
--- dbae854c7c91182c8a124d0b85d802945d1c6223/sha1_file.c  (mode:100644 
sha1:b2914dd2ea629ae974fd4b4c272e77cb04e5c0e0)
+++ ce285b1a0adb4f8d415f72668a77bc1f1f92e1e1/sha1_file.c  (mode:100644 
sha1:27136fdba0fbf2dd943f2634cb49660cdbf95ec4)
@@ -200,6 +200,56 @@
return base;
 }
 
+char *sha1_pack_name(const unsigned char *sha1)
+{
+   static const char hex[] = 0123456789abcdef;
+   static char *name, *base, *buf;
+   int i;
+
+   if (!base) {
+   const char *sha1_file_directory = get_object_directory();
+   int len = strlen(sha1_file_directory);
+   base = xmalloc(len + 60);
+   sprintf(base, 
%s/pack/pack-1234567890123456789012345678901234567890.pack, 
sha1_file_directory);
+   name = base + len + 11;
+   }
+
+   buf = name;
+
+   for (i = 0; i  20; i++) {
+   unsigned int val = *sha1++;
+   *buf++ = hex[val  4];
+   *buf++ = hex[val  0xf];
+   }
+   
+   return base;
+}
+
+char *sha1_pack_index_name(const unsigned char *sha1)
+{
+   static const char hex[] = 0123456789abcdef;
+   static char *name, *base, *buf;
+   int i;
+
+   if (!base) {
+   const char *sha1_file_directory = get_object_directory();
+   int len = strlen(sha1_file_directory);
+   base = xmalloc(len + 60);
+   sprintf(base, 
%s/pack/pack-1234567890123456789012345678901234567890.idx, 
sha1_file_directory);
+   name = base + len + 11;
+   }
+
+   buf = name;
+
+   for (i = 0; i  20; i++) {
+   unsigned int val = *sha1++;
+   *buf++ = hex[val  4];
+   *buf++ = hex[val  0xf];
+   }
+   
+   return base;
+}
+
 struct alternate_object_database *alt_odb;
 
 /*
@@ -360,6 +410,14 @@
 
 int use_packed_git(struct packed_git *p)
 {
+   if (!p-pack_size) {
+   struct stat st;
+   // We created the struct before we had the pack
+   stat(p-pack_name, st);
+   if (!S_ISREG(st.st_mode))
+   die(packfile %s not a regular file, p-pack_name);
+   p-pack_size = st.st_size;
+   }
if (!p-pack_base) {
int fd;
struct stat st;
@@ -387,8 +445,10 @@
 * this is cheap.
 */
if (memcmp((char*)(p-index_base) + p-index_size - 40,
-  p-pack_base + p-pack_size - 20, 20))
+  p

[PATCH 0/2] Support for packs in HTTP

2005-07-10 Thread Daniel Barkalow
This series has one patch which is ready to go in and one that's not
(although it's a reasonable phony for the current state of the git world).

 1: Several additional functions are needed in the library to support
progressively getting pack data from some remote location and using it
to determine what else to get.

 2: git-http-pull can get packs as appropriate by getting all the index
files first, and then using them to figure out whether the object it's
looking for is in some pack it could get.

Currently, there's no sane way to figure out what pack/index files are
available from an HTTP server. But there only seems to be one pack file
available on an HTTP server at the moment, so this tries to get that
one.

-Daniel
*This .sig left intentionally blank*

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] Demo support for packs via HTTP

2005-07-10 Thread Daniel Barkalow
Support for downloading the pack file
e3117bbaf6a59cb53c3f6f0d9b17b9433f0e4135 when appropriate. (Will
support other pack files when the repository has a list of them.)

Signed-off-by: Daniel Barkalow [EMAIL PROTECTED]

---
commit 74132562a2f6cfce9690a5091de7e85bd51d88af
tree c0ae9cb936abac4412aa4a89928f4609d111fd2c
parent b686d7a0377c24e05dbed0dafe909dda6c3dfb48
author Daniel Barkalow [EMAIL PROTECTED] 1121024943 -0400
committer Daniel Barkalow [EMAIL PROTECTED](none) 1121024943 -0400

Index: http-pull.c
===
--- ce285b1a0adb4f8d415f72668a77bc1f1f92e1e1/http-pull.c  (mode:100644 
sha1:1f9d60b9b1d5eed85b24d96c240666bbfc5a22ed)
+++ c0ae9cb936abac4412aa4a89928f4609d111fd2c/http-pull.c  (mode:100644 
sha1:2a8d7e71d9447483668cb4a1eb01a096e736f8e3)
@@ -56,6 +56,126 @@
return size;
 }
 
+static int got_indices = 0;
+
+static struct packed_git *packs = NULL;
+
+static int fetch_index(unsigned char *sha1)
+{
+   char *filename;
+   char *url;
+
+   FILE *indexfile;
+
+   if (has_pack_index(sha1))
+   return 0;
+
+   if (get_verbosely)
+   fprintf(stderr, Getting index for pack %s\n,
+   sha1_to_hex(sha1));
+   
+   url = xmalloc(strlen(base) + 64);
+   sprintf(url, %s/objects/pack/pack-%s.idx,
+   base, sha1_to_hex(sha1));
+   
+   filename = sha1_pack_index_name(sha1);
+   indexfile = fopen(filename, w);
+   if (!indexfile)
+   return error(Unable to open local file %s for pack index,
+filename);
+
+   curl_easy_setopt(curl, CURLOPT_FILE, indexfile);
+   curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, fwrite);
+   curl_easy_setopt(curl, CURLOPT_URL, url);
+   
+   if (curl_easy_perform(curl)) {
+   fclose(indexfile);
+   return error(Unable to get pack index %s, url);
+   }
+
+   fclose(indexfile);
+   return 0;
+}
+
+static int setup_index(unsigned char *sha1)
+{
+   struct packed_git *new_pack;
+   if (has_pack_file(sha1))
+   return 0; // don't list this as something we can get
+
+   if (fetch_index(sha1))
+   return -1;
+
+   new_pack = parse_pack_index(sha1);
+   new_pack-next = packs;
+   packs = new_pack;
+   return 0;
+}
+
+static int fetch_indices(void)
+{
+   unsigned char sha1[20];
+   if (got_indices)
+   return 0;
+   get_sha1_hex(e3117bbaf6a59cb53c3f6f0d9b17b9433f0e4135, sha1);
+   setup_index(sha1);
+   got_indices = 1;
+   return 0;
+}
+
+static int fetch_pack(unsigned char *sha1)
+{
+   char *url;
+   struct packed_git *target;
+   struct packed_git **lst;
+   FILE *packfile;
+   char *filename;
+
+   if (fetch_indices())
+   return -1;
+   target = find_sha1_pack(sha1, packs);
+   if (!target)
+   return error(Couldn't get %s: not separate or in any pack,
+sha1_to_hex(sha1));
+
+   if (get_verbosely) {
+   fprintf(stderr, Getting pack %s\n,
+   sha1_to_hex(target-sha1));
+   fprintf(stderr,  which contains %s\n,
+   sha1_to_hex(sha1));
+   }
+
+   url = xmalloc(strlen(base) + 65);
+   sprintf(url, %s/objects/pack/pack-%s.pack,
+   base, sha1_to_hex(target-sha1));
+
+   filename = sha1_pack_name(target-sha1);
+   packfile = fopen(filename, w);
+   if (!packfile)
+   return error(Unable to open local file %s for pack,
+filename);
+
+   curl_easy_setopt(curl, CURLOPT_FILE, packfile);
+   curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, fwrite);
+   curl_easy_setopt(curl, CURLOPT_URL, url);
+   
+   if (curl_easy_perform(curl)) {
+   fclose(packfile);
+   return error(Unable to get pack file %s, url);
+   }
+
+   fclose(packfile);
+
+   install_packed_git(target);
+
+   lst = packs;
+   while (*lst != target)
+   lst = ((*lst)-next);
+   *lst = (*lst)-next;
+
+   return 0;
+}
+
 int fetch(unsigned char *sha1)
 {
char *hex = sha1_to_hex(sha1);
@@ -67,7 +187,7 @@
local = open(filename, O_WRONLY | O_CREAT | O_EXCL, 0666);
 
if (local  0)
-   return error(Couldn't open %s\n, filename);
+   return error(Couldn't open local object %s\n, filename);
 
memset(stream, 0, sizeof(stream));
 
@@ -75,6 +195,7 @@
 
SHA1_Init(c);
 
+   curl_easy_setopt(curl, CURLOPT_FAILONERROR, 1);
curl_easy_setopt(curl, CURLOPT_FILE, NULL);
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, fwrite_sha1_file);
 
@@ -90,8 +211,12 @@
 
curl_easy_setopt(curl, CURLOPT_URL, url);
 
-   if (curl_easy_perform(curl))
-   return error(Couldn't get %s for %s\n, url, hex

Re: [RFC] Design for http-pull on repo with packs

2005-07-10 Thread Daniel Barkalow
On Sun, 10 Jul 2005, Dan Holmsand wrote:

 Daniel Barkalow wrote:
  I have a design for using http-pull on a packed repository, and it only
  requires one extra file in the repository: an append-only list of the pack
  files (because getting the directory listing is very painful and
  failure-prone).
 
 A few comments (as I've been tinkering with a way to solve the problem 
 myself).
 
 As long as the pack files are named sensibly (i.e. if they are created 
 by git-repack-script), it's not very error-prone to just get the 
 directory listing, and look for matches for pack-sha1.idx. It seems to 
 work quite well (see below). It isn't beautiful in any way, but it works...

I may grab your code for that; the version I just sent seems to be working
except for that.

   If an individual file is not available, figure out what packs are
available:
  
 Get the list of pack files the repository has
  (currently, I just use e3117bbaf6a59cb53c3f6f0d9b17b9433f0e4135)
 For any packs we don't have, get the index files.
 
 This part might be slightly expensive, for large repositories. If one 
 assumes that packs are named as by git-repack-script, however, one might 
 cache indexes we've already seen (again, see below). Or, if you go for 
 the mandatory pack-index-file, require that it has a reliable order, 
 so that you can get the last added index first.

Nothing bad happens if you have index files for pack files you don't have,
as it turns out; the library ignores them. So we can keep the index files
around so we can quickly check if they have the objects we want. That way,
we don't have to worry about skipping something now (because it's not
needed) and then ignoring it when the branch gets merged in.

So what I actually do is make a list of the pack files that aren't already
downloaded that are available from the server, and download the index
files for any where the index file isn't downloaded, either.

 Keep a list of the struct packed_gits for the packs the server has
  (these are not used as places to look for objects)
  
   Each time we need an object, check the list for it. If it is in there,
download the corresponding pack and report success.
 
 Here you will need some strategy to deal with packs that overlap with 
 what we've already got. Basically, small and overlapping packs should be 
 unpacked, big and non-overlapping ones saved as is (since 
 git-unpack-objects is painfully slow and memory-hungry...).

I don't think there's an issue to having overlapping packs, either with
each other or with separate objects. If the user wants, stuff can be
repacked outside of the pull operation (note, though, that the index files
should be truncated rather than removed, so that the program doesn't fetch
them again next time some object can't be found easily).

 One could also optimize the pack-download bit, by figuring out the last 
 object in the pack that we need (easy enough to do from the index file), 
   and just get the part of the pack file leading up to that object. That 
 could be a huge win for independently packed repositories (I don't do 
 that in my code below, though).

That's only possible if you can figure out what you want to have before
you get it. My code is walking the reachability graph on the client; it
can only figure out what other objects it needs after it's mapped the pack
file.

 Anyway, here's my attempt at the same thing. It introduces 
 git-dumb-fetch, with usage like git-fetch-pack (except that it works 
 with http and rsync). And it adds some uglyness to git-cat-file, for 
 figuring out which objects we already have.

I might use that method for listing the available packs, although I'd sort
of like to encourage a clean solution first.

-Daniel
*This .sig left intentionally blank*

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Make --recover cause pull to trace everything

2005-07-10 Thread Daniel Barkalow
Make the --recover flag check the parents of commits which are already
available. This is needed currently to deal with cases where a parent is
pulled along with a commit (in a pack, e.g.) and references above that
parent aren't also pulled together.

Signed-off-by: Daniel Barkalow [EMAIL PROTECTED]
---
commit 75e8c1be7a778e0a0fa119fe1bc408341932e7e5
tree ffbe708117543c356eb2981f1e0540b89b7a95e2
parent a7336ae514738f159dad314d6674961427f043a6
author Daniel Barkalow [EMAIL PROTECTED] 1121024019 -0400
committer Daniel Barkalow [EMAIL PROTECTED](none) 1121024019 -0400

Index: http-pull.c
===
--- 248f72f3e4dcb40693488b0c06f93d0b38122b8e/http-pull.c  (mode:100644 
sha1:1f9d60b9b1d5eed85b24d96c240666bbfc5a22ed)
+++ ffbe708117543c356eb2981f1e0540b89b7a95e2/http-pull.c  (mode:100644 
sha1:3fa56f08b0b8e7316afcaab3a7bfa3f2d26b550f)
@@ -146,7 +146,10 @@
int arg = 1;
 
while (arg  argc  argv[arg][0] == '-') {
-   if (argv[arg][1] == 't') {
+   if (argv[arg][1] == '-') {
+   if (!strcmp(argv[arg] + 2, recover))
+   careful = 1;
+   } else if (argv[arg][1] == 't') {
get_tree = 1;
} else if (argv[arg][1] == 'c') {
get_history = 1;
Index: local-pull.c
===
--- 248f72f3e4dcb40693488b0c06f93d0b38122b8e/local-pull.c  (mode:100644 
sha1:2f06fbee8b840a7ae642f5a22e2cb993687f3470)
+++ ffbe708117543c356eb2981f1e0540b89b7a95e2/local-pull.c  (mode:100644 
sha1:0d10c07844030bc7cb615cf916dce89592151be7)
@@ -116,7 +116,10 @@
int arg = 1;
 
while (arg  argc  argv[arg][0] == '-') {
-   if (argv[arg][1] == 't')
+   if (argv[arg][1] == '-') {
+   if (!strcmp(argv[arg] + 2, recover))
+   careful = 1;
+   } else if (argv[arg][1] == 't')
get_tree = 1;
else if (argv[arg][1] == 'c')
get_history = 1;
Index: pull.c
===
--- 248f72f3e4dcb40693488b0c06f93d0b38122b8e/pull.c  (mode:100644 
sha1:ed3078e3b27c62c07558fd94f339801cbd685593)
+++ ffbe708117543c356eb2981f1e0540b89b7a95e2/pull.c  (mode:100644 
sha1:d9763840c7ebcb1e5838c3b960695cafcca3ac73)
@@ -11,6 +11,7 @@
 
 const unsigned char *current_ref = NULL;
 
+int careful = 0;
 int get_tree = 0;
 int get_history = 0;
 int get_all = 0;
@@ -91,7 +92,8 @@
if (get_history) {
struct commit_list *parents = obj-parents;
for (; parents; parents = parents-next) {
-   if (has_sha1_file(parents-item-object.sha1))
+   if (!careful 
+   has_sha1_file(parents-item-object.sha1))
continue;
if (make_sure_we_have_it(NULL,
 parents-item-object.sha1)) {
Index: pull.h
===
--- 248f72f3e4dcb40693488b0c06f93d0b38122b8e/pull.h  (mode:100644 
sha1:e173ae3337c4465da87d849f4e5c9da203fdf01d)
+++ ffbe708117543c356eb2981f1e0540b89b7a95e2/pull.h  (mode:100644 
sha1:d1076468b71b31dd5e59ec55d98de830cf9df60e)
@@ -21,6 +21,12 @@
 /* If set, the hash that the current value of write_ref must be. */
 extern const unsigned char *current_ref;
 
+/* 
+ * Set to check on everything, instead of stopping at points where we think
+ * we must have everything.
+ */
+extern int careful;
+
 /* Set to fetch the target tree. */
 extern int get_tree;
 
Index: ssh-pull.c
===
--- 248f72f3e4dcb40693488b0c06f93d0b38122b8e/ssh-pull.c  (mode:100644 
sha1:26356dd7d84ea1bc9f7320b18562ed4117d4fac0)
+++ ffbe708117543c356eb2981f1e0540b89b7a95e2/ssh-pull.c  (mode:100644 
sha1:7ca4243f3bd84590e7bb94467fd5acccd7d4d6f9)
@@ -61,7 +61,10 @@
const char *prog = getenv(GIT_SSH_PUSH) ? : git-ssh-push;
 
while (arg  argc  argv[arg][0] == '-') {
-   if (argv[arg][1] == 't') {
+   if (argv[arg][1] == '-') {
+   if (!strcmp(argv[arg] + 2, recover))
+   careful = 1;
+   } else if (argv[arg][1] == 't') {
get_tree = 1;
} else if (argv[arg][1] == 'c') {
get_history = 1;

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] Remove map_sha1_file

2005-07-10 Thread Daniel Barkalow
Remove map_sha1_file(), now unused.

Signed-off-by: Daniel Barkalow [EMAIL PROTECTED]
---
commit c21a02262f770a25b005378e06354e582aa1bfd8
tree 7ac9fabe666f00f37572e7b349fdb859bf8a6491
parent 264ff9f3dcde5553728b34fa08e04643b2b55946
author Daniel Barkalow [EMAIL PROTECTED] 1121033599 -0400
committer Daniel Barkalow [EMAIL PROTECTED](none) 1121033599 -0400

Index: cache.h
===
--- 353fe33ae9c7265d7b685bca864d657e3efe2849/cache.h  (mode:100644 
sha1:38dac6d6a413f1c788e5331ef4741fc15d72d9bd)
+++ 7ac9fabe666f00f37572e7b349fdb859bf8a6491/cache.h  (mode:100644 
sha1:11ba95c8aa9202fa3b1a3cbc07bc976641cd1908)
@@ -167,7 +167,6 @@
 int safe_create_leading_directories(char *path);
 
 /* Read and unpack a sha1 file into memory, write memory to a sha1 file */
-extern void * map_sha1_file(const unsigned char *sha1, unsigned long *size);
 extern int unpack_sha1_header(z_stream *stream, void *map, unsigned long 
mapsize, void *buffer, unsigned long size);
 extern int parse_sha1_header(char *hdr, char *type, unsigned long *sizep);
 extern int sha1_object_info(const unsigned char *, char *, unsigned long *);
Index: sha1_file.c
===
--- 353fe33ae9c7265d7b685bca864d657e3efe2849/sha1_file.c  (mode:100644 
sha1:08560b2c7a6dff400a46160501c247081f9bb4c7)
+++ 7ac9fabe666f00f37572e7b349fdb859bf8a6491/sha1_file.c  (mode:100644 
sha1:e082f2e6cb985caca11979311c291aa51d6c37fd)
@@ -578,8 +578,7 @@
 }
 
 static void *map_sha1_file_internal(const unsigned char *sha1,
-   unsigned long *size,
-   int say_error)
+   unsigned long *size)
 {
struct stat st;
void *map;
@@ -587,8 +586,6 @@
char *filename = find_sha1_file(sha1, st);
 
if (!filename) {
-   if (say_error)
-   error(cannot map sha1 file %s, sha1_to_hex(sha1));
return NULL;
}
 
@@ -602,8 +599,6 @@
break;
/* Fallthrough */
case 0:
-   if (say_error)
-   perror(filename);
return NULL;
}
 
@@ -620,11 +615,6 @@
return map;
 }
 
-void *map_sha1_file(const unsigned char *sha1, unsigned long *size)
-{
-   return map_sha1_file_internal(sha1, size, 1);
-}
-
 int unpack_sha1_header(z_stream *stream, void *map, unsigned long mapsize, 
void *buffer, unsigned long size)
 {
/* Get the data stream */
@@ -1112,7 +1102,7 @@
z_stream stream;
char hdr[128];
 
-   map = map_sha1_file_internal(sha1, mapsize, 0);
+   map = map_sha1_file_internal(sha1, mapsize);
if (!map) {
struct pack_entry e;
 
@@ -1151,7 +1141,7 @@
unsigned long mapsize;
void *map, *buf;
 
-   map = map_sha1_file_internal(sha1, mapsize, 0);
+   map = map_sha1_file_internal(sha1, mapsize);
if (map) {
buf = unpack_sha1_file(map, mapsize, type, size);
munmap(map, mapsize);
@@ -1331,7 +1321,7 @@
ssize_t size;
unsigned long objsize;
int posn = 0;
-   char *buf = map_sha1_file_internal(sha1, objsize, 0);
+   char *buf = map_sha1_file_internal(sha1, objsize);
z_stream stream;
if (!buf) {
unsigned char *unpacked;

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] write_sha1_to_fd()

2005-07-10 Thread Daniel Barkalow
Add write_sha1_to_fd(), which writes an object to a file descriptor. This
includes support for unpacking it and recompressing it.

Signed-off-by: Daniel Barkalow [EMAIL PROTECTED]
---
commit 264ff9f3dcde5553728b34fa08e04643b2b55946
tree 353fe33ae9c7265d7b685bca864d657e3efe2849
parent c3eb461762b1d65e424fc4ede6a1d4f3e0a679f7
author Daniel Barkalow [EMAIL PROTECTED] 1121033477 -0400
committer Daniel Barkalow [EMAIL PROTECTED](none) 1121033477 -0400

Index: cache.h
===
--- 545ef8191b517b7f9e4ea558edaf526038ed1895/cache.h  (mode:100644 
sha1:719a77dfabb24e58abd21b7f3a4b846a114e000a)
+++ 353fe33ae9c7265d7b685bca864d657e3efe2849/cache.h  (mode:100644 
sha1:38dac6d6a413f1c788e5331ef4741fc15d72d9bd)
@@ -187,6 +187,7 @@
 extern int read_tree(void *buffer, unsigned long size, int stage);
 
 extern int write_sha1_from_fd(const unsigned char *sha1, int fd);
+extern int write_sha1_to_fd(int fd, const unsigned char *sha1);
 
 extern int has_sha1_pack(const unsigned char *sha1);
 extern int has_sha1_file(const unsigned char *sha1);
Index: sha1_file.c
===
--- 545ef8191b517b7f9e4ea558edaf526038ed1895/sha1_file.c  (mode:100644 
sha1:27136fdba0fbf2dd943f2634cb49660cdbf95ec4)
+++ 353fe33ae9c7265d7b685bca864d657e3efe2849/sha1_file.c  (mode:100644 
sha1:08560b2c7a6dff400a46160501c247081f9bb4c7)
@@ -1326,6 +1326,65 @@
return 0;
 }
 
+int write_sha1_to_fd(int fd, const unsigned char *sha1)
+{
+   ssize_t size;
+   unsigned long objsize;
+   int posn = 0;
+   char *buf = map_sha1_file_internal(sha1, objsize, 0);
+   z_stream stream;
+   if (!buf) {
+   unsigned char *unpacked;
+   unsigned long len;
+   char type[20];
+   char hdr[50];
+   int hdrlen;
+   // need to unpack and recompress it by itself
+   unpacked = read_packed_sha1(sha1, type, len);
+
+   hdrlen = sprintf(hdr, %s %lu, type, len) + 1;
+
+   /* Set it up */
+   memset(stream, 0, sizeof(stream));
+   deflateInit(stream, Z_BEST_COMPRESSION);
+   size = deflateBound(stream, len + hdrlen);
+   buf = xmalloc(size);
+
+   /* Compress it */
+   stream.next_out = buf;
+   stream.avail_out = size;
+   
+   /* First header.. */
+   stream.next_in = hdr;
+   stream.avail_in = hdrlen;
+   while (deflate(stream, 0) == Z_OK)
+   /* nothing */;
+
+   /* Then the data itself.. */
+   stream.next_in = unpacked;
+   stream.avail_in = len;
+   while (deflate(stream, Z_FINISH) == Z_OK)
+   /* nothing */;
+   deflateEnd(stream);
+   
+   objsize = stream.total_out;
+   }
+
+   do {
+   size = write(fd, buf + posn, objsize - posn);
+   if (size = 0) {
+   if (!size) {
+   fprintf(stderr, write closed);
+   } else {
+   perror(write );
+   }
+   return -1;
+   }
+   posn += size;
+   } while (posn  objsize);
+   return 0;
+}
+
 int write_sha1_from_fd(const unsigned char *sha1, int fd)
 {
char *filename = sha1_file_name(sha1);
Index: ssh-push.c
===
--- 545ef8191b517b7f9e4ea558edaf526038ed1895/ssh-push.c  (mode:100644 
sha1:090d6f9f8fbde2d736ac5bf563415b0fa402b5aa)
+++ 353fe33ae9c7265d7b685bca864d657e3efe2849/ssh-push.c  (mode:100644 
sha1:aac70af514e0dc5507fa4997ebad54352c973215)
@@ -7,13 +7,13 @@
 static unsigned char local_version = 1;
 static unsigned char remote_version = 0;
 
+static int verbose = 0;
+
 static int serve_object(int fd_in, int fd_out) {
ssize_t size;
-   int posn = 0;
unsigned char sha1[20];
-   unsigned long objsize;
-   void *buf;
signed char remote;
+   int posn = 0;
do {
size = read(fd_in, sha1 + posn, 20 - posn);
if (size  0) {
@@ -25,12 +25,12 @@
posn += size;
} while (posn  20);

-   /* fprintf(stderr, Serving %s\n, sha1_to_hex(sha1)); */
+   if (verbose)
+   fprintf(stderr, Serving %s\n, sha1_to_hex(sha1));
+
remote = 0;

-   buf = map_sha1_file(sha1, objsize);
-   
-   if (!buf) {
+   if (!has_sha1_file(sha1)) {
fprintf(stderr, git-ssh-push: could not find %s\n, 
sha1_to_hex(sha1));
remote = -1;
@@ -41,20 +41,7 @@
if (remote  0)
return 0;

-   posn = 0;
-   do {
-   size = write(fd_out, buf + posn

[PATCH] Better error message from git-ssh-push

2005-07-04 Thread Daniel Barkalow
If git-ssh-push can't interpret the commit-id, there are various possible
issues. Just giving the usage message makes it hard to identify what could
be wrong.

Signed-off-by: Daniel Barkalow [EMAIL PROTECTED]

---
commit 7a274ce1f93e6092dcf226d546a58d2d6df9d13c
tree 1f045fa8aa017cabbac613cf8c1ea2bd63ccc46c
parent 8934c88118c900fe38abbf60f893ee9ef4e83b3c
author Daniel Barkalow [EMAIL PROTECTED] 1120507167 -0400
committer Daniel Barkalow [EMAIL PROTECTED](none) 1120507167 -0400

Index: ssh-push.c
===
--- 62a74516551505e5fd2b5c2fd14486f3ac8a400e/ssh-push.c  (mode:100644 
sha1:10390948efacfa06f4f6fc6b2f3631cec6fcb876)
+++ 1f045fa8aa017cabbac613cf8c1ea2bd63ccc46c/ssh-push.c  (mode:100644 
sha1:6b1406b527ba6ede8602a04ab031003edb7da2b0)
@@ -257,8 +257,12 @@
usage(ssh_push_usage);
commit_id = argv[arg];
url = argv[arg + 1];
-   if (get_sha1(commit_id, sha1))
-   usage(ssh_push_usage);
+   if (get_sha1(commit_id, sha1)) {
+   fprintf(stderr, 
+   Unable to interpret %s as something to push.\n,
+   commit_id);
+   return 1;
+   }
memcpy(hex, sha1_to_hex(sha1), sizeof(hex));
argv[arg] = hex;
 

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Last mile for 1.0 again

2005-07-04 Thread Daniel Barkalow
On Mon, 4 Jul 2005, Linus Torvalds wrote:

 On Mon, 4 Jul 2005, Daniel Barkalow wrote:
  
  How about an option to git-rev-list to take a path, and (1) exclude any
  branch where the version at that path ends up ignored in a merge and
  (2) not list any revision where the version at that path is identical to a
  parent?
 
 Hmm. How is that different from git-whatchanged path, really?

It would short-circuit going up areas of the history which don't
contribute (i.e., lead up to a merge which took its version from a
different parent). It could also stop when it ran out of branches that
have the file at all. Neither of these is all that significant, I guess.

Junio: what's missing from annotate/blame?

-Daniel
*This .sig left intentionally blank*

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Last mile for 1.0 again

2005-07-04 Thread Daniel Barkalow
On Mon, 4 Jul 2005, Junio C Hamano wrote:

  DB == Daniel Barkalow [EMAIL PROTECTED] writes:
 
 DB Junio: what's missing from annotate/blame?
 
 Which one are you talking about?
 
 What I use to generate http://members.cox.net/junkio/Summary.txt
 is an implementation of an algorithm I consider complete in
 that it does rename/copy and complete rewrite correctly.  What
 is missing from the implementation is efficiency.

[perl script]

 How does this work, and what do we do about merges?

I've got that part, but I'm not clear on how the rename/copy and complete
rewrite stuff works.

-Daniel
*This .sig left intentionally blank*

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [3/5] Add http-pull

2005-04-23 Thread Daniel Barkalow
On Sat, 23 Apr 2005, Petr Baudis wrote:

 Dear diary, on Sat, Apr 23, 2005 at 01:00:33AM CEST, I got a letter
 where Daniel Barkalow [EMAIL PROTECTED] told me that...
  On Sat, 23 Apr 2005, Petr Baudis wrote:
  
   Dear diary, on Fri, Apr 22, 2005 at 09:46:35PM CEST, I got a letter
   where Daniel Barkalow [EMAIL PROTECTED] told me that...
   
   Huh. Why? You just go back to history until you find a commit you
   already have. If you did it the way as Tony described, if you have that
   commit, you can be sure that you have everything it depends on too.
  
  But if you download 1000 files of the 1010 you need, and then your network
  goes down, you will need to download those 1000 again when it comes back,
  because you can't save them unless you have the full history. 
 
 Why can't I? I think I can do that perfectly fine. The worst thing that
 can happen is that fsck-cache will complain a bit.

Not if you're using the fact that you don't have them to tell you that you
still need the other 10, which is what tony's scheme would do.

-Daniel
*This .sig left intentionally blank*

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Change pull to _only_ download, and git update=pull+merge?

2005-04-19 Thread Daniel Barkalow
On Tue, 19 Apr 2005, Petr Baudis wrote:

 I disagree. This already forces you to have two branches (one to pull
 from to get the data, mirroring the remote branch, one for your real
 work) uselessly and needlessly.

If you pull in a non-tracked tree, it certainly won't apply the
changes, so you can just have your local tree and pull other people's
trees as desired.

 I think there is just no good name for what pull is doing now, and
 update seems like a great name for what pull-and-merge really is. Pull
 really is pull - it _pulls_ the data, while update also updates the
 given tree. No surprises.

I'm actually getting suspicious that the right thing is to hide pull in
the id scheme. That is, instead of saying linus to refer to the
linus head that you currently have, you say +linus to refer to the
head Linus has on his server currently, and this will cause you to
download anything necessary to perform the operation with the resulting
value.

See, I don't think you ever want to just pull. You want to
pull-and-do-something, but the something could be any operation that uses
a commit, not necessarily update. So you could do git diff -r +linus to
compare your head against current linus. You'd want git update to take a
working directory from linus to +linus (just because you know Linus's
more recent head doesn't mean you're automatically using it). You could
just git merge +linus in your working directory to sync with Linus. Even
git log +linus to see his recent changes.

I think the only reason not to just make any reference to a head pull it
is performance on looking up the head; you don't really want to hammer the
server getting these 40-byte files constantly or wait for a connection
every time (not to mention the possibility of not being able to
connect). But there's no reason to want to not have the latest data, since
the older data doesn't go away.

-Daniel
*This .sig left intentionally blank*

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [0/5] Parsers for git objects, porting some programs

2005-04-18 Thread Daniel Barkalow
On Mon, 18 Apr 2005, Junio C Hamano wrote:

 I was looking at the tree part and am thinking that it would
 make it much nicer if your tree object records path for each
 entry. 

You're entirely right, and I've actually now written the code that does
it. I'm planning to send out a patch for that shortly.

 Currently it just borrows from object.refs to represent
 its children

Note that object.refs needs to get filled out for those
applications, even if the information is also included in the
parse; object.refs is for finding what you can reach without worrying
about how you do it.

-Daniel
*This .sig left intentionally blank*

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


More patches

2005-04-18 Thread Daniel Barkalow
Here are the things I was saving for after the previous set:

 1: Report the actual contents of trees
 2: Add functions for scanning history by date
 3: Add http-pull, a program to fetch the objects you need by HTTP
 4: Change merge-base to find the most recent common ancestor

1 and 2 are core extensions. 3 might be best for the pasky tree. 4 is
mostly a demo of 2 and because Linus thought it was a better algorithm.

-Daniel
*This .sig left intentionally blank*

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[1/4] Report info from trees

2005-04-18 Thread Daniel Barkalow
This patch adds actual information to struct tree, making it possible to
tell what sorts of things the referenced objects are. This is needed for
http-pull, and Junio wanted something of the sort.

Signed-Off-By: Daniel Barkalow [EMAIL PROTECTED]
Index: tree.c
===
--- 1172a9b8f45b2fd640985595cc5258db3b027828/tree.c  (mode:100644 
sha1:7c5e5e46f4967b0812b06c0114946c3a6432c8d8)
+++ 7e5a0d93117ecadfb15de3a6bebdb1aa94234fde/tree.c  (mode:100644 
sha1:39f9cbd1908e9046c148339f816025c9313ec142)
@@ -27,6 +27,7 @@
char type[20];
void *buffer, *bufptr;
unsigned long size;
+   struct tree_entry_list **list_p;
if (item-object.parsed)
return 0;
item-object.parsed = 1;
@@ -38,8 +39,10 @@
if (strcmp(type, tree_type))
return error(Object %s not a tree,
 sha1_to_hex(item-object.sha1));
+   list_p = item-entries;
while (size) {
struct object *obj;
+   struct tree_entry_list *entry;
int len = 1+strlen(bufptr);
unsigned char *file_sha1 = bufptr + len;
char *path = strchr(bufptr, ' ');
@@ -48,6 +51,11 @@
sscanf(bufptr, %o, mode) != 1)
return -1;
 
+   entry = malloc(sizeof(struct tree_entry_list));
+   entry-directory = S_ISDIR(mode);
+   entry-executable = mode  S_IXUSR;
+   entry-next = NULL;
+
/* Warn about trees that don't do the recursive thing.. */
if (strchr(path, '/')) {
item-has_full_path = 1;
@@ -56,12 +64,17 @@
bufptr += len + 20;
size -= len + 20;
 
-   if (S_ISDIR(mode)) {
-   obj = lookup_tree(file_sha1)-object;
+   if (entry-directory) {
+   entry-item.tree = lookup_tree(file_sha1);
+   obj = entry-item.tree-object;
} else {
-   obj = lookup_blob(file_sha1)-object;
+   entry-item.blob = lookup_blob(file_sha1);
+   obj = entry-item.blob-object;
}
add_ref(item-object, obj);
+
+   *list_p = entry;
+   list_p = entry-next;
}
return 0;
 }
Index: tree.h
===
--- 1172a9b8f45b2fd640985595cc5258db3b027828/tree.h  (mode:100644 
sha1:14ebbacded09d5e058c7f94652dcb9e12bc31cae)
+++ 7e5a0d93117ecadfb15de3a6bebdb1aa94234fde/tree.h  (mode:100644 
sha1:985500e2a9130fe8c33134ca121838af9320c465)
@@ -5,9 +5,20 @@
 
 extern const char *tree_type;
 
+struct tree_entry_list {
+   struct tree_entry_list *next;
+   unsigned directory : 1;
+   unsigned executable : 1;
+   union {
+   struct tree *tree;
+   struct blob *blob;
+   } item;
+};
+
 struct tree {
struct object object;
unsigned has_full_path : 1;
+   struct tree_entry_list *entries;
 };
 
 struct tree *lookup_tree(unsigned char *sha1);

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[2/4] Sorting commits by date

2005-04-18 Thread Daniel Barkalow
Functions for a date-ordered queue of commits, progressively pulled out of
the history incrementally. Linus wanted this for finding the most recent
common ancestor, and it might be relevant to logging.

Signed-Off-By: Daniel Barkalow [EMAIL PROTECTED]
Index: commit.c
===
--- b3cf8daf9b619ae9f06a28f42a4ae01b69729206/commit.c  (mode:100644 
sha1:0099baa63971d86ee30ef2a7da25057f0f45a964)
+++ 7e5a0d93117ecadfb15de3a6bebdb1aa94234fde/commit.c  (mode:100644 
sha1:ef9af397471817837e1799d72f6707e0ccc949b9)
@@ -83,3 +83,47 @@
free(temp);
}
 }
+
+static void insert_by_date(struct commit_list **list, struct commit *item)
+{
+   struct commit_list **pp = list;
+   struct commit_list *p;
+   while ((p = *pp) != NULL) {
+   if (p-item-date  item-date) {
+   break;
+   }
+   pp = p-next;
+   }
+   struct commit_list *insert = malloc(sizeof(struct commit_list));
+   insert-next = *pp;
+   *pp = insert;
+   insert-item = item;
+}
+
+   
+void sort_by_date(struct commit_list **list)
+{
+   struct commit_list *ret = NULL;
+   while (*list) {
+   insert_by_date(ret, (*list)-item);
+   *list = (*list)-next;
+   }
+   *list = ret;
+}
+
+struct commit *pop_most_recent_commit(struct commit_list **list)
+{
+   struct commit *ret = (*list)-item;
+   struct commit_list *parents = ret-parents;
+   struct commit_list *old = *list;
+
+   *list = (*list)-next;
+   free(old);
+
+   while (parents) {
+   parse_commit(parents-item);
+   insert_by_date(list, parents-item);
+   parents = parents-next;
+   }
+   return ret;
+}
Index: commit.h
===
--- b3cf8daf9b619ae9f06a28f42a4ae01b69729206/commit.h  (mode:100644 
sha1:8cd20b046875f5f7e534b0607fdd97f330f53272)
+++ 7e5a0d93117ecadfb15de3a6bebdb1aa94234fde/commit.h  (mode:100644 
sha1:35679482132ae5a6b7d72bbb684f21472470717c)
@@ -24,4 +24,8 @@
 
 void free_commit_list(struct commit_list *list);
 
+void sort_by_date(struct commit_list **list);
+
+struct commit *pop_most_recent_commit(struct commit_list **list);
+
 #endif /* COMMIT_H */

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[3/4] Add http-pull

2005-04-18 Thread Daniel Barkalow
This adds a command to pull a commit and dependant objects from an HTTP
server.

Signed-Off-By: Daniel Barkalow [EMAIL PROTECTED]
Index: Makefile
===
--- 50afb5dd4184842d8da1da8dcb9ca6a591dfc5b0/Makefile  (mode:100644 
sha1:803f1d49c436efa570d779db6d350efbceb29ddd)
+++ f7f62e0d2a822ad0937fd98a826f65ac7f938217/Makefile  (mode:100644 
sha1:a3d26213c085e8b6bbc1ec352df0996e558e7c38)
@@ -15,7 +15,7 @@
 
 PROG=   update-cache show-diff init-db write-tree read-tree commit-tree \
cat-file fsck-cache checkout-cache diff-tree rev-tree show-files \
-   check-files ls-tree merge-base merge-cache unpack-file
+   check-files ls-tree merge-base merge-cache unpack-file http-pull
 
 all: $(PROG)
 
@@ -81,6 +81,11 @@
 unpack-file: unpack-file.o $(LIB_FILE)
$(CC) $(CFLAGS) -o unpack-file unpack-file.o $(LIBS)
 
+http-pull: LIBS += -lcurl
+
+http-pull: http-pull.o $(LIB_FILE)
+   $(CC) $(CFLAGS) -o http-pull http-pull.o $(LIBS)
+
 blob.o: $(LIB_H)
 cat-file.o: $(LIB_H)
 check-files.o: $(LIB_H)
@@ -105,6 +110,7 @@
 usage.o: $(LIB_H)
 unpack-file.o: $(LIB_H)
 write-tree.o: $(LIB_H)
+http-pull.o: $(LIB_H)
 
 clean:
rm -f *.o $(PROG) $(LIB_FILE)
Index: http-pull.c
===
--- /dev/null  (tree:50afb5dd4184842d8da1da8dcb9ca6a591dfc5b0)
+++ f7f62e0d2a822ad0937fd98a826f65ac7f938217/http-pull.c  (mode:100644 
sha1:bd251f9e0748784bbd2cd5cf720f126d852fe888)
@@ -0,0 +1,170 @@
+#include fcntl.h
+#include unistd.h
+#include string.h
+#include stdlib.h
+#include cache.h
+#include commit.h
+#include errno.h
+#include stdio.h
+
+#include curl/curl.h
+#include curl/easy.h
+
+static CURL *curl;
+
+static char *base;
+
+static int tree = 0;
+static int commits = 0;
+static int all = 0;
+
+static int has(unsigned char *sha1)
+{
+   char *filename = sha1_file_name(sha1);
+   struct stat st;
+
+   if (!stat(filename, st))
+   return 1;
+   return 0;
+}
+
+static int fetch(unsigned char *sha1)
+{
+   char *hex = sha1_to_hex(sha1);
+   char *filename = sha1_file_name(sha1);
+
+   char *url;
+   char *posn;
+   FILE *local;
+   struct stat st;
+
+   if (!stat(filename, st)) {
+   return 0;
+   }
+
+   local = fopen(filename, w);
+
+   if (!local)
+   return error(Couldn't open %s\n, filename);
+
+   curl_easy_setopt(curl, CURLOPT_FILE, local);
+
+   url = malloc(strlen(base) + 50);
+   strcpy(url, base);
+   posn = url + strlen(base);
+   strcpy(posn, objects/);
+   posn += 8;
+   memcpy(posn, hex, 2);
+   posn += 2;
+   *(posn++) = '/';
+   strcpy(posn, hex + 2);
+
+   curl_easy_setopt(curl, CURLOPT_URL, url);
+
+   printf(Getting %s\n, hex);
+
+   if (curl_easy_perform(curl))
+   return error(Couldn't get %s for %s\n, url, hex);
+
+   fclose(local);
+   
+   return 0;
+}
+
+static int process_tree(unsigned char *sha1)
+{
+   struct tree *tree = lookup_tree(sha1);
+   struct tree_entry_list *entries;
+
+   if (parse_tree(tree))
+   return -1;
+
+   for (entries = tree-entries; entries; entries = entries-next) {
+   if (fetch(entries-item.tree-object.sha1))
+   return -1;
+   if (entries-directory) {
+   if (process_tree(entries-item.tree-object.sha1))
+   return -1;
+   }
+   }
+   return 0;
+}
+
+static int process_commit(unsigned char *sha1)
+{
+   struct commit *obj = lookup_commit(sha1);
+
+   if (fetch(sha1))
+   return -1;
+
+   if (parse_commit(obj))
+   return -1;
+
+   if (tree) {
+   if (fetch(obj-tree-object.sha1))
+   return -1;
+   if (process_tree(obj-tree-object.sha1))
+   return -1;
+   if (!all)
+   tree = 0;
+   }
+   if (commits) {
+   struct commit_list *parents = obj-parents;
+   for (; parents; parents = parents-next) {
+   if (has(parents-item-object.sha1))
+   continue;
+   if (fetch(parents-item-object.sha1)) {
+   /* The server might not have it, and
+* we don't mind. 
+*/
+   continue;
+   }
+   if (process_commit(parents-item-object.sha1))
+   return -1;
+   }
+   }
+   return 0;
+}
+
+int main(int argc, char **argv)
+{
+   char *commit_id;
+   char *url;
+   int arg = 1;
+   unsigned char sha1[20];
+
+   while (arg  argc  argv[arg][0] == '-') {
+   if (argv[arg][1] == 't

[2/5] Add merge-base

2005-04-17 Thread Daniel Barkalow
merge-base finds one of the best common ancestors of a pair of commits. In
particular, it finds one of the ones which is fewest commits away from the
further of the heads.

Signed-Off-By: Daniel Barkalow [EMAIL PROTECTED]
Index: Makefile
===
--- 37a0b01b85c2999243674d48bfc71cdba0e5518e/Makefile  (mode:100644 
sha1:346e3850de026485802e41e16a1180be2df85e4a)
+++ d662b707e11391f6cfe597fd4d0bf9c41d34d01a/Makefile  (mode:100644 
sha1:b2ce7c5b63fffca59653b980d98379909f893d44)
@@ -14,7 +14,7 @@
 
 PROG=   update-cache show-diff init-db write-tree read-tree commit-tree \
cat-file fsck-cache checkout-cache diff-tree rev-tree show-files \
-   check-files ls-tree
+   check-files ls-tree merge-base
 
 SCRIPT=parent-id tree-id git gitXnormid.sh gitadd.sh gitaddremote.sh \
gitcommit.sh gitdiff-do gitdiff.sh gitlog.sh gitls.sh gitlsobj.sh \
Index: merge-base.c
===
--- /dev/null  (tree:37a0b01b85c2999243674d48bfc71cdba0e5518e)
+++ d662b707e11391f6cfe597fd4d0bf9c41d34d01a/merge-base.c  (mode:100644 
sha1:0f85e7d9e9a896d1142a54170ddf1159f11f9cdd)
@@ -0,0 +1,108 @@
+#include stdlib.h
+#include cache.h
+#include revision.h
+
+struct revision *common_ancestor(struct revision *rev1, struct revision *rev2)
+{
+   struct parent *parent;
+
+   struct parent *rev1list = malloc(sizeof(struct parent));
+   struct parent *rev2list = malloc(sizeof(struct parent));
+
+   struct parent *posn, *temp;
+
+   rev1list-parent = rev1;
+   rev1list-next = NULL;
+
+   rev2list-parent = rev2;
+   rev2list-next = NULL;
+
+   while (rev1list || rev2list) {
+   posn = rev1list;
+   rev1list = NULL;
+   while (posn) {
+   parse_commit_object(posn-parent);
+   if (posn-parent-flags  0x0001) {
+   /*
+   printf(1 already seen %s %x\n,
+  sha1_to_hex(posn-parent-sha1),
+  posn-parent-flags);
+   */
+// do nothing
+   } else if (posn-parent-flags  0x0002) {
+//  free lists
+   return posn-parent;
+   } else {
+   /*
+   printf(1 based on %s\n,
+  sha1_to_hex(posn-parent-sha1));
+   */
+   posn-parent-flags |= 0x0001;
+
+   parent = posn-parent-parent;
+   while (parent) {
+   temp = malloc(sizeof(struct parent));
+   temp-next = rev1list;
+   temp-parent = parent-parent;
+   rev1list = temp;
+   parent = parent-next;
+   }
+   }
+   posn = posn-next;
+   }
+   posn = rev2list;
+   rev2list = NULL;
+   while (posn) {
+   parse_commit_object(posn-parent);
+   if (posn-parent-flags  0x0002) {
+   /*
+   printf(2 already seen %s\n,
+  sha1_to_hex(posn-parent-sha1));
+   */
+// do nothing
+   } else if (posn-parent-flags  0x0001) {
+//  free lists
+   return posn-parent;
+   } else {
+   /*
+   printf(2 based on %s\n,
+  sha1_to_hex(posn-parent-sha1));
+   */
+   posn-parent-flags |= 0x0002;
+
+   parent = posn-parent-parent;
+   while (parent) {
+   temp = malloc(sizeof(struct parent));
+   temp-next = rev2list;
+   temp-parent = parent-parent;
+   rev2list = temp;
+   parent = parent-next;
+   }
+   }
+   posn = posn-next;
+   }
+   }
+   return NULL;
+}
+
+int main(int argc, char **argv)
+{
+   struct revision *rev1, *rev2, *ret;
+   unsigned char rev1key[20], rev2key[20];
+   if (argc != 3 ||
+   get_sha1_hex(argv[1], rev1key

[3/5] Add http-pull

2005-04-17 Thread Daniel Barkalow
http-pull is a program that downloads from a (normal) HTTP server a commit
and all of the tree and blob objects it refers to (but not other commits,
etc.). Options could be used to make it download a larger or different
selection of objects. It depends on libcurl, which I forgot to mention in
the README again.

Signed-Off-By: Daniel Barkalow [EMAIL PROTECTED]
Index: Makefile
===
--- d662b707e11391f6cfe597fd4d0bf9c41d34d01a/Makefile  (mode:100644 
sha1:b2ce7c5b63fffca59653b980d98379909f893d44)
+++ 157b46ce1d82b3579e2e1258927b0d9bdbc033ab/Makefile  (mode:100644 
sha1:940ef8578cf469354002cd8feaec25d907015267)
@@ -14,7 +14,7 @@
 
 PROG=   update-cache show-diff init-db write-tree read-tree commit-tree \
cat-file fsck-cache checkout-cache diff-tree rev-tree show-files \
-   check-files ls-tree merge-base
+   check-files ls-tree http-pull merge-base
 
 SCRIPT=parent-id tree-id git gitXnormid.sh gitadd.sh gitaddremote.sh \
gitcommit.sh gitdiff-do gitdiff.sh gitlog.sh gitls.sh gitlsobj.sh \
@@ -35,6 +35,7 @@
 
 LIBS= -lssl -lz
 
+http-pull: LIBS += -lcurl
 
 $(PROG):%: %.o $(COMMON)
$(CC) $(CFLAGS) -o $@ $^ $(LIBS)
Index: http-pull.c
===
--- /dev/null  (tree:d662b707e11391f6cfe597fd4d0bf9c41d34d01a)
+++ 157b46ce1d82b3579e2e1258927b0d9bdbc033ab/http-pull.c  (mode:100644 
sha1:106ca31239e6afe6784e7c592234406f5c149e44)
@@ -0,0 +1,126 @@
+#include fcntl.h
+#include unistd.h
+#include string.h
+#include stdlib.h
+#include cache.h
+#include revision.h
+#include errno.h
+#include stdio.h
+
+#include curl/curl.h
+#include curl/easy.h
+
+static CURL *curl;
+
+static char *base;
+
+static int fetch(unsigned char *sha1)
+{
+   char *hex = sha1_to_hex(sha1);
+   char *filename = sha1_file_name(sha1);
+
+   char *url;
+   char *posn;
+   FILE *local;
+   struct stat st;
+
+   if (!stat(filename, st)) {
+   return 0;
+   }
+
+   local = fopen(filename, w);
+
+   if (!local) {
+   fprintf(stderr, Couldn't open %s\n, filename);
+   return -1;
+   }
+
+   curl_easy_setopt(curl, CURLOPT_FILE, local);
+
+   url = malloc(strlen(base) + 50);
+   strcpy(url, base);
+   posn = url + strlen(base);
+   strcpy(posn, objects/);
+   posn += 8;
+   memcpy(posn, hex, 2);
+   posn += 2;
+   *(posn++) = '/';
+   strcpy(posn, hex + 2);
+
+   curl_easy_setopt(curl, CURLOPT_URL, url);
+
+   curl_easy_perform(curl);
+
+   fclose(local);
+   
+   return 0;
+}
+
+static int process_tree(unsigned char *sha1)
+{
+   void *buffer;
+unsigned long size;
+char type[20];
+
+buffer = read_sha1_file(sha1, type, size);
+   if (!buffer)
+   return -1;
+   if (strcmp(type, tree))
+   return -1;
+   while (size) {
+   int len = strlen(buffer) + 1;
+   unsigned char *sha1 = buffer + len;
+   unsigned int mode;
+   int retval;
+
+   if (size  len + 20 || sscanf(buffer, %o, mode) != 1)
+   return -1;
+
+   buffer = sha1 + 20;
+   size -= len + 20;
+
+   retval = fetch(sha1);
+   if (retval)
+   return -1;
+
+   if (S_ISDIR(mode)) {
+   retval = process_tree(sha1);
+   if (retval)
+   return -1;
+   }
+   }
+   return 0;
+}
+
+static int process_commit(unsigned char *sha1)
+{
+   struct revision *rev = lookup_rev(sha1);
+   if (parse_commit_object(rev))
+   return -1;
+   
+   fetch(rev-tree);
+   process_tree(rev-tree);
+   return 0;
+}
+
+int main(int argc, char **argv)
+{
+   char *commit_id = argv[1];
+   char *url = argv[2];
+
+   unsigned char sha1[20];
+
+   get_sha1_hex(commit_id, sha1);
+
+   curl_global_init(CURL_GLOBAL_ALL);
+
+   curl = curl_easy_init();
+
+   base = url;
+
+   fetch(sha1);
+   process_commit(sha1);
+
+   curl_global_cleanup();
+   return 0;
+}

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[2.1/5] Add merge-base

2005-04-17 Thread Daniel Barkalow
merge-base finds one of the best common ancestors of a pair of commits. In
particular, it finds one of the ones which is fewest commits away from the
further of the heads.

Signed-Off-By: Daniel Barkalow [EMAIL PROTECTED]
Index: Makefile
===
--- 45f926575d2c44072bfcf2317dbf3f0fbb513a4e/Makefile  (mode:100644 
sha1:346e3850de026485802e41e16a1180be2df85e4a)
+++ 7d806c2d3be8f87d3d4d87e5254500d7fc24476b/Makefile  (mode:100644 
sha1:0e84e3cd12f836602b420c197e08fabefe975493)
@@ -14,7 +17,7 @@
 
 PROG=   update-cache show-diff init-db write-tree read-tree commit-tree \
cat-file fsck-cache checkout-cache diff-tree rev-tree show-files \
-   check-files ls-tree
+   check-files ls-tree merge-base
 
 SCRIPT=parent-id tree-id git gitXnormid.sh gitadd.sh gitaddremote.sh \
gitcommit.sh gitdiff-do gitdiff.sh gitlog.sh gitls.sh gitlsobj.sh \
Index: merge-base.c
===
--- /dev/null  (tree:45f926575d2c44072bfcf2317dbf3f0fbb513a4e)
+++ 7d806c2d3be8f87d3d4d87e5254500d7fc24476b/merge-base.c  (mode:100644 
sha1:ee979c7532cbdf823e9930993b0dd8f97aadb21f)
@@ -0,0 +1,95 @@
+#include stdlib.h
+#include cache.h
+#include revision.h
+
+static struct revision *process_list(struct parent **list_p, int this_mark,
+int other_mark)
+{
+   struct parent *parent, *temp;
+   struct parent *posn = *list_p;
+   *list_p = NULL;
+   while (posn) {
+   parse_commit_object(posn-parent);
+   if (posn-parent-flags  this_mark) {
+   /*
+ printf(%d already seen %s %x\n,
+ this_mark
+ sha1_to_hex(posn-parent-sha1),
+ posn-parent-flags);
+   */
+   /* do nothing; this indicates that this side
+* split and reformed, and we only need to
+* mark it once.
+*/
+   } else if (posn-parent-flags  other_mark) {
+   return posn-parent;
+   } else {
+   /*
+ printf(%d based on %s\n,
+ this_mark,
+ sha1_to_hex(posn-parent-sha1));
+   */
+   posn-parent-flags |= this_mark;
+   
+   parent = posn-parent-parent;
+   while (parent) {
+   temp = malloc(sizeof(struct parent));
+   temp-next = *list_p;
+   temp-parent = parent-parent;
+   *list_p = temp;
+   parent = parent-next;
+   }
+   }
+   posn = posn-next;
+   }
+   return NULL;
+}
+
+struct revision *common_ancestor(struct revision *rev1, struct revision *rev2)
+{
+   struct parent *rev1list = malloc(sizeof(struct parent));
+   struct parent *rev2list = malloc(sizeof(struct parent));
+
+   rev1list-parent = rev1;
+   rev1list-next = NULL;
+
+   rev2list-parent = rev2;
+   rev2list-next = NULL;
+
+   while (rev1list || rev2list) {
+   struct revision *ret;
+   ret = process_list(rev1list, 0x1, 0x2);
+   if (ret) {
+   /*  free lists */
+   return ret;
+   }
+   ret = process_list(rev2list, 0x2, 0x1);
+   if (ret) {
+   /*  free lists */
+   return ret;
+   }
+   }
+   return NULL;
+}
+
+int main(int argc, char **argv)
+{
+   struct revision *rev1, *rev2, *ret;
+   unsigned char rev1key[20], rev2key[20];
+
+   if (argc != 3 ||
+   get_sha1_hex(argv[1], rev1key) ||
+   get_sha1_hex(argv[2], rev2key)) {
+   usage(merge-base commit-id commit-id);
+   }
+   rev1 = lookup_rev(rev1key);
+   rev2 = lookup_rev(rev2key);
+   ret = common_ancestor(rev1, rev2);
+   if (ret) {
+   printf(%s\n, sha1_to_hex(ret-sha1));
+   return 0;
+   } else {
+   return 1;
+   }
+   
+}

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] fork optional branch point normazilation

2005-04-17 Thread Daniel Barkalow
On Sun, 17 Apr 2005, Linus Torvalds wrote:

 On Sun, 17 Apr 2005, Brad Roberts wrote:
 
  (ok, author looks better, but committer doesn't obey the AUTHOR_ vars yet)
 
 They should't, but maybe I should add COMMITTER_xxx overrides. I just do 
 _not_ want people to think that they should claim to be somebody else: 
 it's not a security issue (you could compile your own commit-tree.c 
 after all), it's more of a social rule thing. I prefer seeing bad email 
 addresses that at least match the system setup to seeing good email 
 addresses that people made up just to make them look clean.

It seems to me like there should be a set of variables for the user in
general, and the various git scripts should arrange them appropriately
(e.g., git apply could look for a first Signed-Off-By, and make the
AUTHOR_ variables match that (for the next commit), while making the
COMMITTER match the user, etc). It seems to me like the current situation
is likely to lead to people claiming to be other people when applying
their patches, just due to having set up their correct info for handling
their own patches.

Actually, if the scripts are reorganizing them, they might as well send
them on the command line.

-Daniel
*This .sig left intentionally blank*

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [3/5] Add http-pull

2005-04-17 Thread Daniel Barkalow
On Sun, 17 Apr 2005, Petr Baudis wrote:

  Index: Makefile
  ===
  --- d662b707e11391f6cfe597fd4d0bf9c41d34d01a/Makefile  (mode:100644 
  sha1:b2ce7c5b63fffca59653b980d98379909f893d44)
  +++ 157b46ce1d82b3579e2e1258927b0d9bdbc033ab/Makefile  (mode:100644 
  sha1:940ef8578cf469354002cd8feaec25d907015267)
  @@ -35,6 +35,7 @@
   
   LIBS= -lssl -lz
   
  +http-pull: LIBS += -lcurl
   
   $(PROG):%: %.o $(COMMON)
  $(CC) $(CFLAGS) -o $@ $^ $(LIBS)
 
 Whew. Looks like an awful trick, you say this works?! :-)
 
 At times, I wouldn't want to be a GNU make parser.

Yup. GNU make is big on the features which do the obvious thing, even when
you can't believe they work. This is probably why nobody's managed to
replace it.

  Index: http-pull.c
  ===
  --- /dev/null  (tree:d662b707e11391f6cfe597fd4d0bf9c41d34d01a)
  +++ 157b46ce1d82b3579e2e1258927b0d9bdbc033ab/http-pull.c  (mode:100644 
  sha1:106ca31239e6afe6784e7c592234406f5c149e44)
  +   url = malloc(strlen(base) + 50);
 
 Off-by-one. What about the trailing NUL?

I get length(base) + object/=8 + 40 SHA1 + 1 for '/' and 1 for NUL = 50.

 I think you should have at least two disjunct modes - either you are
 downloading everything related to the given commit, or you are
 downloading all commit records for commit predecessors.
 
 Even if you might not want all the intermediate trees, you definitively
 want the intermediate commits, to keep the history graph contignuous.
 
 So in git pull, I'd imagine to do
 
   http-pull -c $new_head
   http-pull -t $(tree-id $new_head)
 
 So, -c would fetch a given commit and all its predecessors until it hits
 what you already have on your side. -t would fetch a given tree with all
 files and subtrees and everything. http-pull shouldn't default on
 either, since they are mutually exclusive.
 
 What do you think?

I think I'd rather keep the current behavior and add a -c for getting the
history of commits, and maybe a -a for getting the history of commits and
their tress.

There's some trickiness for the history of commits thing for stopping at
the point where you have everything, but also behaving appropriately if
you try once, fail partway through, and then try again. It's on my queue
of things to think about.

-Daniel
*This .sig left intentionally blank*

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[3.1/5] Add http-pull

2005-04-17 Thread Daniel Barkalow
http-pull is a program that downloads from a (normal) HTTP server a commit
and all of the tree and blob objects it refers to (but not other commits,
etc.). Options could be used to make it download a larger or different
selection of objects.

Signed-Off-By: Daniel Barkalow [EMAIL PROTECTED]
Index: Makefile
===
--- 45f926575d2c44072bfcf2317dbf3f0fbb513a4e/Makefile  (mode:100644 
sha1:346e3850de026485802e41e16a1180be2df85e4a)
+++ 3eae85f66143160a26f5545d197862c89e2a8fb8/Makefile  (mode:100644 
sha1:0e84e3cd12f836602b420c197e08fabefe975493)
@@ -14,7 +17,7 @@
 
 PROG=   update-cache show-diff init-db write-tree read-tree commit-tree \
cat-file fsck-cache checkout-cache diff-tree rev-tree show-files \
-   check-files ls-tree merge-base
+   check-files ls-tree http-pull merge-base
 
 SCRIPT=parent-id tree-id git gitXnormid.sh gitadd.sh gitaddremote.sh \
gitcommit.sh gitdiff-do gitdiff.sh gitlog.sh gitls.sh gitlsobj.sh \
@@ -35,6 +38,7 @@
 
 LIBS= -lssl -lz
 
+http-pull: LIBS += -lcurl
 
 $(PROG):%: %.o $(COMMON)
$(CC) $(CFLAGS) -o $@ $^ $(LIBS)
Index: README
===
--- 45f926575d2c44072bfcf2317dbf3f0fbb513a4e/README  (mode:100664 
sha1:0170eafb60ad9009ca41c6536cecd6d1fdee5b86)
+++ 3eae85f66143160a26f5545d197862c89e2a8fb8/README  (mode:100664 
sha1:921d552d810394e665323ec82b4826914918689c)
@@ -120,7 +120,7 @@
diff, patch
libssl
rsync
-
+   curl (later than 7.7, according to the docs)
 
 
The core GIT
Index: http-pull.c
===
--- /dev/null  (tree:45f926575d2c44072bfcf2317dbf3f0fbb513a4e)
+++ 3eae85f66143160a26f5545d197862c89e2a8fb8/http-pull.c  (mode:100644 
sha1:7ba4ad67f6dac34addb537ee147ae3de0550a484)
@@ -0,0 +1,139 @@
+#include fcntl.h
+#include unistd.h
+#include string.h
+#include stdlib.h
+#include cache.h
+#include revision.h
+#include errno.h
+#include stdio.h
+
+#include curl/curl.h
+#include curl/easy.h
+
+static CURL *curl;
+
+static char *base;
+
+static int fetch(unsigned char *sha1)
+{
+   char *hex = sha1_to_hex(sha1);
+   char *filename = sha1_file_name(sha1);
+
+   char *url;
+   char *posn;
+   FILE *local;
+
+   if (!access(filename, R_OK)) {
+   return 0;
+   }
+
+   local = fopen(filename, w);
+
+   if (!local) {
+   return error(Couldn't open %s, filename);
+   }
+
+   curl_easy_setopt(curl, CURLOPT_FILE, local);
+
+   url = malloc(strlen(base) + 50);
+   strcpy(url, base);
+   posn = url + strlen(base);
+   strcpy(posn, objects/);
+   posn += 8;
+   memcpy(posn, hex, 2);
+   posn += 2;
+   *(posn++) = '/';
+   strcpy(posn, hex + 2);
+
+   curl_easy_setopt(curl, CURLOPT_URL, url);
+
+   if (curl_easy_perform(curl)) {
+   fclose(local);
+   unlink(filename);
+   return error(Error downloading %s from %s,
+sha1_to_hex(sha1), url);
+   }
+
+   fclose(local);
+   
+   return 0;
+}
+
+static int process_tree(unsigned char *sha1)
+{
+   void *buffer;
+   unsigned long size;
+   char type[20];
+
+   buffer = read_sha1_file(sha1, type, size);
+   if (!buffer)
+   return error(Couldn't read %s.,
+sha1_to_hex(sha1));
+   if (strcmp(type, tree))
+   return error(Expected %s to be a tree, but was a %s.,
+sha1_to_hex(sha1), type);
+   while (size) {
+   int len = strlen(buffer) + 1;
+   unsigned char *sha1 = buffer + len;
+   unsigned int mode;
+   int retval;
+
+   if (size  len + 20 || sscanf(buffer, %o, mode) != 1)
+   return error(Invalid tree object);
+
+   buffer = sha1 + 20;
+   size -= len + 20;
+
+   retval = fetch(sha1);
+   if (retval)
+   return retval;
+
+   if (S_ISDIR(mode)) {
+   retval = process_tree(sha1);
+   if (retval)
+   return retval;
+   }
+   }
+   return 0;
+}
+
+static int process_commit(unsigned char *sha1)
+{
+   int retval;
+   struct revision *rev = lookup_rev(sha1);
+   if (parse_commit_object(rev))
+   return error(Couldn't parse commit %s\n, sha1_to_hex(sha1));
+
+   retval = fetch(rev-tree);
+   if (retval)
+   return retval;
+   retval = process_tree(rev-tree);
+   return retval;
+}
+
+int main(int argc, char **argv)
+{
+   char *commit_id = argv[1];
+   char *url = argv[2];
+   int retval;
+
+   unsigned char sha1[20];
+
+   get_sha1_hex(commit_id, sha1);
+
+   curl_global_init(CURL_GLOBAL_ALL

Re: [1/5] Parsing code in revision.h

2005-04-17 Thread Daniel Barkalow
On Sun, 17 Apr 2005, Linus Torvalds wrote:

 On Sun, 17 Apr 2005, Daniel Barkalow wrote:
 
  --- 45f926575d2c44072bfcf2317dbf3f0fbb513a4e/revision.h  (mode:100644 
  sha1:28d0de3261a61f68e4e0948a25a416a515cd2e83)
  +++ 37a0b01b85c2999243674d48bfc71cdba0e5518e/revision.h  (mode:100644 
  sha1:523bde6e14e18bb0ecbded8f83ad4df93fc467ab)
  @@ -24,6 +24,7 @@
  unsigned int flags;
  unsigned char sha1[20];
  unsigned long date;
  +   unsigned char tree[20];
  struct parent *parent;
   };
   
 
 I think this is really wrong.
 
 The whole point of revision.h is that it's a generic framework for 
 keeping track of relationships between different objects. And those 
 objects are in no way just commit objects.

 For example, fsck uses this struct revision to create a full free of 
 _all_ the object dependencies, which means that a struct revision can be 
 any object at all - it's not in any way limited to commit objects, and 
 there is no tree object that is associated with these things at all.

I entirely missed this. No wonder my fsck-cache conversion wasn't going
so well...

 Besides, why do you want the tree? There's really nothing you can do with 
 the tree to a first approximation - you need to _first_ do the 
 reachability analysis entirely on the commit dependencies, and then when 
 you've selected a set of commits, you can just output those.

I actually want the tree for http-pull, not merging stuff. I was trying to
get a commit parser, not reachability at that point.

I think the right thing is to make a separate struct commit that has the
stuff I want in it, and probably do a struct tree at the same time.

-Daniel
*This .sig left intentionally blank*

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [3/5] Add http-pull

2005-04-17 Thread Daniel Barkalow
On Sun, 17 Apr 2005, Petr Baudis wrote:

 Dear diary, on Sun, Apr 17, 2005 at 08:49:11PM CEST, I got a letter
 where Daniel Barkalow [EMAIL PROTECTED] told me that...
 
 I'm not too kind at this. Either make it totally separate commands, or
 make a required switch specifying what to do. Otherwise it implies the
 switches would just modify what it does, but they make it do something
 completely different.

That's a good point. I'll require a -t for now, and add more later.

 -a would be fine too - basically a combination of -c and -t. I'd imagine
 that is what Linus would want to use, e.g.

Well, -c -t would give you the current tree and the whole commit log, but
not old trees. -a would additionally give you old trees.

  There's some trickiness for the history of commits thing for stopping at
  the point where you have everything, but also behaving appropriately if
  you try once, fail partway through, and then try again. It's on my queue
  of things to think about.
 
 Can't you just stop the recursion when you hit a commit you already
 have?

The problem is that, if you've fetched the final commit already, and then
the server dies, and you try again later, you already have the last one,
and so you think you've got everything.

At this point, I also want to put off doing much further with recursion
and commits until revision.h and such are sorted out.

-Daniel
*This .sig left intentionally blank*

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2.1/5] Add merge-base

2005-04-17 Thread Daniel Barkalow
On Sun, 17 Apr 2005, Petr Baudis wrote:

 Dear diary, on Sun, Apr 17, 2005 at 06:51:59PM CEST, I got a letter
 where Daniel Barkalow [EMAIL PROTECTED] told me that...
  merge-base finds one of the best common ancestors of a pair of commits. In
  particular, it finds one of the ones which is fewest commits away from the
  further of the heads.
  
  Signed-Off-By: Daniel Barkalow [EMAIL PROTECTED]
 
 Note that during merge with Linus (probably the most complicated I've
 got so far, but still thankfully not too painful thanks to the rej
 tool) I've decided to revert your merge-base in favour of Linus'
 version. I did this mainly to make me merging Linus less awful; we
 should probably clean it up first and decide which solution to go for in
 the first place before possibly replacing it again, I think.

Sure. I'm working on the rearrangement now.

-Daniel
*This .sig left intentionally blank*

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[2/5] Implementations of parsing functions

2005-04-17 Thread Daniel Barkalow
This implements the parsing functions.

Signed-Off-By: Daniel Barkalow [EMAIL PROTECTED]
Index: blob.c
===
--- /dev/null  (tree:5ca133e1b74aee39b2124c0ec9fd51539babb5e0)
+++ 1172a9b8f45b2fd640985595cc5258db3b027828/blob.c  (mode:100644 
sha1:04e0c1da9b1f4cdb1d1c5881b785babd3b0ceb09)
@@ -0,0 +1,24 @@
+#include blob.h
+#include cache.h
+#include stdlib.h
+
+const char *blob_type = blob;
+
+struct blob *lookup_blob(unsigned char *sha1)
+{
+   struct object *obj = lookup_object(sha1);
+   if (!obj) {
+   struct blob *ret = malloc(sizeof(struct blob));
+   bzero(ret, sizeof(struct blob));
+   created_object(sha1, ret-object);
+   ret-object.type = blob_type;
+   ret-object.parsed = 1;
+   return ret;
+   }
+   if (obj-parsed  obj-type != blob_type) {
+   error(Object %s is a %s, not a blob, 
+ sha1_to_hex(sha1), obj-type);
+   return NULL;
+   }
+   return (struct blob *) obj;
+}
Index: commit.c
===
--- /dev/null  (tree:5ca133e1b74aee39b2124c0ec9fd51539babb5e0)
+++ 1172a9b8f45b2fd640985595cc5258db3b027828/commit.c  (mode:100644 
sha1:0099baa63971d86ee30ef2a7da25057f0f45a964)
@@ -0,0 +1,85 @@
+#include commit.h
+#include cache.h
+#include string.h
+
+const char *commit_type = commit;
+
+struct commit *lookup_commit(unsigned char *sha1)
+{
+   struct object *obj = lookup_object(sha1);
+   if (!obj) {
+   struct commit *ret = malloc(sizeof(struct commit));
+   bzero(ret, sizeof(struct commit));
+   created_object(sha1, ret-object);
+   return ret;
+   }
+   if (obj-parsed  obj-type != commit_type) {
+   error(Object %s is a %s, not a commit, 
+ sha1_to_hex(sha1), obj-type);
+   return NULL;
+   }
+   return (struct commit *) obj;
+}
+
+static unsigned long parse_commit_date(const char *buf)
+{
+   unsigned long date;
+
+   if (memcmp(buf, author, 6))
+   return 0;
+   while (*buf++ != '\n')
+   /* nada */;
+   if (memcmp(buf, committer, 9))
+   return 0;
+   while (*buf++ != '')
+   /* nada */;
+   date = strtoul(buf, NULL, 10);
+   if (date == ULONG_MAX)
+   date = 0;
+   return date;
+}
+
+int parse_commit(struct commit *item)
+{
+   char type[20];
+   void * buffer, *bufptr;
+   unsigned long size;
+   unsigned char parent[20];
+   if (item-object.parsed)
+   return 0;
+   item-object.parsed = 1;
+   buffer = bufptr = read_sha1_file(item-object.sha1, type, size);
+   if (!buffer)
+   return error(Could not read %s,
+sha1_to_hex(item-object.sha1));
+   if (strcmp(type, commit_type))
+   return error(Object %s not a commit,
+sha1_to_hex(item-object.sha1));
+   item-object.type = commit_type;
+   get_sha1_hex(bufptr + 5, parent);
+   item-tree = lookup_tree(parent);
+   add_ref(item-object, item-tree-object);
+   bufptr += 46; /* tree  + hex sha1 + \n */
+   while (!memcmp(bufptr, parent , 7) 
+  !get_sha1_hex(bufptr + 7, parent)) {
+   struct commit_list *new_parent = 
+   malloc(sizeof(struct commit_list));
+   new_parent-next = item-parents;
+   new_parent-item = lookup_commit(parent);
+   add_ref(item-object, new_parent-item-object);
+   item-parents = new_parent;
+   bufptr += 48;
+   }
+   item-date = parse_commit_date(bufptr);
+   free(buffer);
+   return 0;
+}
+
+void free_commit_list(struct commit_list *list)
+{
+   while (list) {
+   struct commit_list *temp = list;
+   list = temp-next;
+   free(temp);
+   }
+}
Index: object.c
===
--- /dev/null  (tree:5ca133e1b74aee39b2124c0ec9fd51539babb5e0)
+++ 1172a9b8f45b2fd640985595cc5258db3b027828/object.c  (mode:100644 
sha1:986624ac7a7fd9229e05e1f181fd500640298d9e)
@@ -0,0 +1,96 @@
+#include object.h
+#include cache.h
+#include stdlib.h
+#include string.h
+
+struct object **objs;
+int nr_objs;
+static int obj_allocs;
+
+static int find_object(unsigned char *sha1)
+{
+   int first = 0, last = nr_objs;
+
+while (first  last) {
+int next = (first + last) / 2;
+struct object *obj = objs[next];
+int cmp;
+
+cmp = memcmp(sha1, obj-sha1, 20);
+if (!cmp)
+return next;
+if (cmp  0) {
+last = next;
+continue;
+}
+first

Re: [PATCH] Get commits from remote repositories by HTTP

2005-04-16 Thread Daniel Barkalow
On Sat, 16 Apr 2005, Tony Luck wrote:

 On 4/16/05, Daniel Barkalow [EMAIL PROTECTED] wrote:
  +buffer = read_sha1_file(sha1, type, size);
 
 You never free this buffer.

Ideally, this should all be rearranged to share the code with
read-tree, and it should be fixed in common.

 It would also be nice if you saved tree objects in some temporary file
 and did not install them until after you had fetched all the blobs and
 trees that this tree references.  Then if your connection is interrupted
 you can just restart it.

It looks over everything relevant, even if it doesn't need to download
anything, so it should work to continue if it stops in between.

-Daniel
*This .sig left intentionally blank*

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Get commits from remote repositories by HTTP

2005-04-16 Thread Daniel Barkalow
On Sun, 17 Apr 2005, Martin Mares wrote:

 Hello!
 
  This adds a program to download a commit, the trees, and the blobs in them
  from a remote repository using HTTP. It skips anything you already have.
 
 Is it really necessary to write your own HTTP downloader? If so, is it
 necessary to forget basic stuff like the Host: header? ;-)

I wanted to get something hacked quickly; can you suggest a good one to
use?

 If you feel that it should be optimized for speed, then at least use
 persistent connections.

That's the next step.

-Daniel
*This .sig left intentionally blank*

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Get commits from remote repositories by HTTP

2005-04-16 Thread Daniel Barkalow
On Sat, 16 Apr 2005, Adam Kropelin wrote:

 Tony Luck wrote:
  Otherwise this looks really nice.  I was going to script something
  similar using wget ... but that would have made zillions of seperate
  connections.  Not so kind to the server.
 
 How about building a file list and doing a batch download via 'wget -i 
 /tmp/foo'? A quick test (on my ancient wget-1.7) indicates that it reuses 
 connectionss when successive URLs point to the same server.

You need to look at some of the files before you know what other files to
get. You could do it in waves, but that would be excessively complicated
to code and not the most efficient anyway.

-Daniel
*This .sig left intentionally blank*

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re: Add clone support to lntree

2005-04-16 Thread Daniel Barkalow
On Sun, 17 Apr 2005, Petr Baudis wrote:

 Dear diary, on Sat, Apr 16, 2005 at 05:06:54AM CEST, I got a letter
 where Daniel Barkalow [EMAIL PROTECTED] told me that...

  I think fork is as good as anything for describing the operation. I had
  thought about clone because it seemed to fill the role that bk
  clone had (although I never used BK, so I'm not sure). It doesn't seem
  useful to me to try cloning multiple remote repositories, since you'd get
  a copy of anything common from each; you just want to suck everything into
  the same .git/objects and split off working directories.
 
 Actually, what about if git pull outside of repository did what git
 clone does now? I'd kinda like clone instead of fork too.

This seems like the best solution to me, too. Although that would make
pull take a URL when making a new repository and not otherwise, which
might be confusing. init-remote perhaps, or maybe just have init do it
if given a URL?

-Daniel
*This .sig left intentionally blank*

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re: Re: Add clone support to lntree

2005-04-16 Thread Daniel Barkalow
On Sun, 17 Apr 2005, Petr Baudis wrote:

 Dear diary, on Sat, Apr 16, 2005 at 05:17:00AM CEST, I got a letter
 where Daniel Barkalow [EMAIL PROTECTED] told me that...
  On Sat, 16 Apr 2005, Petr Baudis wrote:
  
   Dear diary, on Sat, Apr 16, 2005 at 04:47:55AM CEST, I got a letter
   where Petr Baudis [EMAIL PROTECTED] told me that...
git branch --- creates a branch from a given commit
(when passed empty commit, creates a branch
from the current commit and sets the working
tree to that branch)
Note that there is a bug in current git update - it will allow you to
bring several of your trees to follow the same branch, or even a remote
branch. This is not even supposed to work, and will be fixed when I get
some sleep. You will be able to do git pull even on local branches, and
the proper solution for this will be just tracking the branch you want
to follow.
   
   I must admit that I'm not entirely decided yet, so I'd love to hear your
   opinion.
   
   I'm wondering, whether each tree should be fixed to a certain branch.
   That is, you decide a name when you do git fork, and then the tree
   always follows that branch. (It always has to follow [be bound to]
   *some* branch, and each branch can be followed by only a single tree at
   a time.)
  
  I don't think I'm following the use of branches. Currently, what I do is
  have a git-pasky and a git-linus, and fork off a working directory from
  one of these for each thing I want to work on. I do some work, commit as I
  make progress, and then do a diff against the remote head to get a patch
  to send off. If I want to do a series of patches which depend on each
  other, I fork my next directory off of my previous one rather than off of
  a remote base. I haven't done much rebasing, so I haven't worked out how I
  would do that most effectively.
 
 Yes. And that's exactly what the branches allow you to do. You just do
 
   git fork myhttpclient ~/myhttpclientdir
 
 then you do some hacking, and when you have something usable, you can
 go back to your main working directory and do
 
   git merge -b when_you_started myhttpclient
 
 Since you consider the code perfect, you can now just rm -rf
 ~/myhttpclient.
 
 Suddenly, you get a mail from mj pointing out some bugs, and it looks
 like there are more to come. What to do?
 
   git fork myhttpclient ~/myhttpclientdir
 
 (Ok, this does not work, but that's a bug, will fix tomorrow.) This will
 let you take off when you left in your work on the branch.

Ah, I think that's what made me think I wasn't understanding branches; the
first thing I tried hit this big.

 git update for seeking between commits is probably extremely important
 for any kind of binary search when you are wondering when did this bug
 appeared first, or when you are exploring how certain branch evolved
 over time. Doing git fork for each successive iteration sounds horrible.

Even if there isn't a performance hit, it's semantically wrong, because
you're looking at different versions that were in the same place at
different times.

 Now, what about git branch and git update for switching between
 branches? I think this is the most controversial part; these are
 basically just shortcuts for not having to do git fork, and I wouldn't
 mind so much removing them, if you people really consider them too ugly
 a wart for the soft clean git skin. I admit that they both come from a
 hidden prejudice that git fork is going to be slow and eat a lot of
 disk.

I think that this just confuses matters.

 The idea for git update for switching between branches is that
 especially when you have two rather similar branches and mostly do stuff
 on one of them, but sometimes you want to do something on the other one,
 you can do just quick git update, do stuff, and git update back, without
 any forking.

I still think that fork should be quick enough, or you could leave the
extra tree around. I'm not against having such a command, but I think it
should be a separate command rather than a different use of update, since
it would be used by poeople working in different ways.

  I think I can make this space efficient by hardlinking unmodified blobs to
  a directory of cached expanded blobs.
 
 I don't know but I really feel *very* unsafe when doing that. What if
 something screws up and corrupts my base... way too easy. And it gets
 pretty inconvenient and even more dangerous when you get the idea to do
 some modifications on your tree by something else than your favorite
 editor (which you've already checked does the right thing).

It should only be an option, not required and maybe not even
default. I think it should be possible to prevent stuff from screwing up,
since we really don't want anything to ever modify those inodes (as
opposed to some cases, where you want to modify inodes only in certain
ways). For that matter, relatively

[PATCH] Use libcurl to use HTTP to get repositories

2005-04-16 Thread Daniel Barkalow
This enables the use of HTTP to download commits and associated objects
from remote repositories. It now uses libcurl instead of local hack code.

Still causes warnings for fsck-cache and rev-tree, due to unshared code.

Still leaks a bit of memory due to bug copied from read-tree.

Needs libcurl post 7.7 or so.

Signed-Off-By: Daniel Barkalow [EMAIL PROTECTED]

Index: Makefile
===
--- ed4f6e454b40650b904ab72048b2f93a068dccc3/Makefile  (mode:100644 
sha1:b39b4ea37586693dd707d1d0750a9b580350ec50)
+++ d332a8ddffb50c1247491181af458970bf639942/Makefile  (mode:100644 
sha1:ca5dfd41b750cb1339128e4431afbbbc21bf57bb)
@@ -14,7 +14,7 @@
 
 PROG=   update-cache show-diff init-db write-tree read-tree commit-tree \
cat-file fsck-cache checkout-cache diff-tree rev-tree show-files \
-   check-files ls-tree merge-tree
+   check-files ls-tree merge-tree http-get
 
 all: $(PROG)
 
@@ -23,6 +23,11 @@
 
 LIBS= -lssl -lz
 
+http-get: LIBS += -lcurl
+
+http-get:%:%.o read-cache.o
+   $(CC) $(CFLAGS) -o $@ $^ $(LIBS)
+
 init-db: init-db.o
 
 update-cache: update-cache.o read-cache.o
Index: http-get.c
===
--- /dev/null  (tree:ed4f6e454b40650b904ab72048b2f93a068dccc3)
+++ d332a8ddffb50c1247491181af458970bf639942/http-get.c  (mode:100644 
sha1:106ca31239e6afe6784e7c592234406f5c149e44)
@@ -0,0 +1,126 @@
+#include fcntl.h
+#include unistd.h
+#include string.h
+#include stdlib.h
+#include cache.h
+#include revision.h
+#include errno.h
+#include stdio.h
+
+#include curl/curl.h
+#include curl/easy.h
+
+static CURL *curl;
+
+static char *base;
+
+static int fetch(unsigned char *sha1)
+{
+   char *hex = sha1_to_hex(sha1);
+   char *filename = sha1_file_name(sha1);
+
+   char *url;
+   char *posn;
+   FILE *local;
+   struct stat st;
+
+   if (!stat(filename, st)) {
+   return 0;
+   }
+
+   local = fopen(filename, w);
+
+   if (!local) {
+   fprintf(stderr, Couldn't open %s\n, filename);
+   return -1;
+   }
+
+   curl_easy_setopt(curl, CURLOPT_FILE, local);
+
+   url = malloc(strlen(base) + 50);
+   strcpy(url, base);
+   posn = url + strlen(base);
+   strcpy(posn, objects/);
+   posn += 8;
+   memcpy(posn, hex, 2);
+   posn += 2;
+   *(posn++) = '/';
+   strcpy(posn, hex + 2);
+
+   curl_easy_setopt(curl, CURLOPT_URL, url);
+
+   curl_easy_perform(curl);
+
+   fclose(local);
+   
+   return 0;
+}
+
+static int process_tree(unsigned char *sha1)
+{
+   void *buffer;
+unsigned long size;
+char type[20];
+
+buffer = read_sha1_file(sha1, type, size);
+   if (!buffer)
+   return -1;
+   if (strcmp(type, tree))
+   return -1;
+   while (size) {
+   int len = strlen(buffer) + 1;
+   unsigned char *sha1 = buffer + len;
+   unsigned int mode;
+   int retval;
+
+   if (size  len + 20 || sscanf(buffer, %o, mode) != 1)
+   return -1;
+
+   buffer = sha1 + 20;
+   size -= len + 20;
+
+   retval = fetch(sha1);
+   if (retval)
+   return -1;
+
+   if (S_ISDIR(mode)) {
+   retval = process_tree(sha1);
+   if (retval)
+   return -1;
+   }
+   }
+   return 0;
+}
+
+static int process_commit(unsigned char *sha1)
+{
+   struct revision *rev = lookup_rev(sha1);
+   if (parse_commit_object(rev))
+   return -1;
+   
+   fetch(rev-tree);
+   process_tree(rev-tree);
+   return 0;
+}
+
+int main(int argc, char **argv)
+{
+   char *commit_id = argv[1];
+   char *url = argv[2];
+
+   unsigned char sha1[20];
+
+   get_sha1_hex(commit_id, sha1);
+
+   curl_global_init(CURL_GLOBAL_ALL);
+
+   curl = curl_easy_init();
+
+   base = url;
+
+   fetch(sha1);
+   process_commit(sha1);
+
+   curl_global_cleanup();
+   return 0;
+}
Index: revision.h
===
--- ed4f6e454b40650b904ab72048b2f93a068dccc3/revision.h  (mode:100664 
sha1:28d0de3261a61f68e4e0948a25a416a515cd2e83)
+++ d332a8ddffb50c1247491181af458970bf639942/revision.h  (mode:100664 
sha1:523bde6e14e18bb0ecbded8f83ad4df93fc467ab)
@@ -24,6 +24,7 @@
unsigned int flags;
unsigned char sha1[20];
unsigned long date;
+   unsigned char tree[20];
struct parent *parent;
 };
 
@@ -111,4 +112,29 @@
}
 }
 
+static int parse_commit_object(struct revision *rev)
+{
+   if (!(rev-flags  SEEN)) {
+   void *buffer, *bufptr;
+   unsigned long size;
+   char type[20];
+   unsigned char parent[20

  1   2   >