Re: [RFC] Possible strategy cleanup for git add/remove/diff etc.

2005-04-19 Thread Linus Torvalds


On Tue, 19 Apr 2005, Junio C Hamano wrote:
 
 Let's for a moment forget what git-pasky currently does, which
 is not to touch .git/index until the user says Ok, let's
 commit. 

I think git-pasky is wrong.

It's true that we want to often (almost always) diff against the last 
released thing, and I actually think git-pasky does what it does because 
I never wrote a tool to diff the current working directory against a 
tree.

At the same time, I very much worked with a model where you do _not_ have 
a traditional work file, but the index really _is_ the work file.

 I'd like to start from a different premise and see what happens:
 
  - What .git/index records is *not* the state as the last
commit.  It is just an cache Cogito uses to speed up access
to the user's working tree.  From the user's point of view,
it does not even exist.

Yes. Yes. YES.

That is indeed the whole point of the index file. In my world-view, the
index file does _everything_. It's the staging area (work file), it's
the merging area (merge directory) and it's the cache file (stat
cache).

I'll immediately write a tool to diff the current working directory 
against a tree object, and hopefully that will just make pasky happy with 
this model too. 

Is there any other reason why git-pasky wants to have a work file?

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Possible strategy cleanup for git add/remove/diff etc.

2005-04-19 Thread Linus Torvalds


On Tue, 19 Apr 2005, Linus Torvalds wrote:
 
 That is indeed the whole point of the index file. In my world-view, the
 index file does _everything_. It's the staging area (work file), it's
 the merging area (merge directory) and it's the cache file (stat
 cache).
 
 I'll immediately write a tool to diff the current working directory 
 against a tree object, and hopefully that will just make pasky happy with 
 this model too. 

Ok, immediately took a bit longer than I wanted to, and quite frankly,
the end result is not very well tested. It was a bit more complex than I
was hoping for to match up the index file against a tree object, since
unlike the tree-tree comparison in diff-tree, you have to compare two
cases where the layout isn't the same.

No matter. It seems to work to a first approximation, and the result is
such a cool tool that it's worth committing and pushing out immediately. 

The code ain't exactly pretty, but hey, maybe that's just me having higher 
standards of beauty than most. Or maybe you just shudder at what I 
consider pretty in the first place, in which case you probably shouldn't 
look too closely at this one.

What the new diff-cache does is basically emulate diff-tree, except 
one of the trees is always the index file.

You can also choose whether you want to trust the index file entirely
(using the --cached flag) or ask the diff logic to show any files that
don't match the stat state as being tentatively changed.  Both of these
operations are very useful indeed.

For example, let's say that you have worked on your index file, and are
ready to commit. You want to see eactly _what_ you are going to commit is
without having to write a new tree object and compare it that way, and to
do that, you just do

diff-cache --cached $(cat .git/HEAD)

(another difference between diff-tree and diff-cache is that the new 
diff-cache can take a commit object, and it automatically just extracts 
the tree information from there).

Example: let's say I had renamed commit.c to git-commit.c, and I had 
done an upate-cache to make that effective in the index file. 
show-diff wouldn't show anything at all, since the index file matches 
my working directory. But doing a diff-cache does:

[EMAIL PROTECTED]:~/git diff-cache --cached $(cat .git/HEAD)
-100644 blob4161aecc6700a2eb579e842af0b7f22b98443f74commit.c
+100644 blob4161aecc6700a2eb579e842af0b7f22b98443f74
git-commit.c

So what the above diff-cache command line does is to say

   show me the differences between HEAD and the current index contents 
(the ones I'd write with a write-tree)

And as you can see, the output matches diff-tree -r output (we always do
-r, since the index is always fully populated). All the same rules: +  
means added file, - means removed file, and * means changed file. You 
can trivially see that the above is a rename.

In fact, diff-tree --cached _should_ always be entirely equivalent to
actually doing a write-tree and comparing that. Except this one is much
nicer for the case where you just want to check. Maybe you don't want to
do the tree.

So doing a diff-cache --cached is basically very useful when you are 
asking yourself what have I already marked for being committed, and 
what's the difference to a previous tree.

However, the non-cached version takes a different approach, and is
potentially the even more useful of the two in that what it does can't be
emulated with a write-tree + diff-tree. Thus that's the default mode.  
The non-cached version asks the question

   show me the differences between HEAD and the currently checked out 
tree - index contents _and_ files that aren't up-to-date

which is obviously a very useful question too, since that tells you what
you _could_ commit. Again, the output matches the diff-tree -r output to
a tee, but with a twist.

The twist is that if some file doesn't match the cache, we don't have a
backing store thing for it, and we use the magic all-zero sha1 to show
that. So let's say that you have edited kernel/sched.c, but have not
actually done an update-cache on it yet - there is no object associated
with the new state, and you get:

[EMAIL PROTECTED]:~/v2.6/linux diff-cache $(cat .git/HEAD )
*100644-100664 blob
7476bbcfe5ef5a1dd87d745f298b831143e4d77e-
  kernel/sched.c

ie it shows that the tree has changed, and that kernel/sched.c has is
not up-to-date and may contain new stuff. The all-zero sha1 means that to
get the real diff, you need to look at the object in the working directory
directly rather than do an object-to-object diff.

NOTE! As with other commands of this type, diff-cache does not actually 
look at the contents of the file at all. So maybe kernel/sched.c hasn't 
actually changed, and it's just that you touched it. In either case, it's 
a note that you need to upate-cache it to make the cache be in sync.

NOTE 2! You can have a mixture