diff etc.

Linus Torvalds Tue, 19 Apr 2005 21:31:10 -0700


On Tue, 19 Apr 2005, Linus Torvalds wrote:
> 
> That is indeed the whole point of the index file. In my world-view, the
> index file does _everything_. It's the staging area ("work file"), it's
> the merging area ("merge directory") and it's the cache file ("stat
> cache").
> 
> I'll immediately write a tool to diff the current working directory 
> against a tree object, and hopefully that will just make pasky happy with 
> this model too.


Ok, "immediately" took a bit longer than I wanted to, and quite frankly,
the end result is not very well tested. It was a bit more complex than I
was hoping for to match up the index file against a tree object, since
unlike the tree<->tree comparison in diff-tree, you have to compare two
cases where the layout isn't the same.

No matter. It seems to work to a first approximation, and the result is
such a cool tool that it's worth committing and pushing out immediately. 

The code ain't exactly pretty, but hey, maybe that's just me having higher 
standards of beauty than most. Or maybe you just shudder at what I 
consider pretty in the first place, in which case you probably shouldn't 
look too closely at this one.

What the new "diff-cache" does is basically emulate "diff-tree", except 
one of the trees is always the index file.

You can also choose whether you want to trust the index file entirely
(using the "--cached" flag) or ask the diff logic to show any files that
don't match the stat state as being "tentatively changed".  Both of these
operations are very useful indeed.

For example, let's say that you have worked on your index file, and are
ready to commit. You want to see eactly _what_ you are going to commit is
without having to write a new tree object and compare it that way, and to
do that, you just do

        diff-cache --cached $(cat .git/HEAD)

(another difference between diff-tree and diff-cache is that the new 
diff-cache can take a "commit" object, and it automatically just extracts 
the tree information from there).

Example: let's say I had renamed "commit.c" to "git-commit.c", and I had 
done an "upate-cache" to make that effective in the index file. 
"show-diff" wouldn't show anything at all, since the index file matches 
my working directory. But doing a diff-cache does:

        [EMAIL PROTECTED]:~/git> diff-cache --cached $(cat .git/HEAD)
        -100644 blob    4161aecc6700a2eb579e842af0b7f22b98443f74        commit.c
        +100644 blob    4161aecc6700a2eb579e842af0b7f22b98443f74        
git-commit.c

So what the above "diff-cache" command line does is to say

   "show me the differences between HEAD and the current index contents 
    (the ones I'd write with a "write-tree")"

And as you can see, the output matches "diff-tree -r" output (we always do
"-r", since the index is always fully populated). All the same rules: "+"  
means added file, "-" means removed file, and "*" means changed file. You 
can trivially see that the above is a rename.

In fact, "diff-tree --cached" _should_ always be entirely equivalent to
actually doing a "write-tree" and comparing that. Except this one is much
nicer for the case where you just want to check. Maybe you don't want to
do the tree.

So doing a "diff-cache --cached" is basically very useful when you are 
asking yourself "what have I already marked for being committed, and 
what's the difference to a previous tree".

However, the "non-cached" version takes a different approach, and is
potentially the even more useful of the two in that what it does can't be
emulated with a "write-tree + diff-tree". Thus that's the default mode.  
The non-cached version asks the question

   "show me the differences between HEAD and the currently checked out 
    tree - index contents _and_ files that aren't up-to-date"

which is obviously a very useful question too, since that tells you what
you _could_ commit. Again, the output matches the "diff-tree -r" output to
a tee, but with a twist.

The twist is that if some file doesn't match the cache, we don't have a
backing store thing for it, and we use the magic "all-zero" sha1 to show
that. So let's say that you have edited "kernel/sched.c", but have not
actually done an update-cache on it yet - there is no "object" associated
with the new state, and you get:

        [EMAIL PROTECTED]:~/v2.6/linux> diff-cache $(cat .git/HEAD )
        *100644->100664 blob    
7476bbcfe5ef5a1dd87d745f298b831143e4d77e->0000000000000000000000000000000000000000
      kernel/sched.c

ie it shows that the tree has changed, and that "kernel/sched.c" has is
not up-to-date and may contain new stuff. The all-zero sha1 means that to
get the real diff, you need to look at the object in the working directory
directly rather than do an object-to-object diff.

NOTE! As with other commands of this type, "diff-cache" does not actually 
look at the contents of the file at all. So maybe "kernel/sched.c" hasn't 
actually changed, and it's just that you touched it. In either case, it's 
a note that you need to upate-cache it to make the cache be in sync.

NOTE 2! You can have a mixture of files show up as "has been updated" and
"is still dirty in the working directory" together. You can always tell
which file is in which state, since the "has been updated" ones show a
valid sha1, and the "not in sync with the index" ones will always have the
special all-zero sha1.

I think this should obviate the need for Pasky keeping a separate work 
file. You can always tell what the difference to the last commit is with 
this, and you don't need to have a separate file to tell you about what 
you're supposed to do.

                        Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Possible strategy cleanup for git add/remove/diff etc.

Reply via email to