Re: Handling renames.

2005-04-14 Thread Linus Torvalds


On Thu, 14 Apr 2005, David Woodhouse wrote:

 I've been looking at tracking file revisions. One proposed solution was
 to have a separate revision history for individual files, with a new
 kind of 'filecommit' object which parallels the existing 'commit',
 referencing a blob instead of a tree. Then trees would reference such
 objects instead of referencing blobs directly.

Please don't.  It's fundamentally the git notion of content determines
objects.

It also has no relevance. A rename really doesn't exist in the git 
model. The git model really is about tracking data, not about tracking 
what happened to _create_ that data.

The one exception is the commit log. That's where you put the explanations 
of _why_ the data changed. And git itself doesn't care what the format is, 
apart from the git header.

So, you really need to think of git as a filesystem. You can then 
implement an SCM _on_top_of_it_, which means that your second suggestion 
is not only acceptable, it really is the _only_ way to handle this in git:

 So a commit involving a rename would look something like this...
 
   tree 82ba574c85e9a2e4652419c88244e9dd1bfa8baa
   parent bb95843a5a0f397270819462812735ee29796fb4
   rename foo.c bar.c
   author David Woodhouse [EMAIL PROTECTED] 1113499881 +0100
   committer David Woodhouse [EMAIL PROTECTED] 1113499881 +0100
   Rename foo.c to bar.c and s/foo_/bar_/g

Except I want that empty line in there, and I want it in the free-form  
section. The rename part really isn't part of the git header. It's not 
what git tracks, it was tracked by an SCM system on top of git.

So the git header is an inode in the git filesystem, and like an inode 
it has a ctime and an mtime, and pointers to the data. So as far as git is 
concerned, this part:

tree 82ba574c85e9a2e4652419c88244e9dd1bfa8baa
parent bb95843a5a0f397270819462812735ee29796fb4
author David Woodhouse [EMAIL PROTECTED] 1113499881 +0100
committer David Woodhouse [EMAIL PROTECTED] 1113499881 +0100

really is the filesystem inode. The rest is whatever the filesystem user
puts into it, and git won't care.

 Opinions? Dissent? We'd probably need to escape the filenames in some
 way -- handwave over that for now.

The fact that git handles arbitrary filenames (stuff starting with . 
excepted) doesn't mean that the SCM above it needs to. Quite frankly, I 
think an SCM that handles newlines in filenames is being silly. But a 
_filesystem_ needs to not care.

There are too many messy SCM's out there that do not hav ea philosophy. 
Dammit, I'm not interested in creating another one. This thing has a 
mental model, and we keep to that model.

The reason UNIX is beautiful is that it has a mental model of processes 
and files. Git has a mental model of objects and certain very very limited 
relationships. The relationships git cares about are encoded in the C 
files, the extra crap (like rename info) is just that - stuff that 
random scripts wrote, and that is just informational and not central to 
the model.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Handling renames.

2005-04-14 Thread Ingo Molnar

* David Woodhouse [EMAIL PROTECTED] wrote:

 I've been looking at tracking file revisions. One proposed solution 
 was to have a separate revision history for individual files, with a 
 new kind of 'filecommit' object which parallels the existing 'commit', 
 referencing a blob instead of a tree. Then trees would reference such 
 objects instead of referencing blobs directly.
 
 I think that introduces a lot of redundancy though, because 99% of the 
 time, the revision history of the individual file is entirely 
 reproducible from the revision history of the tree. It's only when 
 files are renamed that we fall over -- and I think we can handle 
 renames fairly well if we just log them in the commit object.

how about the following structure:

- tree_new ---
- tree_old --- rename_commit - blob

the rename_commit object just contains a pointer to the file content 
blob. If a rename happens then the old tree references the rename_commit 
object (instead of the blob), and the new tree references it too. This 
way there's no need to list the rename via namespace means: if a tree 
entry points to a rename_commit object then a rename happened and the 
rename_commit object is looked up in the old tree to get the old name.

there's no redundancy caused by this method: only renames (which are 
rare) go through the rename_commit redirection. (to speed up the lookup 
the rename_commit object could cache the offset of the two names within 
their tree objects.)

Ingo
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Handling renames.

2005-04-14 Thread H. Peter Anvin
Linus Torvalds wrote:
On Thu, 14 Apr 2005, H. Peter Anvin wrote:
Although Linus is correct in that an SCM doesn't *have* to handle this, 
it really feels like shooting for mediocracy to me.  We might as well 
design it right from the beginning.
No. git is not an SCM. it's a filesystem designed to _host_ an SCM, and 
that _is_ doing it right from the beginning.

Keep the abstractions clean. Do _not_ get confused into thinking that git 
is an SCM. If you think of it that way, you'll end up with crap you can't 
think about.

And at a filesystem layer, rename already exists. It's moving an object 
to a new name in a tree. git already does that very well, thank you very 
much.

But a filesystem rename is _not_ the same thing as an SCM rename.  An SCM 
rename is built on top of a filesystem rename, but it has its own issues 
that may or may not make sense for the filesystem.

I wasn't referring to git per se, I was referring to the hosted SCM.
-hpa
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Handling renames.

2005-04-14 Thread David Woodhouse
On Thu, 2005-04-14 at 20:58 +0200, Ingo Molnar wrote:
 The thing i tried to avoid was to list long filenames in the commit 
 (because of the tree hierarchy we'd need to do tree-absolute pathnames 
 or something like that, and escape things, and do lookups - duplicating 
 a VFS which is quite bad) - it would be better to identify the rename 
 source and target via its tree object hash and its offset within that 
 tree. Such information could be embedded in the commit object just fine.  
 Something like:

Actually I'm not sure that's true. Let's consider the two main users of
this information.

Firstly, because it's what I've been playing with: to list a given
file's revision history, I currently work with its filename -- walk the
commit objects, inspecting the tree and selecting those commits where
the file has changed. If my filename is 'fs/jffs2/inode.c' then I can
immediately skip over a commit where the 'fs' entry in the top-level
tree is identical to that in the parent, or I can skip a commit where
the 'jffs2' entry in the 'fs' subtree is identical to the parent... it's
all done on filename, and the {parent, entry} tuple wouldn't help much
here; I'd probably have to convert back to a filename anyway.

Secondly, there's merges. I've paid less attention to these (see mail 5
minutes ago) but I think they'd end up operating on the rename
information in a very similar way. To find a common ancestor for a given
file,, we want to track its name as it changed during history; at that
point it's all string compares.

-- 
dwmw2


-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Handling renames.

2005-04-14 Thread Zach Welch
Linus Torvalds wrote:
 
 On Thu, 14 Apr 2005, H. Peter Anvin wrote:
 
 Although Linus is correct in that an SCM doesn't *have* to handle 
 this, it really feels like shooting for mediocracy to me.  We might
  as well design it right from the beginning.
 
 
 No. git is not an SCM. it's a filesystem designed to _host_ an SCM, 
 and that _is_ doing it right from the beginning.

I imagine quite a few folks expect something not entirely unlike an SCM
to emerge from these current efforts. Moreover, Petr's 'git' scripts
wrap your filesystem plumbing to that very end.

To avoid confusion, I think it would be better to distinguish the two
layers, perhaps by calling the low-level plumbing... 'gitfs', of course.

Cheers,

Zach Welch
Superlucidity Services
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re: Naming the SCM (was Re: Handling renames.)

2005-04-14 Thread Petr Baudis
Dear diary, on Thu, Apr 14, 2005 at 10:58:52PM CEST, I got a letter
where H. Peter Anvin [EMAIL PROTECTED] told me that...
 Petr Baudis wrote:
 
 Cogito.  Git inside can be the first slogan.
 
 What about tig?
 
 I like Cogito; it's a real name, plus it'd be a good use for the 
 otherwise-pretty-useless two-letter combination cg.

Duh, believe me or not but I completely missed the Cogito part of
Steven's mail. Of course, I like it too.

I'll commit my poor man's git-merge-in-separate-tree and finally get
some sleep. I promise.

-- 
Petr Pasky Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Handling renames.

2005-04-14 Thread Daniel Barkalow
On Thu, 14 Apr 2005, David Woodhouse wrote:

 Opinions? Dissent? We'd probably need to escape the filenames in some
 way -- handwave over that for now.

I personally think renames are a minor thing that doesn't happen
much. What actually happens, in my opinion, is that some chunk of a file
is moved to a different, possibly new, file. If this is supported (as
something that the SCM notices), then a rename is just a special case
where the moved chunk is a whole file.

I think that it should be possible to identify and tag big
enough deletions and insertions, and compare them to find moves, where a
further change may be applied in the middle if two chunks are very
similar but not the same.

On the other hand, I think that the SCM will need to cache its
understanding of what a commit did in order to give reasonable
performance for operations like annotate, and it may be advantegous to
distribute things from this cache, since the committer might want to tell
the system something that it didn't guess.

At some point, I'm going to argue for core support for back pointers,
where a file can be created which is about some other file(s), and
someone looking for files about a particular file can find them without
searching the entire database. I think this will turn out to be important
for a variety of cases where some later participant wants to say something
about an existing file without changing the content of the file.

-Daniel
*This .sig left intentionally blank*

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Naming the SCM (was Re: Handling renames.)

2005-04-14 Thread Peter Williams
Steven Cole wrote:
On Thursday 14 April 2005 01:40 pm, Andrew Timberlake-Newell wrote:
Zach Welch pontificated:
I imagine quite a few folks expect something not entirely unlike an SCM
to emerge from these current efforts. Moreover, Petr's 'git' scripts
wrap your filesystem plumbing to that very end.
To avoid confusion, I think it would be better to distinguish the two
layers, perhaps by calling the low-level plumbing... 'gitfs', of course.
Or perhaps to come up with a name (or at least nickname) for the SCM.
GitMaster?

Cogito.  Git inside can be the first slogan.
Differentiating the SCM built on top of git from git itself is probably 
worthwhile
to avoid confusion.  Other SCMs may be developed later, built on git, and these
can come up with their own clever names.
And the logo could be a dove which, as everybody knows, coos.
Peter
--
Peter Williams   [EMAIL PROTECTED]
Learning, n. The kind of ignorance distinguishing the studious.
 -- Ambrose Bierce
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html