Re: Handling renames.
On Thu, 14 Apr 2005, David Woodhouse wrote: I've been looking at tracking file revisions. One proposed solution was to have a separate revision history for individual files, with a new kind of 'filecommit' object which parallels the existing 'commit', referencing a blob instead of a tree. Then trees would reference such objects instead of referencing blobs directly. Please don't. It's fundamentally the git notion of content determines objects. It also has no relevance. A rename really doesn't exist in the git model. The git model really is about tracking data, not about tracking what happened to _create_ that data. The one exception is the commit log. That's where you put the explanations of _why_ the data changed. And git itself doesn't care what the format is, apart from the git header. So, you really need to think of git as a filesystem. You can then implement an SCM _on_top_of_it_, which means that your second suggestion is not only acceptable, it really is the _only_ way to handle this in git: So a commit involving a rename would look something like this... tree 82ba574c85e9a2e4652419c88244e9dd1bfa8baa parent bb95843a5a0f397270819462812735ee29796fb4 rename foo.c bar.c author David Woodhouse [EMAIL PROTECTED] 1113499881 +0100 committer David Woodhouse [EMAIL PROTECTED] 1113499881 +0100 Rename foo.c to bar.c and s/foo_/bar_/g Except I want that empty line in there, and I want it in the free-form section. The rename part really isn't part of the git header. It's not what git tracks, it was tracked by an SCM system on top of git. So the git header is an inode in the git filesystem, and like an inode it has a ctime and an mtime, and pointers to the data. So as far as git is concerned, this part: tree 82ba574c85e9a2e4652419c88244e9dd1bfa8baa parent bb95843a5a0f397270819462812735ee29796fb4 author David Woodhouse [EMAIL PROTECTED] 1113499881 +0100 committer David Woodhouse [EMAIL PROTECTED] 1113499881 +0100 really is the filesystem inode. The rest is whatever the filesystem user puts into it, and git won't care. Opinions? Dissent? We'd probably need to escape the filenames in some way -- handwave over that for now. The fact that git handles arbitrary filenames (stuff starting with . excepted) doesn't mean that the SCM above it needs to. Quite frankly, I think an SCM that handles newlines in filenames is being silly. But a _filesystem_ needs to not care. There are too many messy SCM's out there that do not hav ea philosophy. Dammit, I'm not interested in creating another one. This thing has a mental model, and we keep to that model. The reason UNIX is beautiful is that it has a mental model of processes and files. Git has a mental model of objects and certain very very limited relationships. The relationships git cares about are encoded in the C files, the extra crap (like rename info) is just that - stuff that random scripts wrote, and that is just informational and not central to the model. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Handling renames.
* David Woodhouse [EMAIL PROTECTED] wrote: I've been looking at tracking file revisions. One proposed solution was to have a separate revision history for individual files, with a new kind of 'filecommit' object which parallels the existing 'commit', referencing a blob instead of a tree. Then trees would reference such objects instead of referencing blobs directly. I think that introduces a lot of redundancy though, because 99% of the time, the revision history of the individual file is entirely reproducible from the revision history of the tree. It's only when files are renamed that we fall over -- and I think we can handle renames fairly well if we just log them in the commit object. how about the following structure: - tree_new --- - tree_old --- rename_commit - blob the rename_commit object just contains a pointer to the file content blob. If a rename happens then the old tree references the rename_commit object (instead of the blob), and the new tree references it too. This way there's no need to list the rename via namespace means: if a tree entry points to a rename_commit object then a rename happened and the rename_commit object is looked up in the old tree to get the old name. there's no redundancy caused by this method: only renames (which are rare) go through the rename_commit redirection. (to speed up the lookup the rename_commit object could cache the offset of the two names within their tree objects.) Ingo - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Handling renames.
Linus Torvalds wrote: On Thu, 14 Apr 2005, H. Peter Anvin wrote: Although Linus is correct in that an SCM doesn't *have* to handle this, it really feels like shooting for mediocracy to me. We might as well design it right from the beginning. No. git is not an SCM. it's a filesystem designed to _host_ an SCM, and that _is_ doing it right from the beginning. Keep the abstractions clean. Do _not_ get confused into thinking that git is an SCM. If you think of it that way, you'll end up with crap you can't think about. And at a filesystem layer, rename already exists. It's moving an object to a new name in a tree. git already does that very well, thank you very much. But a filesystem rename is _not_ the same thing as an SCM rename. An SCM rename is built on top of a filesystem rename, but it has its own issues that may or may not make sense for the filesystem. I wasn't referring to git per se, I was referring to the hosted SCM. -hpa - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Handling renames.
On Thu, 2005-04-14 at 20:58 +0200, Ingo Molnar wrote: The thing i tried to avoid was to list long filenames in the commit (because of the tree hierarchy we'd need to do tree-absolute pathnames or something like that, and escape things, and do lookups - duplicating a VFS which is quite bad) - it would be better to identify the rename source and target via its tree object hash and its offset within that tree. Such information could be embedded in the commit object just fine. Something like: Actually I'm not sure that's true. Let's consider the two main users of this information. Firstly, because it's what I've been playing with: to list a given file's revision history, I currently work with its filename -- walk the commit objects, inspecting the tree and selecting those commits where the file has changed. If my filename is 'fs/jffs2/inode.c' then I can immediately skip over a commit where the 'fs' entry in the top-level tree is identical to that in the parent, or I can skip a commit where the 'jffs2' entry in the 'fs' subtree is identical to the parent... it's all done on filename, and the {parent, entry} tuple wouldn't help much here; I'd probably have to convert back to a filename anyway. Secondly, there's merges. I've paid less attention to these (see mail 5 minutes ago) but I think they'd end up operating on the rename information in a very similar way. To find a common ancestor for a given file,, we want to track its name as it changed during history; at that point it's all string compares. -- dwmw2 - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Handling renames.
Linus Torvalds wrote: On Thu, 14 Apr 2005, H. Peter Anvin wrote: Although Linus is correct in that an SCM doesn't *have* to handle this, it really feels like shooting for mediocracy to me. We might as well design it right from the beginning. No. git is not an SCM. it's a filesystem designed to _host_ an SCM, and that _is_ doing it right from the beginning. I imagine quite a few folks expect something not entirely unlike an SCM to emerge from these current efforts. Moreover, Petr's 'git' scripts wrap your filesystem plumbing to that very end. To avoid confusion, I think it would be better to distinguish the two layers, perhaps by calling the low-level plumbing... 'gitfs', of course. Cheers, Zach Welch Superlucidity Services - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Naming the SCM (was Re: Handling renames.)
Dear diary, on Thu, Apr 14, 2005 at 10:58:52PM CEST, I got a letter where H. Peter Anvin [EMAIL PROTECTED] told me that... Petr Baudis wrote: Cogito. Git inside can be the first slogan. What about tig? I like Cogito; it's a real name, plus it'd be a good use for the otherwise-pretty-useless two-letter combination cg. Duh, believe me or not but I completely missed the Cogito part of Steven's mail. Of course, I like it too. I'll commit my poor man's git-merge-in-separate-tree and finally get some sleep. I promise. -- Petr Pasky Baudis Stuff: http://pasky.or.cz/ C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Handling renames.
On Thu, 14 Apr 2005, David Woodhouse wrote: Opinions? Dissent? We'd probably need to escape the filenames in some way -- handwave over that for now. I personally think renames are a minor thing that doesn't happen much. What actually happens, in my opinion, is that some chunk of a file is moved to a different, possibly new, file. If this is supported (as something that the SCM notices), then a rename is just a special case where the moved chunk is a whole file. I think that it should be possible to identify and tag big enough deletions and insertions, and compare them to find moves, where a further change may be applied in the middle if two chunks are very similar but not the same. On the other hand, I think that the SCM will need to cache its understanding of what a commit did in order to give reasonable performance for operations like annotate, and it may be advantegous to distribute things from this cache, since the committer might want to tell the system something that it didn't guess. At some point, I'm going to argue for core support for back pointers, where a file can be created which is about some other file(s), and someone looking for files about a particular file can find them without searching the entire database. I think this will turn out to be important for a variety of cases where some later participant wants to say something about an existing file without changing the content of the file. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Naming the SCM (was Re: Handling renames.)
Steven Cole wrote: On Thursday 14 April 2005 01:40 pm, Andrew Timberlake-Newell wrote: Zach Welch pontificated: I imagine quite a few folks expect something not entirely unlike an SCM to emerge from these current efforts. Moreover, Petr's 'git' scripts wrap your filesystem plumbing to that very end. To avoid confusion, I think it would be better to distinguish the two layers, perhaps by calling the low-level plumbing... 'gitfs', of course. Or perhaps to come up with a name (or at least nickname) for the SCM. GitMaster? Cogito. Git inside can be the first slogan. Differentiating the SCM built on top of git from git itself is probably worthwhile to avoid confusion. Other SCMs may be developed later, built on git, and these can come up with their own clever names. And the logo could be a dove which, as everybody knows, coos. Peter -- Peter Williams [EMAIL PROTECTED] Learning, n. The kind of ignorance distinguishing the studious. -- Ambrose Bierce - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html