[patch] git: clean up add_file_to_cache() in update-cache.c
this patch cleans up add_file_to_cache() to free up all memory it allocates. This has no significance right now as the only user of add_file_to_cache() die()s immediately in the 'leak' paths - but if the function is ever used without dying then this uncleanliness could lead to a memory leak. Ingo Signed-off-by: Ingo Molnar [EMAIL PROTECTED] --- update-cache.c.orig +++ update-cache.c @@ -120,10 +120,17 @@ static int add_file_to_cache(char *path) ce-st_size = st.st_size; ce-namelen = namelen; - if (index_fd(path, namelen, ce, fd, st) 0) + if (index_fd(path, namelen, ce, fd, st) 0) { + free(ce); return -1; + } - return add_cache_entry(ce, allow_add); + if (add_cache_entry(ce, allow_add)) { + free(ce); + return -1; + } + + return 0; } static int match_data(int fd, void *buffer, unsigned long size) - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] git: fix memory leak #2 in read-cache.c
* Ingo Molnar [EMAIL PROTECTED] wrote: this patch fixes a memory leak in read-cache.c: when there's cache entry collision we should free the previous one. + free(active_cache[pos]); active_cache[pos] = ce; i'm having second thoughs about this one: active_cache entries are not always malloc()-ed - e.g. read_cache() will construct them from the mmap() of the index file. Which must not be free()d! one safe solution would be to malloc() all these entries and copy them over from the index file? Slightly slower but safer and free()-able when update-cache.c notices a collision. The (tested) patch below does this. this would also make Martin Schlemmer's update-cache.c fix safe. (without this second patch, free(active_cache[pos]) might crash, and that crash is would possibly be remote exploitable via a special repository that tricks the index file to look in a certain way.) Ingo Signed-off-by: Ingo Molnar [EMAIL PROTECTED] --- read-cache.c.orig +++ read-cache.c @@ -453,10 +453,17 @@ int read_cache(void) offset = sizeof(*hdr); for (i = 0; i hdr-entries; i++) { - struct cache_entry *ce = map + offset; + struct cache_entry *ce = map + offset, *tmp; offset = offset + ce_size(ce); - active_cache[i] = ce; + + tmp = malloc(ce_size(ce)); + if (!tmp) + return error(malloc failed); + memcpy(tmp, ce, ce_size(ce)); + active_cache[i] = tmp; } + munmap(map, size); + return active_nr; unmap: - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] git: fix memory leak #2 in read-cache.c
* Linus Torvalds [EMAIL PROTECTED] wrote: In other words, if the common case is that we update a couple of entries in the active cache, we actually saved 1.6MB (+ malloc overhead for the 17 _thousand_ allocations) by my approach. And the leak? There's none. We never actually update an existing entry that was allocated with malloc(), unless the user does something stupid. In other words, the only case where there is a leak is when the user does something like update-cache file file file file file file .. with the same file listed several times. fair enough - as long as this is only used in a scripted environment, and not via some library and not within a repository server, web backend, etc. Ingo - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Reorganize common code
I forgot to include util.c. On my to-do list is something to make this more obvious. Signed-Off-By: Daniel Barkalow [EMAIL PROTECTED] Index: util.c === --- /dev/null (tree:e8194c62bfc68725972a6847fa2c6d529ca64137) +++ 420a52c504bf712cd898a11746a54ea4349a9386/util.c (mode:100644 sha1:d1d497c6709503f71138e816b2599d5105cc4915) @@ -0,0 +1,37 @@ +#include util.h +#include stdarg.h +#include stdio.h +#include stdlib.h + +void usage(const char *err) +{ + fprintf(stderr, usage: %s\n, err); + exit(1); +} + +static void report(const char *prefix, const char *err, va_list params) +{ + fputs(prefix, stderr); + vfprintf(stderr, err, params); + fputs(\n, stderr); +} + +void die(const char *err, ...) +{ + va_list params; + + va_start(params, err); + report(fatal: , err, params); + va_end(params); + exit(1); +} + +int error(const char *err, ...) +{ + va_list params; + + va_start(params, err); + report(error: , err, params); + va_end(params); + return -1; +} - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Yet another base64 patch
Linus Torvalds wrote: So why is base64 worse than the stock one? As mentioned, the flat version may be faster, but it really isn't an option. 32000 objects is peanuts. Any respectable source tree may hit that in a short time, and will break in horrible ways on many Linux filesystems. If it does, it's not because of n_link; see previous email. I have used ext2 filesystems with hundreds of thousands of files per directory back in 1996. It was slow but didn't break anything. The only filesystem I know of which has a 2^16 entry limit is FAT. So you need at least a single level of subdirectory. What I don't get is why the stock hex version would be better than base64. I like the result, I just don't _understand_ it. The base64 version has 2^12 subdirectories instead of 2^8 (I just used 2 characters as the hash key just like the hex version.) So it ascerbates the performance penalty of subdirectory hashing. -hpa - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] merge-base
On Wed, 13 Apr 2005, Linus Torvalds wrote: I agree. But I did the silly common revision tracking part slightly differently and in particular I already made fsck and rev-tree use the same exact code. I think I only saw a cut-and-paste version, and I didn't want to follow that pattern. Also, I don't see why you did the common parent thing as part of the library, since that really does seem to be a very specific to this problem, and neither fsck nor rev-tree really wants it. That was just silly; on the other hand, I think I'll eventually want to have a support-for-merging library file, so that people can write alternative merging programs which call library functions to pick out history details. But that obviously shouldn't be the same file that rev-tree and fsck share, and I'll probably do that by pulling main() out of the program later. Also, the date parsing really is a separate issue from the revision tracking (fsck does not want date parsing, but rev-tool does), so I think you might want to do for date parsing what I just did for the revision.h thing? No point in tying them together. I think there is some value in having a library file that completely parses commit-tagged files. Having the date field in struct revision without the code to parse it in the file that defines the struct seems poor to me. So could I ask you to re-factor it and base it on my current tree? Make the merge-base program have that common parent thing in it, and factor out the common date parsing into parse-date.c or something? I'm not actually using the date in merge-base, either, so I'll just leave that alone for now (merge-base is based on generations, not time, currently). -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Date handling.
On Thu, 2005-04-14 at 02:12 -0700, Linus Torvalds wrote: I take that back. I'd be much happier with you doing and testing it, because now I'm crashing. OK. commit-tree now eats RFC2822 dates as AUTHOR_DATE because that's what you're going to want to feed it. We store seconds since UTC epoch, we add the author's or committer's timezone as auxiliary data so that dates can be pretty-printed in the original timezone later if anyone cares. I left the date parsing in rev-tree.c for backward compatibility but it can be dropped when we change to base64 :) Yes, glibc sucks and strptime is a pile of crap. We have to parse it ourselves. Index: commit-tree.c --- 1756b578489f93999ded68ae347bef7d6063101c/commit-tree.c (mode:100664 sha1:12196c79f31d004dff0df1f50dda67d8204f5568) +++ 82ba574c85e9a2e4652419c88244e9dd1bfa8baa/commit-tree.c (mode:100644 sha1:35cb09402c9868499bcaf6de42afbad9fdfebe05) @@ -7,6 +7,9 @@ #include pwd.h #include time.h +#include string.h +#include ctype.h +#include time.h #define BLOCKING (1ul 14) #define ORIG_OFFSET (40) @@ -95,6 +98,148 @@ } } +static const char *month_names[] = { +Jan, Feb, Mar, Apr, May, Jun, +Jul, Aug, Sep, Oct, Nov, Dec +}; + +static const char *weekday_names[] = { +Sun, Mon, Tue, Wed, Thu, Fri, Sat +}; + + +static char *skipfws(char *str) +{ + while (isspace(*str)) + str++; + return str; +} + + +/* Gr. strptime is crap for this; it doesn't have a way to require RFC2822 + (i.e. English) day/month names, and it doesn't work correctly with %z. */ +static void parse_rfc2822_date(char *date, char *result, int maxlen) +{ + struct tm tm; + char *p; + int i, offset; + time_t then; + + memset(tm, 0, sizeof(tm)); + + /* Skip day-name */ + p = skipfws(date); + if (!isdigit(*p)) { + for (i=0; i7; i++) { + if (!strncmp(p,weekday_names[i],3) p[3] == ',') { + p = skipfws(p+4); + goto day; + } + } + return; + } + + /* day */ + day: + tm.tm_mday = strtoul(p, p, 10); + + if (tm.tm_mday 1 || tm.tm_mday 31) + return; + + if (!isspace(*p)) + return; + + p = skipfws(p); + + /* month */ + + for (i=0; i12; i++) { + if (!strncmp(p, month_names[i], 3) isspace(p[3])) { + tm.tm_mon = i; + p = skipfws(p+strlen(month_names[i])); + goto year; + } + } + return; /* Error -- bad month */ + + /* year */ + year: + tm.tm_year = strtoul(p, p, 10); + + if (!tm.tm_year !isspace(*p)) + return; + + if (tm.tm_year 1900) + tm.tm_year -= 1900; + + p=skipfws(p); + + /* hour */ + if (!isdigit(*p)) + return; + tm.tm_hour = strtoul(p, p, 10); + + if (!tm.tm_hour 23) + return; + + if (*p != ':') + return; /* Error -- bad time */ + p++; + + /* minute */ + if (!isdigit(*p)) + return; + tm.tm_min = strtoul(p, p, 10); + + if (!tm.tm_min 59) + return; + + if (isspace(*p)) + goto zone; + + if (*p != ':') + return; /* Error -- bad time */ + p++; + + /* second */ + if (!isdigit(*p)) + return; + tm.tm_sec = strtoul(p, p, 10); + + if (!tm.tm_sec 59) + return; + + if (!isspace(*p)) + return; + + zone: + p = skipfws(p); + + if (*p == '-') + offset = -60; + else if (*p == '+') + offset = 60; + else + return; + + if (!isdigit(p[1]) || !isdigit(p[2]) || !isdigit(p[3]) || !isdigit(p[4])) + return; + + i = strtoul(p+1, NULL, 10); + offset *= ((i % 100) + ((i / 100) * 60)); + + if (*(skipfws(p + 5))) + return; + + then = mktime(tm); /* mktime appears to ignore the GMT offset, stupidly */ + if (then == -1) + return; + + then -= offset; + + snprintf(result, maxlen, %lu %5.5s, then, p); +} + /* * Having more than two parents may be strange, but hey, there's * no conceptual reason why the file format couldn't accept multi-way @@ -114,10 +259,12 @@ unsigned char commit_sha1[20]; char *gecos, *realgecos; char *email, realemail[1000]; - char *date, *realdate; + char date[20], realdate[20]; + char *audate; char comment[1000]; struct passwd *pw; time_t now; + struct tm *tm; char *buffer; unsigned int size; @@ -142,15 +289,19 @@ realemail[len] = '@';
Re: Handling renames.
On Thu, 14 Apr 2005, David Woodhouse wrote: I've been looking at tracking file revisions. One proposed solution was to have a separate revision history for individual files, with a new kind of 'filecommit' object which parallels the existing 'commit', referencing a blob instead of a tree. Then trees would reference such objects instead of referencing blobs directly. Please don't. It's fundamentally the git notion of content determines objects. It also has no relevance. A rename really doesn't exist in the git model. The git model really is about tracking data, not about tracking what happened to _create_ that data. The one exception is the commit log. That's where you put the explanations of _why_ the data changed. And git itself doesn't care what the format is, apart from the git header. So, you really need to think of git as a filesystem. You can then implement an SCM _on_top_of_it_, which means that your second suggestion is not only acceptable, it really is the _only_ way to handle this in git: So a commit involving a rename would look something like this... tree 82ba574c85e9a2e4652419c88244e9dd1bfa8baa parent bb95843a5a0f397270819462812735ee29796fb4 rename foo.c bar.c author David Woodhouse [EMAIL PROTECTED] 1113499881 +0100 committer David Woodhouse [EMAIL PROTECTED] 1113499881 +0100 Rename foo.c to bar.c and s/foo_/bar_/g Except I want that empty line in there, and I want it in the free-form section. The rename part really isn't part of the git header. It's not what git tracks, it was tracked by an SCM system on top of git. So the git header is an inode in the git filesystem, and like an inode it has a ctime and an mtime, and pointers to the data. So as far as git is concerned, this part: tree 82ba574c85e9a2e4652419c88244e9dd1bfa8baa parent bb95843a5a0f397270819462812735ee29796fb4 author David Woodhouse [EMAIL PROTECTED] 1113499881 +0100 committer David Woodhouse [EMAIL PROTECTED] 1113499881 +0100 really is the filesystem inode. The rest is whatever the filesystem user puts into it, and git won't care. Opinions? Dissent? We'd probably need to escape the filenames in some way -- handwave over that for now. The fact that git handles arbitrary filenames (stuff starting with . excepted) doesn't mean that the SCM above it needs to. Quite frankly, I think an SCM that handles newlines in filenames is being silly. But a _filesystem_ needs to not care. There are too many messy SCM's out there that do not hav ea philosophy. Dammit, I'm not interested in creating another one. This thing has a mental model, and we keep to that model. The reason UNIX is beautiful is that it has a mental model of processes and files. Git has a mental model of objects and certain very very limited relationships. The relationships git cares about are encoded in the C files, the extra crap (like rename info) is just that - stuff that random scripts wrote, and that is just informational and not central to the model. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Handling renames.
* David Woodhouse [EMAIL PROTECTED] wrote: I've been looking at tracking file revisions. One proposed solution was to have a separate revision history for individual files, with a new kind of 'filecommit' object which parallels the existing 'commit', referencing a blob instead of a tree. Then trees would reference such objects instead of referencing blobs directly. I think that introduces a lot of redundancy though, because 99% of the time, the revision history of the individual file is entirely reproducible from the revision history of the tree. It's only when files are renamed that we fall over -- and I think we can handle renames fairly well if we just log them in the commit object. how about the following structure: - tree_new --- - tree_old --- rename_commit - blob the rename_commit object just contains a pointer to the file content blob. If a rename happens then the old tree references the rename_commit object (instead of the blob), and the new tree references it too. This way there's no need to list the rename via namespace means: if a tree entry points to a rename_commit object then a rename happened and the rename_commit object is looked up in the old tree to get the old name. there's no redundancy caused by this method: only renames (which are rare) go through the rename_commit redirection. (to speed up the lookup the rename_commit object could cache the offset of the two names within their tree objects.) Ingo - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Merge with git-pasky II.
PB == Petr Baudis [EMAIL PROTECTED] writes: PB Bah, you outran me. ;-) Just being in a different timezone, I guess. PB I'll change it to use the cool git-pasky stuff (commit-id etc) and its PB style of committing - that is, it will merely record the update-caches PB to be done upon commit, and it will read-tree the branch we are merging PB to instead of the ancestor. (So that git diff gives useful output.) Sorry, I have not seen what you have been doing since pasky 0.3, and I have not even started to understand the mental model of the world your tool is building. That said, my gut feeling is that telling this script about git-pasky's world model might be a mistake. I'd rather see you consider the script as mere part of the plumbing. Maybe adding an extra parameter to the script to let the user explicitly specify the common ancestor to use would be needed, but I would prefer git-pasky-merge to do its own magic (converting symbolic commit names into raw commit names and such) before calling this low level script. That way people like me who have not migrated to your framework can still keep using it. All the script currently needs is a bare git object database; i.e., nothing other than what is in .git/objects and a couple of commit record SHA1s as its parameters. No .git/heads/, no .git/HEAD.local, no .git/tags, are involved for it to work, and I would prefer to keep things that way if possible. * show-diff updates to add -r flag to squelch diffs for files not in the working directory. This is mainly useful when verifying the result of an automated merge. PB -r traditionally means recursive - what's the reasoning behind the PB choice of this letter? Well, '-r' is not necessarily recursive. ls -r is reverse, sort -r is reverse. less -r is raw. cat -r is reversible. nethack -r is race ;-). You are thinking as an SCM person so it may look that way. diff -r is recursive. darcs add -r is recursive. But even in the SCM world, cvs add -r is not (it means read-only) neither co -r (explicit revision) ;-). I would rather pick '-q' if I were doing the patch today, but I was too tired and did not think of a letter when I wrote it. I guess '-r' stood for removed, but I agree it is a bad choice. Any objections to '-q'? PB use strict? Not in this iteration but eventually yes. PB It'd be simpler to do just PB my @common = (map { $common{$_} } PB sort { $b = $a } PB keys %common) Well, actually you spotted a bug between the implementation and what I wanted to do. It should have been: map { $_-[0] } sort { $b-[1] = $a-[1] } map { [ $common{$_} = $_ ] } keys %common That is, sort [anscestor = number of times it appears] tuple by the number of times it appears in decreasing order, and project the resulting list to a list of ancestors. It is trying to deal with the following pattern in rev-tree output: TIMESTAMP1 EDGE1:1 ANCESTOR1:3 ANCESTOR2:3 TIMESTAMP2 EDGE2:2 ANCESTOR1:3 and when the above happens I wanted to pick up ANCESTOR1, but that was without no sound reason. PB But I really think this is a horrible heuristic. I believe you should PB take the latest commit in the --edges output, and from that choose the PB base whose rev-tree --edges the_base merged_branch has the least lines PB on output. (That is, the path to it is shortest - ideally it's already PB part of the merged_branch.) I'll try something along that line. Honestly the ancestor selection part was what I had most trouble with. Thanks. PB What about PB sub OLDMODE { 0 } PB sub NEWMODE { 1 } PB sub NEWSHA { 2 } PB and then using that when accessing the tuple? Would make the code PB much more readable. Totally agreed; readability cleanup is needed, just as use strict you mentioned, before it is ready for public consumption. Remember, however, the primary purpose of the message was to share it with Linus so that I can ask his opinion while the script was still slushy; the contents that array contained was still changing then and was too early for symbolic constants. I'll do that in the next round. PB It is a good idea to check merge's exit code and give a notice at the PB end if there were any conflicts. In principle yes, but I noticed that merge already gave me a nice warning message when it found conflicts, so there was no need to do so myself in this case. See sample output: $ perl ./git-merge.perl \ 71796686221a0a56ccc25b02386ed8ea648da14d \ bb95843a5a0f397270819462812735ee29796fb4 Common ancestor: 9f02d4d233223462d3f6217b5837b786e6286ba4 O - COPYING O - README ... O - write-tree.c A M write-blob.c A M show-diff.c ... A M update-cache.c A M git-merge.perl B M merge-tree.c MRG Makefile merge: warning: conflicts during merge $ +# Create a temporary directory and go there. +system 'rm', '-rf', ',,merge-temp'; PB Can't we call it just ,,merge? I'd rather have a command line option
Re: Handling renames.
Linus Torvalds wrote: On Thu, 14 Apr 2005, H. Peter Anvin wrote: Although Linus is correct in that an SCM doesn't *have* to handle this, it really feels like shooting for mediocracy to me. We might as well design it right from the beginning. No. git is not an SCM. it's a filesystem designed to _host_ an SCM, and that _is_ doing it right from the beginning. Keep the abstractions clean. Do _not_ get confused into thinking that git is an SCM. If you think of it that way, you'll end up with crap you can't think about. And at a filesystem layer, rename already exists. It's moving an object to a new name in a tree. git already does that very well, thank you very much. But a filesystem rename is _not_ the same thing as an SCM rename. An SCM rename is built on top of a filesystem rename, but it has its own issues that may or may not make sense for the filesystem. I wasn't referring to git per se, I was referring to the hosted SCM. -hpa - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Handling renames.
On Thu, 2005-04-14 at 20:58 +0200, Ingo Molnar wrote: The thing i tried to avoid was to list long filenames in the commit (because of the tree hierarchy we'd need to do tree-absolute pathnames or something like that, and escape things, and do lookups - duplicating a VFS which is quite bad) - it would be better to identify the rename source and target via its tree object hash and its offset within that tree. Such information could be embedded in the commit object just fine. Something like: Actually I'm not sure that's true. Let's consider the two main users of this information. Firstly, because it's what I've been playing with: to list a given file's revision history, I currently work with its filename -- walk the commit objects, inspecting the tree and selecting those commits where the file has changed. If my filename is 'fs/jffs2/inode.c' then I can immediately skip over a commit where the 'fs' entry in the top-level tree is identical to that in the parent, or I can skip a commit where the 'jffs2' entry in the 'fs' subtree is identical to the parent... it's all done on filename, and the {parent, entry} tuple wouldn't help much here; I'd probably have to convert back to a filename anyway. Secondly, there's merges. I've paid less attention to these (see mail 5 minutes ago) but I think they'd end up operating on the rename information in a very similar way. To find a common ancestor for a given file,, we want to track its name as it changed during history; at that point it's all string compares. -- dwmw2 - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Handling renames.
Linus Torvalds wrote: On Thu, 14 Apr 2005, H. Peter Anvin wrote: Although Linus is correct in that an SCM doesn't *have* to handle this, it really feels like shooting for mediocracy to me. We might as well design it right from the beginning. No. git is not an SCM. it's a filesystem designed to _host_ an SCM, and that _is_ doing it right from the beginning. I imagine quite a few folks expect something not entirely unlike an SCM to emerge from these current efforts. Moreover, Petr's 'git' scripts wrap your filesystem plumbing to that very end. To avoid confusion, I think it would be better to distinguish the two layers, perhaps by calling the low-level plumbing... 'gitfs', of course. Cheers, Zach Welch Superlucidity Services - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Date handling.
On Thu, 2005-04-14 at 12:19 -0700, [EMAIL PROTECTED] wrote: With a UTC date, why would anyone care in which timezone the commit was made? Any pretty printing would most likely be prettiest if it is done relative to the timezone of the person looking at the commit record, not the person who created the record. I'd prefer not to lose the information. If someone has committed a change at 2am, I like to know that it was 2am for _them_. It helps me decide where to look first for the cause of problems. :) It also helps disambiguate certain comments, especially those involving words or phrases such as yesterday or this afternoon. -- dwmw2 - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Date handling.
I'd prefer not to lose the information. If someone has committed a change at 2am, I like to know that it was 2am for _them_. It helps me decide where to look first for the cause of problems. :) I'd think the 8:00am-before-the-first-coffee checkins would be the most worrying :-) It also helps disambiguate certain comments, especially those involving words or phrases such as yesterday or this afternoon. This is a very good point ... but this still has problems with the git is a filesystem, not a SCM mantra. Timezone comments don't belong in the git inode. -Tony - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Re: Merge with git-pasky II.
Dear diary, on Thu, Apr 14, 2005 at 10:23:26PM CEST, I got a letter where Erik van Konijnenburg [EMAIL PROTECTED] told me that... On Thu, Apr 14, 2005 at 09:35:07PM +0200, Petr Baudis wrote: Hmm. I actually don't like this naming. I think it's not too consistent, is irregular, therefore parsing it would be ugly. What I propose: 12c\tname - legend - original file D - tree #1 removed file D- tree #2 removed file DD- both trees removed file M - tree #1 modified file M DM* - conflict, tree #1 removed file, tree #2 modified file MD* MM- exact same modification MM* - different modifications, merging This is generic, theoretically scales well even to more trees, is easy to parse trivially, still is human readable (actually the asterisk in the 'conflict' column is there basically only for the humans), is completely regular and consistent. Detail: perhaps use underscore instead of space, to avoid space/tab typos that are invisible on paper and user friendly mail clients? I'd go for dots in that case. Looks less intrusive. :^) -- Petr Pasky Baudis Stuff: http://pasky.or.cz/ C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Naming the SCM (was Re: Handling renames.)
Dear diary, on Thu, Apr 14, 2005 at 10:58:52PM CEST, I got a letter where H. Peter Anvin [EMAIL PROTECTED] told me that... Petr Baudis wrote: Cogito. Git inside can be the first slogan. What about tig? I like Cogito; it's a real name, plus it'd be a good use for the otherwise-pretty-useless two-letter combination cg. Duh, believe me or not but I completely missed the Cogito part of Steven's mail. Of course, I like it too. I'll commit my poor man's git-merge-in-separate-tree and finally get some sleep. I promise. -- Petr Pasky Baudis Stuff: http://pasky.or.cz/ C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Handling renames.
On Thu, 14 Apr 2005, David Woodhouse wrote: Opinions? Dissent? We'd probably need to escape the filenames in some way -- handwave over that for now. I personally think renames are a minor thing that doesn't happen much. What actually happens, in my opinion, is that some chunk of a file is moved to a different, possibly new, file. If this is supported (as something that the SCM notices), then a rename is just a special case where the moved chunk is a whole file. I think that it should be possible to identify and tag big enough deletions and insertions, and compare them to find moves, where a further change may be applied in the middle if two chunks are very similar but not the same. On the other hand, I think that the SCM will need to cache its understanding of what a commit did in order to give reasonable performance for operations like annotate, and it may be advantegous to distribute things from this cache, since the committer might want to tell the system something that it didn't guess. At some point, I'm going to argue for core support for back pointers, where a file can be created which is about some other file(s), and someone looking for files about a particular file can find them without searching the entire database. I think this will turn out to be important for a variety of cases where some later participant wants to say something about an existing file without changing the content of the file. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
subscribe git - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch pasky] update gitcancel.sh to handle modes as well
Hi, gitcancel.sh do not handle mode changes: $ chmod -x Makefile $ git cancel patch: Only garbage was found in the patch input. Rather use checkout-cache to sync our tree, as should do the right thing instead of diffing (cancel imply just blow away everything). Signed-off-by: Martin Schlemmer [EMAIL PROTECTED] gittrack.sh: 03d6db1fb3a70605ef249c632c04e542457f0808 --- 03d6db1fb3a70605ef249c632c04e542457f0808/gittrack.sh +++ uncommitted/gittrack.sh @@ -51,6 +51,7 @@ read-tree $(tree-id $name) gitdiff.sh local $name | gitapply.sh + update-cache --refresh else [ $tracking ] || \ @@ -61,6 +62,7 @@ if [ -s .git/HEAD.local ]; then gitdiff.sh $tracking local | gitapply.sh read-tree $(tree-id local) + update-cache --refresh head=$(cat .git/HEAD) branchhead=$(cat .git/heads/$tracking) -- Martin Schlemmer gittrack.sh: 03d6db1fb3a70605ef249c632c04e542457f0808 --- 03d6db1fb3a70605ef249c632c04e542457f0808/gittrack.sh +++ uncommitted/gittrack.sh @@ -51,6 +51,7 @@ read-tree $(tree-id $name) gitdiff.sh local $name | gitapply.sh + update-cache --refresh else [ $tracking ] || \ @@ -61,6 +62,7 @@ if [ -s .git/HEAD.local ]; then gitdiff.sh $tracking local | gitapply.sh read-tree $(tree-id local) + update-cache --refresh head=$(cat .git/HEAD) branchhead=$(cat .git/heads/$tracking) signature.asc Description: This is a digitally signed message part
Re: Remove need to untrack before tracking new branch
On Fri, 2005-04-15 at 00:42 +0200, Petr Baudis wrote: Dear diary, on Thu, Apr 14, 2005 at 11:40:09AM CEST, I got a letter where Martin Schlemmer [EMAIL PROTECTED] told me that... (PS, can you check the fact that your mail client keeps on adding a 'Re: ' ...) Hmm. I guess my ancient reply_regexp ^((\\[|\\()([^B]|B([^u]|u([^g]|g([^ ]|AnTiMaTcH[^]]+(\\]|\\)))?[ \t]*((re([\\[0-9\\]+])*|aw):[ \t]*)? is broken... ;-) On Thu, 2005-04-14 at 11:11 +0200, Petr Baudis wrote: I'm lost. Why do you do --update-modes? That makes no sense to me. You introduce them to the cache out-of-order w.r.t. commits, that means in the normal git usage they are already unrevertable. Right, afterwards I thought I did add it to the wrong place. So, could you please do something with it? :-) What are you trying to do? Mode changes _are_ real changes. You _don't_ want to silence them. What you want is to even show them more explicitly in show-diff. No, you do not understand. If you actually change the mode, it will show. What now happens, is that say I track the 'linus' branch, then untrack, and then track 'pasky' again, the Patches will be applied, but not the mode changes which are stored in the cache ... Let me show you: - $ ls -l $(./show-diff -s | cut -d: -f1) ..directroy listing with no 'x' bit.. - (Note no 'x' bit ...) And that is _after_ doing: $ git track linus; git track So basically the modes that are stored in the cache are not applied ... Although, yes, I prob should add the relevant code to checkout-cache. This should be fixed now, BTW. git apply didn't correctly apply the mode changes, but now it should. Several bugs prevented it to, in fact. ;-) Yep, I saw - thought you scrapped this, so mailed a new patch (or was busy doing the touch ups to the email when this came in. show-diff.c: a531ca4078525d1c8dcf84aae0bfa89fed6e5d96 --- a531ca4078525d1c8dcf84aae0bfa89fed6e5d96/show-diff.c +++ uncommitted/show-diff.c @@ -5,13 +5,18 @@ */ #include cache.h -static void show_differences(char *name, +static void show_differences(struct cache_entry *ce, void *old_contents, unsigned long long old_size) { static char cmd[1000]; + static char sha1[41]; + int n; FILE *f; - snprintf(cmd, sizeof(cmd), diff -L %s -u -N - %s, name, name); + for (n = 0; n 20; n++) + snprintf((sha1[n*2]), 3, %02x, ce-sha1[n]); + snprintf(cmd, sizeof(cmd), diff -L %s/%s -L uncommitted/%s -u -N - %s, + sha1, ce-name, ce-name, ce-name); The directory sha1 is the sha1 of the tree, not of the particular file - that one is in the attributes list (parentheses after the filename), together with mode. Does it really matter? It is more just to get the patch prefix right, and I did it as it went nicely with the printed: show-diff.c: a531ca4078525d1c8dcf84aae0bfa89fed6e5d96 for example ... Yes, it matters, and I don't care how nicely it wents with what you print before. hah ;p Either print there some nonsense which is clear not to be a tree ID, or (much more preferably) print the real tree ID there. If some tool ever uses it (e.g. to help resolve conflicts, perhaps even actually doing a real merge based on the patch), you just confused it. Ok, understood. Do you think it will be scripted? If not I guess we can just do labels like: --- committed/ +++ uncommitted/ ? Also, do you think you could separate this patch from the other (--update-modes) patch? (If we actually still need the --update-modes patch after git apply was fixed.) Yeah, already split it out locally, just waiting on above response ... Thanks, -- Martin Schlemmer signature.asc Description: This is a digitally signed message part
Re: update gitcancel.sh to handle modes as well
Dear diary, on Fri, Apr 15, 2005 at 12:57:25AM CEST, I got a letter where Martin Schlemmer [EMAIL PROTECTED] told me that... Hi, gitcancel.sh do not handle mode changes: $ chmod -x Makefile $ git cancel patch: Only garbage was found in the patch input. Rather use checkout-cache to sync our tree, as should do the right thing instead of diffing (cancel imply just blow away everything). Signed-off-by: Martin Schlemmer [EMAIL PROTECTED] gittrack.sh: 03d6db1fb3a70605ef249c632c04e542457f0808 --- 03d6db1fb3a70605ef249c632c04e542457f0808/gittrack.sh +++ uncommitted/gittrack.sh @@ -51,6 +51,7 @@ read-tree $(tree-id $name) gitdiff.sh local $name | gitapply.sh + update-cache --refresh else [ $tracking ] || \ @@ -61,6 +62,7 @@ if [ -s .git/HEAD.local ]; then gitdiff.sh $tracking local | gitapply.sh read-tree $(tree-id local) + update-cache --refresh head=$(cat .git/HEAD) branchhead=$(cat .git/heads/$tracking) The patch looks familiar, but not right. ;-) -- Petr Pasky Baudis Stuff: http://pasky.or.cz/ C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Remove need to untrack before tracking new branch
Dear diary, on Fri, Apr 15, 2005 at 01:01:27AM CEST, I got a letter where Martin Schlemmer [EMAIL PROTECTED] told me that... On Fri, 2005-04-15 at 00:42 +0200, Petr Baudis wrote: Dear diary, on Thu, Apr 14, 2005 at 11:40:09AM CEST, I got a letter where Martin Schlemmer [EMAIL PROTECTED] told me that... So basically the modes that are stored in the cache are not applied ... Although, yes, I prob should add the relevant code to checkout-cache. This should be fixed now, BTW. git apply didn't correctly apply the mode changes, but now it should. Several bugs prevented it to, in fact. ;-) Yep, I saw - thought you scrapped this, so mailed a new patch (or was busy doing the touch ups to the email when this came in. Hmm, I must've missed the new patch. The latest I have still puts the stuff to update-cache and combines it with the show-diff change. show-diff.c: a531ca4078525d1c8dcf84aae0bfa89fed6e5d96 --- a531ca4078525d1c8dcf84aae0bfa89fed6e5d96/show-diff.c +++ uncommitted/show-diff.c @@ -5,13 +5,18 @@ ..snip.. - snprintf(cmd, sizeof(cmd), diff -L %s -u -N - %s, name, name); + for (n = 0; n 20; n++) + snprintf((sha1[n*2]), 3, %02x, ce-sha1[n]); + snprintf(cmd, sizeof(cmd), diff -L %s/%s -L uncommitted/%s -u -N - %s, + sha1, ce-name, ce-name, ce-name); The directory sha1 is the sha1 of the tree, not of the particular file - that one is in the attributes list (parentheses after the filename), together with mode. Does it really matter? It is more just to get the patch prefix right, and I did it as it went nicely with the printed: show-diff.c: a531ca4078525d1c8dcf84aae0bfa89fed6e5d96 for example ... Yes, it matters, and I don't care how nicely it wents with what you print before. hah ;p Either print there some nonsense which is clear not to be a tree ID, or (much more preferably) print the real tree ID there. If some tool ever uses it (e.g. to help resolve conflicts, perhaps even actually doing a real merge based on the patch), you just confused it. Ok, understood. Do you think it will be scripted? If not I guess we can just do labels like: --- committed/ +++ uncommitted/ ? Heh. Well, of course this could do. But is there any technical reason why not just carry the sha1 id of the tree around and stuff it there? -- Petr Pasky Baudis Stuff: http://pasky.or.cz/ C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Reorganize common code
Dear diary, on Thu, Apr 14, 2005 at 12:00:25AM CEST, I got a letter where Daniel Barkalow [EMAIL PROTECTED] told me that... This splits read-cache.c into util.c, cache.c, and objects.c, based on what the code is used for; similarly, cache.h becomes util.h, cache.h, and objects.h. For simplicity, cache.h includes the other two to give the previous overall behavior. Signed-Off-By: Daniel Barkalow [EMAIL PROTECTED] FYI, given the scale of this change, I'm waiting for Linus to either ack it, pick it himself or reject it. -- Petr Pasky Baudis Stuff: http://pasky.or.cz/ C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Naming the SCM (was Re: Handling renames.)
Steven Cole wrote: On Thursday 14 April 2005 01:40 pm, Andrew Timberlake-Newell wrote: Zach Welch pontificated: I imagine quite a few folks expect something not entirely unlike an SCM to emerge from these current efforts. Moreover, Petr's 'git' scripts wrap your filesystem plumbing to that very end. To avoid confusion, I think it would be better to distinguish the two layers, perhaps by calling the low-level plumbing... 'gitfs', of course. Or perhaps to come up with a name (or at least nickname) for the SCM. GitMaster? Cogito. Git inside can be the first slogan. Differentiating the SCM built on top of git from git itself is probably worthwhile to avoid confusion. Other SCMs may be developed later, built on git, and these can come up with their own clever names. And the logo could be a dove which, as everybody knows, coos. Peter -- Peter Williams [EMAIL PROTECTED] Learning, n. The kind of ignorance distinguishing the studious. -- Ambrose Bierce - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Merge with git-pasky II.
Is that some thing you want to see? Maybe clean up the error printing. Chris --- /dev/null 2003-01-30 05:24:37.0 -0500 +++ merge.py2005-04-14 16:34:39.0 -0400 @@ -0,0 +1,76 @@ +#!/usr/bin/env python + +import re +import sys +import os +from pprint import pprint + +def get_tree(commit): +data = os.popen(cat-file commit %s%commit).read() +return re.findall(r(?m)^tree (\w+), data)[0] + +PREFIX = 0 +PATH = -1 +SHA = -2 +ORIGSHA = -3 + +def get_difftree(old, new): +lines = os.popen(diff-tree %s %s%(old, new)).read().split(\x00) +patterns = (r(\*)(\d+)-(\d+)\s(\w+)\s(\w+)-(\w+)\s(.*), + r([+-])(\d+)\s(\w+)\s(\w+)\s(.*)) +res = {} +for l in lines: + if not l: continue + for p in patterns: + m = re.findall(p, l) + if m: + m = m[0] + res[m[-1]] = m + break + else: + raise difftree: unknow line, l +return res + +def analyze(diff1, diff2): +diff1only = [ diff1[k] for k in diff1 if k not in diff2 ] +diff2only = [ diff2[k] for k in diff2 if k not in diff1 ] +both = [ (diff1[k],diff2[k]) for k in diff2 if k in diff1 ] + +action(diff1only) +action(diff2only) +action_two(both) + +def action(diffs): +for act in diffs: + if act[PREFIX] == *: + print modify, act[PATH], act[SHA] + elif act[PREFIX] == '-': + print remove, act[PATH], act[SHA] + elif act[PREFIX] == '+': + print add, act[PATH], act[SHA] + else: + raise unknow action + +def action_two(diffs): +for act1, act2 in diffs: + if len(act1) == len(act2): # same kind type + if act1[PREFIX] == act2[PREFIX]: + if act1[SHA] == act2[SHA] or act1[PREFIX] == '-': + return action(act1) + if act1[PREFIX]=='*': + print do_merge, act1[PATH], act1[ORIGSHA], act1[SHA], act2[SHA] + return + print unable to handle, act[PATH] + print one side wants, act1[PREFIX] + print the other side wants, act2[PREFIX] + + +args = sys.argv[1:] +if len(args)!=3: +print Usage merge.py common rev1 rev2 +trees = map(get_tree, args) +print checkout-tree, trees[0] +diff1 = get_difftree(trees[0], trees[1]) +diff2 = get_difftree(trees[0], trees[2]) +analyze(diff1, diff2) + - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html