[patch] git: clean up add_file_to_cache() in update-cache.c

2005-04-14 Thread Ingo Molnar

this patch cleans up add_file_to_cache() to free up all memory it 
allocates. This has no significance right now as the only user of 
add_file_to_cache() die()s immediately in the 'leak' paths - but if the 
function is ever used without dying then this uncleanliness could lead 
to a memory leak.

Ingo

Signed-off-by: Ingo Molnar [EMAIL PROTECTED]

--- update-cache.c.orig
+++ update-cache.c
@@ -120,10 +120,17 @@ static int add_file_to_cache(char *path)
ce-st_size = st.st_size;
ce-namelen = namelen;
 
-   if (index_fd(path, namelen, ce, fd, st)  0)
+   if (index_fd(path, namelen, ce, fd, st)  0) {
+   free(ce);
return -1;
+   }
 
-   return add_cache_entry(ce, allow_add);
+   if (add_cache_entry(ce, allow_add)) {
+   free(ce);
+   return -1;
+   }
+
+   return 0;
 }
 
 static int match_data(int fd, void *buffer, unsigned long size)

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] git: fix memory leak #2 in read-cache.c

2005-04-14 Thread Ingo Molnar

* Ingo Molnar [EMAIL PROTECTED] wrote:

 this patch fixes a memory leak in read-cache.c: when there's cache 
 entry collision we should free the previous one.

 + free(active_cache[pos]);
   active_cache[pos] = ce;

i'm having second thoughs about this one: active_cache entries are not 
always malloc()-ed - e.g. read_cache() will construct them from the 
mmap() of the index file. Which must not be free()d!

one safe solution would be to malloc() all these entries and copy them 
over from the index file? Slightly slower but safer and free()-able when 
update-cache.c notices a collision. The (tested) patch below does this.

this would also make Martin Schlemmer's update-cache.c fix safe.

(without this second patch, free(active_cache[pos]) might crash, and 
that crash is would possibly be remote exploitable via a special 
repository that tricks the index file to look in a certain way.)

Ingo

Signed-off-by: Ingo Molnar [EMAIL PROTECTED]

--- read-cache.c.orig
+++ read-cache.c
@@ -453,10 +453,17 @@ int read_cache(void)
 
offset = sizeof(*hdr);
for (i = 0; i  hdr-entries; i++) {
-   struct cache_entry *ce = map + offset;
+   struct cache_entry *ce = map + offset, *tmp;
offset = offset + ce_size(ce);
-   active_cache[i] = ce;
+
+   tmp = malloc(ce_size(ce));
+   if (!tmp)
+   return error(malloc failed);
+   memcpy(tmp, ce, ce_size(ce));
+   active_cache[i] = tmp;
}
+   munmap(map, size);
+
return active_nr;
 
 unmap:

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] git: fix memory leak #2 in read-cache.c

2005-04-14 Thread Ingo Molnar

* Linus Torvalds [EMAIL PROTECTED] wrote:

 In other words, if the common case is that we update a couple of 
 entries in the active cache, we actually saved 1.6MB (+ malloc 
 overhead for the 17 _thousand_ allocations) by my approach.
 
 And the leak? There's none. We never actually update an existing entry 
 that was allocated with malloc(), unless the user does something 
 stupid.  In other words, the only case where there is a leak is when 
 the user does something like
 
   update-cache file file file file file file .. 
 
 with the same file listed several times.

fair enough - as long as this is only used in a scripted environment, 
and not via some library and not within a repository server, web 
backend, etc.

Ingo
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Reorganize common code

2005-04-14 Thread Daniel Barkalow
I forgot to include util.c. On my to-do list is something to make this
more obvious.

Signed-Off-By: Daniel Barkalow [EMAIL PROTECTED]

Index: util.c
===
--- /dev/null  (tree:e8194c62bfc68725972a6847fa2c6d529ca64137)
+++ 420a52c504bf712cd898a11746a54ea4349a9386/util.c  (mode:100644 
sha1:d1d497c6709503f71138e816b2599d5105cc4915)
@@ -0,0 +1,37 @@
+#include util.h
+#include stdarg.h
+#include stdio.h
+#include stdlib.h
+
+void usage(const char *err)
+{
+   fprintf(stderr, usage: %s\n, err);
+   exit(1);
+}
+
+static void report(const char *prefix, const char *err, va_list params)
+{
+   fputs(prefix, stderr);
+   vfprintf(stderr, err, params);
+   fputs(\n, stderr);
+}
+
+void die(const char *err, ...)
+{
+   va_list params;
+
+   va_start(params, err);
+   report(fatal: , err, params);
+   va_end(params);
+   exit(1);
+}
+
+int error(const char *err, ...)
+{
+   va_list params;
+
+   va_start(params, err);
+   report(error: , err, params);
+   va_end(params);
+   return -1;
+}

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Yet another base64 patch

2005-04-14 Thread H. Peter Anvin
Linus Torvalds wrote:
So why is base64 worse than the stock one?
As mentioned, the flat version may be faster, but it really isn't an
option. 32000 objects is peanuts. Any respectable source tree may hit that
in a short time, and will break in horrible ways on many Linux
filesystems.
If it does, it's not because of n_link; see previous email.
I have used ext2 filesystems with hundreds of thousands of files per 
directory back in 1996.  It was slow but didn't break anything.

The only filesystem I know of which has a 2^16 entry limit is FAT.
So you need at least a single level of subdirectory. 

What I don't get is why the stock hex version would be better than base64.
I like the result, I just don't _understand_ it.
The base64 version has 2^12 subdirectories instead of 2^8 (I just used 2 
characters as the hash key just like the hex version.)  So it ascerbates 
the performance penalty of subdirectory hashing.

-hpa
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] merge-base

2005-04-14 Thread Daniel Barkalow
On Wed, 13 Apr 2005, Linus Torvalds wrote:

 I agree. But I did the silly common revision tracking part slightly
 differently and in particular I already made fsck and rev-tree use the 
 same exact code.

I think I only saw a cut-and-paste version, and I didn't want to follow
that pattern.

 Also, I don't see why you did the common parent thing as part of the
 library, since that really does seem to be a very specific to this 
 problem, and neither fsck nor rev-tree really wants it. 

That was just silly; on the other hand, I think I'll eventually want to
have a support-for-merging library file, so that people can write
alternative merging programs which call library functions to pick out
history details. But that obviously shouldn't be the same file that
rev-tree and fsck share, and I'll probably do that by pulling main() out
of the program later.

 Also, the date parsing really is a separate issue from the revision 
 tracking (fsck does not want date parsing, but rev-tool does), so I think 
 you might want to do for date parsing what I just did for the revision.h 
 thing? No point in tying them together.

I think there is some value in having a library file that completely
parses commit-tagged files. Having the date field in struct revision
without the code to parse it in the file that defines the struct seems
poor to me.

 So could I ask you to re-factor it and base it on my current tree? Make 
 the merge-base program have that common parent thing in it, and factor 
 out the common date parsing into parse-date.c or something?

I'm not actually using the date in merge-base, either, so I'll just leave
that alone for now (merge-base is based on generations, not time,
currently).

-Daniel
*This .sig left intentionally blank*

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Date handling.

2005-04-14 Thread David Woodhouse
On Thu, 2005-04-14 at 02:12 -0700, Linus Torvalds wrote:
 I take that back. I'd be much happier with you doing and testing it, 
 because now I'm crashing.

OK. commit-tree now eats RFC2822 dates as AUTHOR_DATE because that's
what you're going to want to feed it. We store seconds since UTC epoch,
we add the author's or committer's timezone as auxiliary data so that
dates can be pretty-printed in the original timezone later if anyone
cares. I left the date parsing in rev-tree.c for backward compatibility
but it can be dropped when we change to base64 :)

Yes, glibc sucks and strptime is a pile of crap. We have to parse it
ourselves.

Index: commit-tree.c
--- 1756b578489f93999ded68ae347bef7d6063101c/commit-tree.c  (mode:100664 
sha1:12196c79f31d004dff0df1f50dda67d8204f5568)
+++ 82ba574c85e9a2e4652419c88244e9dd1bfa8baa/commit-tree.c  (mode:100644 
sha1:35cb09402c9868499bcaf6de42afbad9fdfebe05)
@@ -7,6 +7,9 @@
 
 #include pwd.h
 #include time.h
+#include string.h
+#include ctype.h
+#include time.h
 
 #define BLOCKING (1ul  14)
 #define ORIG_OFFSET (40)
@@ -95,6 +98,148 @@
}
 }
 
+static const char *month_names[] = {
+Jan, Feb, Mar, Apr, May, Jun,
+Jul, Aug, Sep, Oct, Nov, Dec
+};
+
+static const char *weekday_names[] = {
+Sun, Mon, Tue, Wed, Thu, Fri, Sat
+};
+
+
+static char *skipfws(char *str)
+{
+   while (isspace(*str))
+   str++;
+   return str;
+}
+
+   
+/* Gr. strptime is crap for this; it doesn't have a way to require RFC2822
+   (i.e. English) day/month names, and it doesn't work correctly with %z. */
+static void parse_rfc2822_date(char *date, char *result, int maxlen)
+{
+   struct tm tm;
+   char *p;
+   int i, offset;
+   time_t then;
+
+   memset(tm, 0, sizeof(tm));
+
+   /* Skip day-name */
+   p = skipfws(date);
+   if (!isdigit(*p)) {
+   for (i=0; i7; i++) {
+   if (!strncmp(p,weekday_names[i],3)  p[3] == ',') {
+   p = skipfws(p+4);
+   goto day;
+   }
+   }
+   return;
+   }   
+
+   /* day */
+ day:
+   tm.tm_mday = strtoul(p, p, 10);
+
+   if (tm.tm_mday  1 || tm.tm_mday  31)
+   return;
+
+   if (!isspace(*p))
+   return;
+
+   p = skipfws(p);
+
+   /* month */
+
+   for (i=0; i12; i++) {
+   if (!strncmp(p, month_names[i], 3)  isspace(p[3])) {
+   tm.tm_mon = i;
+   p = skipfws(p+strlen(month_names[i]));
+   goto year;
+   }
+   }
+   return; /* Error -- bad month */
+
+   /* year */
+ year: 
+   tm.tm_year = strtoul(p, p, 10);
+
+   if (!tm.tm_year  !isspace(*p))
+   return;
+
+   if (tm.tm_year  1900)
+   tm.tm_year -= 1900;
+   
+   p=skipfws(p);
+
+   /* hour */
+   if (!isdigit(*p))
+   return;
+   tm.tm_hour = strtoul(p, p, 10);
+   
+   if (!tm.tm_hour  23)
+   return;
+
+   if (*p != ':')
+   return; /* Error -- bad time */
+   p++;
+
+   /* minute */
+   if (!isdigit(*p))
+   return;
+   tm.tm_min = strtoul(p, p, 10);
+   
+   if (!tm.tm_min  59)
+   return;
+
+   if (isspace(*p))
+   goto zone;
+
+   if (*p != ':')
+   return; /* Error -- bad time */
+   p++;
+
+   /* second */
+   if (!isdigit(*p))
+   return;
+   tm.tm_sec = strtoul(p, p, 10);
+   
+   if (!tm.tm_sec  59)
+   return;
+
+   if (!isspace(*p))
+   return;
+
+ zone:
+   p = skipfws(p);
+
+   if (*p == '-')
+   offset = -60;
+   else if (*p == '+')
+   offset = 60;
+   else
+  return;
+
+   if (!isdigit(p[1]) || !isdigit(p[2]) || !isdigit(p[3]) || 
!isdigit(p[4]))
+   return;
+
+   i = strtoul(p+1, NULL, 10);
+   offset *= ((i % 100) + ((i / 100) * 60));
+
+   if (*(skipfws(p + 5)))
+   return;
+
+   then = mktime(tm); /* mktime appears to ignore the GMT offset, 
stupidly */
+   if (then == -1)
+   return;
+
+   then -= offset;
+
+   snprintf(result, maxlen, %lu %5.5s, then, p);
+}
+
 /*
  * Having more than two parents may be strange, but hey, there's
  * no conceptual reason why the file format couldn't accept multi-way
@@ -114,10 +259,12 @@
unsigned char commit_sha1[20];
char *gecos, *realgecos;
char *email, realemail[1000];
-   char *date, *realdate;
+   char date[20], realdate[20];
+   char *audate;
char comment[1000];
struct passwd *pw;
time_t now;
+   struct tm *tm;
char *buffer;
unsigned int size;
 
@@ -142,15 +289,19 @@
realemail[len] = '@';
   

Re: Handling renames.

2005-04-14 Thread Linus Torvalds


On Thu, 14 Apr 2005, David Woodhouse wrote:

 I've been looking at tracking file revisions. One proposed solution was
 to have a separate revision history for individual files, with a new
 kind of 'filecommit' object which parallels the existing 'commit',
 referencing a blob instead of a tree. Then trees would reference such
 objects instead of referencing blobs directly.

Please don't.  It's fundamentally the git notion of content determines
objects.

It also has no relevance. A rename really doesn't exist in the git 
model. The git model really is about tracking data, not about tracking 
what happened to _create_ that data.

The one exception is the commit log. That's where you put the explanations 
of _why_ the data changed. And git itself doesn't care what the format is, 
apart from the git header.

So, you really need to think of git as a filesystem. You can then 
implement an SCM _on_top_of_it_, which means that your second suggestion 
is not only acceptable, it really is the _only_ way to handle this in git:

 So a commit involving a rename would look something like this...
 
   tree 82ba574c85e9a2e4652419c88244e9dd1bfa8baa
   parent bb95843a5a0f397270819462812735ee29796fb4
   rename foo.c bar.c
   author David Woodhouse [EMAIL PROTECTED] 1113499881 +0100
   committer David Woodhouse [EMAIL PROTECTED] 1113499881 +0100
   Rename foo.c to bar.c and s/foo_/bar_/g

Except I want that empty line in there, and I want it in the free-form  
section. The rename part really isn't part of the git header. It's not 
what git tracks, it was tracked by an SCM system on top of git.

So the git header is an inode in the git filesystem, and like an inode 
it has a ctime and an mtime, and pointers to the data. So as far as git is 
concerned, this part:

tree 82ba574c85e9a2e4652419c88244e9dd1bfa8baa
parent bb95843a5a0f397270819462812735ee29796fb4
author David Woodhouse [EMAIL PROTECTED] 1113499881 +0100
committer David Woodhouse [EMAIL PROTECTED] 1113499881 +0100

really is the filesystem inode. The rest is whatever the filesystem user
puts into it, and git won't care.

 Opinions? Dissent? We'd probably need to escape the filenames in some
 way -- handwave over that for now.

The fact that git handles arbitrary filenames (stuff starting with . 
excepted) doesn't mean that the SCM above it needs to. Quite frankly, I 
think an SCM that handles newlines in filenames is being silly. But a 
_filesystem_ needs to not care.

There are too many messy SCM's out there that do not hav ea philosophy. 
Dammit, I'm not interested in creating another one. This thing has a 
mental model, and we keep to that model.

The reason UNIX is beautiful is that it has a mental model of processes 
and files. Git has a mental model of objects and certain very very limited 
relationships. The relationships git cares about are encoded in the C 
files, the extra crap (like rename info) is just that - stuff that 
random scripts wrote, and that is just informational and not central to 
the model.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Handling renames.

2005-04-14 Thread Ingo Molnar

* David Woodhouse [EMAIL PROTECTED] wrote:

 I've been looking at tracking file revisions. One proposed solution 
 was to have a separate revision history for individual files, with a 
 new kind of 'filecommit' object which parallels the existing 'commit', 
 referencing a blob instead of a tree. Then trees would reference such 
 objects instead of referencing blobs directly.
 
 I think that introduces a lot of redundancy though, because 99% of the 
 time, the revision history of the individual file is entirely 
 reproducible from the revision history of the tree. It's only when 
 files are renamed that we fall over -- and I think we can handle 
 renames fairly well if we just log them in the commit object.

how about the following structure:

- tree_new ---
- tree_old --- rename_commit - blob

the rename_commit object just contains a pointer to the file content 
blob. If a rename happens then the old tree references the rename_commit 
object (instead of the blob), and the new tree references it too. This 
way there's no need to list the rename via namespace means: if a tree 
entry points to a rename_commit object then a rename happened and the 
rename_commit object is looked up in the old tree to get the old name.

there's no redundancy caused by this method: only renames (which are 
rare) go through the rename_commit redirection. (to speed up the lookup 
the rename_commit object could cache the offset of the two names within 
their tree objects.)

Ingo
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Merge with git-pasky II.

2005-04-14 Thread Junio C Hamano
 PB == Petr Baudis [EMAIL PROTECTED] writes:

PB Bah, you outran me. ;-)

Just being in a different timezone, I guess.

PB I'll change it to use the cool git-pasky stuff (commit-id etc) and its
PB style of committing - that is, it will merely record the update-caches
PB to be done upon commit, and it will read-tree the branch we are merging
PB to instead of the ancestor. (So that git diff gives useful output.)

Sorry, I have not seen what you have been doing since pasky 0.3,
and I have not even started to understand the mental model of
the world your tool is building.  That said, my gut feeling is
that telling this script about git-pasky's world model might be
a mistake.  I'd rather see you consider the script as mere part
of the plumbing.  Maybe adding an extra parameter to the script
to let the user explicitly specify the common ancestor to use
would be needed, but I would prefer git-pasky-merge to do its
own magic (converting symbolic commit names into raw commit
names and such) before calling this low level script.

That way people like me who have not migrated to your framework
can still keep using it.  All the script currently needs is a
bare git object database; i.e., nothing other than what is in
.git/objects and a couple of commit record SHA1s as its
parameters.  No .git/heads/, no .git/HEAD.local, no .git/tags,
are involved for it to work, and I would prefer to keep things
that way if possible.

 * show-diff updates to add -r flag to squelch diffs for files not in
 the working directory.  This is mainly useful when verifying the
 result of an automated merge.

PB -r traditionally means recursive - what's the reasoning behind the
PB choice of this letter?

Well, '-r' is not necessarily recursive. ls -r is reverse, sort
-r is reverse.  less -r is raw.  cat -r is reversible.
nethack -r is race ;-).  You are thinking as an SCM person so
it may look that way.  diff -r is recursive.  darcs add -r
is recursive.  But even in the SCM world, cvs add -r is not
(it means read-only) neither co -r (explicit revision) ;-).

I would rather pick '-q' if I were doing the patch today, but I
was too tired and did not think of a letter when I wrote it.  I
guess '-r' stood for removed, but I agree it is a bad choice.
Any objections to '-q'?

PB use strict?

Not in this iteration but eventually yes.

PB It'd be simpler to do just

PB my @common = (map { $common{$_} }
PB   sort { $b = $a }
PB   keys %common)

Well, actually you spotted a bug between the implementation and
what I wanted to do.  It should have been:

map { $_-[0] }
sort { $b-[1] = $a-[1] }
map { [ $common{$_} = $_ ] } keys %common

That is, sort [anscestor = number of times it appears] tuple by
the number of times it appears in decreasing order, and
project the resulting list to a list of ancestors.  It is trying
to deal with the following pattern in rev-tree output:

TIMESTAMP1 EDGE1:1 ANCESTOR1:3 ANCESTOR2:3
TIMESTAMP2 EDGE2:2 ANCESTOR1:3

and when the above happens I wanted to pick up ANCESTOR1, but
that was without no sound reason.

PB But I really think this is a horrible heuristic. I believe you should
PB take the latest commit in the --edges output, and from that choose the
PB base whose rev-tree --edges the_base merged_branch has the least lines
PB on output. (That is, the path to it is shortest - ideally it's already
PB part of the merged_branch.)

I'll try something along that line.  Honestly the ancestor
selection part was what I had most trouble with.  Thanks.

PB What about

PB sub OLDMODE { 0 }
PB sub NEWMODE { 1 }
PB sub NEWSHA { 2 }

PB and then using that when accessing the tuple? Would make the code
PB much more readable.

Totally agreed; readability cleanup is needed, just as use
strict you mentioned, before it is ready for public
consumption.  Remember, however, the primary purpose of the
message was to share it with Linus so that I can ask his opinion
while the script was still slushy; the contents that array
contained was still changing then and was too early for symbolic
constants.  I'll do that in the next round.

PB It is a good idea to check merge's exit code and give a notice at the
PB end if there were any conflicts.

In principle yes, but I noticed that merge already gave me a
nice warning message when it found conflicts, so there was no
need to do so myself in this case.  See sample output:

$ perl ./git-merge.perl \
71796686221a0a56ccc25b02386ed8ea648da14d \
bb95843a5a0f397270819462812735ee29796fb4 
Common ancestor: 9f02d4d233223462d3f6217b5837b786e6286ba4
O - COPYING
O - README
...
O - write-tree.c
A M write-blob.c
A M show-diff.c
...
A M update-cache.c
A M git-merge.perl
B M merge-tree.c
MRG Makefile
merge: warning: conflicts during merge
$ 

 +# Create a temporary directory and go there.
 +system 'rm', '-rf', ',,merge-temp';

PB Can't we call it just ,,merge?

I'd rather have a command line option 

Re: Handling renames.

2005-04-14 Thread H. Peter Anvin
Linus Torvalds wrote:
On Thu, 14 Apr 2005, H. Peter Anvin wrote:
Although Linus is correct in that an SCM doesn't *have* to handle this, 
it really feels like shooting for mediocracy to me.  We might as well 
design it right from the beginning.
No. git is not an SCM. it's a filesystem designed to _host_ an SCM, and 
that _is_ doing it right from the beginning.

Keep the abstractions clean. Do _not_ get confused into thinking that git 
is an SCM. If you think of it that way, you'll end up with crap you can't 
think about.

And at a filesystem layer, rename already exists. It's moving an object 
to a new name in a tree. git already does that very well, thank you very 
much.

But a filesystem rename is _not_ the same thing as an SCM rename.  An SCM 
rename is built on top of a filesystem rename, but it has its own issues 
that may or may not make sense for the filesystem.

I wasn't referring to git per se, I was referring to the hosted SCM.
-hpa
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Handling renames.

2005-04-14 Thread David Woodhouse
On Thu, 2005-04-14 at 20:58 +0200, Ingo Molnar wrote:
 The thing i tried to avoid was to list long filenames in the commit 
 (because of the tree hierarchy we'd need to do tree-absolute pathnames 
 or something like that, and escape things, and do lookups - duplicating 
 a VFS which is quite bad) - it would be better to identify the rename 
 source and target via its tree object hash and its offset within that 
 tree. Such information could be embedded in the commit object just fine.  
 Something like:

Actually I'm not sure that's true. Let's consider the two main users of
this information.

Firstly, because it's what I've been playing with: to list a given
file's revision history, I currently work with its filename -- walk the
commit objects, inspecting the tree and selecting those commits where
the file has changed. If my filename is 'fs/jffs2/inode.c' then I can
immediately skip over a commit where the 'fs' entry in the top-level
tree is identical to that in the parent, or I can skip a commit where
the 'jffs2' entry in the 'fs' subtree is identical to the parent... it's
all done on filename, and the {parent, entry} tuple wouldn't help much
here; I'd probably have to convert back to a filename anyway.

Secondly, there's merges. I've paid less attention to these (see mail 5
minutes ago) but I think they'd end up operating on the rename
information in a very similar way. To find a common ancestor for a given
file,, we want to track its name as it changed during history; at that
point it's all string compares.

-- 
dwmw2


-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Handling renames.

2005-04-14 Thread Zach Welch
Linus Torvalds wrote:
 
 On Thu, 14 Apr 2005, H. Peter Anvin wrote:
 
 Although Linus is correct in that an SCM doesn't *have* to handle 
 this, it really feels like shooting for mediocracy to me.  We might
  as well design it right from the beginning.
 
 
 No. git is not an SCM. it's a filesystem designed to _host_ an SCM, 
 and that _is_ doing it right from the beginning.

I imagine quite a few folks expect something not entirely unlike an SCM
to emerge from these current efforts. Moreover, Petr's 'git' scripts
wrap your filesystem plumbing to that very end.

To avoid confusion, I think it would be better to distinguish the two
layers, perhaps by calling the low-level plumbing... 'gitfs', of course.

Cheers,

Zach Welch
Superlucidity Services
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Date handling.

2005-04-14 Thread David Woodhouse
On Thu, 2005-04-14 at 12:19 -0700, [EMAIL PROTECTED] wrote:
 With a UTC date, why would anyone care in which timezone the commit was
 made?  Any pretty printing would most likely be prettiest if it is done
 relative to the timezone of the person looking at the commit record, not
 the person who created the record.

I'd prefer not to lose the information. If someone has committed a
change at 2am, I like to know that it was 2am for _them_. It helps me
decide where to look first for the cause of problems. :)

It also helps disambiguate certain comments, especially those involving
words or phrases such as yesterday or this afternoon.

-- 
dwmw2


-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Date handling.

2005-04-14 Thread Luck, Tony
I'd prefer not to lose the information. If someone has committed a
change at 2am, I like to know that it was 2am for _them_. It helps me
decide where to look first for the cause of problems. :)

I'd think the 8:00am-before-the-first-coffee checkins would be the
most worrying :-)

It also helps disambiguate certain comments, especially those involving
words or phrases such as yesterday or this afternoon.

This is a very good point ... but this still has problems with the
git is a filesystem, not a SCM mantra.  Timezone comments don't
belong in the git inode.

-Tony
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re: Re: Merge with git-pasky II.

2005-04-14 Thread Petr Baudis
Dear diary, on Thu, Apr 14, 2005 at 10:23:26PM CEST, I got a letter
where Erik van Konijnenburg [EMAIL PROTECTED] told me that...
 On Thu, Apr 14, 2005 at 09:35:07PM +0200, Petr Baudis wrote:
  Hmm. I actually don't like this naming. I think it's not too consistent,
  is irregular, therefore parsing it would be ugly. What I propose:
  
  12c\tname - legend
- original file
  D - tree #1 removed file
   D- tree #2 removed file
  DD- both trees removed file
  M - tree #1 modified file
   M
  DM*   - conflict, tree #1 removed file, tree #2 modified file
  MD*
  MM- exact same modification
  MM*   - different modifications, merging
  
  This is generic, theoretically scales well even to more trees, is easy
  to parse trivially, still is human readable (actually the asterisk in
  the 'conflict' column is there basically only for the humans), is
  completely regular and consistent.
 
 Detail: perhaps use underscore instead of space, to avoid space/tab typos
 that are invisible on paper and user friendly mail clients?

I'd go for dots in that case. Looks less intrusive. :^)

-- 
Petr Pasky Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re: Naming the SCM (was Re: Handling renames.)

2005-04-14 Thread Petr Baudis
Dear diary, on Thu, Apr 14, 2005 at 10:58:52PM CEST, I got a letter
where H. Peter Anvin [EMAIL PROTECTED] told me that...
 Petr Baudis wrote:
 
 Cogito.  Git inside can be the first slogan.
 
 What about tig?
 
 I like Cogito; it's a real name, plus it'd be a good use for the 
 otherwise-pretty-useless two-letter combination cg.

Duh, believe me or not but I completely missed the Cogito part of
Steven's mail. Of course, I like it too.

I'll commit my poor man's git-merge-in-separate-tree and finally get
some sleep. I promise.

-- 
Petr Pasky Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Handling renames.

2005-04-14 Thread Daniel Barkalow
On Thu, 14 Apr 2005, David Woodhouse wrote:

 Opinions? Dissent? We'd probably need to escape the filenames in some
 way -- handwave over that for now.

I personally think renames are a minor thing that doesn't happen
much. What actually happens, in my opinion, is that some chunk of a file
is moved to a different, possibly new, file. If this is supported (as
something that the SCM notices), then a rename is just a special case
where the moved chunk is a whole file.

I think that it should be possible to identify and tag big
enough deletions and insertions, and compare them to find moves, where a
further change may be applied in the middle if two chunks are very
similar but not the same.

On the other hand, I think that the SCM will need to cache its
understanding of what a commit did in order to give reasonable
performance for operations like annotate, and it may be advantegous to
distribute things from this cache, since the committer might want to tell
the system something that it didn't guess.

At some point, I'm going to argue for core support for back pointers,
where a file can be created which is about some other file(s), and
someone looking for files about a particular file can find them without
searching the entire database. I think this will turn out to be important
for a variety of cases where some later participant wants to say something
about an existing file without changing the content of the file.

-Daniel
*This .sig left intentionally blank*

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[no subject]

2005-04-14 Thread Timo Hirvonen
subscribe git

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch pasky] update gitcancel.sh to handle modes as well

2005-04-14 Thread Martin Schlemmer
Hi,

gitcancel.sh do not handle mode changes:


$ chmod -x Makefile
$ git cancel
patch:  Only garbage was found in the patch input.


Rather use checkout-cache to sync our tree, as should do the right thing
instead of diffing (cancel imply just blow away everything).

Signed-off-by: Martin Schlemmer [EMAIL PROTECTED]

gittrack.sh:  03d6db1fb3a70605ef249c632c04e542457f0808
--- 03d6db1fb3a70605ef249c632c04e542457f0808/gittrack.sh
+++ uncommitted/gittrack.sh
@@ -51,6 +51,7 @@

read-tree $(tree-id $name)
gitdiff.sh local $name | gitapply.sh
+   update-cache --refresh

 else
[ $tracking ] || \
@@ -61,6 +62,7 @@
if [ -s .git/HEAD.local ]; then
gitdiff.sh $tracking local | gitapply.sh
read-tree $(tree-id local)
+   update-cache --refresh

head=$(cat .git/HEAD)
branchhead=$(cat .git/heads/$tracking)


-- 
Martin Schlemmer

gittrack.sh:  03d6db1fb3a70605ef249c632c04e542457f0808
--- 03d6db1fb3a70605ef249c632c04e542457f0808/gittrack.sh
+++ uncommitted/gittrack.sh
@@ -51,6 +51,7 @@
 
 	read-tree $(tree-id $name)
 	gitdiff.sh local $name | gitapply.sh
+	update-cache --refresh
 
 else
 	[ $tracking ] || \
@@ -61,6 +62,7 @@
 	if [ -s .git/HEAD.local ]; then
 		gitdiff.sh $tracking local | gitapply.sh
 		read-tree $(tree-id local)
+		update-cache --refresh
 
 		head=$(cat .git/HEAD)
 		branchhead=$(cat .git/heads/$tracking)


signature.asc
Description: This is a digitally signed message part


Re: Remove need to untrack before tracking new branch

2005-04-14 Thread Martin Schlemmer
On Fri, 2005-04-15 at 00:42 +0200, Petr Baudis wrote:
 Dear diary, on Thu, Apr 14, 2005 at 11:40:09AM CEST, I got a letter
 where Martin Schlemmer [EMAIL PROTECTED] told me that...
  (PS, can you check the fact that your mail client keeps on adding a 'Re:
  ' ...)
 
 Hmm. I guess my ancient reply_regexp
 ^((\\[|\\()([^B]|B([^u]|u([^g]|g([^ ]|AnTiMaTcH[^]]+(\\]|\\)))?[
  \t]*((re([\\[0-9\\]+])*|aw):[ \t]*)? is broken... ;-)
 
  On Thu, 2005-04-14 at 11:11 +0200, Petr Baudis wrote:
   I'm lost. Why do you do --update-modes? That makes no sense to me.
   You introduce them to the cache out-of-order w.r.t. commits, that means
   in the normal git usage they are already unrevertable.
   
  
  Right, afterwards I thought I did add it to the wrong place.
 
 So, could you please do something with it? :-)
 
   What are you trying to do? Mode changes _are_ real changes. You _don't_
   want to silence them. What you want is to even show them more explicitly
   in show-diff.
   
  
  No, you do not understand.  If you actually change the mode, it will
  show.  What now happens, is that say I track the 'linus' branch, then
  untrack, and then track 'pasky' again, the Patches will be applied, but
  not the mode changes which are stored in the cache ...  Let me show you:
  
  -
  $ ls -l $(./show-diff -s | cut -d: -f1)
 ..directroy listing with no 'x' bit..
  -
  
  (Note no 'x' bit ...)
  
  And that is _after_ doing:
  
   $ git track linus; git track
  
  So basically the modes that are stored in the cache are not applied ...
  Although, yes, I prob should add the relevant code to checkout-cache.
 
 This should be fixed now, BTW. git apply didn't correctly apply the
 mode changes, but now it should. Several bugs prevented it to, in fact.
 ;-)
 

Yep, I saw - thought you scrapped this, so mailed a new patch (or was
busy doing the touch ups to the email when this came in.

show-diff.c:  a531ca4078525d1c8dcf84aae0bfa89fed6e5d96
--- a531ca4078525d1c8dcf84aae0bfa89fed6e5d96/show-diff.c
+++ uncommitted/show-diff.c
@@ -5,13 +5,18 @@
  */
 #include cache.h

-static void show_differences(char *name,
+static void show_differences(struct cache_entry *ce,
void *old_contents, unsigned long long old_size)
 {
static char cmd[1000];
+   static char sha1[41];
+   int n;
FILE *f;

-   snprintf(cmd, sizeof(cmd), diff -L %s -u -N  - %s, name, 
name);
+   for (n = 0; n  20; n++)
+   snprintf((sha1[n*2]), 3, %02x, ce-sha1[n]);
+   snprintf(cmd, sizeof(cmd), diff -L %s/%s -L uncommitted/%s -u 
-N  - %s,
+   sha1, ce-name, ce-name, ce-name);
   
   The directory sha1 is the sha1 of the tree, not of the particular
   file - that one is in the attributes list (parentheses after the
   filename), together with mode.
   
  
  Does it really matter?  It is more just to get the patch prefix right,
  and I did it as it went nicely with the printed:
  
  
  show-diff.c:  a531ca4078525d1c8dcf84aae0bfa89fed6e5d96
  
  
  for example ...
 
 Yes, it matters, and I don't care how nicely it wents with what you
 print before.
 

hah ;p

 Either print there some nonsense which is clear not to be a tree ID, or
 (much more preferably) print the real tree ID there. If some tool ever
 uses it (e.g. to help resolve conflicts, perhaps even actually doing a
 real merge based on the patch), you just confused it.
 

Ok, understood.  Do you think it will be scripted?  If not I guess we
can just do labels like:

--- committed/
+++ uncommitted/

?

 Also, do you think you could separate this patch from the other
 (--update-modes) patch? (If we actually still need the --update-modes
 patch after git apply was fixed.)
 

Yeah, already split it out locally, just waiting on above response ...


Thanks,

-- 
Martin Schlemmer



signature.asc
Description: This is a digitally signed message part


Re: update gitcancel.sh to handle modes as well

2005-04-14 Thread Petr Baudis
Dear diary, on Fri, Apr 15, 2005 at 12:57:25AM CEST, I got a letter
where Martin Schlemmer [EMAIL PROTECTED] told me that...
 Hi,
 
 gitcancel.sh do not handle mode changes:
 
 
 $ chmod -x Makefile
 $ git cancel
 patch:  Only garbage was found in the patch input.
 
 
 Rather use checkout-cache to sync our tree, as should do the right thing
 instead of diffing (cancel imply just blow away everything).
 
 Signed-off-by: Martin Schlemmer [EMAIL PROTECTED]
 
 gittrack.sh:  03d6db1fb3a70605ef249c632c04e542457f0808
 --- 03d6db1fb3a70605ef249c632c04e542457f0808/gittrack.sh
 +++ uncommitted/gittrack.sh
 @@ -51,6 +51,7 @@
 
 read-tree $(tree-id $name)
 gitdiff.sh local $name | gitapply.sh
 +   update-cache --refresh
 
  else
 [ $tracking ] || \
 @@ -61,6 +62,7 @@
 if [ -s .git/HEAD.local ]; then
 gitdiff.sh $tracking local | gitapply.sh
 read-tree $(tree-id local)
 +   update-cache --refresh
 
 head=$(cat .git/HEAD)
 branchhead=$(cat .git/heads/$tracking)

The patch looks familiar, but not right. ;-)

-- 
Petr Pasky Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Remove need to untrack before tracking new branch

2005-04-14 Thread Petr Baudis
Dear diary, on Fri, Apr 15, 2005 at 01:01:27AM CEST, I got a letter
where Martin Schlemmer [EMAIL PROTECTED] told me that...
 On Fri, 2005-04-15 at 00:42 +0200, Petr Baudis wrote:
  Dear diary, on Thu, Apr 14, 2005 at 11:40:09AM CEST, I got a letter
  where Martin Schlemmer [EMAIL PROTECTED] told me that...
   So basically the modes that are stored in the cache are not applied ...
   Although, yes, I prob should add the relevant code to checkout-cache.
  
  This should be fixed now, BTW. git apply didn't correctly apply the
  mode changes, but now it should. Several bugs prevented it to, in fact.
  ;-)
 
 Yep, I saw - thought you scrapped this, so mailed a new patch (or was
 busy doing the touch ups to the email when this came in.

Hmm, I must've missed the new patch. The latest I have still puts the
stuff to update-cache and combines it with the show-diff change.

 show-diff.c:  a531ca4078525d1c8dcf84aae0bfa89fed6e5d96
 --- a531ca4078525d1c8dcf84aae0bfa89fed6e5d96/show-diff.c
 +++ uncommitted/show-diff.c
 @@ -5,13 +5,18 @@
..snip..
 -   snprintf(cmd, sizeof(cmd), diff -L %s -u -N  - %s, name, 
 name);
 +   for (n = 0; n  20; n++)
 +   snprintf((sha1[n*2]), 3, %02x, ce-sha1[n]);
 +   snprintf(cmd, sizeof(cmd), diff -L %s/%s -L uncommitted/%s 
 -u -N  - %s,
 +   sha1, ce-name, ce-name, ce-name);

The directory sha1 is the sha1 of the tree, not of the particular
file - that one is in the attributes list (parentheses after the
filename), together with mode.

   
   Does it really matter?  It is more just to get the patch prefix right,
   and I did it as it went nicely with the printed:
   
   
   show-diff.c:  a531ca4078525d1c8dcf84aae0bfa89fed6e5d96
   
   
   for example ...
  
  Yes, it matters, and I don't care how nicely it wents with what you
  print before.
  
 
 hah ;p
 
  Either print there some nonsense which is clear not to be a tree ID, or
  (much more preferably) print the real tree ID there. If some tool ever
  uses it (e.g. to help resolve conflicts, perhaps even actually doing a
  real merge based on the patch), you just confused it.
  
 
 Ok, understood.  Do you think it will be scripted?  If not I guess we
 can just do labels like:
 
 --- committed/
 +++ uncommitted/
 
 ?

Heh. Well, of course this could do. But is there any technical reason
why not just carry the sha1 id of the tree around and stuff it there?

-- 
Petr Pasky Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Reorganize common code

2005-04-14 Thread Petr Baudis
Dear diary, on Thu, Apr 14, 2005 at 12:00:25AM CEST, I got a letter
where Daniel Barkalow [EMAIL PROTECTED] told me that...
 This splits read-cache.c into util.c, cache.c, and objects.c, based on
 what the code is used for; similarly, cache.h becomes util.h, cache.h, and
 objects.h. For simplicity, cache.h includes the other two to give the
 previous overall behavior.
 
 Signed-Off-By: Daniel Barkalow [EMAIL PROTECTED]

FYI, given the scale of this change, I'm waiting for Linus to either ack
it, pick it himself or reject it.

-- 
Petr Pasky Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Naming the SCM (was Re: Handling renames.)

2005-04-14 Thread Peter Williams
Steven Cole wrote:
On Thursday 14 April 2005 01:40 pm, Andrew Timberlake-Newell wrote:
Zach Welch pontificated:
I imagine quite a few folks expect something not entirely unlike an SCM
to emerge from these current efforts. Moreover, Petr's 'git' scripts
wrap your filesystem plumbing to that very end.
To avoid confusion, I think it would be better to distinguish the two
layers, perhaps by calling the low-level plumbing... 'gitfs', of course.
Or perhaps to come up with a name (or at least nickname) for the SCM.
GitMaster?

Cogito.  Git inside can be the first slogan.
Differentiating the SCM built on top of git from git itself is probably 
worthwhile
to avoid confusion.  Other SCMs may be developed later, built on git, and these
can come up with their own clever names.
And the logo could be a dove which, as everybody knows, coos.
Peter
--
Peter Williams   [EMAIL PROTECTED]
Learning, n. The kind of ignorance distinguishing the studious.
 -- Ambrose Bierce
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re: Merge with git-pasky II.

2005-04-14 Thread Christopher Li
Is that some thing you want to see? Maybe clean up the error printing.


Chris

--- /dev/null   2003-01-30 05:24:37.0 -0500
+++ merge.py2005-04-14 16:34:39.0 -0400
@@ -0,0 +1,76 @@
+#!/usr/bin/env python
+
+import re
+import sys
+import os
+from pprint import pprint
+
+def get_tree(commit):
+data = os.popen(cat-file commit %s%commit).read()
+return re.findall(r(?m)^tree (\w+), data)[0]
+
+PREFIX = 0
+PATH = -1
+SHA = -2
+ORIGSHA = -3
+
+def get_difftree(old, new):
+lines = os.popen(diff-tree %s %s%(old, new)).read().split(\x00)
+patterns = (r(\*)(\d+)-(\d+)\s(\w+)\s(\w+)-(\w+)\s(.*),
+   r([+-])(\d+)\s(\w+)\s(\w+)\s(.*))
+res = {}
+for l in lines:
+   if not l: continue
+   for p in patterns:
+   m = re.findall(p, l)
+   if m:
+   m = m[0]
+   res[m[-1]] = m
+   break
+   else:
+   raise difftree: unknow line, l
+return res
+
+def analyze(diff1, diff2):
+diff1only = [ diff1[k] for k in diff1 if k not in diff2 ]
+diff2only = [ diff2[k] for k in diff2 if k not in diff1 ]
+both = [ (diff1[k],diff2[k]) for k in diff2 if k in diff1 ]
+
+action(diff1only)
+action(diff2only)
+action_two(both)
+
+def action(diffs):
+for act in diffs:
+   if act[PREFIX] == *:
+   print modify, act[PATH], act[SHA]
+   elif act[PREFIX] == '-':
+   print remove, act[PATH], act[SHA]
+   elif act[PREFIX] == '+':
+   print add, act[PATH], act[SHA]
+   else:
+   raise unknow action
+
+def action_two(diffs):
+for act1, act2 in diffs:
+   if len(act1) == len(act2):  # same kind type
+   if act1[PREFIX] == act2[PREFIX]:
+   if act1[SHA] == act2[SHA] or act1[PREFIX] == '-': 
+   return action(act1)
+   if act1[PREFIX]=='*':
+   print do_merge, act1[PATH], act1[ORIGSHA], act1[SHA], 
act2[SHA]
+   return
+   print unable to handle, act[PATH]
+   print one side wants, act1[PREFIX]
+   print the other side wants, act2[PREFIX]
+   
+
+args = sys.argv[1:]
+if len(args)!=3:
+print Usage merge.py common rev1 rev2
+trees = map(get_tree, args)
+print checkout-tree, trees[0]
+diff1 = get_difftree(trees[0], trees[1])
+diff2 = get_difftree(trees[0], trees[2])
+analyze(diff1, diff2)
+
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html