[PATCH 3/4] Use tree_from_tree_or_commit() in diff-tree.

2005-04-20 Thread Junio C Hamano
This patch makes diff-tree accept either tree or commit.

Signed-off-by: Junio C Hamano [EMAIL PROTECTED]
---

 diff-tree.c |   12 +++-
 1 files changed, 7 insertions(+), 5 deletions(-)

--- a/diff-tree.c
+++ b/diff-tree.c
@@ -160,18 +160,20 @@ static int diff_tree(void *tree1, unsign
return 0;
 }
 
-static int diff_tree_sha1(const unsigned char *old, const unsigned char *new, 
const char *base)
+static int diff_tree_sha1(const unsigned char *old,
+ const unsigned char *new,
+ const char *base)
 {
void *tree1, *tree2;
unsigned long size1, size2;
char type[20];
int retval;
 
-   tree1 = read_sha1_file(old, type, size1);
-   if (!tree1 || strcmp(type, tree))
+   tree1 = tree_from_tree_or_commit(old, type, size1);
+   if (!tree1)
die(unable to read source tree (%s), sha1_to_hex(old));
-   tree2 = read_sha1_file(new, type, size2);
-   if (!tree2 || strcmp(type, tree))
+   tree2 = tree_from_tree_or_commit(new, type, size2);
+   if (!tree2)
die(unable to read destination tree (%s), sha1_to_hex(new));
retval = diff_tree(tree1, size1, tree2, size2, base);
free(tree1);

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] Use tree_from_tree_or_commit() in read-tree.

2005-04-20 Thread Junio C Hamano
This patch makes read-tree accept either tree or commit.

Signed-off-by: Junio C Hamano [EMAIL PROTECTED]
---

 read-tree.c |4 +---
 1 files changed, 1 insertion(+), 3 deletions(-)

Makefile: needs update
--- a/read-tree.c
+++ b/read-tree.c
@@ -29,11 +29,9 @@ static int read_tree(unsigned char *sha1
unsigned long size;
char type[20];
 
-   buffer = read_sha1_file(sha1, type, size);
+   buffer = tree_from_tree_or_commit(sha1, type, size);
if (!buffer)
return -1;
-   if (strcmp(type, tree))
-   return -1;
while (size) {
int len = strlen(buffer)+1;
unsigned char *sha1 = buffer + len;

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] Use tree_from_tree_or_commit() in ls-tree.

2005-04-20 Thread Junio C Hamano
This patch makes ls-tree accept either tree or commit.

Signed-off-by: Junio C Hamano [EMAIL PROTECTED]
---

 ls-tree.c |2 +-
 1 files changed, 1 insertion(+), 1 deletion(-)

Makefile: needs update
cache.h: needs update
sha1_file.c: needs update
--- a/ls-tree.c
+++ b/ls-tree.c
@@ -74,7 +74,7 @@ static int list(unsigned char *sha1)
unsigned long size;
char type[20];
 
-   buffer = read_sha1_file(sha1, type, size);
+   buffer = tree_from_tree_or_commit(sha1, type, size);
if (!buffer)
die(unable to read sha1 file);
list_recursive(buffer, type, size, NULL);

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


(fixed) [PATCH 1/4] Accept commit in some places when tree is needed.

2005-04-20 Thread Junio C Hamano
cover-paragraph
_BLUSH_  

The 1/4 in the series was a buggy one I sent by mistake.  Please
replace it with this fixed one.  The other three are OK.

BTW, do you have a preferred patch-mail convention to mark the
cover paragraph like this to be excluded from the commit log,
like the three-dash one you mentioned to exclude the tail of
the message?
/cover-paragraph

Similar to diff-cache which was introduced recently, when the
intent is obvious we should accept commit ID when tree ID is
required.  This patch lifts the tree-from-tree-or-commit logic
from diff-cache.c and moves it to sha1_file.c, which is a common
library source for the SHA1 storage part.

Signed-off-by: Junio C Hamano [EMAIL PROTECTED]
---

 cache.h  |1 +
 diff-cache.c |   19 ++-
 sha1_file.c  |   29 +
 3 files changed, 32 insertions(+), 17 deletions(-)

--- a/cache.h
+++ b/cache.h
@@ -124,5 +124,6 @@ extern void die(const char *err, ...);
 extern int error(const char *err, ...);
 
 extern int cache_name_compare(const char *name1, int len1, const char *name2, 
int len2);
+extern void *tree_from_tree_or_commit(const unsigned char *sha1, char *type, 
unsigned long *size);
 
 #endif /* CACHE_H */
--- a/diff-cache.c
+++ b/diff-cache.c
@@ -245,23 +245,8 @@ int main(int argc, char **argv)
if (argc != 2 || get_sha1_hex(argv[1], tree_sha1))
usage(diff-cache [-r] [-z] tree sha1);
 
-   tree = read_sha1_file(tree_sha1, type, size);
+   tree = tree_from_tree_or_commit(tree_sha1, type, size);
if (!tree)
-   die(bad tree object %s, argv[1]);
-
-   /* We allow people to feed us a commit object, just because we're nice 
*/
-   if (!strcmp(type, commit)) {
-   /* tree sha1 is always at offset 5 (tree ) */
-   if (get_sha1_hex(tree + 5, tree_sha1))
-   die(bad commit object %s, argv[1]);
-   free(tree);
-   tree = read_sha1_file(tree_sha1, type, size);   
-   if (!tree)
-   die(unable to read tree object %s, 
sha1_to_hex(tree_sha1));
-   }
-
-   if (strcmp(type, tree))
-   die(bad tree object %s (%s), sha1_to_hex(tree_sha1), type);
-
+   die(cannot get tree object from %s, argv[1]);
return diff_cache(tree, size, active_cache, active_nr, );
 }
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -245,3 +245,32 @@ int write_sha1_buffer(const unsigned cha
close(fd);
return 0;
 }
+
+void *tree_from_tree_or_commit(const unsigned char *sha1, char *type,
+  unsigned long *size)
+{
+   void *tree = read_sha1_file(sha1, type, size);
+   if (!tree)
+   return tree;
+
+   /* We allow people to feed us a commit object,
+* just because we're nice.
+*/
+   if (!strcmp(type, commit)) {
+   /* tree sha1 is always at offset 5 (tree ) */
+   char tree_sha1[20];
+   if (get_sha1_hex(tree + 5, tree_sha1)) {
+   free(tree);
+   return NULL;
+   }
+   free(tree);
+   tree = read_sha1_file(tree_sha1, type, size);
+   if (!tree)
+   return NULL;
+   }
+   if (strcmp(type , tree)) {
+   free(tree);
+   return NULL;
+   }
+   return tree;
+}

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] write-tree performance problems

2005-04-20 Thread Linus Torvalds


On Tue, 19 Apr 2005, Chris Mason wrote:
 
 I'll finish off the patch once you ok the basics below.  My current code 
 works 
 like this:

Chris, before you do anything further, let me re-consider.

Assuming that the real cost of write-tree is the compression (and I think
it is), I really suspect that this ends up being the death-knell to my
use the sha1 of the _compressed_ object approach. I thought it was
clever, and I was ready to ignore the other arguments against it, but if
it turns out that we can speed up write-tree a lot by just doing the SHA1
on the uncompressed data, and noticing that we already have the tree
before we need to compress it and write it out, then that may be a good
enough reason for me to just admit that I was wrong about that decision.

So I'll see if I can turn the current fsck into a convert into
uncompressed format, and do a nice clean format conversion. 

Most of git is very format-agnostic, so that shouldn't be that painful. 
Knock wood.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Change pull to _only_ download, and git update=pull+merge?

2005-04-20 Thread Ingo Molnar

* Petr Baudis [EMAIL PROTECTED] wrote:

  I think pull is pull.  If you are doing lots of local stuff and do not
  want it overwritten, it should have been in a forked branch.
 
 I disagree. This already forces you to have two branches (one to pull 
 from to get the data, mirroring the remote branch, one for your real 
 work) uselessly and needlessly.
 
 I think there is just no good name for what pull is doing now, and 
 update seems like a great name for what pull-and-merge really is. Pull 
 really is pull - it _pulls_ the data, while update also updates the 
 given tree. No surprises.

yeah. In fact most of the times i did 'git pull pasky' in the past, the 
'merge' phase was unsuccessful, and i had to nuke the tree and recreate 
it.  All i did with the snapshots was to build them, so there were no 
local changes. Waiting a couple of days with doing a 'git pull pasky', 
or installing Linus' tree is a sure way to break the merging.

e.g. to reproduce the last such failure i had today, do:

 cd git-pasky-base
 echo 8568e1a88c086d1b72b0e84ab24fa6888b5861b9  .git/HEAD
 read-tree $(tree-id $(cat .git/HEAD))
 checkout-cache -a -f
 make
 make install   # make sure to use the older tools
 rm -rf .git/objects
 git pull pasky

and i get:

 [...]
 fatal: unable to execute 'gitmerge-file.sh'
 fatal: merge program failed

Conflicts during merge. Do git commit after resolving them.

note that with earlier versions of pasky, i had other merge conflicts.  
Sometimes there were .rej files, sometimes some sort of script failure.  
So it seems rather unrobust at the moment. Especially if i happen to 
install Linus' tree and try to sync the pasky tree with those tools.

another thing: it's confusing that during 'git pull', the rsync output 
is not visible. Especially during large rsyncs, it would be nice to see 
some progress. So i usually use a raw rsync not 'git pull', due to this.

yet another thing: what is the canonical 'pasky way' of simply nuking 
the current files and checking out the latest tree (according to 
.git/HEAD). Right now i'm using a script to:

  read-tree $(tree-id $(cat .git/HEAD))
  checkout-cache -a

(i first do an 'rm -f *' in the working directory)

i guess there's an existing command for this already?

Ingo
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] write-tree performance problems

2005-04-20 Thread H. Peter Anvin
Linus Torvalds wrote:
So I'll see if I can turn the current fsck into a convert into
uncompressed format, and do a nice clean format conversion. 

Just let me know what you want to do, and I can trivially change the 
conversion scripts I've already written to do what you want.

-hpa
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


enforcing DB immutability

2005-04-20 Thread Ingo Molnar

* Linus Torvalds [EMAIL PROTECTED] wrote:

 On Wed, 13 Apr 2005, Ingo Molnar wrote:
  
  well, the 'owned by another user' solution is valid though, and doesnt 
  have this particular problem. (We've got a secure multiuser OS, so can 
  as well use it to protect the DB against corruption.)
 
 So now you need root to set up new repositories? No thanks.

yeah, it's a bit awkward to protect uncompressed repositories - but it 
will need some sort of kernel enforcement. (if userspace finds out the 
DB contains uncompressed blobs, it _will_ try to use them.)

(perhaps having an in-kernel GIT-alike versioned filesystem will help - 
but that brings up the same 'I have to be root' issues. The FS will 
enforce the true immutability of objects.)

perhaps having a new 'immutable hardlink' feature in the Linux VFS would 
help? I.e. a hardlink that can only be readonly followed, and can be 
removed, but cannot be chmod-ed to a writeable hardlink. That i think 
would be a large enough barrier for editors/build-tools not to play the 
tricks they already do that makes 'readonly' files virtually 
meaningless.

Ingo
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Unify usage() strings.

2005-04-20 Thread Junio C Hamano
This patch changes identical cut-and-paste usage strings into a
single instance of static string, to make maintenance easier.

Signed-off-by: Junio C Hamano [EMAIL PROTECTED]
---

 commit-tree.c |   14 ++
 diff-cache.c  |6 --
 diff-tree.c   |6 --
 read-tree.c   |   10 ++
 4 files changed, 20 insertions(+), 16 deletions(-)



commit-tree.c: 2eee2fe5b14f1f2d86b8d41b501a879b190bf08f
--- a/commit-tree.c
+++ b/commit-tree.c
@@ -268,15 +268,13 @@ static void check_valid(unsigned char *s
 }
 
 /*
- * Having more than two parents may be strange, but hey, there's
- * no conceptual reason why the file format couldn't accept multi-way
- * merges. It might be the union of several packages, for example.
- *
- * I don't really expect that to happen, but this is here to make
- * it clear that _conceptually_ it's ok..
+ * Having more than two parents is not strange at all, and this is
+ * how multi-way merges are represented.
  */
 #define MAXPARENT (16)
 
+static char *commit_tree_usage = commit-tree sha1 [-p sha1]*  changelog;
+
 int main(int argc, char **argv)
 {
int i, len;
@@ -296,14 +294,14 @@ int main(int argc, char **argv)
unsigned int size;
 
if (argc  2 || get_sha1_hex(argv[1], tree_sha1)  0)
-   usage(commit-tree sha1 [-p sha1]*  changelog);
+   usage(commit_tree_usage);
 
check_valid(tree_sha1, tree);
for (i = 2; i  argc; i += 2) {
char *a, *b;
a = argv[i]; b = argv[i+1];
if (!b || strcmp(a, -p) || get_sha1_hex(b, 
parent_sha1[parents]))
-   usage(commit-tree sha1 [-p sha1]*  changelog);
+   usage(commit_tree_usage);
check_valid(parent_sha1[parents], commit);
parents++;
}


diff-cache.c: 48bcec1230365e12b9fb6df65c15540caea24029
--- a/diff-cache.c
+++ b/diff-cache.c
@@ -215,6 +215,8 @@ static int diff_cache(void *tree, unsign
return 0;
 }
 
+static char *diff_cache_usage = diff-cache [-r] [-z] [--cached] tree sha1;
+
 int main(int argc, char **argv)
 {
unsigned char tree_sha1[20];
@@ -239,11 +241,11 @@ int main(int argc, char **argv)
cached_only = 1;
continue;
}
-   usage(diff-cache [-r] [-z] tree sha1);
+   usage(diff_cache_usage);
}
 
if (argc != 2 || get_sha1_hex(argv[1], tree_sha1))
-   usage(diff-cache [-r] [-z] tree sha1);
+   usage(diff_cache_usage);
 
tree = tree_from_tree_or_commit(tree_sha1, type, size);
if (!tree)
diff-tree.c: 8720ce75b72cdf9c8d189f9edf41e0920bd72767
--- a/diff-tree.c
+++ b/diff-tree.c
@@ -193,6 +193,8 @@ static void commit_to_tree(unsigned char
}
 }
 
+static char *diff_tree_usage = diff-tree [-r] [-z] tree sha1 tree sha1;
+
 int main(int argc, char **argv)
 {
unsigned char old[20], new[20];
@@ -209,11 +211,11 @@ int main(int argc, char **argv)
line_termination = '\0';
continue;
}
-   usage(diff-tree [-r] [-z] tree sha1 tree sha1);
+   usage(diff_tree_usage);
}
 
if (argc != 3 || get_sha1_hex(argv[1], old) || get_sha1_hex(argv[2], 
new))
-   usage(diff-tree tree sha1 tree sha1);
+   usage(diff_tree_usage);
commit_to_tree(old);
commit_to_tree(new);
return diff_tree_sha1(old, new, );


read-tree.c: e438579d63fb090209eaf4c864586afaeb52ae0f
--- a/read-tree.c
+++ b/read-tree.c
@@ -201,6 +201,8 @@ static void merge_stat_info(struct cache
}
 }
 
+static char *read_tree_usage = read-tree (sha | -m sha1 [sha2 sha3]);
+
 int main(int argc, char **argv)
 {
int i, newfd, merge;
@@ -220,20 +222,20 @@ int main(int argc, char **argv)
if (!strcmp(arg, -m)) {
int i;
if (stage)
-   usage(-m needs to come first);
+   die(-m needs to come first);
read_cache();
for (i = 0; i  active_nr; i++) {
if (ce_stage(active_cache[i]))
-   usage(you need to resolve your current 
index first);
+   die(you need to resolve your current 
index first);
}
stage = 1;
merge = 1;
continue;
}
if (get_sha1_hex(arg, sha1)  0)
-   usage(read-tree [-m] sha1);
+   usage(read_tree_usage);
if (stage  3)
-   usage(can't merge more than two trees);
+   usage(read_tree_usage);
if (read_tree(sha1, , 0)  0)
die(failed to unpack tree object %s, arg);

Re: enforcing DB immutability

2005-04-20 Thread Ingo Molnar

* Ingo Molnar [EMAIL PROTECTED] wrote:

 perhaps having a new 'immutable hardlink' feature in the Linux VFS 
 would help? I.e. a hardlink that can only be readonly followed, and 
 can be removed, but cannot be chmod-ed to a writeable hardlink. That i 
 think would be a large enough barrier for editors/build-tools not to 
 play the tricks they already do that makes 'readonly' files virtually 
 meaningless.

immutable hardlinks have the following advantage: a hardlink by design 
hides the information where the link comes from. So even if an editor 
wanted to play stupid games and override the immutability - it doesnt 
know where the DB object is. (sure, it could find it if it wants to, but 
that needs real messing around - editors wont do _that_)

i think this might work.

(the current chattr +i flag isnt quite what we need though because it 
works on the inode, and it's also a root-only feature so it puts us back 
to square one. What would be needed is an immutability flag on 
hardlinks, settable by unprivileged users.)

Ingo
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


wit - demo site

2005-04-20 Thread Christian Meder
Hi,

thanks to my friend Frank Sattelberger I got access to a site where I
could set up a demo for wit:

http://grmso.net:8090

Couple of notes wrt why I work on another git web interface compared
with Kay's work:

* I was already experimenting and implementing for a couple of days when
Kay's tool was first announced and I didn't want to throw away my
feature set

* the Web API: wit has a different philosophy when it comes to URIs: The
stable URI mapping should translate in a straightforward fashion to
git: /blob/sha1 /tree/sha1, /tree/sha/diff/sha1, etc.; no URL
parameters

* wit is more of a git view right now: it only uses git and tries to
stay close to the repository browsing paradigm (see the API issue above)

* wit provides tarballs and patches but that's an easy one for Kay

* wit looks uglier but that will hopefully change soon ;-)

* I'm a not a Perl guy

I'm still seeking feedback ;-)

Greetings,


Christian

-- 
Christian Meder, email: [EMAIL PROTECTED]

The Way-Seeking Mind of a tenzo is actualized 
by rolling up your sleeves.

(Eihei Dogen Zenji)

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Give better default modes to merge results.

2005-04-20 Thread Junio C Hamano
As shipped, the example git-merge-one-file-script often leaves
the merge result with not-so-useful mode bits, especially with
glibc 2.0.7 or later whose mkstemp() creates temporary file with
mode 0600.  This contradicts the way checkout-cache creates new
files, which is to use 0666 (or 0777 for files with executable
bit on) and let the umask mechanism to take care of adjusting it
to the user's preference.

This patch fixes this problem by (1) passing the executable bits
for 3 stages from merge-cache to the merge script, and by (2)
adjusting the example script to make use of that information.

For backward compatibility with existing merge-one-file-script
people may already have developed, the additional 3 arguments
are passed after the filename (i.e. as $5, $6 and $7).  This
does not logically look so nice, but the older scripts can and
would just ignore these new parameters.

The patch also fixes some shell quoting problems the original
sample script had with the resulting filename $4.  Unlike all
the other arguments, this must be quoted to prevent it from
being split via shell's $IFS mechanism.

Signed-off-by: Junio C Hamano [EMAIL PROTECTED]
---

 git-merge-one-file-script |   35 +++
 merge-cache.c |   18 ++
 2 files changed, 41 insertions(+), 12 deletions(-)

--- a/git-merge-one-file-script
+++ b/git-merge-one-file-script
@@ -6,7 +6,9 @@
 #   $2 - file in branch1 SHA1 (or empty)
 #   $3 - file in branch2 SHA1 (or empty)
 #   $4 - pathname in repository
-#
+#   $5 - original file executable bit ('x' or '-' or empty)
+#   $6 - file in branch1  executable bit ('x' or '-' or empty)
+#   $7 - file in branch2  executable bit ('x' or '-' or empty)
 #
 # Handle some trivial cases.. The _really_ trivial cases have
 # been handled already by read-tree, but that one doesn't
@@ -24,17 +26,29 @@ case ${1:-.}${2:-.}${3:-.} in
 #
 $1.. | $1.$1 | $1$1.)
rm -f -- $4
-   update-cache --remove -- $4
-   exit 0
+   exec update-cache --remove -- $4
;;
 
 #
 # added in one, or added identically in both
 #
 .$2. | ..$3 | .$2$2)
-   mv $(unpack-file ${2:-$3}) $4
-   update-cache --add -- $4 ;# needs filemode fix.
-   exit 0
+
+   # This part is convoluted but necessary to get a sane
+   # default mode bits.  We let the shell to honor default
+   # umask when creating the file, and then rely on chmod +x
+   # to again honor umask.  It used to mv the file created
+   # in mode 0600 by unpack-file to $4, which was almost
+   # always wrong.
+
+   tmp=$(unpack-file ${2:-$3}) 
+   rm -f $4 
+   cat $tmp $4 
+   case $6$7 in
+   *x*) chmod +x $4 ;;
+   esac 
+   rm -f $tmp || exit
+   exec update-cache --add -- $4
;;
 
 #
@@ -50,11 +64,16 @@ case ${1:-.}${2:-.}${3:-.} in
echo Leaving conflict merge in $src2
exit 1
fi
-   cp $src2 $4  update-cache --add -- $4  exit 0
+   rm -f $4 
+   cat $src2 $4 
+   case $5$6$7 in
+   *x*) chmod +x $4 ;;
+   esac || exit
+   exec update-cache --add -- $4
;;
 
 *)
-   echo Not handling case $1 - $2 - $3
+   echo Not handling case $1($5) - $2($6) - $3($7)
;;
 esac
 exit 1
--- a/merge-cache.c
+++ b/merge-cache.c
@@ -4,7 +4,7 @@
 #include cache.h
 
 static const char *pgm = NULL;
-static const char *arguments[5];
+static const char *arguments[8];
 
 static void run_program(void)
 {
@@ -18,6 +18,9 @@ static void run_program(void)
arguments[2],
arguments[3],
arguments[4],
+   arguments[5],
+   arguments[6],
+   arguments[7],
NULL);
die(unable to execute '%s', pgm);
}
@@ -36,17 +39,24 @@ static int merge_entry(int pos, const ch
arguments[2] = ;
arguments[3] = ;
arguments[4] = path;
+   arguments[5] = ;
+   arguments[6] = ;
+   arguments[7] = ;
found = 0;
do {
-   static char hexbuf[4][60];
+   static char hexbuf[3][41];
+   static char xbit[3][2];
struct cache_entry *ce = active_cache[pos];
int stage = ce_stage(ce);
 
if (strcmp(ce-name, path))
break;
found++;
-   strcpy(hexbuf[stage], sha1_to_hex(ce-sha1));
-   arguments[stage] = hexbuf[stage];
+   strcpy(hexbuf[stage-1], sha1_to_hex(ce-sha1));
+   arguments[stage] = hexbuf[stage-1];
+   xbit[stage-1][0] = (ntohl(ce-ce_mode)  0100) ? 'x' : '-';
+   xbit[stage-1][1] = 0;
+   arguments[stage+4] = xbit[stage-1];
} while (++pos  active_nr);
if (!found)
die(merge-cache: %s not in the cache, path);



-
To 

Re: wit 0.0.3 - a web interface for git available

2005-04-20 Thread Christian Meder
On Wed, 2005-04-20 at 10:42 +0100, Christoph Hellwig wrote:
 On Tue, Apr 19, 2005 at 09:18:29PM -0700, Greg KH wrote:
  On Wed, Apr 20, 2005 at 02:29:11AM +0200, Christian Meder wrote:
   Hi,
   
   ok it's starting to look like spam ;-)
   
   I uploaded a new version of wit to http://www.absolutegiganten.org/wit
  
  Why not work together with Kay's tool:
  http://ehlo.org/~kay/gitweb.pl?project=linux-2.6action=show_log
 
 That one looks really nice.  One major feature I'd love to see would
 be a show all diffs link for a changeset.

Hi, 

wit only has show all diffs right now but I like the show file diffs
of Kay's tool. I'll implement it tonight ;-)


Christian


-- 
Christian Meder, email: [EMAIL PROTECTED]

The Way-Seeking Mind of a tenzo is actualized 
by rolling up your sleeves.

(Eihei Dogen Zenji)

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)

2005-04-20 Thread Ingo Molnar

* Linus Torvalds [EMAIL PROTECTED] wrote:

 So to convert your old git setup to a new git setup, do the following:
 [...]

did this for two repositories (git and kernel-git), it works as 
advertised.

Ingo
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ANNOUNCEMENT] /Arch/ embraces `git'

2005-04-20 Thread Tom Lord

`git', by Linus Torvalds, contains some very good ideas and some
very entertaining source code -- recommended reading for hackers.

/GNU Arch/ will adopt `git':

From the /Arch/ perspective: `git' technology will form the
basis of a new archive/revlib/cache format and the basis
of new network transports.

From the `git' perspective, /Arch/ will replace the lame directory
cache component of `git' with a proper revision control system.

In my view, the core ideas in `git' are quite profound and deserve
an impeccable implementation.   This is practical because those ideas
are also pretty simple.

I started here:

   http://www.seyza.com/=clients/linus/tree/index.html

and for those interested in `git'-theory, a good place to start is

   http://www.seyza.com/=clients/linus/tree/src/liblob/index.html

(Linus is not literally a client of mine.  That's just the directory 
where this goes.)

-t
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ANNOUNCEMENT] /Arch/ embraces `git'

2005-04-20 Thread Tom Lord

`git', by Linus Torvalds, contains some very good ideas and some
very entertaining source code -- recommended reading for hackers.

/GNU Arch/ will adopt `git':

From the /Arch/ perspective: `git' technology will form the
basis of a new archive/revlib/cache format and the basis
of new network transports.

From the `git' perspective, /Arch/ will replace the lame directory
cache component of `git' with a proper revision control system.

In my view, the core ideas in `git' are quite profound and deserve
an impeccable implementation.   This is practical because those ideas
are also pretty simple.

I started here:

   http://www.seyza.com/=clients/linus/tree/index.html

and for those interested in `git'-theory, a good place to start is

   http://www.seyza.com/=clients/linus/tree/src/liblob/index.html

(Linus is not literally a client of mine.  That's just the directory 
where this goes.)

-t
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GNU-arch-dev] [ANNOUNCEMENT] /Arch/ embraces `git'

2005-04-20 Thread Miles Bader
Way to go.

-Miles
-- 
Do not taunt Happy Fun Ball.
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [darcs-devel] Darcs and git: plan of action

2005-04-20 Thread David Roundy
On Tue, Apr 19, 2005 at 09:49:12AM -0700, Linus Torvalds wrote:
 On Tue, 19 Apr 2005, Tupshin Harper wrote:
  I suspect that any use of wildcards in a new format would be impossible
  for darcs since it wouldn't allow darcs to construct dependencies,
  though I'll leave it to david to respond to that.
 
 Note that git _does_ very efficiently (and I mean _very_) expose the 
 changed files.
 
 So if this kind of darcs patch is always the same pattern just repeated
 over n files, then you really don't need to even list the files at all.
 Git gives you a very efficient file listing by just doing a diff-tree
 (which does not diff the _contents_ - it really just gives you a pretty
 much zero-cost which files changed listing).

The catch is that it's possible to have a darcs patch that doesn't change
any files, or that affects files without changing them.  If I rename
function foo to bar, I might want to do

darcs replace foo bar *.c

which would issue a replace on all files, which means that when this patch
is merged with any patches that add occurrences of foo in a file, that will
get modified to a bar, regardless of whether there was previously an
occurrence of foo in that file.

I think we might (when working with git--it would be problematic within
darcs straight) be able to work out some sort of a wildcard replace
scheme, so it could be something like

replace foo bar in: mm/*.c

The regexp bit could be left out, if we restrict the definition of tokens
in token replaces--which probably isn't a troublesome limitation.  By
default darcs uses two tokenizing schemes, one which allows . in tokens
(usually relevant in Makefiles), and one which doesn't, and basically
matches C identifiers.  We could allow for both of these if we had a second
option:

replace filename foo.h bar.h in: mm/*.c

We'd just need to expand the wildcards when translating from the git
repository into darcs patches.

 So that combination would be 100% reliable _if_ you always split up darcs 
 patches to common elements. 
 
 And note that there does not have to be a 1:1 relationship between a git
 commit and a darcs patch. For example, say that you have a darcs patch
 that does a combination of change token x to token y in 100 files and
 rename file a into b. I don't know if you do those kind of combination 
 patches at all, but if you do, why not just split them up into two? That 
 way the list of files changed _does_ 100% determine the list of files for 
 the token exchange.

We do allow multiple sorts of changes (in darcs terminology, multiple
primitive patches) in a single patch.

One *could* have multiple git commits for a single darcs patch, but that
seems ugly and I'd rather avoid it.  In my view, revision control system is
more about communication than history (which is why by default, darcs
doesn't do history), and grouping changes together is how we express
which changes go together.  Of course, we could still have a grouping at
a higher level, so that a single changeset could consist of multiple git
commits (for example by recognizing that identical commit logs mean that
it's a single change), but that adds a layer of complexity that I'd like to
avoid if possible.
-- 
David Roundy
http://www.darcs.net
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [darcs-devel] Darcs and git: plan of action

2005-04-20 Thread David Roundy
On Tue, Apr 19, 2005 at 02:25:18PM +0200, Petr Baudis wrote:
 Dear diary, on Tue, Apr 19, 2005 at 02:20:55PM CEST, I got a letter
 where Juliusz Chroboczek [EMAIL PROTECTED] told me that...
   The problem is that there is no sequence of alien versions that one
   can differentiate.  Git has a branched history, with each version
   that follows a merge having multiple parents.
  
  Yep.  I've just realised that this morning.  Is there some notion of
  ``primary parent'' as in Arch?  Can a changeset have 0 parents?
 
 Yes, the root commit. Usually, there is only one, but there may be
 multiple of them theoretically.

Incidentally (and completely off-topic for this thread), wouldn't there be
a sha1 tree hash corresponding to a completely empty directory, and
couldn't one use that as the parent for the root? Would there be any reason
to do so? Just a silly thought...
-- 
David Roundy
http://www.darcs.net
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [darcs-devel] Darcs and git: plan of action

2005-04-20 Thread David Roundy
On Tue, Apr 19, 2005 at 02:20:55PM +0200, Juliusz Chroboczek wrote:
 [Removing Linus from CC, keeping the Git list -- or should we remove it?]

I think leaving much of this on git would be appropriate, since there are
issues of how to relate to git that should be relevant.

  If we do it right (automatically tagging like crazy people), darcs
  users between themselves can cherry-pick all they like, without
  introducing inconsistencies or losing interoperability with git.
 
 You've lost me here.  How can you cherry-pick if every tag depends on
 the preceding patches?  Or are you thinking of pulling just the patch
 and not the tag -- in that case, what happens when you push to git a
 Darcs patch that depends on a patch that originated with git?

Yes, I'm thinking of pulling patches from one darcs repo to another.  If we
cherry-pick in this way, we need to create a git-tag for each patch that
we pull without its associated tag.  To git, this would look like two
separate changes that have the same commit log, except that they have
different parents and different commiters and commit dates.

I don't think this will be a problem for git, and since darcs will
recognize the two patches as the identical darcs patch (we'll need to put
somewhere in the git commit log a magic word indicating that this patch
originated in darcs), there won't be a problem for darcs either.

In case I haven't been clear (which seems likely), the scenario is that
darcs user 1 makes the following changes to his darcs version of a
git-based repository:

changes in 1: A - B
tags in 1:A1   B1

Darcs user 2 wants B, but not A, and didn't do any development:

changes in 2: B
tags in 2:B2

User 2 pushes to git, and now git has (where P is the parent of both of the
above):

git:
P - B/B2  (where B/B2 is the commit log with B2 as committer info and B
as the author info and long comment)

User 1 pushes (everything) to git and merges the two (patch M, which has
two parents, B1 and B2:

git:

   -B/B2-
  /   \
P-- A/A1 - B/B1--- M

It's a little lame, and if user 2 doesn't do any real work, the git-using
person might be annoyed, but I think it's doable.
-- 
David Roundy
http://www.darcs.net
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)

2005-04-20 Thread Jon Seymour
On 4/20/05, Linus Torvalds [EMAIL PROTECTED] wrote:
 
 
 I converted my git archives (kernel and git itself) to do the SHA1 hash
 _before_ the compression phase.
 

Linus,
 
 Am I correct to understand that with this change, all the objects in
the database are still being compressed (so no net performance benefit
now), but by doing the SHA1 calculations before compression you are
keeping open the possibility that at some point in the future you may
use a different compression technique (including none at all) for some
or all of the objects?

jon.

[ reposted to list, because list post was bounced because of rich text
formatting ]
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[git] simplify Makefile

2005-04-20 Thread Andre Noll
Use a generic rule for executables that depend only on the corresponding
.o and on $(LIB_FILE).

Signed-Off-By: Andre Noll [EMAIL PROTECTED]
---

Makefile |   49 ++---
 1 files changed, 2 insertions(+), 47 deletions(-)

Makefile: cd299f850679b2456e360d3aa6a2d529855ba7a5
--- a/Makefile
+++ b/Makefile
@@ -34,62 +34,17 @@ LIBS= $(LIB_FILE) -lssl -lz
 
 init-db: init-db.o
 
-update-cache: update-cache.o $(LIB_FILE)
-   $(CC) $(CFLAGS) -o update-cache update-cache.o $(LIBS)
-
-show-diff: show-diff.o $(LIB_FILE)
-   $(CC) $(CFLAGS) -o show-diff show-diff.o $(LIBS)
-
-write-tree: write-tree.o $(LIB_FILE)
-   $(CC) $(CFLAGS) -o write-tree write-tree.o $(LIBS)
-
-read-tree: read-tree.o $(LIB_FILE)
-   $(CC) $(CFLAGS) -o read-tree read-tree.o $(LIBS)
-
-commit-tree: commit-tree.o $(LIB_FILE)
-   $(CC) $(CFLAGS) -o commit-tree commit-tree.o $(LIBS)
-
-cat-file: cat-file.o $(LIB_FILE)
-   $(CC) $(CFLAGS) -o cat-file cat-file.o $(LIBS)
-
 fsck-cache: fsck-cache.o $(LIB_FILE) object.o commit.o tree.o blob.o
$(CC) $(CFLAGS) -o fsck-cache fsck-cache.o $(LIBS)
 
-checkout-cache: checkout-cache.o $(LIB_FILE)
-   $(CC) $(CFLAGS) -o checkout-cache checkout-cache.o $(LIBS)
-
-diff-tree: diff-tree.o $(LIB_FILE)
-   $(CC) $(CFLAGS) -o diff-tree diff-tree.o $(LIBS)
-
 rev-tree: rev-tree.o $(LIB_FILE) object.o commit.o tree.o blob.o
$(CC) $(CFLAGS) -o rev-tree rev-tree.o $(LIBS)
 
-show-files: show-files.o $(LIB_FILE)
-   $(CC) $(CFLAGS) -o show-files show-files.o $(LIBS)
-
-check-files: check-files.o $(LIB_FILE)
-   $(CC) $(CFLAGS) -o check-files check-files.o $(LIBS)
-
-ls-tree: ls-tree.o $(LIB_FILE)
-   $(CC) $(CFLAGS) -o ls-tree ls-tree.o $(LIBS)
-
 merge-base: merge-base.o $(LIB_FILE) object.o commit.o tree.o blob.o
$(CC) $(CFLAGS) -o merge-base merge-base.o $(LIBS)
 
-merge-cache: merge-cache.o $(LIB_FILE)
-   $(CC) $(CFLAGS) -o merge-cache merge-cache.o $(LIBS)
-
-unpack-file: unpack-file.o $(LIB_FILE)
-   $(CC) $(CFLAGS) -o unpack-file unpack-file.o $(LIBS)
-
-git-export: git-export.o $(LIB_FILE)
-   $(CC) $(CFLAGS) -o git-export git-export.o $(LIBS)
-
-diff-cache: diff-cache.o $(LIB_FILE)
-   $(CC) $(CFLAGS) -o diff-cache diff-cache.o $(LIBS)
-
-convert-cache: convert-cache.o $(LIB_FILE)
-   $(CC) $(CFLAGS) -o convert-cache convert-cache.o $(LIBS)
+%: %.o $(LIB_FILE)
+   $(CC) $(CFLAGS) -o $@ $ $(LIBS)
 
 blob.o: $(LIB_H)
 cat-file.o: $(LIB_H)
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)

2005-04-20 Thread Martin Uecker
On Wed, Apr 20, 2005 at 10:11:10PM +1000, Jon Seymour wrote:
 On 4/20/05, Linus Torvalds [EMAIL PROTECTED] wrote:
  
  
  I converted my git archives (kernel and git itself) to do the SHA1 hash
  _before_ the compression phase.
  
 
 Linus,
  
  Am I correct to understand that with this change, all the objects in
 the database are still being compressed (so no net performance benefit
 now), but by doing the SHA1 calculations before compression you are
 keeping open the possibility that at some point in the future you may
 use a different compression technique (including none at all) for some
 or all of the objects?

The main point is not about trying different compression
techniques but that you don't need to compress at all just
to calculate the hash of some data. (to know if it is
unchanged for example)

There are still some other design decisions I am worried
about:

The storage method of the database of a collection of
files in the underlying file system. Because of the
random nature of the hashes this leads to a horrible
amount of seeking for all operations which walk the
logical structure of some tree stored in the database.

Why not store all objects linearized in one or more
flat file?


The other thing I don't like is the use of a sha1
for a complete file. Switching to some kind of hash
tree would allow to introduce chunks later. This has
two advantages:

It would allow git to scale to repositories of large
binary files. And it would allow to build a very cool
content transport algorithm for those repositories.
This algorithm could combine all the advantages of
bittorrent and rsync (without the cpu load).


And it would allow trivial merging of patches which
apply to different chunks of a file in exact the same
way as merging changesets which apply to different
files in a tree.


Martin

-- 
One night, when little Giana from Milano was fast asleep,
she had a strange dream.



signature.asc
Description: Digital signature


Blob chunking code. [First look.]

2005-04-20 Thread C. Scott Ananian
So I wrote up my ideas regarding blob chunking as code; see attached.
This is against git-0.4 (I know, ancient, but I had to start somewhere.)
The idea here is that blobs are chunked using a rolling checksum (so the 
chunk boundaries are content-dependent and stay fixed even if you mutate 
pieces of the file).  The chunks are then tree-structured as 'treaps', 
which will ensure that chunk trees can be profitably reused.  (If you 
create a flat 'chunk index' instead of tree-structuring it, then you need 
to write two files even if you make a small change to a small file.  If 
you use a full binary tree, then insertions at the beginning (say) still 
change the entire tree structure.  The treap ensures that on avg O(ln N) 
chunks need to be written per change, where N is the number of chunks in 
the file).  More details are in the code.

Compatibility with existing archives in git-0.4 was tricky, because of 
git's 'compress-before-hash' thingy.  Moving to 'hash before compress' is 
*much* better, although because the file size is included in the hash, I will 
need to perform (the equivalent of) O(ln N) hashes of the complete file.
If the file size weren't included, or if it were put at the end, then 2 
hashes would suffice. (Basically, we can save work hashing subranges which 
are prefix-identical, but including the length means that no subtrees are
prefix-identical.)

I'll work on bringing this forward to the latest git, but I thought I'd 
post it here for early reviews and comments.  My informal testing shows 
that 1) my chunk size is currently too small, and 2) subrange sharing 
works well even on relatively small files.  I'll be working on getting 
concrete numbers for larger archives.
 --scott

DNC NRA Kojarena ESCOBILLA QKENCHANT STANDEL shotgun ESGAIN KGB Mossad 
overthrow ASW cracking HOPEFUL KUBARK counter-intelligence Yakima
 ( http://cscott.net/ )
--- begin chunk.c --
#include stdlib.h

/* we could be clever and do this even if we don't fit in memory...
 * ... but we're going to be quick and dirty. */
/* C source has approx 5 bits per character of entropy.
 * We'd like to get 32 bits of good entropy; that means 7 bytes is a
 * reasonable minimum for the window size. */
#define ROLLING_WINDOW 30
#define CHUNK_SIZE 1023 /* desired block size */
#include assert.h
#include cache.h
/*
 * This file implements a treap-based chunked content store.  The
 * idea is that every stored file is broken down into tree-structured
 * chunks (that is, every chunk has an optional 'prefix' and 'suffix'
 * chunk), and these chunks are put in the object store.  This way
 * similar files will be expected to share chunks, saving space.
 * Files less than one disk block long are expected to fit in a single
 * chunk, so there is no extra indirection overhead for this case.
 */
/* First, some data structures: */
struct chunk {
/* a chunk represents some range of the underlying file */
size_t start /* inclusive */, end /*exclusive*/;
unsigned char sha1[20]; /* sha1 for this chunk; used as the heap key */
};
struct chunklist {
/* a dynamically-sized list of chunks */
struct chunk *chunk; /* an array of chunks */
size_t num_items; /* how many items are currently in the list */
size_t allocd;/* how many items we've allocated space for */
};
struct treap {
/* A treap node represents a run of consecutive chunks. */
struct chunk *chunk; /* some chunk in the run. */
/* treaps representing the run before 'chunk' (left) and
 * after 'chunk' (right).  */
struct treap *left, *right;
/* sha1 for the run represented by this treap */
unsigned char sha1[20];
};
static struct chunklist *
create_chunklist(int expected_items) {
struct chunklist *cl = malloc(sizeof(*cl));
cl-num_items = 0;
cl-allocd = expected_items;
cl-chunk = malloc(sizeof(cl-chunk[0]) * cl-allocd);
return cl;
}
static void
free_chunklist(struct chunklist *cl) {
free(cl-chunk);
free(cl);
}
/* Add a chunk to the chunk list, calculating its SHA1 in the process. */
/* The chunk includes buf[start] to buf[end-1].*/
static void
add_chunk(struct chunklist *cl, char *buf, size_t start, size_t end) {
struct chunk *ch;
SHA_CTX c;
assert(startend); assert(cl); assert(buf);
if (cl-num_items = cl-allocd) {
cl-allocd = cl-allocd*3/2;
cl-chunk = realloc(cl-chunk, cl-allocd * sizeof(*(cl-chunk)));
}
assert(cl-num_items  cl-allocd);
ch = cl-chunk + (cl-num_items++);
ch-start = start;
ch-end = end;
// compute SHA-1
SHA1_Init(c);
SHA1_Update(c, buf+start, end-start);
SHA1_Final(ch-sha1, c);
// done!
}
/* Split a buffer into chunks, using a rolling checksum over ROLLING_WINDOW
 * bytes to determine chunk boundaries.  We try to split chunks into pieces
 * whose size averages out to be 'CHUNK_SIZE'. */
static void
chunkify(struct chunklist *cl, char *buf, size_t size) {
int i, 

Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)

2005-04-20 Thread Morten Welinder
On 4/20/05, Martin Uecker [EMAIL PROTECTED] wrote:

 The storage method of the database of a collection of
 files in the underlying file system. Because of the
 random nature of the hashes this leads to a horrible
 amount of seeking for all operations which walk the
 logical structure of some tree stored in the database.
 
 Why not store all objects linearized in one or more
 flat file?

I've been thinking along the same lines and it doesn't look too hard
to factor out the
back end, i.e., provide methods to
read/write/stat/remove/mmap/whatever objects.
(Note the mmap there.  Apart from that, the backend could be an http connection
or worse.)

It will, however, seriously break rsync as transport for people who
commit to their trees.
Thus you need an alternative in place before you can present it as an
alternative.

Morten
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)

2005-04-20 Thread Jon Seymour
 The main point is not about trying different compression
 techniques but that you don't need to compress at all just
 to calculate the hash of some data. (to know if it is
 unchanged for example)
 

Ah, ok, I didn't understand that there were extra compresses being
performed for that reason. Thanks for the explanation.

jon.
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)

2005-04-20 Thread David Woodhouse
On Wed, 2005-04-20 at 02:08 -0700, Linus Torvalds wrote:
 I converted my git archives (kernel and git itself) to do the SHA1
 hash _before_ the compression phase.

I'm happy to see that -- because I'm going to be asking you to make
another change which will also require a simple repository conversion. 

We are working on getting the complete history since 2.4.0 into git
form. When it's done and checked (which should be RSN) I'd like you to
edit the first commit object in your tree -- the import of 2.6.12-rc2,
and give it a parent. That parent will be the sha1 hash of the
2.6.12-rc2 commit in the newly-provided history, and of course will
change the sha1 hash of your first commit, and all subsequent commits. 
We'll provide a tool to do that, of course.

The history itself will be absent from your tree. Obviously we'll need
to make sure that the tools can cope with an absentee parent, probably
by just treating that case as if no parent exists. That won't be hard,
it'll be useful for people to prune their trees of unwanted older
history in the general case too. That history won't be lost or undone --
it'll just be archived elsewhere.

The reason for doing this is that without it, we can't ever have a full
history actually connected to the current trees. There'd always be a
break at 2.6.12-rc2, at which point you'd have to switch to an entirely
different git repository.

-- 
dwmw2

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)

2005-04-20 Thread Linus Torvalds


On Wed, 20 Apr 2005, Jon Seymour wrote:
 
 Am I correct to understand that with this change, all the objects in the 
 database are still being compressed (so no net performance benefit), but by 
 doing the SHA1 calculations before compression you are keeping open the 
 possibility that at some point in the future you may use a different 
 compression technique (including none at all) for some or all of the 
 objects?

Correct. There is zero performance benefit to this right now, and the only 
reason for doing it is because it will allow other things to happen.

Note that the other things include:
 - change the compression format to make it cheaper
 - _keep_ the same compression format, but notice that we already have an 
   object by looking at the uncompressed one.

I'm actually leaning towards just #2 at this time. I like how things
compress, and it sure is simple. The fact that we use the equivalent of
-9 may be expensive, but the thing is, we don't actually write new files
that often, and it's just CPU time (no seeking on disk or anything like
that), which tends to get cheaper over time.

So I suspect that once I optimize the tree writing to notice that oh, I
already have this tree object, and thus build it up but never compressing
it, write-tree performance will go up _hugely_ even without removing the
compressioin. Because most of the time, write-tree actually only needs to
create a couple of small new tree objects.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)

2005-04-20 Thread Martin Uecker
On Wed, Apr 20, 2005 at 10:30:15AM -0400, C. Scott Ananian wrote:

Hi,

your code looks pretty cool. thank you!

 On Wed, 20 Apr 2005, Martin Uecker wrote:
 
 The other thing I don't like is the use of a sha1
 for a complete file. Switching to some kind of hash
 tree would allow to introduce chunks later. This has
 two advantages:
 
 You can (and my code demonstrates/will demonstrate) still use a whole-file 
 hash to use chunking.  With content prefixes, this takes O(N ln M) time 
 (where N is the file size and M is the number of chunks) to compute all 
 hashes; if subtrees can share the same prefix, then you can do this in 
 O(N) time (ie, as fast as possible, modulo a constant factor, which is 
 '2').  You don't *need* internal hashing functions.

I don't understand this paragraph. What is an internal
hash function? Your code seems to do exactly what I want.
The hashes are computed recusively as in a hash tree
with O(N ln N). The only difference between your design
and a design based on a conventional (binary) hash tree
seems to be that data is stored in the intermediate nodes
too. 

 It would allow git to scale to repositories of large
 binary files. And it would allow to build a very cool
 content transport algorithm for those repositories.
 This algorithm could combine all the advantages of
 bittorrent and rsync (without the cpu load).
 
 Yes, the big benefit of internal hashing is that it lets you check 
 validity of a chunk w/o having the entire file available.  I'm not sure 
 that's terribly useful in this case.  [And, if it is, then it can 
 obviously be done w/ other means.]

If I don't miss anything essential, you can validate
each treap piece at the moment you get it from the
network with its SHA1 hash and then proceed with
downloading the prefix and suffix tree (in parallel
if you have more than one peer a la bittorrent).

 And it would allow trivial merging of patches which
 apply to different chunks of a file in exact the same
 way as merging changesets which apply to different
 files in a tree.
 
 I'm not sure anyone should be looking at chunks.  To me, at least, they 
 are an object-store-implementation detail only.  For merging, etc, we 
 should be looking at whole files, or (better) the whole repository.
 The chunking algorithm is guaranteed not to respect semantic boundaries 
 (for *some* semantics of *some* file).

You might be right. I just wanted to point out this
possibility because it would allow to avoid calling
external merging code for a lot of trivial merges.

bye,
Martin



-- 
One night, when little Giana from Milano was fast asleep,
she had a strange dream.



signature.asc
Description: Digital signature


Re: [PATCH] write-tree performance problems

2005-04-20 Thread Chris Mason
On Wednesday 20 April 2005 02:43, Linus Torvalds wrote:
 On Tue, 19 Apr 2005, Chris Mason wrote:
  I'll finish off the patch once you ok the basics below.  My current code
  works like this:

 Chris, before you do anything further, let me re-consider.

 Assuming that the real cost of write-tree is the compression (and I think
 it is), I really suspect that this ends up being the death-knell to my
 use the sha1 of the _compressed_ object approach. 

Thanks for looking at this.  Your new tree is faster, it gets the commit 100 
patches time down from 1m5s to 50s.  I've attached my patch from last night, 
which is mostly a rough guess of the changes we would need, I haven't 
validated or cleaned things up.

With the basic changes I described before, the  100 patch time only goes down 
to 40s.  Certainly not fast enough to justify the changes.  In this case, the 
bulk of the extra time comes from write-tree writing the index file, so I 
split write-tree.c up into libwrite-tree.c, and created update-cache 
--write-tree.

This gets our time back down to 21s.

The attached patch is not against your latest revs.  After updating I would 
need to sprinkle a few S_ISDIR checks into diff-cache.c and checkout-cache.c, 
but the changes should be small.

-chris
Index: Makefile
===
--- dbeacafeb442bcfd39dfdc90c360d47d4215c185/Makefile  (mode:100644 sha1:6a04941a337ec50da06cf4cf52aa58f3b1435776)
+++ 27e71cd40ff1dccfbbd996427833fd7bac714dde/Makefile  (mode:100644 sha1:2ba6d49196e8a2335cfcd77ec0dbe9cda3e402dd)
@@ -29,7 +29,7 @@
 
 VERSION= VERSION
 
-LIB_OBJS=read-cache.o sha1_file.o usage.o object.o commit.o tree.o blob.o
+LIB_OBJS=read-cache.o sha1_file.o usage.o object.o commit.o tree.o blob.o libwrite-tree.o
 LIB_FILE=libgit.a
 LIB_H=cache.h object.h
 
Index: cache.h
===
--- dbeacafeb442bcfd39dfdc90c360d47d4215c185/cache.h  (mode:100644 sha1:c182ea0c5c1def37d899f9a05f8884ebe17c9d92)
+++ 27e71cd40ff1dccfbbd996427833fd7bac714dde/cache.h  (mode:100644 sha1:0882b713222b71e67c9dab5d58ab6f15c3c49ed6)
@@ -74,7 +74,7 @@
 #define ce_stage(ce) ((CE_STAGEMASK  ntohs((ce)-ce_flags))  CE_STAGESHIFT)
 
 #define ce_permissions(mode) (((mode)  0100) ? 0755 : 0644)
-#define create_ce_mode(mode) htonl(S_IFREG | ce_permissions(mode))
+#define create_ce_mode(mode) htonl((mode  (S_IFREG|S_IFDIR)) | ce_permissions(mode))
 
 #define cache_entry_size(len) ((offsetof(struct cache_entry,name) + (len) + 8)  ~7)
 
Index: libwrite-tree.c
===
--- /dev/null  (tree:dbeacafeb442bcfd39dfdc90c360d47d4215c185)
+++ 27e71cd40ff1dccfbbd996427833fd7bac714dde/libwrite-tree.c  (mode:100644 sha1:52202930d02b3721f5a388ae1178c5a4d99ec1b4)
@@ -0,0 +1,174 @@
+/*
+ * GIT - The information manager from hell
+ *
+ * Copyright (C) Linus Torvalds, 2005
+ */
+#include cache.h
+
+struct new_ce {
+	struct new_ce *next;
+	struct cache_entry ce;
+};
+
+static struct new_ce *add_list = NULL;
+
+static int check_valid_sha1(unsigned char *sha1)
+{
+	char *filename = sha1_file_name(sha1);
+	int ret;
+
+	/* If we were anal, we'd check that the sha1 of the contents actually matches */
+	ret = access(filename, R_OK);
+	if (ret)
+		perror(filename);
+	return ret;
+}
+
+static int prepend_integer(char *buffer, unsigned val, int i)
+{
+	buffer[--i] = '\0';
+	do {
+		buffer[--i] = '0' + (val % 10);
+		val /= 10;
+	} while (val);
+	return i;
+}
+
+#define ORIG_OFFSET (40)	/* Enough space to add the header of tree size\0 */
+
+static int write_tree(struct cache_entry **cachep, int maxentries, const char *base, int baselen, unsigned char *returnsha1)
+{
+	unsigned char subdir_sha1[20];
+	unsigned long size, offset;
+	char *buffer;
+	int i, nr;
+
+	/* Guess at some random initial size */
+	size = 8192;
+	buffer = malloc(size);
+	offset = ORIG_OFFSET;
+
+	nr = 0;
+	do {
+		struct cache_entry *ce = cachep[nr];
+		const char *pathname = ce-name, *filename, *dirname;
+		int pathlen = ce_namelen(ce), entrylen;
+		unsigned char *sha1;
+		unsigned int mode;
+
+		/* Did we hit the end of the directory? Return how many we wrote */
+		if (baselen = pathlen || memcmp(base, pathname, baselen))
+			break;
+
+		sha1 = ce-sha1;
+		mode = ntohl(ce-ce_mode);
+
+		/* Do we have _further_ subdirectories? */
+		filename = pathname + baselen;
+		dirname = strchr(filename, '/');
+		if (dirname) {
+			int subdir_written;
+			int len = dirname - pathname;
+			unsigned int size = cache_entry_size(len);
+			struct new_ce *new_ce = malloc(size + sizeof(struct new_ce *));
+			struct cache_entry *c = new_ce-ce;
+			subdir_written = write_tree(cachep + nr, maxentries - nr, pathname, dirname-pathname+1, subdir_sha1);
+			nr += subdir_written - 1;
+
+			/* Now we need to write out the directory entry into this tree.. */
+			mode = S_IFDIR;
+			pathlen = dirname - pathname;
+
+			sha1 = subdir_sha1;
+
+			memset(c, 0, size);
+
+			/* 

Re: [PATCH 1/4] Accept commit in some places when tree is needed.

2005-04-20 Thread Linus Torvalds


On Tue, 19 Apr 2005, Junio C Hamano wrote:
 
 This patch lifts the tree-from-tree-or-commit logic from
 diff-cache.c and moves it to sha1_file.c, which is a common
 library source for the SHA1 storage part.

I don't think that's a good interface. It changes the sha1 passed into it: 
that may actually be nice, since you may want to know what it changed to, 
but I think you'd want to have that as an (optional) separate 
sha1_result parameter. 

Also, the type or size things make no sense to have as a parameter 
at all.

IOW, it was fine when it was an internal hacky thing in diff-cache, but 
once it's promoted to be a real library function it should definitely be 
cleaned up to have sane interfaces that make sense in general, and not 
just within the original context.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] write-tree performance problems

2005-04-20 Thread C. Scott Ananian
On Wed, 20 Apr 2005, Chris Mason wrote:
With the basic changes I described before, the  100 patch time only goes down
to 40s.  Certainly not fast enough to justify the changes.  In this case, the
bulk of the extra time comes from write-tree writing the index file, so I
split write-tree.c up into libwrite-tree.c, and created update-cache
--write-tree.
Hmm.  Are our index files too large, or is there some other factor?
I was considering using a chunked representation for *all* files (not just 
blobs), which would avoid the original 'trees must reference other trees 
or they become too large' issue -- and maybe the performance issue you're 
referring to, as well?
 --scott

Boston MI6 quiche LPMEDLEY BLUEBIRD PBSUCCESS jihad biowarfare non-violent protest 
Yakima NRA EZLN DES hack SARANAC KMPLEBE Echelon PBCABOOSE security
 ( http://cscott.net/ )
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] write-tree performance problems

2005-04-20 Thread Linus Torvalds


On Wed, 20 Apr 2005, C. Scott Ananian wrote:
 
 Hmm.  Are our index files too large, or is there some other factor?

They _are_ pretty large, but they have to be,

For the kernel, the index file is about 1.6MB. That's 

 - 17,000+ files and filenames
 - stat information for all of them
 - the sha1 for them all

ie for the kernel it averages to 93.5 bytes per file. Which is actually 
pretty dense (just the sha1 and stat information is about half of it, and 
those are required).

 I was considering using a chunked representation for *all* files (not just 
 blobs), which would avoid the original 'trees must reference other trees 
 or they become too large' issue -- and maybe the performance issue you're 
 referring to, as well?

No. The most common index file operation is reading, and that's the one 
that has to be _fast_. And it is - it's a single mmap and some parsing.

In fact, writing it is pretty fast too, exactly because the index file is 
totally linear and isn't compressed or anything fancy like that. It's a 
_lot_ faster than the tree objects, exactly because it doesn't need to 
be as careful.

The main cost of the index file is probably the fact that I add a sha1 
signature of the file into itself to verify that it's ok. The advantage is 
that the signature means that the file is ok, and the parsing of it can be 
much more relaxed. You win some, you lose some.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] write-tree performance problems

2005-04-20 Thread C. Scott Ananian
On Wed, 20 Apr 2005, Linus Torvalds wrote:
I was considering using a chunked representation for *all* files (not just
blobs), which would avoid the original 'trees must reference other trees
or they become too large' issue -- and maybe the performance issue you're
referring to, as well?
No. The most common index file operation is reading, and that's the one
that has to be _fast_. And it is - it's a single mmap and some parsing.
OK, sure.  But how 'bout chunking trees?  Are you grown happy with the new 
trees-reference-other-trees paradigm, or is there a deep longing in your 
heart for the simplicity of 'trees-reference-blobs-period'?  I'm fairly
certain that chunking could get you the space-savings you need without 
multi-level trees, if the simplicity of that is still appealing.

Not necessarily for rev.1 of the chunking code, but I'm curious as to 
whether it's still of interest at all.  I don't know exactly how far
ingrained multilevel trees have become since they were adopted.
 --scott

Japan explosion BLUEBIRD Honduras jihad D5 SLBM Diplomat overthrow 
JMTIDE CABOUNCE AMTHUG ESODIC Kennedy AVBRANDY CLOWER mail drop PHOENIX
 ( http://cscott.net/ )
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)

2005-04-20 Thread Martin Uecker
On Wed, Apr 20, 2005 at 11:28:20AM -0400, C. Scott Ananian wrote:

Hi,
 
 A merkle-tree (which I think you initially pointed me at) makes the hash 
 of the internal nodes be a hash of the chunk's hashes; ie not a straight 
 content hash.  This is roughly what my current implementation does, but
 I would like to identify each subtree with the hash of the 
 *(expanded) contents of that subtree* (ie no explicit reference to 
 subtree hashes).  This makes it interoperable with non-chunked or 
 differently-chunked representations, in that the top-level hash is *just 
 the hash of the complete content*, not some hash-of-subtree-hashes.  Does 
 that make more sense?

Yes, thank you. But I would like to argue against this:

You can make the representations interoperable
if you calculate the hash for the non-chunked
representations exactly as if this file is stored
chunked but simple do not store it in that way.

Of course this is not backward compatible to the
monolithic hash and not compatible with a differently
chunked representation (but you could store subtrees
unchunked if you think your chunks are too small).

 The code I posted doesn't demonstrate this very well, but now that Linus 
 has abandoned the 'hash of compressed content' stuff, my next code posting 
 should show this more clearly.

I think the hash of the treap piece should be calculated
from the hash of the prefix and suffix tree and the already
calculated hash of the uncompressed data. This makes hashing
nearly as cheap as in Linus version which is important
because checking whether a given file has identically
content as a stored version should be fast.

 If I don't miss anything essential, you can validate
 each treap piece at the moment you get it from the
 network with its SHA1 hash and then proceed with
 downloading the prefix and suffix tree (in parallel
 if you have more than one peer a la bittorrent).
 
 Yes, I guess this is the detail I was going to abandon. =)
 
 I viewed the fact that the top-level hash was dependent on the exact chunk 
 makeup a 'misfeature', because it doesn't allow easy interoperability with 
 existing non-chunked repos.

I thought this as a misfeature too before I realized how
many advantages this has.

Martin
 

-- 
One night, when little Giana from Milano was fast asleep,
she had a strange dream.



signature.asc
Description: Digital signature


Re: [PATCH] write-tree performance problems

2005-04-20 Thread David Willmore
On 4/20/05, Linus Torvalds [EMAIL PROTECTED] wrote:
 It really _shouldn't_ be faster. It still does the compression, and throws
 the end result away.

Am I misunderstanding or is the proglem that doing:
file with unknown status - compress - sha1 - compare with existing hash

is expensive?

What about doing:
file it's supposed to be equal to - uncompress - compare with
unknown status file

It's more file I/O, but the uncompress is much cheaper than the compress.

On a second issue, what's the format of the main 'index' file?  Is it:
pathspec sha1hash
pathspec sha1hash 
?
If so, that's not going to compress well.  A file like:
pathspec1
pathspec2

sha1hash1
sha1hash2

Will compress better.

Stop me if I'm way off base--I'm just following the mailing list, I
haven't tried out the code.

Cheers,
David
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] write-tree performance problems

2005-04-20 Thread Linus Torvalds


On Wed, 20 Apr 2005, C. Scott Ananian wrote:
 
 OK, sure.  But how 'bout chunking trees?  Are you grown happy with the new 
 trees-reference-other-trees paradigm, or is there a deep longing in your 
 heart for the simplicity of 'trees-reference-blobs-period'?

I'm pretty sure we do better chunking on a subdirectory basis, especially 
as it allows us to do various optimizations (avoid diffing common parts).

Yes, you could try to do the same optimizations with chunking, but then 
you'd need to make sure that the chunking was always on a full tree entry 
boundary etc - ie much harder than blob chunking. 

But hey, numbers talk, bullshit walks. 

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] write-tree performance problems

2005-04-20 Thread Linus Torvalds


On Wed, 20 Apr 2005, Linus Torvalds wrote:
 
 To actually go faster, it _should_ need this patch. Untested. See if it 
 works..

NO! Don't see if this works. For the sha1 file already exists file, it 
forgot to return the SHA1 value in returnsha1, and would thus corrupt 
the trees it wrote.

So don't apply, don't test. You won't corrupt your archive (you'll just
write bogus tree objects), but if you commit the bogus trees you're going
to be in a world of hurt and will have to undo everything you did.

It's a good test for fsck though. It core-dumps because it tries to add 
references to NULL objects.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] write-tree performance problems

2005-04-20 Thread Chris Mason
On Wednesday 20 April 2005 11:40, Linus Torvalds wrote:
 On Wed, 20 Apr 2005, Chris Mason wrote:
  Thanks for looking at this.  Your new tree is faster, it gets the commit
  100 patches time down from 1m5s to 50s.

 It really _shouldn't_ be faster. It still does the compression, and throws
 the end result away.

Well, that's a little odd.  I had thought about making sure you did this 
change and forgotten.  1 minute benchmarks are a horrible idea since they run 
into noise with cache writebacks.  I should know better...

At any rate, the time for a single write-tree is pretty consistent.  Before it 
was around .5 seconds, and with this change it goes down to .128s.  My patch 
was .024.

The 100 patch time is down to 32s (3 run average).  This is close enough that 
I don't think my patch is worth it if no other part of git can benefit from 
having trees in the index.


 To actually go faster, it _should_ need this patch. Untested. See if it
 works..

Thanks. This one missed the filling in the returnsha1.  New patch attached.

-chris
diff -u linus.back/sha1_file.c linus/sha1_file.c
--- linus.back/sha1_file.c	2005-04-20 12:31:00.240181016 -0400
+++ linus/sha1_file.c	2005-04-20 12:13:56.339837528 -0400
@@ -173,12 +173,27 @@
 	z_stream stream;
 	unsigned char sha1[20];
 	SHA_CTX c;
+	char *filename;
+	int fd;
 
 	/* Sha1.. */
 	SHA1_Init(c);
 	SHA1_Update(c, buf, len);
 	SHA1_Final(sha1, c);
 
+	filename = sha1_file_name(sha1);
+	fd = open(filename, O_WRONLY | O_CREAT | O_EXCL, 0666);
+	if (fd  0) {
+		if (errno != EEXIST)
+			return -1;
+
+		/*
+		 * We might do collision checking here, but we'd need to
+		 * uncompress the old file and check it. Later.
+		 */
+		goto out;
+	}
+
 	/* Set it up */
 	memset(stream, 0, sizeof(stream));
 	deflateInit(stream, Z_BEST_COMPRESSION);
@@ -195,8 +210,10 @@
 	deflateEnd(stream);
 	size = stream.total_out;
 
-	if (write_sha1_buffer(sha1, compressed, size)  0)
-		return -1;
+	if (write(fd, compressed, size) != size)
+		die(unable to write file);
+	close(fd);
+out:		
 	if (returnsha1)
 		memcpy(returnsha1, sha1, 20);
 	return 0;


Re: [PATCH] write-tree performance problems

2005-04-20 Thread Linus Torvalds


On Wed, 20 Apr 2005, Linus Torvalds wrote:
 
 NO! Don't see if this works. For the sha1 file already exists file, it 
 forgot to return the SHA1 value in returnsha1, and would thus corrupt 
 the trees it wrote.

Proper version with fixes checked in. For me, it brings down the time to
write a kernel tree from 0.34s to 0.24s, so a third of the time was just
compressing objects that we ended up already having.

Two thirds to go ;)

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PATCH] I2C and W1 bugfixes for 2.6.12-rc2

2005-04-20 Thread Zlatko Calusic
Linus Torvalds [EMAIL PROTECTED] writes:

 Real merges have no patches taking place _anywhere_. And they take about 
 half a second. Doing an update of your tree should _literally_ boil down 
 to

   #
   # repo needs to point to the repo we update from
   #
   rsync -avz --ignore-existing $repo/objects/. .git/objects/.

I see this -avz incantation mentioned everytime when rsync is
involved. But, is the -z part (compression) really necessary knowing
that we're dealing with an already compressed tree? Doesn't it put
additional strain on the rsync server without any benefit in this
case?

Or I might be too ignorant and not understand some internals well, but
then... I would like to know the reason. :)

Regards,
-- 
Zlatko
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] write-tree performance problems

2005-04-20 Thread Linus Torvalds


On Wed, 20 Apr 2005, Chris Mason wrote:
 
 At any rate, the time for a single write-tree is pretty consistent.  Before 
 it 
 was around .5 seconds, and with this change it goes down to .128s.

Oh, wow.

I bet your SHA1 implementation is done with hand-optimized and scheduled
x86 MMX code or something, while my poor G5 is probably using some slow
generic routine. As a result, it only improved by 33% for me since the
compression was just part of the picture, but with your cheap SHA1 the
compression costs really dominated, and so it's almost four times faster
for you.

Anyway, that's good. It definitely means that I consider tree writing to 
be fast enough. You can commit patches in a third of a second on your 
machine.

I'll consider the problem solved for now. Yeah, I realize that it still 
takes you half a minute to commit the 100 quilt patches, but I just can't 
bring myself to think it's a huge problem in the kind of usage patterns I 
think are realistic.

If somebody really wants to replace quilt with git, he'd need to spend
some effort on it. If you just want to work together reasonably well, I
think 3 patches per second is pretty much there.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Some documentation...

2005-04-20 Thread David Greaves
Hi
I'm starting to write some docs...
Comments... even yep, looks OK, carry on :)
I plan on putting the 'git command' ones into the 'git help ...' 
structure once Petr accepts it.
I guess the low level ones go into a README.reference until they 
stabilise and become man pages...

In doing this I noticed a couple of points:
* update-cache won't accept ./file or fred/./file
* checkout-cache doesn't seem to preserve mode
Are these bugs or should they be documented?
I've taken the approach of documenting behaviour for now.
Signed-off-by: David Greaves [EMAIL PROTECTED]
---

Index: README.reference
===
--- /dev/null  (tree:cf6a46a2199777c3dac32fa4479b97c0752cdf07)
+++ 30de093673d44c7ea8c56a0194fb792e47225ac8/README.reference  (mode:100644 sha1:2ec6683b22e5672ea46d27770fcb1a4b4c37aa0e)
@@ -0,0 +1,158 @@
+Terminology: - see README for description
+Each line contains terms used interchangeably
+
+object database, .git directory
+directory cache, index
+id, sha1, sha1-id, sha1 hash
+type, tag, tagname
+blob, blob object
+tree, tree object
+commit, commit object
+parent
+root object
+changeset
+
+
+cat-file
+	cat-file -t | tagname sha1
+
+Provide contents or type of objects in the repository. The tagname is
+required if it is not being interrogated.
+
+
+sha1
+	The sha1 identifier of the object.
+	(This is the sha1 of the uncompressed content.)
+
+-t
+	show the object type identified by sha1
+	One of: blob/tree/commit
+
+tagname
+	One of: blob/tree/commit
+
+
+
+check-files
+	check-files file...
+
+Check that a list of files are up-to-date between the filesystem and
+the cache. Used to verify a patch target before doing a patch.
+
+Files that do not exist on the filesystem are considered up-to-date
+(whether or not they are in the cache).
+
+Emits an error message on failure.
+
+exits with a status code indicating success if all files are
+up-to-date.
+
+
+see also: update-cache
+
+
+
+checkout-cache
+	checkout-cache [-q] [-a] [-f] [--] file...
+
+Will copy all files listed from the cache to the working directory
+(not overwriting existing files). Note that the file contents are
+restored - NOT the file permissions.
+
+-q
+	be quiet if files exist or are not in the cache
+
+-f
+	forces overwrite of existing files
+
+-a
+	checks out all files in the cache before processing listed
+	files.
+
+Note that the order of the flags matters:
+
+	checkout-cache -a -f file.c
+
+will first check out all files listed in the cache (but not overwrite
+any old ones), and then force-checkout file.c a second time (ie that
+one _will_ overwrite any old contents with the same filename).
+
+Also, just doing checkout-cache does nothing. You probably meant
+checkout-cache -a. And if you want to force it, you want
+checkout-cache -f -a.
+
+Intuitiveness is not the goal here. Repeatability is. The reason for
+the no arguments means no work thing is that from scripts you are
+supposed to be able to do things like
+
+	find . -name '*.h' -print0 | xargs -0 checkout-cache -f --
+
+which will force all existing *.h files to be replaced with their
+cached copies. If an empty command line implied all, then this would
+force-refresh everything in the cache, which was not the point.
+
+Oh, and the -- is just a good idea when you know the rest will be
+filenames. Just so that you wouldn't have a filename of -a causing
+problems (not possible in the above example, but get used to it in
+scripting!).
+
+
+
+commit-id
+	commit-id [tag]
+
+Returns the sha1-id of the commit object associated with given tag.
+
+tag
+	tag of commit object - defaults to the current HEAD.
+
+
+
+commit-tree
+	commit-tree sha1 [-p sha1]*  changelog
+
+
+
+diff-tree
+	diff-tree [-r] [-z] tree sha1 tree sha1
+
+
+
+ls-tree
+	ls-tree [-r] [-z] key
+
+
+
+merge-base
+	merge-base commit-id commit-id
+
+
+
+merge-cache
+	merge-cache merge-program (-a | filename*)
+
+
+
+read-tree
+	read-tree [-m] sha1
+
+
+
+rev-tree
+	rev-tree [--edges] [--cache cache-file] commit-id [commit-id]
+
+
+
+show-diff
+	show-diff [-q] [-s] [-z] [paths...]
+
+
+
+show-files
+	show-files [-z] [-t] 

Re: [GIT PATCH] I2C and W1 bugfixes for 2.6.12-rc2

2005-04-20 Thread Linus Torvalds


On Wed, 20 Apr 2005, Zlatko Calusic wrote:
 
 I see this -avz incantation mentioned everytime when rsync is
 involved. But, is the -z part (compression) really necessary knowing
 that we're dealing with an already compressed tree? Doesn't it put
 additional strain on the rsync server without any benefit in this
 case?
 
 Or I might be too ignorant and not understand some internals well, but
 then... I would like to know the reason. :)

I'm not a big rsync user, so I just copied the examples of others.

You're right, for git, you should not use compression for files (I don't 
know if rsync compresses the directory listings by default, I assume it 
does). 

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [darcs-devel] Darcs and git: plan of action

2005-04-20 Thread Ralph Corderoy

Hi Ray,

 Give me a case where assuming it's a replace will do the wrong thing,
 for C code, where it's a variable or function name.

How about two patches.

1.  s/foo/bar/ throughout file because foo() has been decided upon
as the name of a new globally visible forthcoming function but was
already in use as a static function.

2.  Add definition of new foo().

Patch 1 mustn't be a `darcs replace' despite it changing every occurence
of the C token foo into bar.

Cheers,


Ralph.

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: missing: git api, reference, user manual and mission statement

2005-04-20 Thread Andrew Timberlake-Newell
Petr Baudis graced us with:
 Dear diary, on Tue, Apr 19, 2005 at 02:36:32PM CEST, I got a letter
 where Klaus Robert Suetterlin [EMAIL PROTECTED] told me that...
  1) There is no clear (e.g. by name) distinction between ``git as done
  by Linus'', which is a kind of content addressable database with added
  semantics, and ``git as done by the rest of You'', which is a kind of
  SCM on top of Linuses stuff.
 
 There is git and git-pasky (git-pasky is superset; therefore various
 patches floating around either get to git-pasky or to both). I'm not
 sure what else do you mean.

This goes back to the question of whether to rename git-pasky to cogito.  

Perhaps the crucial question is:  will the git plumbing be used for anything
other than SCM?

If so, then it could be useful to differentiate by program name, so that we
would know whether another project was utilizing git-plumbing or git-SCM.

If not, then there is effectively only one tool and the plumbing is a
[crucial] portion thereof:  a git (SCM and the file system around which it
was built).

So what's the answer to the question?  Anyone planning to use git (the file
system) outside of the SCM?


-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [script] ge: export commits as patches

2005-04-20 Thread Zlatko Calusic
Ingo Molnar [EMAIL PROTECTED] writes:

 TREE1=$(cat-file commit 2/dev/null $1 | head -4 | grep ^tree | cut -d' ' -f2)
 --

And to make it easier on your eyes, you can always rewrite stuff like
that (mentioned everywhere these days :)) like:

TREE1=$(cat-file commit 2/dev/null $1 | awk '/^tree/ {print $2}'
 

No, I'm definitely not trying to save some CPU cycles, CPU cycles are
cheap, eyes are expensive! :)
-- 
Zlatko
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Some documentation...

2005-04-20 Thread C. Scott Ananian
On Wed, 20 Apr 2005, David Greaves wrote:
In doing this I noticed a couple of points:
* update-cache won't accept ./file or fred/./file
The comment in update-cache.c reads:
/*
 * We fundamentally don't like some paths: we don't want
 * dot or dot-dot anywhere, and in fact, we don't even want
 * any other dot-files (.git or anything else). They
 * are hidden, for chist sake.
 *
 * Also, we don't want double slashes or slashes at the
 * end that can make pathnames ambiguous.
 */
It could be argued that './' is a special case... but at the moment this 
is definitely a designed 'feature' not a 'bug'.
 --scott

BLUEBIRD SEQUIN SECANT Waihopai Honduras KUDOVE genetic KUJUMP SCRANTON 
DES AMLASH Indonesia SLINC cracking ESMERALDITE mustard Uzi KUSODA
 ( http://cscott.net/ )
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GNU-arch-dev] [ANNOUNCEMENT] /Arch/ embraces `git'

2005-04-20 Thread duchier
Hi Tom,

just as a datapoint, here is an experiment I carried out.  I wanted to evaluate
how much overhead is incurred by using several levels of directories to
implement a discrimating index.  I used the key format you specified:

SHA1,SIZE

As data, I used my /usr/src/linux which uses 301M and contains 20753 files and
1389 directories.  To compute the key for a directory, I considered that its
contents were a mapping from names to keys.

When constructing the indexed archive, I actually stored empty files instead of
blobs because I am only interested in overhead.

Using your suggested indexing method that uses [0:4] as the 1st level key and
[4:8] as the 2nd level key, I obtain an indexed archive that occupies 159M,
where the top level contains 18665 1st level keys, the largest first level dir
contains 5 entries, and all 2nd level dirs contain exactly 1 entry.

Using Linus suggested 1 level [0:2] indexing, I obtain an indexed archive that
occupies 1.8M, where the top level contains 256 1st level keys, and where the
largest 1st level dir contains 110 entries.

This experiment was performed on an ext3 file system.

Cheers,

--Denys

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Some documentation...

2005-04-20 Thread David Greaves
C. Scott Ananian wrote:
On Wed, 20 Apr 2005, David Greaves wrote:
In doing this I noticed a couple of points:
* update-cache won't accept ./file or fred/./file

The comment in update-cache.c reads:
/*
 * We fundamentally don't like some paths: we don't want
 * dot or dot-dot anywhere, and in fact, we don't even want
 * any other dot-files (.git or anything else). They
 * are hidden, for chist sake.
 *
 * Also, we don't want double slashes or slashes at the
 * end that can make pathnames ambiguous.
 */
It could be argued that './' is a special case... but at the moment this 
is definitely a designed 'feature' not a 'bug'.
Indeed - I've been reading the code to document it as correctly as possible.
But I actually found this by running:
  find . -type f | xargs git add
for a new project - so I'd class it as user unfriendly...
Yes, I know how to get round it :)
I have ensured that my next perl version of gitadd.pl (that I submitted 
to Petr) doesn't allow these files to be added - and it could even 
cleanse leading ./ and any /./ constructs.

So maybe it's left as documented behaviour and higher level tools must 
manage the data they feed to it...

I hope it's useful to raise these niggles now before changing them is 
too hard.

David
--
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Blob chunking code. [Second look]

2005-04-20 Thread C. Scott Ananian
Here's a quick rev of the chunking code.  This is compatible with 
git-current, where the hashes are of the *uncompressed* file.
The 'chunk' file gets dropped in at the same SHA1 filename as the
'blob' file, as it represents identical contents.  Martin won't like
this (because of how the hash is computed), but this is the short-term
direction I want to pursue to validate the concept: it means I can
run a simple converter over all the blob objects and don't have to
rewrite tree and commit objects.

If the approach is seen to have merit, then we can perhaps think about 
doing another bulk repository format conversion where all the hashes
change.  But (IMO) it's a little early to be thinking of this yet.
 --scott

nuclear RUCKUS KUPALM ODACID LA STANDEL Mossad LITEMPO atomic mail drop 
Hussein JUBILIST class struggle SSBN 731 Bush quiche Nazi MKULTRA
 ( http://cscott.net/ )
-  chunk.c --
/*
 * This file implements a treap-based chunked content store.  The
 * idea is that every stored file is broken down into tree-structured
 * chunks (that is, every chunk has an optional 'prefix' and 'suffix'
 * chunk), and these chunks are put in the object store.  This way
 * similar files will be expected to share chunks, saving space.
 * Files less than one disk block long are expected to fit in a single
 * chunk, so there is no extra indirection overhead for this case.
 *
 * Copyright (C) 2005 C. Scott Ananian [EMAIL PROTECTED]
 */

/*
 * We assume that the file and the chunk information all fits in memory.
 * A slightly more-clever implementation would work even if the file
 * didn't fit.  Basically, we could scan it an keep the
 * 'N' lowest heap keys (chunk hashes), where 'N' is chosen to fit
 * comfortably in memory.  These would form the root and top
 * of the resulting treap, constructing it top-down.  Then we'd scan
 * again any only keep the next 'N' lowest heap keys, etc.
 *
 * But we're going to keep things simple.  We do try to maintain locality
 * where possible, so if you need to swap things still shouldn't be too bad.
 */
#include assert.h
#include stdlib.h
#include cache.h
#include chunk.h
typedef unsigned long ch_size_t;
/* Our magic numbers: these can be tuned without breaking files already
 * in the archive, although space re-use is only expected between files which
 * have these constants set to the same values. */
/* The window size determines how much context we use when looking for a
 * chunk boundary.
 * C source has approx 5 bits per character of entropy.
 * We'd like to get 32 bits of good entropy into our boundary checksum;
 * that means 7 bytes is a rough minimum for the window size.
 * 30 bytes is what 'rsyncable zlib' uses; that should be fine. */
#define ROLLING_WINDOW 30
/* The ideal chunk size will fit most chunks into a disk block.  A typical
 * disk block size is 4k, and we expect (say) 50% compression. */
#define CHUNK_SIZE 7901 /* primes are nice to use */
/* Data structures: */
struct chunk {
/* a chunk represents some range of the underlying file */
ch_size_t start /* inclusive */, end /*exclusive*/;
unsigned char sha1[20]; /* sha1 for this chunk; used as the heap key */
};
struct chunklist {
/* a dynamically-sized list of chunks */
struct chunk *chunk; /* an array of chunks */
ch_size_t num_items; /* how many items are currently in the list */
ch_size_t allocd;/* how many items we've allocated space for */
};
struct treap {
/* A treap node represents a run of consecutive chunks. */
/* the start and end of the run: */
ch_size_t start /* inclusive */, end /*exclusive*/;
struct chunk *chunk; /* some chunk in the run. */
/* treaps representing the run before 'chunk' (left) and
 * after 'chunk' (right).  */
struct treap *left, *right;
/* sha1 for the run represented by this treap */
unsigned char sha1[20];
};
static struct chunklist *
create_chunklist(int expected_items) {
struct chunklist *cl = malloc(sizeof(*cl));
cl-num_items = 0;
cl-allocd = expected_items;
cl-chunk = malloc(sizeof(cl-chunk[0]) * cl-allocd);
return cl;
}
static void
free_chunklist(struct chunklist *cl) {
free(cl-chunk);
free(cl);
}
/* Add a chunk to the chunk list, calculating its SHA1 in the process. */
/* The chunk includes buf[start] to buf[end-1].*/
static void
add_chunk(struct chunklist *cl, char *buf, ch_size_t start, ch_size_t end) {
struct chunk *ch;
SHA_CTX c;
assert(startend); assert(cl); assert(buf);
if (cl-num_items = cl-allocd) {
cl-allocd = cl-allocd*3/2;
cl-chunk = realloc(cl-chunk, cl-allocd * sizeof(*(cl-chunk)));
}
assert(cl-num_items  cl-allocd);
ch = cl-chunk + (cl-num_items++);
ch-start = start;
ch-end = end;
/* compute SHA-1 of the chunk. */
SHA1_Init(c);
SHA1_Update(c, buf+start, end-start);
SHA1_Final(ch-sha1, c);
/* done! */
}
/* Split a buffer into chunks, using 

Re: [PATCH] write-tree performance problems

2005-04-20 Thread David S. Miller
On Wed, 20 Apr 2005 10:06:15 -0700 (PDT)
Linus Torvalds [EMAIL PROTECTED] wrote:

 I bet your SHA1 implementation is done with hand-optimized and scheduled
 x86 MMX code or something, while my poor G5 is probably using some slow
 generic routine. As a result, it only improved by 33% for me since the
 compression was just part of the picture, but with your cheap SHA1 the
 compression costs really dominated, and so it's almost four times faster
 for you.

The openssl tree has a i586 optimized SHA1 implementation.
A quick scan of the 0.9.7e tree I happen to have lying around
shows there aren't optimized for other cpus in there, just i586.
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SHA1 hash safety

2005-04-20 Thread David Meybohm
On Tue, Apr 19, 2005 at 06:48:57PM -0400, C. Scott Ananian wrote:
 On Tue, 19 Apr 2005, David Meybohm wrote:
 
 But doesn't this require assuming the distribution of MD5 is uniform,
 and don't the papers finding collisions in less show it's not? So, your
 birthday-argument for calculating the probability wouldn't apply, because
 it rests on the assumption MD5 is uniform, and it isn't.
 
 No, the collision papers don't show this at all.

I didn't mean they showed it directly. I meant by finding collisions in
MD5 quickly, MD5 would have to have some non-uniformity. But that's
nevertheless wrong because uniformness and collision finding ability
aren't related. Sorry to have wasted everyone's time.

Dave
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] write-tree performance problems

2005-04-20 Thread Linus Torvalds


On Wed, 20 Apr 2005, Chris Mason wrote:
 
 Well, the difference there should be pretty hard to see with any benchmark.
 But I was being lazy...new patch attached.  This one gets the same perf 
 numbers, if this is still wrong then I really need some more coffee.

I did my preferred version. Makes a big difference here too.

It would be nicer for the cache to make the index file header be a 
footer, and write it out last - that way we'd be able to do the SHA1 as 
we write rather than doing a two-pass thing. That's for another time.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git-viz tool for visualising commit trees

2005-04-20 Thread Petr Baudis
Dear diary, on Wed, Apr 20, 2005 at 12:08:24PM CEST, I got a letter
where Ingo Molnar [EMAIL PROTECTED] told me that...
 * Petr Baudis [EMAIL PROTECTED] wrote:
just FYI, Olivier Andrieu was kind enough to port his monotone-viz 
  tool to git (http://oandrieu.nerim.net/monotone-viz/ - use the one 
  from the monotone repository). The tool visualizes the history flow 
  nicely; see
  for some screenshots.
 
 really nice stuff! Any plans to include it in git-pasky, via 'git gui' 
 option or so? Also, which particular version has this included - the 
 freshest tarball on the monotone-viz download site doesnt seem to 
 include it.

AFAIK you need Monotone and grab it from the monotone repository.

git gui sounds interesting, but perhaps in longer horizon, and perhaps
not as an integral part of git-pasky. I don't know ocaml and it's rather
large thing.

Point'n'drag merges, anyone? ;-))

-- 
Petr Pasky Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] gittrack.sh accepts invalid branch names

2005-04-20 Thread Pavel Roskin
Hello, Petr and everybody!

gittrack.sh allows abbreviated branch names, e.g. it's possible to run
git track lin when there is a branch called linus.

I believe it's a bug, not a feature.  Please look at this line from
gittrack.sh:

grep -q $(echo -e ^$name\t | sed 's/\./\\./g') .git/remotes

The result of command expansion is subjected to word splitting, which
means the trailing tab is removed as a space.  So grep doesn't see the
tab.

The way to avoid word splitting would be to quote $(), but it would
make the shell code too hairy.  I'm not even sure all shells would
interpret $($name) correctly.

So I decided to use tab directly in the sed expression.  I cannot think
of any portable way to avoid grep completely (q is a GNU sed
extension, and we want to support BSD, I think), so it's still there,
looking for any output from sed.

Signed-off-by: Pavel Roskin [EMAIL PROTECTED]

--- a/gittrack.sh
+++ b/gittrack.sh
@@ -35,7 +35,7 @@ die () {
 mkdir -p .git/heads
 
 if [ $name ]; then
-   grep -q $(echo -e ^$name\t | sed 's/\./\\./g') .git/remotes || \
+   sed -ne /^$name\t/p .git/remotes | grep -q . || \
[ -s .git/heads/$name ] || \
die unknown branch \$name\
 

-- 
Regards,
Pavel Roskin

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git-viz tool for visualising commit trees

2005-04-20 Thread Olivier Andrieu
  Ingo Molnar [EMAIL PROTECTED] [Wed, 20 Apr 2005]:
  
  * Petr Baudis [EMAIL PROTECTED] wrote:
  
 Hi,
   
 just FYI, Olivier Andrieu was kind enough to port his monotone-viz 
   tool to git (http://oandrieu.nerim.net/monotone-viz/ - use the one 
   from the monotone repository). The tool visualizes the history flow 
   nicely; see
   
  http://rover.dkm.cz/~pasky/gitviz1.png
  http://rover.dkm.cz/~pasky/gitviz2.png
  http://rover.dkm.cz/~pasky/gitviz3.png
  http://rover.dkm.cz/~pasky/gitviz4.png
  http://rover.dkm.cz/~pasky/gitviz5.png
  http://rover.dkm.cz/~pasky/gitviz6.png
  http://rover.dkm.cz/~pasky/gitviz7.png
   
   for some screenshots.
  
  really nice stuff! Any plans to include it in git-pasky, via 'git gui' 
  option or so? Also, which particular version has this included - the 
  freshest tarball on the monotone-viz download site doesnt seem to 
  include it.

I'll post a tarball soon. You can also get it from the monotone
repository, but I wouldn't recommend it unless you want to try
monotone as well : that involves a rather large download.

-- 
   Olivier
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Change pull to _only_ download, and git update=pull+merge?

2005-04-20 Thread Petr Baudis
Dear diary, on Wed, Apr 20, 2005 at 09:01:57AM CEST, I got a letter
where Ingo Molnar [EMAIL PROTECTED] told me that...
  [...]
  fatal: unable to execute 'gitmerge-file.sh'
  fatal: merge program failed

Pure stupidity of mine, I forgot to add gitmerge-file.sh to the list of
scripts which get installed.

 another thing: it's confusing that during 'git pull', the rsync output 
 is not visible. Especially during large rsyncs, it would be nice to see 
 some progress. So i usually use a raw rsync not 'git pull', due to this.

Fixed. For further reference, you can also set RSYNC_FLAGS and put
whatever pleases you there.

 yet another thing: what is the canonical 'pasky way' of simply nuking 
 the current files and checking out the latest tree (according to 
 .git/HEAD). Right now i'm using a script to:
 
   read-tree $(tree-id $(cat .git/HEAD))
   checkout-cache -a
 
 (i first do an 'rm -f *' in the working directory)
 
 i guess there's an existing command for this already?

git cancel

-- 
Petr Pasky Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Some documentation...

2005-04-20 Thread Linus Torvalds


On Wed, 20 Apr 2005, David Greaves wrote:
 
 So maybe it's left as documented behaviour and higher level tools must 
 manage the data they feed to it...

That was the plan.

I agree that find . -type f | xargs update-cache --add -- in _theory_ is
a nice thing to do. But in practice, you want to make sure that find 
doesn't incldue the .git directory and that we always use the canonical 
names for all files etc etc.

I could do it in the low-level tools (ie do pathname cleanup there), and
indeed I did exactly that in the original code sequence. However, it very
quickly became obvious that the low-level code really doesn't want to
care, and that it's a lot easier to just do it at a higher level when 
necessary.

For example, if you have to add a sed-script or something that just 
removes '^./' and ^.git/, then that's trivial to do, and it leaves the 
core tools with a very clear agenda in life.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Change pull to _only_ download, and git update=pull+merge?

2005-04-20 Thread Ingo Molnar

* Petr Baudis [EMAIL PROTECTED] wrote:

 Dear diary, on Wed, Apr 20, 2005 at 09:01:57AM CEST, I got a letter
 where Ingo Molnar [EMAIL PROTECTED] told me that...
   [...]
   fatal: unable to execute 'gitmerge-file.sh'
   fatal: merge program failed
 
 Pure stupidity of mine, I forgot to add gitmerge-file.sh to the list of
 scripts which get installed.

another thing is this annoying message:

 rsync: link_stat /linux/kernel/people/torvalds/git.git/tags (in pub) 
 failed: No such file or directory (2)
 rsync error: some files could not be transferred (code 23) at 
 main.c(812)
 client: nothing to do: perhaps you need to specify some filenames or 
 the --recursive option?

you said before that it's harmless, but it's annoying nevertheless as 
one doesnt know for sure whether the pull went fine.

Ingo
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ANNOUNCE] git-pasky-0.6.2 heads-up on upcoming changes

2005-04-20 Thread Petr Baudis
  Hello,

  so I've released git-pasky-0.6.2 (my SCMish layer on top of Linus
Torvalds' git tree history storage system), find it at the usual

http://pasky.or.cz/~pasky/dev/git/

  git-pasky-0.6 has couple of big changes; mainly enhanced git diff,
git patch (to be renamed to cg mkpatch), enhanced git pull and
completely reworked git merge - it now uses the git-core facilities for
merging, and does the merges in-tree. Plenty of smaller stuff, some
bugfixes and some new bugs, and of course regular merging with Linus.

  The most important change for current users is the objects database
SHA1 keys change and (comparatively minor) directory cache format
change. This makes pulling up from older revisions rather difficult.
Linus' instructions _should_ work for you too, basically (you should
replace cat .git/HEAD with cat .git/heads/* or equivalent - note that
convert-tree does not accept multiple arguments so you need to invoke it
multiple times), but I didn't test it well (I did it the lowlevel way
completely since I needed to simultaneously merge with Linus).

  But if you can't be bothered by this or fear touching stuff like that,
and you do not have any local commits in your tree (it would be pretty
strange if you had and still fear), just fetch the tarball (which is
preferrable than git init for me since it eats up _significantly_
smaller portion of my bandwidth).

  I had to release git-pasky-0.6.1 since Linus changed the directory
cache format during me releasing git-pasky-0.6. And git-pasky-0.6.2
fixes gitmerge-file.sh script missing in the list of scripts for
install.


  So, now for the heads-up part. We will undergo at least two major
changes now. First, I'll probably make git-pasky to use the directory
cache for the add/rm queues now that we have diff-cache.

  Second, I've decided to straighten up the naming now that we still
have a chance. There will be no git-pasky-0.7, sorry. You'll get
cogito-0.7 instead. I've decided for it since after some consideration
having it named differently is the right thing (tm).

  The short command version will change from 'git' to 'cg', which should
be shorter to type and free the 'git' command for possible eventual
entry gate for the git commands (so that they are more
namespace-friendly, and it might make most sense anyway if we get fully
libgitized; but this is more of long-term ideas).

  The usage changes:

  cg patch - cg mkpatch('patch' is the program which _applies_ it)
  cg apply - cg patch  (analogically to diff | patch)

  cg pull will now always only pull, never merge.

  cg update will do pull + merge.

  cg track will either just set the default for cg update if you pass it
no parameters, or disappear altogether; I think it could default to the
'origin' branch (or 'master' branch for non-master branches if no 'origin'
branch is around), and I'd rather set up some cg admin where you could
set all this stuff - from this to e.g. the committer details [*1*]. You
likely don't need to change the default every day.

  I must say that I'm pretty happy with the Cogito's command set
otherwise, though. I actually think it has now (almost?) all commands
it needs, and it is not too likely that (many) more will be added -
simple means easy to use, which is Cogito's goal. Compare with
the command set of GNU arch clones. ;-)


  [*1*] The committer details in .git would override the environemnt
variables to discourage people of trying to alter them based on
whatever, since that's not what they are supposed to do. They can always
just change the .git stuff if they _really_ need to.


  Comments welcomed, as well as new ideas. Persuading me to change what
I sketched here will need some good arguments, though. ;-)

  Thanks,

-- 
Petr Pasky Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] git-pasky-0.6.2 heads-up on upcoming changes

2005-04-20 Thread Petr Baudis
Dear diary, on Wed, Apr 20, 2005 at 10:56:33PM CEST, I got a letter
where Petr Baudis [EMAIL PROTECTED] told me that...
   cg pull will now always only pull, never merge.
 
   cg update will do pull + merge.

Note that what you will probably do _most_ by far is cg update.
You generally do cg pull only when you want to make sure you have the
latest and greatest when doing some cg diff or whatever, or on your
notebook when getting on an airplane. And you do direct cg merge generally
only on the airplane.

I also forgot one last usage change:

  cg fork BNAME BRANCH_DIR [COMMIT_ID]
  -
  cg fork BRANCH_DIR [BNAME] [COMMIT_ID]

This will bring its usage in sync to both cg export and cg tag.
The branch name will also default to the last element in the
BRANCH_DIR path (that annoyed me a lot, basically writing a thing
two times at single line).

-- 
Petr Pasky Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Change pull to _only_ download, and git update=pull+merge?

2005-04-20 Thread Petr Baudis
Dear diary, on Wed, Apr 20, 2005 at 10:32:35PM CEST, I got a letter
where Ingo Molnar [EMAIL PROTECTED] told me that...
 
 * Petr Baudis [EMAIL PROTECTED] wrote:
 
   yet another thing: what is the canonical 'pasky way' of simply nuking 
   the current files and checking out the latest tree (according to 
   .git/HEAD). Right now i'm using a script to:
   
 read-tree $(tree-id $(cat .git/HEAD))
 checkout-cache -a
   
   (i first do an 'rm -f *' in the working directory)
   
   i guess there's an existing command for this already?
  
  git cancel
 
 hm, that's a pretty unintuitive name though. How about making it 'git 
 checkout' and providing a 'git checkout -f' option to force the 
 checkout? (or something like this)

Since it does not really checkout. Ok, it does, but that's only small
part of it. It just cancels whatever local changes are you doing in the
tree and bring it to consistent state. When you have a merge in progress
and after you see the sheer number of conflicts you decide to get your
hands off, you type just git cancel. Doing basically anything with your
tree (not only local changes checkout would fix, but also various git
operations, including git add/rm and git seek) can be easily fixed by
git cancel.

Dear diary, on Wed, Apr 20, 2005 at 10:45:51PM CEST, I got a letter
where Ingo Molnar [EMAIL PROTECTED] told me that...
 
 * Petr Baudis [EMAIL PROTECTED] wrote:
 
  Dear diary, on Wed, Apr 20, 2005 at 09:01:57AM CEST, I got a letter
  where Ingo Molnar [EMAIL PROTECTED] told me that...
[...]
fatal: unable to execute 'gitmerge-file.sh'
fatal: merge program failed
  
  Pure stupidity of mine, I forgot to add gitmerge-file.sh to the list of
  scripts which get installed.
 
 another thing is this annoying message:
 
  rsync: link_stat /linux/kernel/people/torvalds/git.git/tags (in pub) 
  failed: No such file or directory (2)
  rsync error: some files could not be transferred (code 23) at 
  main.c(812)
  client: nothing to do: perhaps you need to specify some filenames or 
  the --recursive option?
 
 you said before that it's harmless, but it's annoying nevertheless as 
 one doesnt know for sure whether the pull went fine.

Already fixed. (Well, fixed... sent to /dev/null. ;-)

-- 
Petr Pasky Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] git-pasky-0.6.2 heads-up on upcoming changes

2005-04-20 Thread Greg KH
On Wed, Apr 20, 2005 at 10:56:33PM +0200, Petr Baudis wrote:
   The short command version will change from 'git' to 'cg', which should
 be shorter to type and free the 'git' command for possible eventual
 entry gate for the git commands (so that they are more
 namespace-friendly, and it might make most sense anyway if we get fully
 libgitized; but this is more of long-term ideas).

Hm, but there already is a 'cg' program out there:
http://uzix.org/cgvg.html
I use it every day :(

How about 'cog' instead?

Or I can just rename my local copy of cg and try to retrain my
fingers...

thanks,

greg k-h
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Git hangs while executing commit-tree

2005-04-20 Thread Rhys Hardwick
Hey,

The following is a copy of the terminal session in question:

[EMAIL PROTECTED]:~/repo/tmp.repo$ ls
[EMAIL PROTECTED]:~/repo/tmp.repo$ init-db
defaulting to local storage area
[EMAIL PROTECTED]:~/repo/tmp.repo$ ls -l .git
total 4
drwxr-xr-x  258 rhys rhys 4096 2005-04-20 20:52 objects
[EMAIL PROTECTED]:~/repo/tmp.repo$ ls .git/objects/
00  0d  1a  27  34  41  4e  5b  68  75  82  8f  9c  a9  b6  c3  d0  dd  ea  f7
01  0e  1b  28  35  42  4f  5c  69  76  83  90  9d  aa  b7  c4  d1  de  eb  f8
02  0f  1c  29  36  43  50  5d  6a  77  84  91  9e  ab  b8  c5  d2  df  ec  f9
03  10  1d  2a  37  44  51  5e  6b  78  85  92  9f  ac  b9  c6  d3  e0  ed  fa
04  11  1e  2b  38  45  52  5f  6c  79  86  93  a0  ad  ba  c7  d4  e1  ee  fb
05  12  1f  2c  39  46  53  60  6d  7a  87  94  a1  ae  bb  c8  d5  e2  ef  fc
06  13  20  2d  3a  47  54  61  6e  7b  88  95  a2  af  bc  c9  d6  e3  f0  fd
07  14  21  2e  3b  48  55  62  6f  7c  89  96  a3  b0  bd  ca  d7  e4  f1  fe
08  15  22  2f  3c  49  56  63  70  7d  8a  97  a4  b1  be  cb  d8  e5  f2  ff
09  16  23  30  3d  4a  57  64  71  7e  8b  98  a5  b2  bf  cc  d9  e6  f3
0a  17  24  31  3e  4b  58  65  72  7f  8c  99  a6  b3  c0  cd  da  e7  f4
0b  18  25  32  3f  4c  59  66  73  80  8d  9a  a7  b4  c1  ce  db  e8  f5
0c  19  26  33  40  4d  5a  67  74  81  8e  9b  a8  b5  c2  cf  dc  e9  f6
[EMAIL PROTECTED]:~/repo/tmp.repo$ find . -type f
[EMAIL PROTECTED]:~/repo/tmp.repo$ mkdir src
[EMAIL PROTECTED]:~/repo/tmp.repo$ pico src/hello.c
[EMAIL PROTECTED]:~/repo/tmp.repo$ pico Makefile
[EMAIL PROTECTED]:~/repo/tmp.repo$ update-cache -add Makefile src/hello.c
fatal: unknown option -add
[EMAIL PROTECTED]:~/repo/tmp.repo$ update-cache --add Makefile src/hello.c
[EMAIL PROTECTED]:~/repo/tmp.repo$ write-tree
c80156fafbac377ab35beb076090c8320f874f91
[EMAIL PROTECTED]:~/repo/tmp.repo$ commit-tree 
c80156fafbac377ab35beb076090c8320f874f91
Committing initial tree c80156fafbac377ab35beb076090c8320f874f91
 


At this point, the command seems to be just waiting.  I have had it waiting 
for around 2 hours now!  I have tried removing ~/repo/tmp.repo and starting 
over, with exactly the same results.

I was testing git by following the tutorial posted by Tony Luck on this list.  
I updated and built the latest version of git, using git, at around 2000 GMT 
today.  I have attached the Makefile and hello.c if anyone finds them useful.

Thanks for any help,

Rhys
hello: src/hello.c
	cc -o hello -O src/hello.c

#include stdio.h

main()
{
	printf(Hello, world!\n);
}


Re: Git hangs while executing commit-tree

2005-04-20 Thread Petr Baudis
Dear diary, on Wed, Apr 20, 2005 at 11:28:35PM CEST, I got a letter
where Rhys Hardwick [EMAIL PROTECTED] told me that...
 Hey,

Hi,

 [EMAIL PROTECTED]:~/repo/tmp.repo$ commit-tree 
 c80156fafbac377ab35beb076090c8320f874f91
 Committing initial tree c80156fafbac377ab35beb076090c8320f874f91
  
 
 
 At this point, the command seems to be just waiting.  I have had it waiting 
 for around 2 hours now!  I have tried removing ~/repo/tmp.repo and starting 
 over, with exactly the same results.

just type in your commit message and press ctrl-D now. ;-)

If you can't get along by peeking at the source when you get stuck, etc,
you might prefer using git-pasky (http://pasky.or.cz/~pasky/dev/git/),
which will guide you nicely.

-- 
Petr Pasky Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Git hangs while executing commit-tree

2005-04-20 Thread Rhys Hardwick
Cheers for the help!

Rhys

On Wednesday 20 Apr 2005 22:35, Petr Baudis wrote:
 Dear diary, on Wed, Apr 20, 2005 at 11:28:35PM CEST, I got a letter
 where Rhys Hardwick [EMAIL PROTECTED] told me that...

  Hey,

 Hi,

  [EMAIL PROTECTED]:~/repo/tmp.repo$ commit-tree
  c80156fafbac377ab35beb076090c8320f874f91
  Committing initial tree c80156fafbac377ab35beb076090c8320f874f91
 
 
 
  At this point, the command seems to be just waiting.  I have had it
  waiting for around 2 hours now!  I have tried removing ~/repo/tmp.repo
  and starting over, with exactly the same results.

 just type in your commit message and press ctrl-D now. ;-)

 If you can't get along by peeking at the source when you get stuck, etc,
 you might prefer using git-pasky (http://pasky.or.cz/~pasky/dev/git/),
 which will guide you nicely.


-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Git hangs while executing commit-tree

2005-04-20 Thread Linus Torvalds


On Wed, 20 Apr 2005, Rhys Hardwick wrote:

 [EMAIL PROTECTED]:~/repo/tmp.repo$ commit-tree  
 c80156fafbac377ab35beb076090c8320f874f91
 Committing initial tree c80156fafbac377ab35beb076090c8320f874f91
  
 At this point, the command seems to be just waiting.

That's _exactly_ what it's doing. It's waiting for you to write a commit 
message.

Something like

This is my initial commit of Hello World!
^D

will make it happy.

Alternatively, you can certainly just write your message beforehand with 
an editor and just pipe it into commit-tree.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] git-pasky-0.6.2 heads-up on upcoming changes

2005-04-20 Thread Petr Baudis
Dear diary, on Wed, Apr 20, 2005 at 11:19:19PM CEST, I got a letter
where Greg KH [EMAIL PROTECTED] told me that...
 On Wed, Apr 20, 2005 at 10:56:33PM +0200, Petr Baudis wrote:
The short command version will change from 'git' to 'cg', which should
  be shorter to type and free the 'git' command for possible eventual
  entry gate for the git commands (so that they are more
  namespace-friendly, and it might make most sense anyway if we get fully
  libgitized; but this is more of long-term ideas).
 
 Hm, but there already is a 'cg' program out there:
   http://uzix.org/cgvg.html
 I use it every day :(
 
 How about 'cog' instead?

Grm. Cg is also name of some scary NVidia thing, and cog is GNOME
Configurator. CGT are Chimera Grid Tools, but I think we can clash
with those - at least *I* wouldn't mind. ;-)

-- 
Petr Pasky Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] git-pasky-0.6.2 heads-up on upcoming changes

2005-04-20 Thread Mike Taht
I keep thinking perversely that we need something as obtuse as possible
in the unix tradition, but easy to type... git requires that the fingers
move off the home row...
how about asdf or jkl?  :)
cg is singularly uncomfortable to type. I think that's why it isn't 
commonly used.

Greg KH wrote:
On Wed, Apr 20, 2005 at 10:56:33PM +0200, Petr Baudis wrote:
 The short command version will change from 'git' to 'cg', which should
be shorter to type and free the 'git' command for possible eventual
entry gate for the git commands (so that they are more
namespace-friendly, and it might make most sense anyway if we get fully
libgitized; but this is more of long-term ideas).

Hm, but there already is a 'cg' program out there:
http://uzix.org/cgvg.html
I use it every day :(
How about 'cog' instead?
Or I can just rename my local copy of cg and try to retrain my
fingers...
thanks,
greg k-h
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Mike Taht
  New systems generate new problems.
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] git-pasky-0.6.2 heads-up on upcoming changes

2005-04-20 Thread Randy.Dunlap
On Wed, 20 Apr 2005 23:51:18 +0200 Petr Baudis wrote:

| Dear diary, on Wed, Apr 20, 2005 at 11:19:19PM CEST, I got a letter
| where Greg KH [EMAIL PROTECTED] told me that...
|  On Wed, Apr 20, 2005 at 10:56:33PM +0200, Petr Baudis wrote:
| The short command version will change from 'git' to 'cg', which should
|   be shorter to type and free the 'git' command for possible eventual
|   entry gate for the git commands (so that they are more
|   namespace-friendly, and it might make most sense anyway if we get fully
|   libgitized; but this is more of long-term ideas).
|  
|  Hm, but there already is a 'cg' program out there:
|  http://uzix.org/cgvg.html
|  I use it every day :(
|  
|  How about 'cog' instead?
| 
| Grm. Cg is also name of some scary NVidia thing, and cog is GNOME
| Configurator. CGT are Chimera Grid Tools, but I think we can clash
| with those - at least *I* wouldn't mind. ;-)

I'd rather see you go back to 'tig'...

is there a tig out there?

---
~Randy
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] git-pasky-0.6.2 heads-up on upcoming changes

2005-04-20 Thread Joshua T. Corbin
On 20 April 2005 17:51, Mike Taht wrote:
 I keep thinking perversely that we need something as obtuse as possible
 in the unix tradition, but easy to type... git requires that the fingers
 move off the home row...

 how about asdf or jkl?  :)

 cg is singularly uncomfortable to type. I think that's why it isn't
 commonly used.
Hmm...got to disagree, cg is perfectly comfortable to type here on my dvorak, 
whilst asdf ad jkl are uncomfortable deviations accross the board ;-)

-- 
Regards,
Joshua T. Corbin [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] git-pasky-0.6.2 heads-up on upcoming changes

2005-04-20 Thread Linus Torvalds


On Wed, 20 Apr 2005, Petr Baudis wrote:
 
 Grm. Cg is also name of some scary NVidia thing, and cog is GNOME
 Configurator. CGT are Chimera Grid Tools, but I think we can clash
 with those - at least *I* wouldn't mind. ;-)

I realize that there is probably a law that there has to be a space, but I 
actually personally use tab-completion all the time, and in many ways 
prefer a name that can be completed without having to play games with 
magic bash completion files.

So how about using a dash instead of a space, and making things be

cg-pull
cg-update

etc? You can link them all to the same script if you don't like having 
multiple scripts, and just match with

case $0 in
*-pull)
...
;;
*-update)
...
;;

or something.

Yeah, yeah, it looks different from cvs update, but dammit, wouldn't it 
be cool to just write cg-tabtab and see the command choices? Or 
cg-uptab and get cg-update done for you..

Just because rcs/cvs/everybody-and-his-dog thinks it is cool to have a 
space there and have different meaning for flags depending on whether they 
are before the command or after the command doesn't mean that they are 
necessarily right..

Just an idea,

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] git-pasky-0.6.2 heads-up on upcoming changes

2005-04-20 Thread Steven Cole
Randy.Dunlap wrote:
On Wed, 20 Apr 2005 23:51:18 +0200 Petr Baudis wrote:
| Dear diary, on Wed, Apr 20, 2005 at 11:19:19PM CEST, I got a letter
| where Greg KH [EMAIL PROTECTED] told me that...
|  On Wed, Apr 20, 2005 at 10:56:33PM +0200, Petr Baudis wrote:
| The short command version will change from 'git' to 'cg', which should
|   be shorter to type and free the 'git' command for possible eventual
|   entry gate for the git commands (so that they are more
|   namespace-friendly, and it might make most sense anyway if we get fully
|   libgitized; but this is more of long-term ideas).
|  
|  Hm, but there already is a 'cg' program out there:
|  	http://uzix.org/cgvg.html
|  I use it every day :(
|  
|  How about 'cog' instead?
| 
| Grm. Cg is also name of some scary NVidia thing, and cog is GNOME
| Configurator. CGT are Chimera Grid Tools, but I think we can clash
| with those - at least *I* wouldn't mind. ;-)

I'd rather see you go back to 'tig'...
is there a tig out there?
---
~Randy
Since I was the one who came up with the cogito name, I'll suggest
some alternatives if cogito is unworkable.  This was posted once before,
mostly as a joke, but here goes.
agitato  ag Since Beethoven's Moonlight 3rd mvmt is Presto agitato
and very, very fast, just like git.
legit le or lg  Since git is GPLv2, it's now legit.
Steven
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Git hangs while executing commit-tree

2005-04-20 Thread David Greaves
Linus Torvalds wrote:
On Wed, 20 Apr 2005, Rhys Hardwick wrote:
[EMAIL PROTECTED]:~/repo/tmp.repo$ commit-tree  
c80156fafbac377ab35beb076090c8320f874f91
Committing initial tree c80156fafbac377ab35beb076090c8320f874f91
At this point, the command seems to be just waiting.

That's _exactly_ what it's doing. It's waiting for you to write a commit 
message.

Something like
This is my initial commit of Hello World!
^D
will make it happy.
Alternatively, you can certainly just write your message beforehand with 
an editor and just pipe it into commit-tree.

			Linus
When someone commits the docs I'll submit the next patch for the README:
commit-tree
commit-tree sha1 [-p parent sha1...]  changelog
Creates a new commit object based on the provided tree object and
emits the new commit object id on stdout. If no parent is given then
it is considered to be an initial tree.
A commit comment is read from stdin (max 999 chars)
A commit object usually has 1 parent (a commit after a change) or 2
parents (a merge) although there is no reason it cannot have more than
2 parents.
While a tree represents a particular directory state of a working
directory, a commit represents that state in time, and explains how
to get there.
Normally a commit would identify a new HEAD state, and while git
doesn't care where you save the note about that state, in practice we
tend to just write the result to the file .git/HEAD, so that we can
always see what the last committed state was.
Options
sha1
An existing tree object
-p parent sha1
Each -p indicates a the id of a parent commit object.

Commit Information
A commit encapsulates:
all parent object ids
author name, email and date
committer name and email and the commit time.
If not provided, commit-tree uses your name, hostname and domain to
provide author and committer info. This can be overridden using the
following environment variables.
AUTHOR_NAME
AUTHOR_EMAIL
AUTHOR_DATE
COMMIT_AUTHOR_NAME
COMMIT_AUTHOR_EMAIL
(nb , and CRs are stripped)
see also: write-tree
David
--
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)

2005-04-20 Thread David Woodhouse
On Wed, 2005-04-20 at 07:59 -0700, Linus Torvalds wrote:
 external-parent commit-hash external-parent-ID
 comment for this parent
 
 and the nice thing about that is that now that information allows you to 
 add external parents at any point. 
 
 Why do it like this? First off, I think that the initial import ends up
 being just one special case of the much more _generic_ issue of having
 patches come in from other source control systems 

This isn't about patches coming in from other systems -- it's about
_history_, and the fact that it's imported from another system is just
an implementation detail. It's git history now, and what we have here is
just a special case of wanting to prune ancient git history to keep the
size of our working trees down. You refer to this yourself...

 Secondly, we do need something like this for pruning off history anyway, 
 so that the tools have a better way of saying history has been pruned 
 off than just hitting a missing commit. 

Having a more explicit way of saying history is pruned than just a
reference to a missing commit is a reasonable request -- but I really
don't see how we can do that by changing the now-oldest commit object to
contain an 'external-parent' field. Doing that would change the sha1 of
the commit object in question, and then ripple through all the
subsequent commits.

Come this time next year, if I decide I want to prune anything older
than 2.6.40 from all the trees on my laptop, it has to happen _without_
changing the commit objects which occur after my arbitrarily-chosen
cutoff point.

If we want to have an explicit record of pruning rather than just
copying with a missing object, then I think we'd need to do it with an
external note to say It's OK that commit XXX is missing.

 Thirdly, I don't actually want my new tree to depend on a conversion of
 the old BK tree.
 
 Two reasons: if it's a really full conversion, there are definitely going
 to be issues with BitMover. They do not want people to try to reverse
 engineer how they do namespace merges

Don't think of it as a conversion of the old BK tree. It's just an
import of Linux's development history. This isn't going to help
reverse-engineer how BK does merges; it's just our own revision history.
I'm not sure exactly how Thomas is extracting it, but AIUI it's all
obtainable from the SCCS files anyway without actually resorting to
using BK itself. 

There's nothing here for Larry to worry about. It's not as if we're
actually using BK to develop git by observing BK's behaviour w.r.t
merges and trying to emulate it. Besides -- if we wanted to do that,
we'd need to use the _BK_ version of the tree; the git version wouldn't
help us much anyway.

And given that BK's merges are based on individual files and we're not
going that route with git, it's not clear how much we could lift
directly from BK even if we _were_ going to try that.

 The other reason is just the really obvious one: in the last week, I've
 already changed the format _twice_ in ways that change the hash. As long
 as it's 119MB of data, it's not going to be too nasty to do again.

That's fine. But by the time we settle on a format and actually start
using it in anger, it'd be good to be sure that it _is_ possible to
track development from current trees all the way back -- be that with
explicit reference to pruned history as you suggest, or with absent
parents as I still prefer.

 it's not that it's necessarily the wrong thing to do, but I think it
 is the wrogn thing to do _now_.

OK, time for us to keep arguing over the implementation details of how
we prune history then :)

-- 
dwmw2

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Gnu-arch-users] Re: [GNU-arch-dev] [ANNOUNCEMENT] /Arch/ embraces `git'

2005-04-20 Thread Tomas Mraz
On Wed, 2005-04-20 at 19:15 +0200, [EMAIL PROTECTED] wrote:
...
 As data, I used my /usr/src/linux which uses 301M and contains 20753 files and
 1389 directories.  To compute the key for a directory, I considered that its
 contents were a mapping from names to keys.
I suppose if you used the blob archive for storing many revisions the
number of stored blobs would be much higher. However even then we can
estimate that the maximum number of stored blobs will be in the order of
milions.

 When constructing the indexed archive, I actually stored empty files instead 
 of
 blobs because I am only interested in overhead.
 
 Using your suggested indexing method that uses [0:4] as the 1st level key and
 [0:3]
 [4:8] as the 2nd level key, I obtain an indexed archive that occupies 159M,
 where the top level contains 18665 1st level keys, the largest first level dir
 contains 5 entries, and all 2nd level dirs contain exactly 1 entry.
Yes, it really doesn't make much sense to have so big keys on the
directories. If we would assume that SHA1 is a really good hashing
function so the probability of any hash value is the same this would
allow storing 2^16 * 2^16 * 2^16 blobs with approximately same directory
usage.

 Using Linus suggested 1 level [0:2] indexing, I obtain an indexed archive that
[0:1] I suppose
 occupies 1.8M, where the top level contains 256 1st level keys, and where the
 largest 1st level dir contains 110 entries.
The question is how many entries in directory is optimal compromise
between space and the speed of access to it's files.

If we suppose the maximum number of stored blobs in the order of milions
probably the optimal indexing would be 1 level [0:2] indexing or 2
levels [0:1] [2:3]. However it would be necessary to do some
benchmarking first before setting this to stone.

-- 
Tomas Mraz [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCEMENT] /Arch/ embraces `git'

2005-04-20 Thread C. Scott Ananian
On Wed, 20 Apr 2005, Petr Baudis wrote:
I think one thing git's objects database is not very well suited for are
network transports. You want to have something smart doing the
transports, comparing trees so that it can do some delta compression;
that could probably reduce the amount of data needed to be sent
significantly.
I'm hoping my 'chunking' patches will fix this.  This ought to reduce the 
size of the object store by (in effect) doing delta compression; rsync
will then Do The Right Thing and only transfer the needed deltas.
Running some benchmarks right now to see how well it lives up to this 
promise...
 --scott

terrorist AEROPLANE munitions PAPERCLIP MI5 Morwenstow WSHOOFS CABOUNCE 
colonel Yakima AES MI6 nuclear NSA Cocaine Columbia plastique LICOZY
 ( http://cscott.net/ )
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


chunking (Re: [ANNOUNCEMENT] /Arch/ embraces `git')

2005-04-20 Thread Linus Torvalds


On Wed, 20 Apr 2005, C. Scott Ananian wrote:
 
 I'm hoping my 'chunking' patches will fix this.  This ought to reduce the 
 size of the object store by (in effect) doing delta compression; rsync
 will then Do The Right Thing and only transfer the needed deltas.
 Running some benchmarks right now to see how well it lives up to this 
 promise...

What's the disk usage results? I'm on ext3, for example, which means that
even small files invariably take up 4.125kB on disk (with the inode).

Even uncompressed, most source files tend to be small. Compressed, I'm 
seeing the median blob size being ~1.6kB in my trivial checks. That's 
blobs only, btw.

My point being that about 75% of all blobs already take up less than the
minimal amount of space that most filesystems can sanely allocate. And I'm
_not_ going to say you have to use reiserfs with git.

So the disk fragmentation really does matter. It doesn't help to make a 
file smaller than 4kB, it hurts - while that can be offset by sharing 
chunks, it might not be.

Also, while network performance is important, so is the handshaking on
which objects to get. Lots of small objects potentially need lots of
handshaking to figure out _which_ of the objects to do.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Gnu-arch-users] Re: [GNU-arch-dev] [ANNOUNCEMENT] /Arch/ embraces `git'

2005-04-20 Thread Tomas Mraz
On Wed, 2005-04-20 at 19:15 +0200, [EMAIL PROTECTED] wrote:
...
 As data, I used my /usr/src/linux which uses 301M and contains 20753 files and
 1389 directories.  To compute the key for a directory, I considered that its
 contents were a mapping from names to keys.
I suppose if you used the blob archive for storing many revisions the
number of stored blobs would be much higher. However even then we can
estimate that the maximum number of stored blobs will be in the order of
milions.

 When constructing the indexed archive, I actually stored empty files instead 
 of
 blobs because I am only interested in overhead.
 
 Using your suggested indexing method that uses [0:4] as the 1st level key and
 [0:3]
 [4:8] as the 2nd level key, I obtain an indexed archive that occupies 159M,
 where the top level contains 18665 1st level keys, the largest first level dir
 contains 5 entries, and all 2nd level dirs contain exactly 1 entry.
Yes, it really doesn't make much sense to have so big keys on the
directories. If we would assume that SHA1 is a really good hashing
function so the probability of any hash value is the same this would
allow storing 2^16 * 2^16 * 2^16 blobs with approximately same directory
usage.

 Using Linus suggested 1 level [0:2] indexing, I obtain an indexed archive that
[0:1] I suppose
 occupies 1.8M, where the top level contains 256 1st level keys, and where the
 largest 1st level dir contains 110 entries.
The question is how many entries in directory is optimal compromise
between space and the speed of access to it's files.

If we suppose the maximum number of stored blobs in the order of milions
probably the optimal indexing would be 1 level [0:2] indexing or 2
levels [0:1] [2:3]. However it would be necessary to do some
benchmarking first before setting this to stone.

-- 
Tomas Mraz [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: on when to checksum

2005-04-20 Thread Tom Lord

   From: Linus Torvalds [EMAIL PROTECTED]

   On Wed, 20 Apr 2005, Tom Lord wrote:

I think you have made a mistake by moving the sha1 checksum from the
zipped form to the inflated form.  Here is why:

   I'd have agreed with you (and I did, violently) if it wasn't for the
   performance issues. It makes a huge difference for write-tree, and to me,
   clearly performance _does_ matter.

   Fractions of seconds may not sound like a lot, but they add up. I work 
   with 200-patch series myself all the time, so I'm very sensitive to a 0.3 
   second difference in performance.


How many times per day do you invoke `write-tree' and why?

It takes a large multiple of `0.3s' to get me to take you seriously
on this point.

I have long harbored the suspician that your perceived bandwidth
implies that you process a lot of patches unread or barely read --
implying that your day-to-day bitslingling could/should largely be
handled by an Arch-style patch-queue-manager (a script).

-t
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: on when to checksum

2005-04-20 Thread Linus Torvalds


On Wed, 20 Apr 2005, Tom Lord wrote:
 
 How many times per day do you invoke `write-tree' and why?

Every single commit does a write-tree, so when I merge with Andrew, it's 
usually a series of 100-250 of them in a row.

(Actually, _usualyl_ it's smaller series, but it's the big series that can
be painful enough to matter).

 It takes a large multiple of `0.3s' to get me to take you seriously
 on this point.

The thing is, I don't trickle things in. That would be horribly 
inefficient for me. So I go over the patches, make a mbox, and do them all 
in one go. And then they need to happen _fast_. If it takes 20 minutes, I 
go away for coffee or something, and then if something didn't apply 
half-way through, I will have lost my context.

That's why I want things instant. Not because I have huge daily throughput 
issues, but I have huge _latency_ issues. 

I considered doing a two-level thing, where I first did the stuff in a
light-weigth patch manager, and then batched things up in the background
for the real thing. But the fact is, I don't think it's needed. Not the
way git performs now. If I can apply a hundred patches in a minute or two,
I have not lost the context if it turns out that there is some silly
glitch with one of them.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] git-pasky-0.6.2 heads-up on upcoming changes

2005-04-20 Thread Greg KH
On Thu, Apr 21, 2005 at 12:28:15AM +0200, Petr Baudis wrote:
 Dear diary, on Thu, Apr 21, 2005 at 12:09:06AM CEST, I got a letter
 where Linus Torvalds [EMAIL PROTECTED] told me that...
  Yeah, yeah, it looks different from cvs update, but dammit, wouldn't it 
  be cool to just write cg-tabtab and see the command choices? Or 
  cg-uptab and get cg-update done for you..
 
 I like this idea! :-) I guess that is in fact exactly what I have been
 looking for, and (as probably apparent from the current git-pasky
 structure) I prefer to have the scripts separated anyway.

I agree, it would solve the issue with 'cg' being overloaded, and I too
like the tabtab completion idea.

thanks,

greg k-h
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Possible problem with git-pasky-0.6.2 (patch: **** Only garbage was found in the patch input.)I

2005-04-20 Thread Steven Cole
After getting the latest tarball, and make, make install:

[EMAIL PROTECTED] git-pasky-0.6.2]$ git pull pasky
MOTD:  Welcome to Petr Baudis' rsync archive.
MOTD:
MOTD:  If you are pulling my git branch, please do not repeat that
MOTD:  every five minutes or so - new stuff is likely not going to
MOTD:  appear so fast, and my line is not that thick. Nothing wrong
MOTD:  with pulling every half an hour or so, of course.
MOTD:
MOTD:  Feel free to contact me at [EMAIL PROTECTED], shall you have
MOTD:  any questions or suggestions.


receiving file list ... done
2e/1f16579fdcd9cd5d242f53a3cfaad52ac5d207
3e/f49665799151ced5e03ae1d544b1d67a6b7e5b
74/b4083d67eda87d88a6f92c6c66877bba8bda8a
7f/621eae988378ee776c040a5856e873e41691e1
a2/44b27ac61489b7d7fa4246e82479897d3bb886
a3/87546d148df5718a9c53bbe0cbea441e793d98
a4/6844fcb6afef1f7a2d93f391c82f08ea31
a6/7b79e97f9db01bc270a07f3be9cda610845128
ba/4c6268d14989801b15e87cab98f6a236cc5e7f
f9/3b5e3d8a427d93e7e5125b55b17cd1a9479af9

wrote 228 bytes  read 6 bytes  6466.06 bytes/sec
total size is 1753925  speedup is 17.50

receiving file list ... done

wrote 62 bytes  read 633 bytes  198.57 bytes/sec
total size is 369  speedup is 0.53
Tree change: 
55f9d5042603fff4ddfaf4e5f004d2995286d6d3:a46844fcb6afef1f7a2d93f391c82f08ea31
*100755-100755 blob
a78cf8ccab98861ef7aecb4cb5a79e47d3a84b67-74b4083d67eda87d88a6f92c6c66877bba8bda8a
 gitcancel.sh
Tracked branch, applying changes...
Fast-forwarding 55f9d5042603fff4ddfaf4e5f004d2995286d6d3 - 
a46844fcb6afef1f7a2d93f391c82f08ea31
on top of 55f9d5042603fff4ddfaf4e5f004d2995286d6d3...
patch:  Only garbage was found in the patch input.

This may be a harmless message, but I thought I'd bring it to your attention.

Steven
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GNU-arch-dev] [ANNOUNCEMENT] /Arch/ embraces `git'

2005-04-20 Thread Tom Lord



   From: [EMAIL PROTECTED]

Thank you for your experiment.  I'm not surprised by the 
result but it is very nice to know that my expectations
are right.

I think that to a large extent you are seeing artifacts
of the questionable trade-offs that (reports tell me) the
ext* filesystems make.   With a different filesystem, the 
results would be very different.

I'm imagining a blob database containing may revisions of the linux
kernel.  It will contain millions of blobs.

It's fine that some filesystems and some blob operations work fine
on a directory with millions of files but what about other operations
on the database?   I pity the poor program that has to `readdir' through
millions of files.

That said: I may add an optional flat-directory format to my library,
just to avoid issues such as those you raise over the next couple 
years.

-t

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible problem with git-pasky-0.6.2 (patch: **** Only garbage was found in the patch input.)I

2005-04-20 Thread Petr Baudis
Dear diary, on Thu, Apr 21, 2005 at 01:06:09AM CEST, I got a letter
where Steven Cole [EMAIL PROTECTED] told me that...
 After getting the latest tarball, and make, make install:
 
 Tree change: 
 55f9d5042603fff4ddfaf4e5f004d2995286d6d3:a46844fcb6afef1f7a2d93f391c82f08ea31
 *100755-100755 blob
 a78cf8ccab98861ef7aecb4cb5a79e47d3a84b67-74b4083d67eda87d88a6f92c6c66877bba8bda8a
  gitcancel.sh
 Tracked branch, applying changes...
 Fast-forwarding 55f9d5042603fff4ddfaf4e5f004d2995286d6d3 - 
 a46844fcb6afef1f7a2d93f391c82f08ea31
 on top of 55f9d5042603fff4ddfaf4e5f004d2995286d6d3...
 patch:  Only garbage was found in the patch input.
 
 This may be a harmless message, but I thought I'd bring it to your attention.

This _is_ weird. What does

$ git diff -r 
55f9d5042603fff4ddfaf4e5f004d2995286d6d3:a46844fcb6afef1f7a2d93f391c82f08ea3

tell you? What if you feed it to patch -p1? What if you feed it to git
apply?

Thanks,

-- 
Petr Pasky Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] gittrack.sh accepts invalid branch names

2005-04-20 Thread Petr Baudis
Dear diary, on Wed, Apr 20, 2005 at 09:48:30PM CEST, I got a letter
where Pavel Roskin [EMAIL PROTECTED] told me that...
 --- a/gittrack.sh
 +++ b/gittrack.sh
 @@ -35,7 +35,7 @@ die () {
  mkdir -p .git/heads
  
  if [ $name ]; then
 - grep -q $(echo -e ^$name\t | sed 's/\./\\./g') .git/remotes || \
 + sed -ne /^$name\t/p .git/remotes | grep -q . || \
   [ -s .git/heads/$name ] || \
   die unknown branch \$name\

This fixes the acceptance, but not the choice.

What does the grep -q . exactly do? Just sets error code based on
whether the sed output is non-empty? What about [] instead?

-- 
Petr Pasky Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add help details to git help command. (This time with Perl)

2005-04-20 Thread Petr Baudis
Dear diary, on Tue, Apr 19, 2005 at 09:04:16PM CEST, I got a letter
where David Greaves [EMAIL PROTECTED] told me that...
 I don't love the 'require gitadd.pl' but it's a gradual start...

I hate it, for one. ;-)

 Cogito.pm seems to be a good place for the library stuff.

Sounds sensible.

 git.pl
 passes everything to scripts except gitadd.pl

We've decided to go for the individual scripts directly. :-)

Unfortunately, you didn't send the attachments inline, so I can't
comment on them sensibly.

Perhaps my main problem is now style. I'd prefer you do format it alike
the C sources of git, with 8-chars indentation and such. Also make sure
you use spaces around (or after) operators. Also, for just few short
functions I prefer putting the functions before the code itself.

 use IO::File;   # leads to less perlish syntax and is standard in perl dists

Oh come on. Are you writing Perl or not? I think it looks pretty awful,
and you are using Perl filehandle idioms anyway, so...

-- 
Petr Pasky Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: on when to checksum

2005-04-20 Thread Tom Lord

(I'll have to study/think about that for a while before a proper
reply.  Tomorrow, probably.)

Thanks,
-t

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: chunking (Re: [ANNOUNCEMENT] /Arch/ embraces `git')

2005-04-20 Thread C. Scott Ananian
On Wed, 20 Apr 2005, Linus Torvalds wrote:
What's the disk usage results? I'm on ext3, for example, which means that
even small files invariably take up 4.125kB on disk (with the inode).
Even uncompressed, most source files tend to be small. Compressed, I'm
seeing the median blob size being ~1.6kB in my trivial checks. That's
blobs only, btw.
I'm working on it.  The format was chosen so that blobs under 1 block long 
*stay* 1 block long; i.e. there's no 'chunk plus index file' overhead.
So the chunking should only kick in on multiple-block files.
I hacked 'convert-cache' to do the conversion and it's running out of
memory on linux-2.6.git, however --- I found a few memory leaks in your 
code =) but I certainly seem to be missing a big one still (maybe it's in 
my code!).

When I get this working properly, my plan is to do a number of runs over 
the linux-2.6 archive trying out various combinations of chunking 
parameters.  I *will* be watching both 'real' disk usage (bunged up to 
block boundaries) and 'ideal' disk usage (on a reiserfs-type system).
The goal is to improve both, but if I can improve 'ideal' usage 
significantly with a minimal penalty in 'real' usage then I would argue 
it's still worth doing, since that will improve network times.

The handshaking penalties you mention are significant, but that's why 
rsync uses a pipelined approach.  The 'upstream' part of your full-duplex 
pipe is 'free' while you've got bits clogging your 'downstream' 
pipe.  The wonders of full-duplex...

Anyway, numbers talk, etc.  I'm working on them.
 --scott
LIONIZER LCPANES shortwave MKSEARCH ESGAIN Saddam Hussein Rijndael 
WASHTUB Morwenstow ZPSEMANTIC SKIMMER cryptographic FJHOPEFUL assassination
 ( http://cscott.net/ )
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Change pull to _only_ download, and git update=pull+merge?

2005-04-20 Thread David Mansfield
Petr Baudis wrote:
Dear diary, on Wed, Apr 20, 2005 at 10:32:35PM CEST, I got a letter
where Ingo Molnar [EMAIL PROTECTED] told me that...
* Petr Baudis [EMAIL PROTECTED] wrote:

yet another thing: what is the canonical 'pasky way' of simply nuking 
the current files and checking out the latest tree (according to 
.git/HEAD). Right now i'm using a script to:

 read-tree $(tree-id $(cat .git/HEAD))
 checkout-cache -a
(i first do an 'rm -f *' in the working directory)
i guess there's an existing command for this already?
git cancel
hm, that's a pretty unintuitive name though. How about making it 'git 
checkout' and providing a 'git checkout -f' option to force the 
checkout? (or something like this)

Since it does not really checkout. Ok, it does, but that's only small
part of it. It just cancels whatever local changes are you doing in the
tree and bring it to consistent state. When you have a merge in progress
and after you see the sheer number of conflicts you decide to get your
hands off, you type just git cancel. Doing basically anything with your
tree (not only local changes checkout would fix, but also various git
operations, including git add/rm and git seek) can be easily fixed by
git cancel.

How about 'git revert'?
Most editors and word processors use that idiom for revert to saved 
copy, with the obvious parallel here.

David
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] git-pasky-0.6.2 heads-up on upcoming changes

2005-04-20 Thread Linus Torvalds


Pasky,
 what do you think about this change to git log?

It makes it a _lot_ easier to parse the result, as it indents all the
comments by two spaces, meaning that the header is clearly marked, and you
can then do various 'sed'/'grep' things with nice normal regular
expressions like '^parent' without having to worry about there being a 
line that starts with parent in the free-form part..

I also think the end result is more readable from a human standpoint, with 
indentation as the way to distinguish the headers from the commentary, 
and less ugly ASCII barfic's with -- etc.

I'm doing a 2.6.12-rc3 release, so I care more than usual about the 
changelog ;)

Linus

---
gitlog.sh: a496a864f9586e47a4d7bd3ae0af0b3e07b7deb8
--- a/gitlog.sh
+++ b/gitlog.sh
@@ -28,7 +28,7 @@ rev-tree $base | sort -rn | while read t
fi
;;
)
-   echo; cat
+   echo; sed 's/^/  /'
;;
*)
echo $key $rest
@@ -36,5 +36,5 @@ rev-tree $base | sort -rn | while read t
esac
 
done
-   echo -e \n--
+   echo
 done
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


(rework) [PATCH 1/5] Accept commit in some places when tree is needed.

2005-04-20 Thread Junio C Hamano
Linus,

sorry for bringing up an issue that is already 8 hours old.

LT I don't think that's a good interface. It changes the sha1 passed into it: 
LT that may actually be nice, since you may want to know what it changed to, 
LT but I think you'd want to have that as an (optional) separate 
LT sha1_result parameter. 

Point taken about _changing_ _is_ _bad_ part.  It was a mistake.

LT Also, the type or size things make no sense to have as a parameter 
LT at all.

Well, the semantics is I want to read the raw data of a tree
and I do not know nor care if this sha1 I got from my user is
for a commit or a tree.  So type does not matter (if it returns
a non NULL we know it is a tree), but the size matters.

And that semantics is not so hacky thing specific to diff-cache.
Rather, it applies in general if you structure the way those
recursive walkers do things.  The recursive walkers in ls-tree,
diff-cache, and diff-tree all expect the caller to supply the
buffer read by sha1_read_buffer, and when it calls itself it
does the same (read-tree's recursing convention is an oddball
that needs to be addressed, though).

When the recursion is structured this way, the only thing you
need to do to allow commit ID from the user when tree ID is
needed, without breaking the error checking done by the part
that recurses down (i.e. we must error on a commit object ID
when we are expecting a tree object ID stored in objects we read
from the tree downwards), is to change the top-level caller to
use I want tree with this tree/commit ID instead of I want a
buffer with this ID and I'll make sure it is a tree myself.
Instead, you make the recursor Give me a buffer and its type,
I'll barf if it is does not say a tree.  When the recursor
calls itself, it reads with read_sha1_file and feeds the result
to itself and have the called do the checking.

The commit_to_tree() thing you introduced in diff-tree.c is
simple to use.  IMHO it is however conceptually a wrong thing to
use in these contexts.  When the user supplies a tree ID, you
first read that object only to see if it is not a commit and
throw it away, then immediately read it again for your real
processing.  In these particular cases of four tree- related
files, I want tree with this tree/commit ID semantics is a
_far_ _better_ match for the problem.

Having said that, here is a reworked version.  This first one 
introduces read_tree_with_tree_or_commit_sha1() function.

end-of-cover-letter

This patch implements read_tree_with_tree_or_commit_sha1(),
which can be used when you are interested in reading an unpacked
raw tree data but you do not know nor care if the SHA1 you
obtained your user is a tree ID or a commit ID.  Before this
function's introduction, you would have called read_sha1_file(),
examined its type, parsed it to call read_sha1_file() again if
it is a commit, and verified that the resulting object is a
tree.  Instead, this function does that for you.  It returns
NULL if the given SHA1 is not either a tree or a commit.

Signed-off-by: Junio C Hamano [EMAIL PROTECTED]
---

 cache.h |4 
 sha1_file.c |   40 
 2 files changed, 44 insertions(+)

cache.h: eab355da5d2f6595053f28f0cca61181ac314ee9
--- a/cache.h
+++ b/cache.h
@@ -124,4 +124,8 @@ extern int error(const char *err, ...);
 
 extern int cache_name_compare(const char *name1, int len1, const char *name2, 
int len2);
 
+extern void *read_tree_with_tree_or_commit_sha1(const unsigned char *sha1,
+   unsigned long *size,
+   unsigned char *tree_sha1_ret);
+
 #endif /* CACHE_H */


sha1_file.c: eee3598bb75e2199045b823f007e7933c0fb9cfe
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -166,6 +166,46 @@ void * read_sha1_file(const unsigned cha
return NULL;
 }
 
+void *read_tree_with_tree_or_commit_sha1(const unsigned char *sha1,
+unsigned long *size,
+unsigned char *tree_sha1_return)
+{
+   char type[20];
+   void *buffer;
+   unsigned long isize;
+   int was_commit = 0;
+   char tree_sha1[20];
+
+   buffer = read_sha1_file(sha1, type, isize);
+
+   /* 
+* We might have read a commit instead of a tree, in which case
+* we parse out the tree_sha1 and attempt to read from there.
+* (buffer + 5) is because the tree sha1 is always at offset 5
+* in a commit record (tree ).
+*/
+   if (buffer 
+   !strcmp(type, commit) 
+   !get_sha1_hex(buffer + 5, tree_sha1)) {
+   free(buffer);
+   buffer = read_sha1_file(tree_sha1, type, isize);
+   was_commit = 1;
+   }
+
+   /*
+* Now do we have something and if so is it a tree?
+*/
+   if (!buffer || strcmp(type, tree)) {
+   free(buffer);
+   return;
+   }
+
+   *size = isize;
+   if 

(rework) [PATCH 3/5] Accept commit in some places when tree is needed.

2005-04-20 Thread Junio C Hamano
Updates diff-tree.c to use read_tree_with_tree_or_commit_sha1()
function.  The command can take either tree or commit IDs with this patch.

Signed-off-by: Junio C Hamano [EMAIL PROTECTED]
---

 diff-tree.c |   25 -
 1 files changed, 4 insertions(+), 21 deletions(-)

diff-tree.c: 65bb9d66c5610b2ede11f03a9120da48c59629f8
--- a/diff-tree.c
+++ b/diff-tree.c
@@ -164,14 +164,13 @@ static int diff_tree_sha1(const unsigned
 {
void *tree1, *tree2;
unsigned long size1, size2;
-   char type[20];
int retval;
 
-   tree1 = read_sha1_file(old, type, size1);
-   if (!tree1 || strcmp(type, tree))
+   tree1 = read_tree_with_tree_or_commit_sha1(old, size1, 0);
+   if (!tree1)
die(unable to read source tree (%s), sha1_to_hex(old));
-   tree2 = read_sha1_file(new, type, size2);
-   if (!tree2 || strcmp(type, tree))
+   tree2 = read_tree_with_tree_or_commit_sha1(new, size2, 0);
+   if (!tree2)
die(unable to read destination tree (%s), sha1_to_hex(new));
retval = diff_tree(tree1, size1, tree2, size2, base);
free(tree1);
@@ -179,20 +178,6 @@ static int diff_tree_sha1(const unsigned
return retval;
 }
 
-static void commit_to_tree(unsigned char *sha1)
-{
-   void *buf;
-   char type[20];
-   unsigned long size;
-
-   buf = read_sha1_file(sha1, type, size);
-   if (buf) {
-   if (!strcmp(type, commit))
-   get_sha1_hex(buf+5, sha1);
-   free(buf);
-   }
-}
-
 int main(int argc, char **argv)
 {
unsigned char old[20], new[20];
@@ -214,7 +199,5 @@ int main(int argc, char **argv)
 
if (argc != 3 || get_sha1_hex(argv[1], old) || get_sha1_hex(argv[2], 
new))
usage(diff-tree tree sha1 tree sha1);
-   commit_to_tree(old);
-   commit_to_tree(new);
return diff_tree_sha1(old, new, );
 }

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


(rework) [PATCH 3/4] Accept commit in some places when tree is needed.

2005-04-20 Thread Junio C Hamano
Updates ls-tree.c to use read_tree_with_tree_or_commit_sha1()
function.  The command can take either tree or commit IDs with
this patch.

Signed-off-by: Junio C Hamano [EMAIL PROTECTED]
---

 ls-tree.c |   11 +--
 1 files changed, 5 insertions(+), 6 deletions(-)

ls-tree.c: c063640c114634dc7cf950ce44863dd17ddf83c1
--- a/ls-tree.c
+++ b/ls-tree.c
@@ -24,9 +24,9 @@ static void print_path_prefix(struct pat
 }
 
 static void list_recursive(void *buffer,
- unsigned char *type,
- unsigned long size,
- struct path_prefix *prefix)
+  const unsigned char *type,
+  unsigned long size,
+  struct path_prefix *prefix)
 {
struct path_prefix this_prefix;
this_prefix.prev = prefix;
@@ -72,12 +72,11 @@ static int list(unsigned char *sha1)
 {
void *buffer;
unsigned long size;
-   char type[20];
 
-   buffer = read_sha1_file(sha1, type, size);
+   buffer = read_tree_with_tree_or_commit_sha1(sha1, size, 0);
if (!buffer)
die(unable to read sha1 file);
-   list_recursive(buffer, type, size, NULL);
+   list_recursive(buffer, tree, size, NULL);
return 0;
 }

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible problem with git-pasky-0.6.2 (patch: **** Only garbage was found in the patch input.)I

2005-04-20 Thread Steven Cole
On Wednesday 20 April 2005 05:15 pm, Steven Cole wrote:
 On Wednesday 20 April 2005 05:12 pm, Petr Baudis wrote:
  Dear diary, on Thu, Apr 21, 2005 at 01:06:09AM CEST, I got a letter
  where Steven Cole [EMAIL PROTECTED] told me that...
   After getting the latest tarball, and make, make install:
   
   Tree change: 
   55f9d5042603fff4ddfaf4e5f004d2995286d6d3:a46844fcb6afef1f7a2d93f391c82f08ea31
   *100755-100755 blob
   a78cf8ccab98861ef7aecb4cb5a79e47d3a84b67-74b4083d67eda87d88a6f92c6c66877bba8bda8a
gitcancel.sh
   Tracked branch, applying changes...
   Fast-forwarding 55f9d5042603fff4ddfaf4e5f004d2995286d6d3 - 
   a46844fcb6afef1f7a2d93f391c82f08ea31
   on top of 55f9d5042603fff4ddfaf4e5f004d2995286d6d3...
   patch:  Only garbage was found in the patch input.
   
   This may be a harmless message, but I thought I'd bring it to your 
   attention.
  
  This _is_ weird. What does
  
  $ git diff -r 
  55f9d5042603fff4ddfaf4e5f004d2995286d6d3:a46844fcb6afef1f7a2d93f391c82f08ea3
  
  tell you? 
 
 [EMAIL PROTECTED] git-pasky-0.6.2]$ git diff -r 
 55f9d5042603fff4ddfaf4e5f004d2995286d6d3:a46844fcb6afef1f7a2d93f391c82f08ea3
 Index: gitcancel.sh
[ output snipped, see previous message for output]
 
  What if you feed it to patch -p1? 
 I haven't done that yet, awaiting response to above.
 
  What if you feed it to git  
  apply?
  
  Thanks,
  
 Your're welcome.  I'll do the git patch -p1 stuff_from_above if that's 
 what's needed,
 same with git apply.  Corrrections to syntax apprceciated.
 Steven

Actually, I meant patch -p1 stuff_from_above.

But before doing that, I did a fsck-cache as follows, with these results.
This seems damaged.

[EMAIL PROTECTED] git-pasky-0.6.2]$ fsck-cache --unreachable $(cat .git/HEAD)
root 1bf00e46973f7f1c40bc898f1346a1273f0a347f
unreachable commit 0128396de7ca8a7dc74f6fbff59a68bb781bb9b2
unreachable blob 012c82312c99606f914bda5c501b616237a3b7e9
unreachable tree 02a1b5337f78b807d4404f473e55c44f4273d2f8

[ lots of snippage...]

unreachable blob fee26cc5b378819ff48ef8cb54c35744c0f1c17f
unreachable tree fff7294434014ea68153770da3965ed315806499

[EMAIL PROTECTED] git-pasky-0.6.2]$ fsck-cache --unreachable $(cat .git/HEAD) | 
wc -l
467

I renamed the repo to git-pasky-0.6.2-damaged, and repeated untarring the 0.6.2 
tarball,
make, (didn't do make install this time), and repeated git pull pasky with
similar results as before.

[EMAIL PROTECTED] git-pasky-0.6.2-damaged]$ cat .git/HEAD
a46844fcb6afef1f7a2d93f391c82f08ea31
[EMAIL PROTECTED] git-pasky-0.6.2-damaged]$ cd ../git-pasky-0.6.2
[EMAIL PROTECTED] git-pasky-0.6.2]$ cat .git/HEAD
7a4c67965de68ae7bc7aa1fde33f8eb9d8114697

Hope this helps,
Steven

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


(rework) [PATCH 5/5] Accept commit in some places when tree is needed.

2005-04-20 Thread Junio C Hamano
Updates read-tree to use read_tree_with_tree_or_commit_sha1()
function.  The command can take either tree or commit IDs with
this patch.

The change involves a slight modification of how it recurses down
the tree.  Earlier the caller only supplied SHA1 and the recurser
read the object using it, but now it is the caller's responsibility
to read the object and give it to the recurser.  This matches the
way recursive behaviour is done in other tree- related commands.

Signed-off-by: Junio C Hamano [EMAIL PROTECTED]
---

 read-tree.c |   34 --
 1 files changed, 24 insertions(+), 10 deletions(-)

read-tree.c: 46747b5e99b102ed547e87f55a8ee734c9ddb074
--- a/read-tree.c
+++ b/read-tree.c
@@ -23,16 +23,11 @@ static int read_one_entry(unsigned char 
return add_cache_entry(ce, 1);
 }
 
-static int read_tree(unsigned char *sha1, const char *base, int baselen)
+static int read_tree_recursive(void *buffer, const char *type,
+  unsigned long size,
+  const char *base, int baselen)
 {
-   void *buffer;
-   unsigned long size;
-   char type[20];
-
-   buffer = read_sha1_file(sha1, type, size);
-   if (!buffer)
-   return -1;
-   if (strcmp(type, tree))
+   if (!buffer || strcmp(type, tree))
return -1;
while (size) {
int len = strlen(buffer)+1;
@@ -50,10 +45,20 @@ static int read_tree(unsigned char *sha1
int retval;
int pathlen = strlen(path);
char *newbase = malloc(baselen + 1 + pathlen);
+   void *eltbuf;
+   char elttype[20];
+   unsigned long eltsize;
+
+   eltbuf = read_sha1_file(sha1, elttype, eltsize);
+   if (!eltbuf)
+   return -1;
memcpy(newbase, base, baselen);
memcpy(newbase + baselen, path, pathlen);
newbase[baselen + pathlen] = '/';
-   retval = read_tree(sha1, newbase, baselen + pathlen + 
1);
+   retval = read_tree_recursive(eltbuf, elttype, eltsize,
+newbase,
+baselen + pathlen + 1);
+   free(eltbuf);
free(newbase);
if (retval)
return -1;
@@ -65,6 +70,15 @@ static int read_tree(unsigned char *sha1
return 0;
 }
 
+static int read_tree(unsigned char *sha1, const char *base, int baselen)
+{
+   void *buffer;
+   unsigned long size;
+
+   buffer = read_tree_with_tree_or_commit_sha1(sha1, size, 0);
+   return read_tree_recursive(buffer, tree, size, base, baselen);
+}
+
 static int remove_lock = 0;
 
 static void remove_lock_file(void)

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: (rework) [PATCH 3/4] Accept commit in some places when tree is needed.

2005-04-20 Thread Junio C Hamano
 JCH == Junio C Hamano [EMAIL PROTECTED] writes:

JCH Updates ls-tree.c to use read_tree_with_tree_or_commit_sha1()
JCH function.  The command can take either tree or commit IDs with
JCH this patch.

Sorry, but the numbering is wrong this should have been [4/5]
not [3/4].  The contents should be fine, though.

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: (rework) [PATCH 3/4] Accept commit in some places when tree is needed.

2005-04-20 Thread Linus Torvalds


On Wed, 20 Apr 2005, Junio C Hamano wrote:
 
 Sorry, but the numbering is wrong this should have been [4/5]
 not [3/4].  The contents should be fine, though.

Applied and pushed out.

Btw, I edited your subject lines to make them be more specific
to one particular patch.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] git-pasky-0.6.2 heads-up on upcoming changes

2005-04-20 Thread Linus Torvalds


On Wed, 20 Apr 2005, Linus Torvalds wrote:
 
 Pasky,
  what do you think about this change to git log?

Here's a slightly updated version.

It's identical to the previous one, except that it also feeds the result 
through | ${PAGER:-less} which makes it a lot more useful, in my 
opinion.

If you redirect the output to a non-tty, both less and more do the 
right thing and just feed the output straight through. But if the output 
is a tty, this makes git log a lot more friendly than a quickly 
scrolling mess..

Linus


gitlog.sh: a496a864f9586e47a4d7bd3ae0af0b3e07b7deb8
--- a/gitlog.sh
+++ b/gitlog.sh
@@ -28,7 +28,7 @@ rev-tree $base | sort -rn | while read t
fi
;;
)
-   echo; cat
+   echo; sed 's/^/  /'
;;
*)
echo $key $rest
@@ -36,5 +36,5 @@ rev-tree $base | sort -rn | while read t
esac
 
done
-   echo -e \n--
-done
+   echo
+done | ${PAGER:-less}
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >