Re: [PATCH] doc: Modify git-add doc to say "staging area"

2017-12-14 Thread David A. Wheeler
On December 14, 2017 1:50:00 PM EST, Junio C Hamano  wrote:
>I agree with that.  I do not consider the proposed change "good".

Why is "index" better? It is a confusing name, one that has many other 
unrelated meanings.  In particular, many projects managed by git also have an 
index, but few have a staging area.

Also, the phrase "staging area" is already in use, so this is not a new term 
(e.g., git-staging).


--- David A.Wheeler


Re: [PATCH] doc: Modify git-add doc to say "staging area"

2017-12-14 Thread David A. Wheeler
On December 13, 2017 7:54:04 AM EST, "Ævar Arnfjörð Bjarmason" 
 wrote:
>After your patch the majority of the docs will still talk about
>"index", is this part of some larger series, perhaps it would be good
>to see it all at once...

Yes, this would be part of a larger series.

I'm happy to do the work, but I don't want to do it if it's just going to be 
rejected.

The work is very straightforward, in almost all cases you simply replace the 
word index with the phrase staging area.  The change is similar for the word 
cache.  So I'm not sure what seeing it all at once would do for anybody.

Are there one or two other files that you would like to see transformed to see 
as an example?  If you're just looking for a sense of it, that should be enough.




--- David A.Wheeler


Re: [PATCH] doc: Modify git-add doc to say "staging area"

2017-12-13 Thread David A. Wheeler
On Wed, 13 Dec 2017 09:02:42 -0800, Junio C Hamano <gits...@pobox.com> wrote:
> .. But that is not the only thing the index does.  When "git merge"
> finds conflicting changes, it adds the contents for common, our and
> their variants to the index for the path.  This is quite different
> from how you use the index "as staging area"; the index is being
> used as the "merging area".  When "git clean" wants to see which
> paths it finds on the filesystem are not of interest, it consults
> the index, which acts as the list of paths that are of interest.

If the phrase "staging area" is consistently used *instead* of index,
there's no problem. E.g., "git clean consults the staging area"
conveys exactly the same information as "git clean consults the index"
when index == staging area.

The term "index" has too many *other* meanings.

--- David A. Wheeler


Re: [PATCH] doc: Modify git-add doc to say "staging area"

2017-12-12 Thread David A. Wheeler
On December 13, 2017 12:40:12 AM EST, Jacob Keller  
wrote:
>I know we've used various terms for this concept across a lot of the
>documentation. However, I was under the impression that we most
>explicitly used "index" rather than "staging area".

I think "staging area" is the better term. It focuses on its purpose, and it is 
also less confusing ("index" and "cache" have other meanings in many of the 
repos managed by git).


--- David A.Wheeler


[PATCH] doc: Modify git-add doc to say "staging area"

2017-12-12 Thread David A. Wheeler
Change the documentation of git-add so that it consistently uses
the phrase "staging area".  The current git documentation uses
inconsistent terminology ("index", "cache", and "staging area").
This commit switches git-add's documentation to consistently use
the phrase "staging area", which is higher-level and should be less
confusing for new users.

Signed-off-by: David A. Wheeler <dwhee...@dwheeler.com>
---
 Documentation/git-add.txt | 104 --
 1 file changed, 54 insertions(+), 50 deletions(-)

diff --git a/Documentation/git-add.txt b/Documentation/git-add.txt
index d50fa339d..927a152b0 100644
--- a/Documentation/git-add.txt
+++ b/Documentation/git-add.txt
@@ -3,7 +3,7 @@ git-add(1)
 
 NAME
 
-git-add - Add file contents to the index
+git-add - Add file contents to the staging area
 
 SYNOPSIS
 
@@ -15,23 +15,24 @@ SYNOPSIS
 
 DESCRIPTION
 ---
-This command updates the index using the current content found in
-the working tree, to prepare the content staged for the next commit.
-It typically adds the current content of existing paths as a whole,
+This command updates the staging area using the current content found
+in the working tree.
+This command typically adds the current content of existing paths as a whole,
 but with some options it can also be used to add content with
 only part of the changes made to the working tree files applied, or
 remove paths that do not exist in the working tree anymore.
 
-The "index" holds a snapshot of the content of the working tree, and it
-is this snapshot that is taken as the contents of the next commit.  Thus
-after making any changes to the working tree, and before running
-the commit command, you must use the `add` command to add any new or
-modified files to the index.
+The staging area (historically called the "index" or "cache")
+holds a snapshot of the content of the working tree, and it
+is this snapshot that is taken by default as the contents of the next commit.
+Thus after making any changes to the working tree, and before running
+the commit command, you can use the `add` command to add any new or
+modified files to the staging area.
 
 This command can be performed multiple times before a commit.  It only
 adds the content of the specified file(s) at the time the add command is
 run; if you want subsequent changes included in the next commit, then
-you must run `git add` again to add the new content to the index.
+you must run `git add` again to add the new content to the staging area.
 
 The `git status` command can be used to obtain a summary of which
 files have changes that are staged for the next commit.
@@ -45,7 +46,9 @@ be used to add ignored files with the `-f` (force) option.
 
 Please see linkgit:git-commit[1] for alternative ways to add content to a
 commit.
-
+For example, you can use the git commit `-a` option to first automatically
+add to the staging area all the files that have been have been
+modified or deleted in the working tree.
 
 OPTIONS
 ---
@@ -53,7 +56,7 @@ OPTIONS
Files to add content from.  Fileglobs (e.g. `*.c`) can
be given to add all matching files.  Also a
leading directory name (e.g. `dir` to add `dir/file1`
-   and `dir/file2`) can be given to update the index to
+   and `dir/file2`) can be given to update the staging area to
match the current state of the directory as a whole (e.g.
specifying `dir` will record not just a file `dir/file1`
modified in the working tree, a file `dir/file2` added to
@@ -81,16 +84,16 @@ in linkgit:gitglossary[7].
 -i::
 --interactive::
Add modified contents in the working tree interactively to
-   the index. Optional path arguments may be supplied to limit
+   the staging area. Optional path arguments may be supplied to limit
operation to a subset of the working tree. See ``Interactive
mode'' for details.
 
 -p::
 --patch::
-   Interactively choose hunks of patch between the index and the
-   work tree and add them to the index. This gives the user a chance
+   Interactively choose hunks of patch between the staging area and the
+   work tree and add them to the staging area. This gives the user a chance
to review the difference before adding modified contents to the
-   index.
+   staging area.
 +
 This effectively runs `add --interactive`, but bypasses the
 initial command menu and directly jumps to the `patch` subcommand.
@@ -98,20 +101,20 @@ See ``Interactive mode'' for details.
 
 -e::
 --edit::
-   Open the diff vs. the index in an editor and let the user
+   Open the diff vs. the staging area in an editor and let the user
edit it.  After the editor was closed, adjust the hunk headers
-   and apply the patch to the index.
+   and apply the patch to the staging area.
 +
 The intent of this opt

[PATCH] Expand documentation describing --signoff

2016-01-05 Thread David A. Wheeler
Modify various document (man page) files to explain
in more detail what --signoff means.

This was inspired by https://lwn.net/Articles/669976/ where
paulj noted, "adding [the] '-s' argument to [a] git commit
doesn't really mean you have even heard of the DCO...".
Extending git's documentation will make it easier to argue
that developers understood --signoff when they use it.

Signed-off-by: David A. Wheeler <dwhee...@dwheeler.com>
---
 Documentation/git-am.txt   | 1 +
 Documentation/git-cherry-pick.txt  | 1 +
 Documentation/git-commit.txt   | 6 +-
 Documentation/git-format-patch.txt | 1 +
 Documentation/git-revert.txt   | 1 +
 5 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-am.txt b/Documentation/git-am.txt
index 452c1fe..13cdd7f 100644
--- a/Documentation/git-am.txt
+++ b/Documentation/git-am.txt
@@ -35,6 +35,7 @@ OPTIONS
 --signoff::
Add a `Signed-off-by:` line to the commit message, using
the committer identity of yourself.
+   See the signoff option in linkgit:git-commit[1] for more information.
 
 -k::
 --keep::
diff --git a/Documentation/git-cherry-pick.txt 
b/Documentation/git-cherry-pick.txt
index 77da29a..6154e57 100644
--- a/Documentation/git-cherry-pick.txt
+++ b/Documentation/git-cherry-pick.txt
@@ -100,6 +100,7 @@ effect to your index in a row.
 -s::
 --signoff::
Add Signed-off-by line at the end of the commit message.
+   See the signoff option in linkgit:git-commit[1] for more information.
 
 -S[]::
 --gpg-sign[=]::
diff --git a/Documentation/git-commit.txt b/Documentation/git-commit.txt
index 7f34a5b..9ec6b3c 100644
--- a/Documentation/git-commit.txt
+++ b/Documentation/git-commit.txt
@@ -154,7 +154,11 @@ OPTIONS
 -s::
 --signoff::
Add Signed-off-by line by the committer at the end of the commit
-   log message.
+   log message.  The meaning of a signoff depends on the project,
+   but it typically certifies that committer has
+   the rights to submit this work under the same license and
+   agrees to a Developer Certificate of Origin
+   (see http://developercertificate.org/ for more information).
 
 -n::
 --no-verify::
diff --git a/Documentation/git-format-patch.txt 
b/Documentation/git-format-patch.txt
index e3cdaeb..b149e09 100644
--- a/Documentation/git-format-patch.txt
+++ b/Documentation/git-format-patch.txt
@@ -109,6 +109,7 @@ include::diff-options.txt[]
 --signoff::
Add `Signed-off-by:` line to the commit message, using
the committer identity of yourself.
+   See the signoff option in linkgit:git-commit[1] for more information.
 
 --stdout::
Print all commits to the standard output in mbox format,
diff --git a/Documentation/git-revert.txt b/Documentation/git-revert.txt
index b15139f..573616a 100644
--- a/Documentation/git-revert.txt
+++ b/Documentation/git-revert.txt
@@ -89,6 +89,7 @@ effect to your index in a row.
 -s::
 --signoff::
Add Signed-off-by line at the end of the commit message.
+   See the signoff option in linkgit:git-commit[1] for more information.
 
 --strategy=::
Use the given merge strategy.  Should only be used once.
-- 
2.5.3


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] clone: Warn if clone lacks LICENSE or COPYING file

2015-04-03 Thread David A. Wheeler
On Sun, 22 Mar 2015 18:56:52 +0100, Ævar Arnfjörð Bjarmason ava...@gmail.com 
wrote:
 However perhaps an interesting generalization of this would be
 something like a post-clone hook, obviously you couldn't store that in
 .git/hooks/ like other githooks(5) since there's no repo yet, but
 having it configured via the user/system config might be an
 interesting feature.

Would that be acceptable to the wider group?

--- David A. Wheeler

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] clone: Warn if clone lacks LICENSE or COPYING file

2015-03-23 Thread David A. Wheeler
Junio C Hamano:
An open source hosting site can help better by checking at the
   project creation time, because the people who interact with that
interface are solely in the position to set and publish licensing terms.

That doesn't help with the many projects that have *already* been created.
E.G., GitHub has a license chooser now, but didn't for years, and it's still 
optional.
Also, repos stored as shared filesystems don't do that kind of checking.

More importantly, focusing on the hosting site doesn't warn people
who *clone* from repos. The people who take on legal risks are often not
the posters, but the people who clone *from* the sites.  Thus, *they* are the
ones who need the warning, and git is in an especially good spot to detect the 
issue.


 The general consumer who are cloning and fetching do not
have direct control over this, and the only thing the could do to
 nudge the publishers is with an out-of-line communication...

That's an option, but another option is to NOT use it. Often
people have no idea there's an issue, and in their rush and lack of warning
they forget to check the basics.


An approach that checks only the top-level directory for fixed
filename pattern would not be an effective way to protect the
cloners, either.

I disagree, I think it's remarkably effective. *Many* projects
do this, including git itself. After all, many humans need to find out the 
licensing
basics too; having a simple convention for *finding* it helps humans and tools 
alike.
It's not even limited to open source software; developers of proprietary 
materials
(software or now) *also* typically want to declare licensing.

Sure, the top-level licensing text might be incomplete, but having that 
information
provides a big help, and it's what most people rely on anyway. Indeed, a *lack*
of this is a sign of trouble, which is exactly what warnings are good for.

--- David A. Wheeler

(P.S. I posted this previously but it seems to have failed for some reason,
so I'm resending this in a different way.)
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] clone: Warn if clone lacks LICENSE or COPYING file

2015-03-21 Thread David A. Wheeler
Warn cloners if there is no LICENSE* or COPYING* file that makes
the license clear.  This is a useful warning, because if there is
no license somewhere, then local copyright laws (which forbid many uses)
and terms of service apply - and the cloner may not be expecting that.
Many projects accidentally omit a license, so this is common enough to note.
For more info on the issue, feel free to see:
http://choosealicense.com/no-license/
http://www.wired.com/2013/07/github-licenses/
https://twitter.com/stephenrwalli/status/247597785069789184

Signed-off-by: David A. Wheeler dwhee...@dwheeler.com
---
 builtin/clone.c | 38 ++
 1 file changed, 38 insertions(+)

diff --git a/builtin/clone.c b/builtin/clone.c
index 9572467..9863b04 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -748,6 +748,41 @@ static void dissociate_from_references(void)
die_errno(_(cannot unlink temporary alternates file));
 }
 
+static int starts_with_ignore_case(const char *str, const char *prefix)
+{
+   for (; ; str++, prefix++)
+   if (!*prefix)
+   return 1;
+   else if (tolower(*str) != tolower(*prefix))
+   return 0;
+}
+
+static int contains_license(void)
+{
+   DIR *dir = opendir(.); /* Examine current directory for license. */
+   struct dirent *e;
+   struct stat st;
+   int ret = 0;
+
+   if (!dir)
+   return 0;
+
+   while ((e = readdir(dir)) != NULL)
+   if (starts_with_ignore_case(e-d_name, license) ||
+   starts_with_ignore_case(e-d_name, copyright)) {
+   if (stat(e-d_name, st))
+   continue;
+   if (st.st_size  1) {
+   ret = 1;
+   break;
+   }
+   }
+
+   closedir(dir);
+   return ret;
+}
+
+
 int cmd_clone(int argc, const char **argv, const char *prefix)
 {
int is_bundle = 0, is_local;
@@ -1016,6 +1051,9 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
junk_mode = JUNK_LEAVE_REPO;
err = checkout();
 
+   if (!option_no_checkout  !contains_license())
+   warning(_(Repository has no LICENSE or COPYING file with 
content.));
+
strbuf_release(reflog_msg);
strbuf_release(branch_top);
strbuf_release(key);
-- 
2.1.4


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] clone: Warn if LICENSE or COPYING file lacking and !clone.skiplicensecheck

2015-03-21 Thread David A. Wheeler
Warn cloners if there is no LICENSE* or COPYING* file that makes
the license clear.  This is a useful warning, because if there is
no license somewhere, then local copyright laws (which forbid many uses)
and terms of service apply - and the cloner may not be expecting that.
Many projects accidentally omit a license, so this is common enough to note.

You can disable this warning by setting clone.skiplicensecheck to true.

For more info on the issue, feel free to see:
http://choosealicense.com/no-license/
http://www.wired.com/2013/07/github-licenses/
https://twitter.com/stephenrwalli/status/247597785069789184

Signed-off-by: David A. Wheeler dwhee...@dwheeler.com
---
 builtin/clone.c | 44 
 1 file changed, 44 insertions(+)

diff --git a/builtin/clone.c b/builtin/clone.c
index 9572467..a3e8584 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -50,6 +50,7 @@ static int option_progress = -1;
 static struct string_list option_config;
 static struct string_list option_reference;
 static int option_dissociate;
+static int skip_license_check;
 
 static int opt_parse_reference(const struct option *opt, const char *arg, int 
unset)
 {
@@ -748,6 +749,44 @@ static void dissociate_from_references(void)
die_errno(_(cannot unlink temporary alternates file));
 }
 
+static int starts_with_ignore_case(const char *str, const char *prefix)
+{
+   for (; ; str++, prefix++)
+   if (!*prefix)
+   return 1;
+   else if (tolower(*str) != tolower(*prefix))
+   return 0;
+}
+
+static int missing_license(void)
+{
+   DIR *dir = opendir(.); /* Examine current directory for license. */
+   struct dirent *e;
+   struct stat st;
+   int ret = 0;
+
+   if (!dir)
+   return 0; /* Empty directory, no need for license. */
+
+   while ((e = readdir(dir)) != NULL) {
+   if (starts_with_ignore_case(e-d_name, license) ||
+   starts_with_ignore_case(e-d_name, copyright)) {
+   if (stat(e-d_name, st) || st.st_size  2)
+   continue;
+   ret = 0;
+   break;
+   }
+   if (!strcmp(e-d_name, .) || !strcmp(e-d_name, ..) ||
+   !strcmp(e-d_name, .git))
+   continue;
+   ret = 1; /* Non-empty directory */
+   }
+
+   closedir(dir);
+   return ret;
+}
+
+
 int cmd_clone(int argc, const char **argv, const char *prefix)
 {
int is_bundle = 0, is_local;
@@ -1016,6 +1055,11 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
junk_mode = JUNK_LEAVE_REPO;
err = checkout();
 
+   git_config_get_bool(clone.skiplicensecheck, skip_license_check);
+   if (!option_no_checkout  !skip_license_check 
+   missing_license())
+   warning(_(Repository has no LICENSE or COPYING file with 
content.));
+
strbuf_release(reflog_msg);
strbuf_release(branch_top);
strbuf_release(key);
-- 
2.3.3.221.g33aa87e.dirty


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] #!/bin/sh -- #!/usr/bin/env bash

2005-04-23 Thread David A. Wheeler

Alecs King wrote:
And as for bash, only gitdiff-do and gitlog.sh 'explicitly' use bash
instead of /bin/sh.  On most Linux distros, /bin/sh is just a symbolic
link to bash.  But not on some others.  I found gitlsobj.sh could not
work using a plain /bin/sh on fbsd.  To make life easier, i think it
might be better if we all explicitly use bash for all shell scripts.

H. Peter Anvin wrote:
How about #!/bin/bash (build from .in files if you feel it necessary to 
support systems which don't have bash in /bin) instead of doubling the 
number of execs?
If # of execs is that critical, it probably should not be in
bash anyway.  OpenBSD (at least 3.1)'s bash appears to be in
/usr/local/bin/bash, NOT /bin/bash.
I'd go with the /bin/env solution for now;
it maximizes the it just works factor, and
when it comes time for .in files much of the cogito code (at least)
will probably be rewritten in Perl, and anything performance-sensitive
will be in C.
--- David A. Wheeler
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Eliminate use of mktemp's -t option

2005-04-21 Thread David A. Wheeler
It turns out that mktemp's -t option, while useful, isn't
available on many systems (Mandrake  Red Hat Linux 9 at least,
and probably piles of others).  So, here's a portability
patch that removes all use of mktemp's -t option.
Unlike the quick hack I posted earlier, this should be
clean everywhere (assuming you have mktemp).
This is a patch against git-pasky 0.6.3.
This is my first attempt to _post_ a patch using git itself,
and I'm not entirely sure how you want it.   Let me know
if you have a problem with it!
--- David A. Wheeler

commit 5f926b684025b83e34386bf8e4ef30a97e2ae5ec
tree 61059575269ed1027cfb66543251e182f87d1064
parent dd69ca5f806c8b10bb29ecb8d77c88be007c981c
author David A. Wheeler [EMAIL PROTECTED] 1114138972 -0400
committer David A. Wheeler [EMAIL PROTECTED] 1114138972 -0400

Eliminated use of mktemp's -t option; older mktemps don't support it.

Index: README
===
--- 6a612d42afdba20fd2150e319a689ed451b010e4/README  (mode:100644 
sha1:a71b5fbdbdac0bf2e2d021e422b9f49dd5481165)
+++ 61059575269ed1027cfb66543251e182f87d1064/README  (mode:100644 
sha1:80952e2f67b28f64c10cfb913df375a5dd244cd9)
@@ -141,7 +141,7 @@
C compiler
bash
basic shell environment (sed, grep, textutils, ...)
-   mktemp 1.5+ (Mandrake users beware!)
+   mktemp
diff, patch
libssl
rsync
Index: gitapply.sh
===
--- 6a612d42afdba20fd2150e319a689ed451b010e4/gitapply.sh  (mode:100755 
sha1:7703809dc0743c6e4c1fa5b7d922a4efc16b4276)
+++ 61059575269ed1027cfb66543251e182f87d1064/gitapply.sh  (mode:100755 
sha1:14a13ff23cff2a80f9a44c053002f837fec13e2c)
@@ -8,9 +8,13 @@
 #
 # Takes the diff on stdin.
 
-gonefile=$(mktemp -t gitapply.XX)
-todo=$(mktemp -t gitapply.XX)
-patchfifo=$(mktemp -t gitapply.XX)
+if [ -z $TMPDIR]; then
+   TMPDIR=/tmp
+fi
+
+gonefile=$(mktemp $TMPDIR/gitapply.XX)
+todo=$(mktemp $TMPDIR/gitapply.XX)
+patchfifo=$(mktemp $TMPDIR/gitapply.XX)
 rm $patchfifo  mkfifo -m 600 $patchfifo
 
 show-files --deleted $gonefile
Index: gitcommit.sh
===
--- 6a612d42afdba20fd2150e319a689ed451b010e4/gitcommit.sh  (mode:100755 
sha1:a13bef2c84492ed75679d7d52bb710df35544f8a)
+++ 61059575269ed1027cfb66543251e182f87d1064/gitcommit.sh  (mode:100755 
sha1:ee777605dccdc9737cf743f4f8c96b9bacd97f10)
@@ -16,6 +16,9 @@
exit 1
 }
 
+if [ -z $TMPDIR]; then
+   TMPDIR=/tmp
+fi
 
 [ -s .git/blocked ]  die committing blocked: $(cat .git/blocked)
 
@@ -67,7 +70,7 @@
 fi
 
 echo Enter commit message, terminated by ctrl-D on a separate line:
-LOGMSG=$(mktemp -t gitci.XX)
+LOGMSG=$(mktemp $TMPDIR/gitci.XX)
 if [ $merging ]; then
echo -n 'Merge with ' $LOGMSG
echo -n 'Merge with '
@@ -111,7 +114,7 @@
 if [ ! $customfiles ]; then
rm -f .git/add-queue .git/rm-queue
 else
-   greptmp=$(mktemp -t gitci.XX)
+   greptmp=$(mktemp $TMPDIR/gitci.XX)
for file in $customfiles; do
if [ -s .git/add-queue ]; then
fgrep -vx $file .git/add-queue $greptmp
Index: gitdiff-do
===
--- 6a612d42afdba20fd2150e319a689ed451b010e4/gitdiff-do  (mode:100755 
sha1:218dfabeb4a5dcbd2cf58bd6f672f385690ec397)
+++ 61059575269ed1027cfb66543251e182f87d1064/gitdiff-do  (mode:100755 
sha1:caf20ae034b8dc9f88922ee9f82809bb32a56231)
@@ -32,7 +32,11 @@
[ $labelapp ]  label=$label  ($labelapp)
 }
 
-diffdir=$(mktemp -d -t gitdiff.XX)
+if [ -z $TMPDIR]; then
+   TMPDIR=/tmp
+fi
+
+diffdir=$(mktemp -d $TMPDIR/gitdiff.XX)
 diffdir1=$diffdir/$id1
 diffdir2=$diffdir/$id2
 mkdir $diffdir1 $diffdir2
Index: gitdiff.sh
===
--- 6a612d42afdba20fd2150e319a689ed451b010e4/gitdiff.sh  (mode:100755 
sha1:195c3b9962c764855ec6168a78babf5867ea3046)
+++ 61059575269ed1027cfb66543251e182f87d1064/gitdiff.sh  (mode:100755 
sha1:278511a3f491ed7d5e41bbd642acfd9b5a1d8257)
@@ -80,6 +80,10 @@
shift
 fi
 
+if [ -z $TMPDIR]; then
+   TMPDIR=/tmp
+fi
+
 if [ $parent ]; then
id2=$id1
id1=$(parent-id $id2 | head -n 1)
@@ -88,7 +92,7 @@
 
 if [ $id2 =   ]; then
if [ $id1 !=   ]; then
-   export GIT_INDEX_FILE=$(mktemp -t gitdiff.XX)
+   export GIT_INDEX_FILE=$(mktemp $TMPDIR/gitdiff.XX)
read-tree $(gitXnormid.sh $id1)
update-cache --refresh
fi
Index: gitmerge.sh
===
--- 6a612d42afdba20fd2150e319a689ed451b010e4/gitmerge.sh  (mode:100755 
sha1:683755729b6f689ea43c692712fad6e51eeac354)
+++ 61059575269ed1027cfb66543251e182f87d1064/gitmerge.sh  (mode:100755 
sha1:1c733bbdb9fe54c41787d962d0f55bb5f67d4c63)
@@ -19,6 +19,10 @@
exit 1

Re: ia64 git pull

2005-04-21 Thread David A. Wheeler
Petr Baudis [EMAIL PROTECTED] writes:
Still, why would you escape it? My shell will not take # as a
comment start if it is immediately after an alphanumeric character.
I guess there MIGHT be some command shell implementation
that stupidly _DID_ accept # as a comment character,
even immediately after an alphanumeric.
If that's true, then using # there would be a pain for portability.
But I think that's highly improbable.  A quick peek
at the Single Unix Specification as posted by the Open Group
seems to say that, according to the standards, that's NOT okay:
http://www.opengroup.org/onlinepubs/009695399/utilities/xcu_chap02.html#tag_02
Basically, the command shell is supposed to tokenize, and #
only means comment if it's at the beginning of a token.
And as far as I can tell, it's not an issue in practice either.
I did a few quick tests on Fedora Core 3 and OpenBSD.
On Fedora Core 3, I can say that bash, ash  csh all do NOT
consider # as a comment start if an alpha precedes it.
The same is true for OpenBSD /bin/sh, /bin/csh, and /bin/rksh.
If such different shells do the same thing (this stuff isn't even
legal C-shell text!), it's likely others do too.
--- David A. Wheeler
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Change pull to _only_ download, and git update=pull+merge?

2005-04-19 Thread David A. Wheeler
Daniel Barkalow wrote:
See, I don't think you ever want to just pull. You want to
pull-and-do-something, but the something could be any operation...
In a _logical_ sense that's true; I'd only want to pull data if I intended
to (possibly) do something with it.  But as a _practical_ matter,
I can see lots of reasons for doing a pull as a separate operation.
One is disconnected operation; I may want to pull the data now, to
prepare for disconnectino, and then work later while disconnected.
Another is using lots of data compared to the pipesize; if I have a
dial-in modem, or I want the history of the linux kernel since 0.0.1,
I might want to pull  go away/go to sleep for the night. I might
use cron/at to automatically pull at 3am from some interesting branches.
The next day, I could then pull again to update just what changed,
and/or do the operation I intended to do if the operation auto-pulls the
missing data.
I'm actually getting suspicious that the right thing is to hide pull in the id scheme. That is, instead of saying linus to refer to the
linus head that you currently have, you say +linus to refer to the
head Linus has on his server currently, and this will cause you to
download anything necessary to perform the operation with the resulting value.
 

That's an interesting idea.  I'll have to think about that.
What command would you suggest for the common case
of update with current track?  I've proposed git update [NAME].
git merge with update-from-current-track as default seems unclear, and
I worry that I might accidentally press RETURN too soon  merge with
the wrong thing.  And I like the idea of git update doing the same thing
(essentially) as cvs update and svn update; LOTS of people know
what update does, so using the same command name for one of the most
common operations smooths transition (GNU Arch's tla update
is almost, though not exactly, the same too.)
I still think it's important to have a very simple command that updates
your current branch with a tracked branch (because it's common to stay
in sync with a master branch), and a way to just download the data without
doing things with it YET (because you want to do things in stages).
The commands update and pull come to mind when thinking that way,
though as long as the commands are simple  clear that's a good thing
(I think it's a GOOD idea to use the same commands as CVS and
Subversion when the results are essentially the same, just because so many
people are already familiar with them, but only where it makes sense.)
--- David A. Wheeler
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [script] ge: export commits as patches

2005-04-19 Thread David A. Wheeler
Petr Baudis wrote:
Dear diary, on Tue, Apr 19, 2005 at 03:48:43PM CEST, I got a letter
where Ingo Molnar [EMAIL PROTECTED] told me that...
 

is there any 'export commit as patch' support in git-pasky?
   

Nice idea. I will add it, probably as 'git patch'.
 

Eek!
It's a nice idea, and it'd be great as a subcommand.  But PLEASE
don't name it patch.  I already know what patch does, patch
ACCEPTS a patch... it doesn't create one ;-).
How about naming it aspatch or asdiff instead?  Or something else
(good names, anyone?).
soapbox_to_choirGood externally-viewed names are critical... good
command names that are similar to what people already know
can really help make the tool a joy to use./soapbox_to_choir
--- David A. Wheeler
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2/4] Sorting commits by date

2005-04-18 Thread David A. Wheeler
Petr Baudis wrote:
[Re: Daniel Barkalow [EMAIL PROTECTED]'s patch] 
Note that you are breaking gcc-2.95 compatibility when using declarator
in the middle of a block. Not that it might be a necessarily bad thing
;-) (although I still use gcc-2.95 a lot), just to ring a bell so that
it doesn't slip through unnoticed and we can decide on a policy
regarding this.
I, at least, would REALLY like to see _highly_ portable C code;
I'm looking at git as a potential long-term useful SCM tool for
LOTS of projects, and if you're going to write C, it'd be nice
to just write it portably to start with. There's certainly
no crisis in using separate declarators.
In fact, in the LONG term I'd like to see the shell code
replaced with code that easily runs everywhere (Windows, etc.),
again, for portability's sake.  I think that would be unwise to
do that right now; the shell is an excellent prototyping tool.
But once things have settled down  there's been some experience
with the tools, the pieces could be slowly recoded.
(Yes, I know of  use Cygwin. And I prefer Python over Perl,
but I'm really uninterested in language wars.)
--- David A. Wheeler
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Storing permissions

2005-04-17 Thread David A. Wheeler
Linus Torvalds wrote:
On Sat, 16 Apr 2005, Paul Jackson wrote:
Morten wrote:
It makes some sense in principle, but without storing what they mean
(i.e., group==?) it certainly makes no sense. 
There's no they there.
I think Martin's proposal, to which I agreed, was to store a _single_
bit.  If any of the execute permissions of the incoming file are set,
then the bit is stored ON, else it is stored OFF.  On 'checkout', if the
bit is ON, then the file permission is set mode 0777 (modulo umask),
else it is set mode 0666 (modulo umask).

I think I agree.
Anybody willing to send me a patch? One issue is that if done the obvious
way it's an incompatible change, and old tree objects won't be valid any
more. It might be ok to just change the compare cache check to only care
about a few bits, though: S_IXUSR and S_IFDIR.
There's a minor reason to write out ALL the perm bit data, but
only care about a few bits coming back in: Some people use
SCM systems as a generalized backup system, so you can back up
your system to an arbitrary known state in the past
(e.g., Change my /etc files to the state I was at
just before I installed that *#@ program!).
For more on this, see:
 http://www.onlamp.com/pub/a/onlamp/2005/01/06/svn_homedir.html
If you store all the bits, then you CAN restore things
more exactly the way they were.  This is imperfect, since
it doesn't cover more exotic permission
values from SELinux, xattrs, whatever.  For some, that's enough.
Yeah, I know, not the main purpose of git.  But what the heck,
I _like_ flexible infrastructures.
--- David A. Wheeler
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Storing permissions

2005-04-17 Thread David A. Wheeler
Linus Torvalds wrote:
On Sun, 17 Apr 2005, David A. Wheeler wrote:
There's a minor reason to write out ALL the perm bit data, but
only care about a few bits coming back in: Some people use
SCM systems as a generalized backup system
Yes. I was actually thinking about having system config files in a git 
repository when I started it, since I noticed how nicely it would do 
exactly that.

However, since the mode bits also end up being part of the name of the 
tree object (ie they are most certainly part of the hash), it's really 
basically impossible to only care about one bit but writing out many bits: 
it's the same issue of having multiple identical blocks with different 
names.
...
One solution is to tell git with a command line flag and/or config file 
entry that for this repo, I want you to honor all bits. That should be 
easy enough to add at some point, and then you really get what you want.
Yes, I thought of that too.  And I agree, that should do the job.
My real concern is I'm looking at the early design of the
storage format so that it's POSSIBLE to extend git in obvious ways.
As long as it's possible later, then that's a great thing.
...
Also, I made a design decision that git only cares about non-dotfiles. Git 
literally never sees or looks at _anything_ that starts with a .. I 
think that's absolutely the right thing to do for an SCM (if you hide your 
files, I really don't think you should expect the SCM to see it), but it's 
obviously not the right thing for a backup thing.
Again, a command line flag or config file entry could change that
in the future, if desired.  So this is a decision that could be
changed later... the best kind of decision :-).
--- David A. Wheeler
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Parseable commit header

2005-04-17 Thread David A. Wheeler
Stefan-W. Hahn wrote:
Hi,
after playing a while with git-pasky it is a crap to interpret the date of
commit logs. Though it was a good idea to put the date in a parseable format
(seconds since), but the format of the commit itself is not good parseable.

Should be:
...
Committer-Dater: 1113684324 +0200
I'm probably coming in late to the game, but exactly
why is seconds-since-epoch format used instead of a format
more easily understood by humans?  Yes, I know tools
can easily convert that, but you're already using an ASCII format;
why not just record it in a format that's easily eyeballed like ISO's
mmddThhmmss [timezone]? E.G.:
 20050417T171520 +0200
or some such?  I'm SURE that people will mention things
like the patch I posted on April 17, 2005, and having the
patch format record times that way, directly, would be convenient
to the poor slobs^H^H^H^H^H developers who come later.
Yes, a tool can handle the conversion, but choosing formats
so a tool is unneeded for simple stuff is often better!
--- David A. Wheeler
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Yet another base64 patch

2005-04-17 Thread David A. Wheeler
Paul Jackson wrote:
David wrote:
My list would be:
ext2, ext3, NFS, and Windows' NTFS (stupid short filenames,
case-insensitive/case-preserving).

I'm no mind reader, but I'd bet a pretty penny that what you have in
mind and what Linus has in mind have no overlaps in their solution sets.
Sadly, I lack the mind reading ability as well.
Our goals are, I suspect, somewhat different.
Linus wants to build a tool that meets his specific needs
(managing kernel development), and he has particular requirements
(such as fast simple merging when working at large scales).
In contrast, I'm hoping for a more
general OSS/FS SCM tool that many others can use as well.
But I think there's heavy overlap in the solution space.
The Linux kernel project is, to my knowledge, the largest
project using a truly distributed SCM process.
Anyone else who is considering a distributed SCM process
would at _least_ want to think about how the Linux kernel
project works, and if they're doing so, they
might also want to reuse the development tools.
I'm just taking a peek, and
looking for situations where a design decision is irrelevant
for his purposes, but a particular direction would be of
particular help to other projects.  I'm more worried about the
storage format; if the code doesn't support some particular
feature but it could be added later without great pain, no big deal.
If something would imply a complete rewrite, that's undesirable.
--- David A. Wheeler
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re-done kernel archive - real one?

2005-04-17 Thread David A. Wheeler
On Sun, 17 Apr 2005, Russell King wrote:
One thing which definitely needs to be considered is - what character
encoding are the comments to be stored as?
...
I replied:
I would _heartily_ recommend moving towards UTF-8 as the
internal charset for all comments.
Petr said:
Not that the plumbing should actually _care_ at all; anyone who uses it
should take the care, so this is more of a social thing.
The _plumbing_ shouldn't care, but the stuff above needs to know
how to interpret the stuff that the plumbing produces.
Russell King said:
Except, I believe, MicroEMACS, which both Linus and myself use.  As
far as I know, there aren't any patches to make it UTF-8 compliant.
Since plain ASCII is a subset of UTF-8,
as long as MicroEMACS users only create ASCII comments,
then the comments you create in MicroEMACS will still be UTF-8.
No big deal.
For reading comments, if the text is almost entirely
plain ASCII, you could just ignore the problem and have the
occasional character scramble.  If you need more, you'll
need a tool that's more internationalized or a working iconv,
but if that's important you'd be motivated.
Again, I'm looking for more generalized solutions, where
non-English comments are more common than in Linux kernel code.
--- David A. Wheeler
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Yet another base64 patch

2005-04-16 Thread David A. Wheeler
Paul Jackson wrote:
Earlier, hpa wrote:
The base64 version has 2^12 subdirectories instead of 2^8 (I just used 2 
characters as the hash key just like the hex version.)
Later, hpa wrote:
Ultimately the question is: do we care about old (broken) filesystems?

I'd imagine we care a little - just not alot.
Some people (e.g., me) would really like for git
to be more forgiving of nasty filesystems,
so that git can be used very widely.
I.E., be forgiving about case insensitivity,
poor performance or problems with a large # of files
in a directory, etc.  You're already working to make
sure git handles filenames with spaces  i18n filenames,
a common failing of many other SCM systems.
If git is used for Linux kernel development  nothing else,
it's still a success.  But it'd be even better from
my point of view if git was a useful tool for MANY
other projects.  I think there are advantages, even if you
only plan to use git for the kernel, to making git easier
to use for other projects.  By making git less
sensitive to the filesystem, you'll attract more (non-kernel-dev)
users, some of whom will become new git developers who
add cool new functionality.
As noted in my SCM survey (http://www.dwheeler.com/essays/scm.html),
I think SCM Windows support is really important to a lot of
OSS projects.  Many OSS projects, even if they start
Unix/Linux only, spin off a Windows port, and it's
painful if their SCM can't run on Windows then.
Problems running on NFS filesystems have caused problems
with GNU Arch users (there are workarounds, but now you
need to learn about workarounds instead of things
just working).  If nothing else, look at the history
of other SCM projects: all too many have undergone radical and
painful surgeries so that they can be more portable to
various filesystems.
It's a trade-off, I know.
--- David A. Wheeler
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SHA1 hash safety

2005-04-16 Thread David A. Wheeler
Paul Jackson wrote:
what I'm talking about is the chance that somewhere, sometime there will 
be two different documents that end up with the same hash
I have vastly greater chance of a file colliding due to hardware or
software glitch than a random message digest collision of two legitimate
documents.
The probability of an accidental overlap for SHA-1 for two
different files is absurdly remote; it's just not worth worrying about.
However, the possibility of an INTENTIONAL overlap is a completely
different matter.  I think the hash algorithm should change in the
future; I have a proposal below.
Someone has ALREADY broken into a server to modify the Linux kernel
code already, so the idea of an attack on kernel code
is not an idle fantasy. MD5 is dead, and SHA-1's work factor has
already been sufficiently broken that people have already been told
walk to the exits (i.e., DO NOT USE SHA-1 for new programs like git).
The fact that blobs are compressed first, with a length header
in front, _may_ make it harder to attack.  But maybe not.
I haven't checked for this case, but most decompression algorithms
I know of have a don't change mode that essentially just copies the
data behind it.  If the one used in git has such a mode
(I bet it does!), an attacker could use that mode to
make it MUCH easier to create an attack vector than it would
appear at first.  Now the attacker just needs to create a collision
(hmmm, where was that paper?).  Remember, you don't need to
run a hash algorithm over an entire file; you can precompute
to near the end, and then try your iterations from there.
A little hardware (inc. FPGAs) would speed the attack.
Of course, that assumes you actually
check everything to make sure that an attacker can't slip
in something different. After each rsync, are all new files'
hash values checked?  Do they uncompress to right length?
Do they have excess data after the decompression?
I'm hoping that sort of input-checking (since the data
might be from an attacker, if indirectly!) is already going on,
though I haven't reviewed the git source code.
While the jury's still out, the current belief by most folks
I talk to is that SHA-1 variants with more bits, such as SHA-256,
are the way to go now.  The SHA-1 attack simply reduces
the work factor (it's not a COMPLETE break), so adding
more bits is believed to increase the work factor
enough to counter it.
Adding more information to the hash can make attacking even harder.
Here's one idea: whenever that hash algorithm
switch occurs, create a new hash value as this:
  SHA-256 + uncompressed-length
Where SHA-256 is computed just like SHA-1 is now, e.g.,
SHA-256(file) where file = typecode + length + compressed data.
Leave the internal format as-is (with the length embedded as well).
This means that an attacker has to come up with an attack
that creates the same length uncompressed, yet has the same hash
of the compressed result. That's harder to do.
Length is also really, really cheap to compute :-).
That also might help the convince the what happens if there's
an accidental collision crowd: now, if the file lengths
are different, you're GUARANTEED that the hash values are different,
though that's not the best reason to do that.
One reason to think about switching sooner rather than later
is that it'd be really nice if the object store also included
signatures, so that in one fell swoop you could check who signed what
(and thus you could later on CONFIRM with much more certainty who
REALLY submitted a given change... say if it was clearly malicious).
If you switch hash algorithms, the signatures might not work,
depending on how you do it.
--- David A. Wheeler
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Storing permissions

2005-04-16 Thread David A. Wheeler
Paul Jackson wrote:
Junio wrote:
Sounds like svn 

I have no idea what svn is.
svn = common abbreviation for Subversion, a
widely-used centralized SCM tool intentionally
similar to CVS.
--- David A. Wheeler
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html