Re: Cygwin + git log = no pager!

2014-02-26 Thread Jeff King
On Wed, Feb 26, 2014 at 09:54:39AM -0600, Robert Dailey wrote:

  That _should_ turn on the pager, but I think it does not due to a bug
  with setup_pager and aliases. Something like the patch below would make
  it work (but if you are having to use -p manually, there is something
  to fix in your cygwin environment, which does not think you are on a
  terminal).
 
 I have tried `git -p log` and `git log` and neither use the pager.

I thought Git's behavior was a bug, but it seems to be the intended
effect that -p is just cancel --no-pager and not turn on the pager,
even if stdout is not a tty.

So the patch I sent is not something we would want to apply, but it
might help debugging your situation (if my hunch is right that isatty()
is returning false, then git -p log would work with it). Or perhaps a
simpler way to check that is just:

diff --git a/pager.c b/pager.c
index 0cc75a8..6d870ac 100644
--- a/pager.c
+++ b/pager.c
@@ -41,8 +41,10 @@ const char *git_pager(int stdout_is_tty)
 {
const char *pager;
 
-   if (!stdout_is_tty)
+   if (!stdout_is_tty) {
+   warning(disabling pager, stdout is not a tty);
return NULL;
+   }
 
pager = getenv(GIT_PAGER);
if (!pager) {

 Should I apply the provided patch to the Git for Windows master
 branch? Also I'm not sure if there is a convenient way to apply a
 patch via email, so should I copy  paste it to a file  then apply
 the file?

The usual way is to save the patch to an mbox, then use git am to
apply it. Most Unix-y mail clients have mbox support, but I suspect many
Windows ones do not.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] upload-pack: allow shallow fetching from a read-only repository

2014-02-27 Thread Jeff King
On Thu, Feb 27, 2014 at 02:13:03PM +0700, Nguyễn Thái Ngọc Duy wrote:

 Before cdab485 (upload-pack: delegate rev walking in shallow fetch to
 pack-objects - 2013-08-16) upload-pack does not write to the source
 repository. cdab485 starts to write $GIT_DIR/shallow_XX if it's a
 shallow fetch, so the source repo must be writable.
 
 Fall back to $TMPDIR if $GIT_DIR/shallow_XX cannot be created in
 this case. Note that in other cases that write $GIT_DIR/shallow_XX
 and eventually rename it to $GIT_DIR/shallow, there is no fallback to
 $TMPDIR.

That makes sense.

 @@ -224,7 +224,16 @@ char *setup_temporary_shallow(const struct sha1_array 
 *extra)
   if (write_shallow_commits(sb, 0, extra)) {
   struct strbuf path = STRBUF_INIT;
   strbuf_addstr(path, git_path(shallow_XX));
 - fd = xmkstemp(path.buf);
 + if (read_only) {
 + fd = mkstemp(path.buf);
 + if (fd  0) {
 + char buf[PATH_MAX];
 + fd = git_mkstemp(buf, sizeof(buf), 
 shallow_XX);
 + }
 + if (fd  0)
 + die_errno(Unable to create temporary shallow 
 file);
 + } else
 + fd = xmkstemp(path.buf);

This is not introduced by your patch, but I notice that we do not seem
to do anything with the tempfiles when the program dies prematurely.
We've started collecting stale shallow_XX files in our server repos.

For the writable cases, should we be using the lockfile API to take
shallow.lock? It looks like we do take a lock on it, but only after the
fetch, and then we have to do a manual check for it having moved (by the
way, shouldn't we do that check under the lock? I think
setup_alternate_shallow has a race condition).

I guess the reason to take the lock at the last minute is that the whole
fetch is heavyweight. But if the fetches are going to conflict, perhaps
it is better to have lock contention at the beginning, rather than
failing only at the end. I don't know very much about the shallow
system; does each operation rewrite the shallow file, or is it often
left untouched (in which case two simultaneous fetches could coexist
most of the time).

At any rate, if we used the lockfile API, it handles tempfile cleanup
automatically. Otherwise, I think we need something like the patch
below (in the interest of simplicity, I just drop all of the explicit
unlinks and let them use the atexit handler to clean up; you could also
add a function to explicitly unlink the tempfile and clear the handler).

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 85bba35..ac1d9b5 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -833,8 +833,6 @@ static void execute_commands(struct command *commands,
error(BUG: run 'git fsck' for safety.\n
  If there are errors, try to remove 
  the reported refs above);
-   if (alt_shallow_file  *alt_shallow_file)
-   unlink(alt_shallow_file);
}
 }
 
@@ -1087,10 +1085,6 @@ static void update_shallow_info(struct command *commands,
cmd-skip_update = 1;
}
}
-   if (alt_shallow_file  *alt_shallow_file) {
-   unlink(alt_shallow_file);
-   alt_shallow_file = NULL;
-   }
free(ref_status);
 }
 
diff --git a/fetch-pack.c b/fetch-pack.c
index 90fdd49..ae8550e 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -947,17 +947,6 @@ static void update_shallow(struct fetch_pack_args *args,
if (!si-shallow || !si-shallow-nr)
return;
 
-   if (alternate_shallow_file) {
-   /*
-* The temporary shallow file is only useful for
-* index-pack and unpack-objects because it may
-* contain more roots than we want. Delete it.
-*/
-   if (*alternate_shallow_file)
-   unlink(alternate_shallow_file);
-   free((char *)alternate_shallow_file);
-   }
-
if (args-cloning) {
/*
 * remote is shallow, but this is a clone, there are
diff --git a/shallow.c b/shallow.c
index bbc98b5..0f2bb48 100644
--- a/shallow.c
+++ b/shallow.c
@@ -8,6 +8,7 @@
 #include diff.h
 #include revision.h
 #include commit-slab.h
+#include sigchain.h
 
 static int is_shallow = -1;
 static struct stat shallow_stat;
@@ -216,27 +217,49 @@ int write_shallow_commits(struct strbuf *out, int 
use_pack_protocol,
return write_shallow_commits_1(out, use_pack_protocol, extra, 0);
 }
 
+static struct strbuf shallow_tmpfile = STRBUF_INIT;
+
+static void remove_shallow_tmpfile(void)
+{
+   if (shallow_tmpfile.len) {
+   unlink_or_warn(shallow_tmpfile.buf);
+   strbuf_reset(shallow_tmpfile);
+   }

[PATCH] shallow: verify shallow file after taking lock

2014-02-27 Thread Jeff King
Before writing the shallow file, we stat() the existing file
to make sure it has not been updated since our operation
began. However, we do not do so under a lock, so there is a
possible race:

  1. Process A takes the lock.

  2. Process B calls check_shallow_file_for_update and finds
 no update.

  3. Process A commits the lockfile.

  4. Process B takes the lock, then overwrite's process A's
 changes.

We can fix this by doing our check while we hold the lock.

Signed-off-by: Jeff King p...@peff.net
---
On Thu, Feb 27, 2014 at 04:04:26AM -0500, Jeff King wrote:

 by the way, shouldn't we do that check under the lock? I think
 setup_alternate_shallow has a race condition).

Here is the fix. I didn't actually reproduce the race experimentally,
but it seems pretty straightforward.

I also notice that check_shallow_file_for_update returns early if
!is_shallow. Is that safe? Is it possible for another process to have
made us shallow since the program began? In that case, we would have to
stat() the file always, then complain if it exists and !is_shallow.

 shallow.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/shallow.c b/shallow.c
index bbc98b5..75da07a 100644
--- a/shallow.c
+++ b/shallow.c
@@ -246,9 +246,9 @@ void setup_alternate_shallow(struct lock_file *shallow_lock,
struct strbuf sb = STRBUF_INIT;
int fd;
 
-   check_shallow_file_for_update();
fd = hold_lock_file_for_update(shallow_lock, git_path(shallow),
   LOCK_DIE_ON_ERROR);
+   check_shallow_file_for_update();
if (write_shallow_commits(sb, 0, extra)) {
if (write_in_full(fd, sb.buf, sb.len) != sb.len)
die_errno(failed to write to %s,
@@ -293,9 +293,9 @@ void prune_shallow(int show_only)
strbuf_release(sb);
return;
}
-   check_shallow_file_for_update();
fd = hold_lock_file_for_update(shallow_lock, git_path(shallow),
   LOCK_DIE_ON_ERROR);
+   check_shallow_file_for_update();
if (write_shallow_commits_1(sb, 0, NULL, SEEN_ONLY)) {
if (write_in_full(fd, sb.buf, sb.len) != sb.len)
die_errno(failed to write to %s,
-- 
1.8.5.2.500.g8060133



--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] shallow: verify shallow file after taking lock

2014-02-27 Thread Jeff King
On Thu, Feb 27, 2014 at 04:10:12AM -0500, Jeff King wrote:

 I also notice that check_shallow_file_for_update returns early if
 !is_shallow. Is that safe? Is it possible for another process to have
 made us shallow since the program began? In that case, we would have to
 stat() the file always, then complain if it exists and !is_shallow.

That patch would look like this:

diff --git a/shallow.c b/shallow.c
index 75da07a..e05a241 100644
--- a/shallow.c
+++ b/shallow.c
@@ -139,13 +139,13 @@ void check_shallow_file_for_update(void)
 {
struct stat st;
 
-   if (!is_shallow)
-   return;
-   else if (is_shallow == -1)
+   if (is_shallow == -1)
die(BUG: shallow must be initialized by now);
 
if (stat(git_path(shallow), st))
die(shallow file was removed during fetch);
+   else if (!is_shallow)
+   die(shallow file appeared during fetch);
else if (st.st_mtime != shallow_stat.st_mtime
 #ifdef USE_NSEC
 || ST_MTIME_NSEC(st) != ST_MTIME_NSEC(shallow_stat)

but again, I'm not really clear on whether this is possible.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] shallow: use stat_validity to check for up-to-date file

2014-02-27 Thread Jeff King
On Thu, Feb 27, 2014 at 05:18:58PM +0700, Duy Nguyen wrote:

 On Thu, Feb 27, 2014 at 4:22 PM, Jeff King p...@peff.net wrote:
  On Thu, Feb 27, 2014 at 04:10:12AM -0500, Jeff King wrote:
 
  I also notice that check_shallow_file_for_update returns early if
  !is_shallow. Is that safe? Is it possible for another process to have
  made us shallow since the program began? In that case, we would have to
  stat() the file always, then complain if it exists and !is_shallow.
 
 I think it's safer to do it your way.

Yeah, I played around a bit and found that using git fetch --depth in
a non-shallow repo could run into this case.

  if (stat(git_path(shallow), st))
  die(shallow file was removed during fetch);
  +   else if (!is_shallow)
  +   die(shallow file appeared during fetch);

Note that this is wrong; when the file is missing (the first part of the
conditional), we need to check is_shallow before dying. Otherwise we
erroneously complain when creating the file for the first time.

As I was fixing it, though, I recalled that we had to write a similar
system for the packed-refs file. Fortunately, it was easy to reuse, and
I ended up with the patch below.

-- 8 --
Subject: shallow: use stat_validity to check for up-to-date file

When we are about to write the shallow file, we check that
it has not changed since we last read it. Instead of
hand-rolling this, we can use stat_validity. This is built
around the index stat-check, so it is more robust than just
checking the mtime, as we do now (it uses the same check as
we do for index files).

The new code also handles the case of a shallow file
appearing unexpectedly. With the current code, two
simultaneous processes making us shallow (e.g., two git
fetch --depth=1 running at the same time in a non-shallow
repository) can race to overwrite each other.

As a bonus, we also remove a race in determining the stat
information of what we read (we stat and then open, leaving
a race window; instead we should open and then fstat the
descriptor).

Signed-off-by: Jeff King p...@peff.net
---
 shallow.c | 24 +++-
 1 file changed, 7 insertions(+), 17 deletions(-)

diff --git a/shallow.c b/shallow.c
index 75da07a..9668b39 100644
--- a/shallow.c
+++ b/shallow.c
@@ -10,7 +10,7 @@
 #include commit-slab.h
 
 static int is_shallow = -1;
-static struct stat shallow_stat;
+static struct stat_validity shallow_stat;
 static char *alternate_shallow_file;
 
 void set_alternate_shallow_file(const char *path, int override)
@@ -52,12 +52,12 @@ int is_repository_shallow(void)
 * shallow file should be used. We could just open it and it
 * will likely fail. But let's do an explicit check instead.
 */
-   if (!*path ||
-   stat(path, shallow_stat) ||
-   (fp = fopen(path, r)) == NULL) {
+   if (!*path || (fp = fopen(path, r)) == NULL) {
+   stat_validity_clear(shallow_stat);
is_shallow = 0;
return is_shallow;
}
+   stat_validity_update(shallow_stat, fileno(fp));
is_shallow = 1;
 
while (fgets(buf, sizeof(buf), fp)) {
@@ -137,21 +137,11 @@ struct commit_list *get_shallow_commits(struct 
object_array *heads, int depth,
 
 void check_shallow_file_for_update(void)
 {
-   struct stat st;
-
-   if (!is_shallow)
-   return;
-   else if (is_shallow == -1)
+   if (is_shallow == -1)
die(BUG: shallow must be initialized by now);
 
-   if (stat(git_path(shallow), st))
-   die(shallow file was removed during fetch);
-   else if (st.st_mtime != shallow_stat.st_mtime
-#ifdef USE_NSEC
-|| ST_MTIME_NSEC(st) != ST_MTIME_NSEC(shallow_stat)
-#endif
-  )
-   die(shallow file was changed during fetch);
+   if (!stat_validity_check(shallow_stat, git_path(shallow)))
+   die(shallow file has changed since we read it);
 }
 
 #define SEEN_ONLY 1
-- 
1.8.5.2.500.g8060133

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] shallow: automatically clean up shallow tempfiles

2014-02-27 Thread Jeff King
On Thu, Feb 27, 2014 at 05:11:41PM +0700, Duy Nguyen wrote:

 We only update shallow file in these cases: clone --depth, fetch
 --update-shallow, fetch --depth, and push when receive.shallowupdate
 is set. All of them may end up not updating shallow file though.

OK, that last sentence is what I wasn't sure of. If they might sometimes
not update the shallow file, then I think locking early is wrong. It
means we create lock contention where it might not exist.

 For read-only case in upload-file, locking only reduces the
 availability of clone/fetch.

Right, those should never lock. So even if we did want to tweak the
locking, we would still need a separate tempfile cleanup for those.

Here it is as a completed patch. It conflicts textually but not
semantically with the patch that started this thread (I think one of my
earlier patches has a minor textual conflict, too). Should we roll them
into a single series to help Junio? If so, do you want to do it, or
should I?

-- 8 --
Subject: shallow: automatically clean up shallow tempfiles

We sometimes write tempfiles of the form shallow_XX
during fetch/push operations with shallow repositories.
Under normal circumstances, we clean up the result when we
are done. However, we do no take steps to clean up after
ourselves when we exit due to die() or signal death.

This patch teaches the tempfile creation code to register
handlers to clean up after ourselves. To handle this, we
change the ownership semantics of the filename returned by
setup_temporary_shallow. It now keeps a copy of the filename
itself, and returns only a const pointer to it.

We can also do away with explicit tempfile removal in the
callers. They all exit not long after finishing with the
file, so they can rely on the auto-cleanup, simplifying the
code.

Note that we keep things simple and maintain only a single
filename to be cleaned. This is sufficient for the current
caller, but we future-proof it with a die(BUG).

Signed-off-by: Jeff King p...@peff.net
---
 builtin/receive-pack.c | 16 
 commit.h   |  2 +-
 fetch-pack.c   | 11 ---
 shallow.c  | 41 ++---
 upload-pack.c  |  7 +--
 5 files changed, 40 insertions(+), 37 deletions(-)

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 85bba35..c323081 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -828,14 +828,10 @@ static void execute_commands(struct command *commands,
}
}
 
-   if (shallow_update) {
-   if (!checked_connectivity)
-   error(BUG: run 'git fsck' for safety.\n
- If there are errors, try to remove 
- the reported refs above);
-   if (alt_shallow_file  *alt_shallow_file)
-   unlink(alt_shallow_file);
-   }
+   if (shallow_update  !checked_connectivity)
+   error(BUG: run 'git fsck' for safety.\n
+ If there are errors, try to remove 
+ the reported refs above);
 }
 
 static struct command *read_head_info(struct sha1_array *shallow)
@@ -1087,10 +1083,6 @@ static void update_shallow_info(struct command *commands,
cmd-skip_update = 1;
}
}
-   if (alt_shallow_file  *alt_shallow_file) {
-   unlink(alt_shallow_file);
-   alt_shallow_file = NULL;
-   }
free(ref_status);
 }
 
diff --git a/commit.h b/commit.h
index 16d9c43..55631f1 100644
--- a/commit.h
+++ b/commit.h
@@ -209,7 +209,7 @@ extern int write_shallow_commits(struct strbuf *out, int 
use_pack_protocol,
 extern void setup_alternate_shallow(struct lock_file *shallow_lock,
const char **alternate_shallow_file,
const struct sha1_array *extra);
-extern char *setup_temporary_shallow(const struct sha1_array *extra);
+extern const char *setup_temporary_shallow(const struct sha1_array *extra);
 extern void advertise_shallow_grafts(int);
 
 struct shallow_info {
diff --git a/fetch-pack.c b/fetch-pack.c
index 90fdd49..ae8550e 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -947,17 +947,6 @@ static void update_shallow(struct fetch_pack_args *args,
if (!si-shallow || !si-shallow-nr)
return;
 
-   if (alternate_shallow_file) {
-   /*
-* The temporary shallow file is only useful for
-* index-pack and unpack-objects because it may
-* contain more roots than we want. Delete it.
-*/
-   if (*alternate_shallow_file)
-   unlink(alternate_shallow_file);
-   free((char *)alternate_shallow_file);
-   }
-
if (args-cloning) {
/*
 * remote is shallow, but this is a clone, there are
diff --git a/shallow.c b/shallow.c

Re: [PATCH] repack: add `repack.honorpackkeep` config var

2014-02-27 Thread Jeff King
On Wed, Feb 26, 2014 at 12:30:36PM -0800, Junio C Hamano wrote:

  pack-kept-objects then?
 
  Hmm. That does address my point above, but somehow the word kept feels
  awkward to me. I'm ambivalent between the two.
 
 That word does make my backside somewhat itchy ;-)
 
 Would it help to take a step back and think what the option really
 does?  Perhaps we should call it --pack-all-objects, which is short
 for --pack-all-objectsregardless-of-where-they-currently-are-stored,
 or something?  The word all gives a wrong connotation in a
 different way (e.g. regardless of reachability is a possible wrong
 interpretation), so that does not sound too good, either.

I do not think all is what we want to say. It only affects kept
objects, not any of the other exclusions (e.g., -l).

 --repack-kept-objects?  --include-kept-objects?

Of all of them, I think --pack-kept-objects is probably the best. And I
think we are hitting diminishing returns in thinking too much more on
the name. :)

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] tag: support --sort=spec

2014-02-27 Thread Jeff King
On Thu, Feb 27, 2014 at 07:56:52PM +0700, Nguyễn Thái Ngọc Duy wrote:

 --sort=version:refname (or --sort=v:refname for short) sorts tags as
 if they are versions. --sort=-refname reverses the order (with or
 without :version).
 
 versioncmp() is copied from string/strverscmp.c in glibc commit
 ee9247c38a8def24a59eb5cfb7196a98bef8cfdc, reformatted to Git coding
 style. The implementation is under LGPL-2.1 and according to [1] I can
 relicense it to GPLv2.

This looks good to me. One minor typo:

 diff --git a/Documentation/git-tag.txt b/Documentation/git-tag.txt
 index 404257d..9b05931 100644
 --- a/Documentation/git-tag.txt
 +++ b/Documentation/git-tag.txt
 @@ -95,6 +95,12 @@ OPTIONS
   using fnmatch(3)).  Multiple patterns may be given; if any of
   them matches, the tag is shown.
  
 +--sort=type::
 + Sort in a specific order. Supported type is refname
 + (lexicographic order), version:refname or v:refname (tags
 + name are treated as versions). Prepend - to reverse sort

s/tags name/tag names/

You had mentioned earlier tweaking the version comparison to handle
things like -rc better. I think that can come on top of this initial
patch, but we should probably figure out the final sort order before
including this in a release.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] t3200-branch: test setting branch as own upstream

2014-02-27 Thread Jeff King
On Fri, Feb 28, 2014 at 12:04:18PM +0900, Brian Gesiak wrote:

 No test asserts that git branch -u refs/heads/my-branch my-branch
 emits a warning. Add a test that does so.

For an operation like git branch foo origin where setting up the
tracking is a side effect, a warning makes sense. But the sole purpose
of the command above is to set the upstream, and we didn't do it; should
this warning actually be upgraded to an error?

 +test_expect_success '--set-upstream-to shows warning if used to set branch 
 as own upstream' '
 + git branch --set-upstream-to refs/heads/my13 my13 2actual 
 + cat expected EOF 
 +warning: Not setting branch my13 as its own upstream.
 +EOF
 + test_cmp expected actual
 +'

This should use test_i18ncmp, as the string you are matching is
internationalized.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] branch: use skip_prefix

2014-02-27 Thread Jeff King
On Fri, Feb 28, 2014 at 12:04:19PM +0900, Brian Gesiak wrote:

 From: modocache modoca...@gmail.com

Both your emailed patches have this, which is due to your author name
not matching your sending identity. You probably want to set user.name,
or if you already have (which it looks like you might have from your
Signed-off-by), use git commit --amend --reset-author to update the
author information.

 The install_branch_config function reimplemented the skip_prefix
 function inline. Use skip_prefix function instead for brevity.
 
 Signed-off-by: Brian Gesiak modoca...@gmail.com
 Reported-by: Michael Haggerty mhag...@alum.mit.edu

It's a minor thing, but usually these footer lines try to follow a
chronological order. So the report would come before the signoff (and a
further signoff from the maintainer would go after yours).

 diff --git a/branch.c b/branch.c
 index 723a36b..e163f3c 100644
 --- a/branch.c
 +++ b/branch.c
 [...]

The patch itself looks OK to me.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Rewrite git-compat-util.h:skip_prefix() as a loop

2014-02-27 Thread Jeff King
On Thu, Feb 27, 2014 at 09:33:45PM +0100, David Kastrup wrote:

  diff --git a/git-compat-util.h b/git-compat-util.h
  index cbd86c3..4daa6cf 100644
  --- a/git-compat-util.h
  +++ b/git-compat-util.h
  @@ -357,8 +357,8 @@ extern int suffixcmp(const char *str, const char 
  *suffix);
   
   static inline const char *skip_prefix(const char *str, const char *prefix)
   {
  -  size_t len = strlen(prefix);
  -  return strncmp(str, prefix, len) ? NULL : str + len;
  +while( *prefix != '\0'  *str++ == *prefix++ );
  +return *prefix == '\0' ? str : NULL;
 
  Documentation/CodingGuidelines?
 
 Mostly relevant for tabification here, not helping much otherwise.  In
 particular, does not contain the advice empty statements should appear
 on a line of their own which would help with readability here.

Also whitespace in the while, which I could not find mentioned in
CodingGuidelines either. Maybe:

-- 8 --
Subject: [PATCH] CodingGuidelines: mention C whitespace rules

We are fairly consistent about these, so most are covered by
follow existing style, but it doesn't hurt to be explicit.

Signed-off-by: Jeff King p...@peff.net
---
 Documentation/CodingGuidelines | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
index ef67b53..ed432a8 100644
--- a/Documentation/CodingGuidelines
+++ b/Documentation/CodingGuidelines
@@ -126,6 +126,17 @@ For C programs:
char * string.  This makes it easier to understand code
like char *string, c;.
 
+ - Use whitespace around operators and keywords, but not inside
+   parentheses and not around functions. So:
+
+while (condition)
+   func(bar + 1);
+
+   and not:
+
+while( condition )
+   func (bar+1);
+
  - We avoid using braces unnecessarily.  I.e.
 
if (bla) {
-- 
1.8.5.2.500.g8060133

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] t3200-branch: test setting branch as own upstream

2014-02-27 Thread Jeff King
On Fri, Feb 28, 2014 at 03:17:28PM +0900, Brian Gesiak wrote:

  For an operation like git branch foo origin where setting up the
  tracking is a side effect, a warning makes sense. But the sole purpose
  of the command above is to set the upstream, and we didn't do it; should
  this warning actually be upgraded to an error?
 
 I agree. I originally wrote the test using test_expect_failure--imagine my
 surprise when the exit status was 0, despite the fact that the upstream wasn't
 set!

For reference, you don't want test_expect_failure here; it is about we
want this to succeed, but git is currently buggy and we know it, so mark
it as a failing test. You want test_must_fail to indicate a command
inside a test that must exit non-zero:

  test_expect_success 'pointing upstream to itself fails' '
  test_must_fail git branch -u ...
  '

I notice that the warning comes from install_branch_config, which gets
used both for branch -u, but also in the side effect case I
mentioned above. Is it possible to trigger this as part of such a case?
I think maybe git branch -f --track foo foo would do it. If so, we
should perhaps include a test that it does not break if we upgrade the
-u case to an error.

 Patch is on the way, just waiting for the tests to complete. Thanks
 for pointing that out! Also, sorry if it's in the Makefile somewhere,
 but is there an easy way to run just a single test file in the t
 directory?

You can run ./t-sh explicitly.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] rebase: new convenient option to edit a single commit

2014-02-27 Thread Jeff King
On Thu, Feb 27, 2014 at 08:01:18PM +0700, Nguyễn Thái Ngọc Duy wrote:

 I find myself often do git rebase -i xxx and replace one pick line
 with edit to amend just one commit when I see something I don't like
 in that commit. This happens often while cleaning up a series. This
 automates the replace step so it sends me straight to that commit.

Yeah, I do this a lot, too.  The interface you propose makes sense to
me, though I'm not sure how much I would use it, as I often do not know
the specifier of the commit I want to change (was it HEAD~3 or
HEAD~4?). I guess using :/ could make that easier.

One comment on the option name:

 +1,edit-one!generate todo list to edit this commit

I'd expect -$n to mean rebase the last $n commits (as opposed to
everything not in the upstream). That does not work currently, of
course, but:

  1. It has the potential to confuse people who read it, since it's
 unlike what -1 means in most of the rest of git.

  2. It closes the door if we want to support -$n in the future.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] t3200-branch: test setting branch as own upstream

2014-02-27 Thread Jeff King
On Fri, Feb 28, 2014 at 07:55:25AM +0100, Johannes Sixt wrote:

  This should use test_i18ncmp, as the string you are matching is
  internationalized.
 
 More generally, stderr output shouldn't be tested with test_cmp or
 test_i18ncmp at all, but with grep and test_i18ngrep. The reason is that
 when you run the test with 'sh -x t3200* -v -i', the trace output is also
 in stderr, and the test_cmp/test_i18ncmp fails due to the unexpected extra
 text.

I didn't think we bothered to make sh -x work robustly. I don't mind
if we do, but git grep -E 'test_(i18n)?cmp .*err shows many potential
problem spots.

Hmm. Looks like it is only a problem if you are calling a shell function
(since it is the shell function's trace output you are seeing). So this
test would be OK as-is, but testing for an error, like:

  test_must_fail git branch -u foo foo 2stderr

would not be, because we see the trace from test_must_fail. So some of
the callsites found by my grep are actually probably fine.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] t3200-branch: test setting branch as own upstream

2014-02-27 Thread Jeff King
On Fri, Feb 28, 2014 at 02:14:01AM -0500, Jeff King wrote:

 I didn't think we bothered to make sh -x work robustly. I don't mind
 if we do, but git grep -E 'test_(i18n)?cmp .*err shows many potential
 problem spots.

Just for fun:

  cd t
  make SHELL_PATH=sh -x prove

causes 326 test failures across 43 scripts. That's slightly misleading,
because 200 of the failures are all in t0008, and updating one function
would probably clear up all of them.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] rebase: new convenient option to edit a single commit

2014-02-27 Thread Jeff King
On Fri, Feb 28, 2014 at 02:34:16PM +0700, Duy Nguyen wrote:

  Yeah, I do this a lot, too.  The interface you propose makes sense to
  me, though I'm not sure how much I would use it, as I often do not know
  the specifier of the commit I want to change (was it HEAD~3 or
  HEAD~4?). I guess using :/ could make that easier.
 
 In my case, I just copy/paste the commit ID from git log -lp. I
 think :/ already works with rebase..

I think it should work. I just meant I will have to get in the habit of
starting to use :/. :)

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] t3200-branch: test setting branch as own upstream

2014-02-28 Thread Jeff King
On Fri, Feb 28, 2014 at 04:28:38PM +0900, Brian Gesiak wrote:

 I would be in favor of using test_i18ngrep, but it seems like this
 test file overwhelmingly uses test_(i18n)cmp, even when inspecting
 stderr output.

We generally prefer cmp checks to grep checks, because they are more
rigorous. However, when testing human-readable output which may change,
sometimes being too specific can simply make the tests brittle and
annoying. Using a forgiving regex that matches keywords can be helpful.
So there's definitely some room for judgement.

I think what you posted as v2 looks OK.

 Making double-sure that all tests pass when run with sh -x seems
 like a larger endeavor.
 
 Of course, I'd be happy to submit several patches if there's support
 for such a change. But as Peff points out it will be a large diff.

Yeah, I don't think it's worth the effort.

If you feel like continuing on this series, converting the warning()
into a die() would be a much more productive use of time (but if you
don't, I do not see any reason not to take the patches you've posted).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] repack: add `repack.honorpackkeep` config var

2014-02-28 Thread Jeff King
On Thu, Feb 27, 2014 at 10:04:44AM -0800, Junio C Hamano wrote:

 I wonder if it makes sense to link it with pack.writebitmaps more
 tightly, without even exposing it as a seemingly orthogonal knob
 that can be tweaked, though.
 
 I think that is because I do not fully understand the , because ...
 part of the below:
 
  This patch introduces an option to disable the
  `--honor-pack-keep` option.  It is not triggered by default,
  even when pack.writeBitmaps is turned on, because its use
  depends on your overall packing strategy and use of .keep
  files.
 
 If you ask --write-bitmap-index (or have pack.writeBitmaps on), you
 do want the bitmap-index to be written, and unless you tell
 pack-objects to ignore the .keep marker, it cannot do so, no?
 
 Does the , because ... part above mean you may have an overall
 packing strategy to use .keep file to not ever repack some subset of
 the objects, so we will not silently explode the kept objects into a
 new pack?

Exactly. The two features (bitmaps and .keep) are not compatible with
each other, so you have to prioritize one. If you are using static .keep
files, you might want them to continue being respected at the expense of
using bitmaps for that repo. So I think you want a separate option from
--write-bitmap-index to allow the appropriate flexibility.

The default is another matter.  I think most people using .bitmaps on a
server would probably want to set repack.packKeptObjects.  They would
want to repack often to take advantage of the .bitmaps anyway, so they
probably don't care about .keep files (any they see are due to races
with incoming pushes).

So we could do something like falling back to turning the option on if
--write-bitmap-index is on _and_ the user didn't specify
--pack-kept-objects. The existing default is mostly there because it is
the conservative choice (.keep files continue to do their thing as
normal unless you say otherwise). But the fallback thing would be one
less knob that bitmap users would need to turn in the common case.

Here's the interdiff for doing the fallback:

---
diff --git a/Documentation/config.txt b/Documentation/config.txt
index 3a3d84f..a8ddc7f 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2139,7 +2139,9 @@ repack.usedeltabaseoffset::
 repack.packKeptObjects::
If set to true, makes `git repack` act as if
`--pack-kept-objects` was passed. See linkgit:git-repack[1] for
-   details. Defaults to false.
+   details. Defaults to `false` normally, but `true` if a bitmap
+   index is being written (either via `--write-bitmap-index` or
+   `pack.writeBitmaps`).
 
 rerere.autoupdate::
When set to true, `git-rerere` updates the index with the
diff --git a/builtin/repack.c b/builtin/repack.c
index 49947b2..6b0b62d 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -9,7 +9,7 @@
 #include argv-array.h
 
 static int delta_base_offset = 1;
-static int pack_kept_objects;
+static int pack_kept_objects = -1;
 static char *packdir, *packtmp;
 
 static const char *const git_repack_usage[] = {
@@ -190,6 +190,9 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
argc = parse_options(argc, argv, prefix, builtin_repack_options,
git_repack_usage, 0);
 
+   if (pack_kept_objects  0)
+   pack_kept_objects = write_bitmap;
+
packdir = mkpathdup(%s/pack, get_object_directory());
packtmp = mkpathdup(%s/.tmp-%d-pack, packdir, (int)getpid());
 
diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index f8431a8..b1eed5c 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -21,7 +21,7 @@ test_expect_success 'objects in packs marked .keep are not 
repacked' '
objsha1=$(git verify-pack -v pack-$packsha1.idx | head -n 1 |
sed -e s/^\([0-9a-f]\{40\}\).*/\1/) 
mv pack-* .git/objects/pack/ 
-   git repack -A -d -l 
+   git repack --no-pack-kept-objects -A -d -l 
git prune-packed 
for p in .git/objects/pack/*.idx; do
idx=$(basename $p)
@@ -35,9 +35,9 @@ test_expect_success 'objects in packs marked .keep are not 
repacked' '
test -z $found_duplicate_object
 '
 
-test_expect_success '--pack-kept-objects duplicates objects' '
+test_expect_success 'writing bitmaps duplicates .keep objects' '
# build on $objsha1, $packsha1, and .keep state from previous
-   git repack -Adl --pack-kept-objects 
+   git repack -Adl 
test_when_finished found_duplicate_object= 
for p in .git/objects/pack/*.idx; do
idx=$(basename $p)
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] archive: add archive.restrictRemote option

2014-02-28 Thread Jeff King
On Thu, Feb 27, 2014 at 10:37:30AM -0800, Junio C Hamano wrote:

  Signed-off-by: Jeff King p...@peff.net
 
 Thanks.
 
 Do GitHub people have general aversion against signing off (or
 sending out, for that matter) their own patches, unless they were
 already active here before they joined GitHub, by the way?

Mostly it is that I clean up the patches and commit messages before
sending them out. Michael sends out his own patches because they are
already perfect by the time I see them. :)

I can certainly get S-O-B from GitHubbers, but my impression of the DCO
is that it does not matter; as the first link in the signoff chain, I am
certifying that the patch meets the licensing requirements. Of course, I
know that because of my relationship to the author and our employer,
which is something that isn't encoded in the headers. A S-O-B from the
author would perhaps make it more obvious what happened.

 I like the general idea and this escape hatch would be a good thing
 to have.
 
 A few comments:
 
  - Seeing the word combination restrict+remote before reading
the explanation made me think hmph, only allow remote archive
from certain hosts but not from others?  We are restricting the
objects and only on the remote usage, not restricting remotes, so
somebody else may be able to come up with a less misleading name.

  - It might be better to call the escape hatch allow something
that defaults to false.  It is merely the issue of perception,
but having a knob to be limiting that defaults to true gives a
wrong impression that in an ideal world remote archive ought to
be loose and we are artificially limiting it by default.

After reading your first point, I came up with
archive.allowRemoteUnreachable, which also satisfies the second. I do
not have a strong opinion.

  +archive.restrictRemote::
  +   If true, archives can only be requested by refnames. If false,
 
 As this does not affect local use of git archive, requested by
 refnames may need to be clarified further.  Perhaps remote
 archives can be requested only for published refnames or something.

I was hoping to be vague. If we really want to get into specifics, we
should probably document the current rules (refnames, and sub-trees of
refnames). It might be a good thing to document that anyway, though. And
by doing so, it would become obvious why one would want to set this
option. I'll see what I can come up with.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/2] lifting upload-archive restrictions

2014-02-28 Thread Jeff King
On Fri, Feb 28, 2014 at 04:07:09AM -0500, Jeff King wrote:

  As this does not affect local use of git archive, requested by
  refnames may need to be clarified further.  Perhaps remote
  archives can be requested only for published refnames or something.
 
 I was hoping to be vague. If we really want to get into specifics, we
 should probably document the current rules (refnames, and sub-trees of
 refnames). It might be a good thing to document that anyway, though. And
 by doing so, it would become obvious why one would want to set this
 option. I'll see what I can come up with.

Here's a series to handle this; I think the end result is much nicer. I
ended up calling the option uploadarchive.allowUnreachable. By moving
it there instead of under archive, it is more clear that this is about
the _serving_ end of the remote connection, and the word remote
becomes redundant.

  [1/2]: docs: clarify remote restrictions for git-upload-archive
  [2/2]: add uploadarchive.allowUnreachable option

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/2] docs: clarify remote restrictions for git-upload-archive

2014-02-28 Thread Jeff King
Commits ee27ca4 and 0f544ee introduced rules by which
git-upload-archive would restrict clients from accessing
unreachable objects. However, we never documented those
rules anywhere, nor their reason for being. Let's do so now.

Signed-off-by: Jeff King p...@peff.net
---
 Documentation/git-archive.txt|  5 -
 Documentation/git-upload-archive.txt | 26 ++
 2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
index b97aaab..cfa1e4e 100644
--- a/Documentation/git-archive.txt
+++ b/Documentation/git-archive.txt
@@ -65,7 +65,10 @@ OPTIONS
 
 --remote=repo::
Instead of making a tar archive from the local repository,
-   retrieve a tar archive from a remote repository.
+   retrieve a tar archive from a remote repository. Note that the
+   remote repository may place restrictions on which sha1
+   expressions may be allowed in `tree-ish`. See
+   linkgit:git-upload-archive[1] for details.
 
 --exec=git-upload-archive::
Used with --remote to specify the path to the
diff --git a/Documentation/git-upload-archive.txt 
b/Documentation/git-upload-archive.txt
index d09bbb5..8ae65d8 100644
--- a/Documentation/git-upload-archive.txt
+++ b/Documentation/git-upload-archive.txt
@@ -20,6 +20,32 @@ This command is usually not invoked directly by the end 
user.  The UI
 for the protocol is on the 'git archive' side, and the program pair
 is meant to be used to get an archive from a remote repository.
 
+SECURITY
+
+
+In order to protect the privacy of objects that have been removed from
+history but may not yet have been pruned, `git-upload-archive` avoids
+serving archives for commits and trees that are not reachable from the
+repository's refs.  However, because calculating object reachability is
+computationally expensive, `git-upload-archive` implements a stricter
+but easier-to-check set of rules:
+
+  1. Clients may request a commit or tree that is pointed to directly by
+ a ref. E.g., `git archive --remote=origin v1.0`.
+
+  2. Clients may request a sub-tree within a commit or tree using the
+ `ref:path` syntax. E.g., `git archive --remote=origin v1.0:Documentation`.
+
+  3. Clients may _not_ use other sha1 expressions, even if the end
+ result is reachable. E.g., neither a relative commit like `master^`
+ nor a literal sha1 like `abcd1234` is allowed, even if the result
+ is reachable from the refs.
+
+Note that rule 3 disallows many cases that do not have any privacy
+implications. These rules are subject to change in future versions of
+git, and the server accessed by `git archive --remote` may or may not
+follow these exact rules.
+
 OPTIONS
 ---
 directory::
-- 
1.8.5.2.500.g8060133

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/2] add uploadarchive.allowUnreachable option

2014-02-28 Thread Jeff King
From: Scott J. Goldman scot...@github.com

In commit ee27ca4, we started restricting remote git-archive
invocations to only accessing reachable commits. This
matches what upload-pack allows, but does restrict some
useful cases (e.g., HEAD:foo). We loosened this in 0f544ee,
which allows `foo:bar` as long as `foo` is a ref tip.
However, that still doesn't allow many useful things, like:

  1. Commits accessible from a ref, like `foo^:bar`, which
 are reachable

  2. Arbitrary sha1s, even if they are reachable.

We can do a full object-reachability check for these cases,
but it can be quite expensive if the client has sent us the
sha1 of a tree; we have to visit every sub-tree of every
commit in the worst case.

Let's instead give site admins an escape hatch, in case they
prefer the more liberal behavior.  For many sites, the full
object database is public anyway (e.g., if you allow dumb
walker access), or the site admin may simply decide the
security/convenience tradeoff is not worth it.

This patch adds a new config option to disable the
restrictions added in ee27ca4. It defaults to off, meaning
there is no change in behavior by default.

Signed-off-by: Jeff King p...@peff.net
---
 Documentation/config.txt |  7 +++
 Documentation/git-upload-archive.txt |  6 ++
 archive.c| 13 +++--
 t/t5000-tar-tree.sh  |  9 +
 4 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 040197b..62f0a4e 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2334,6 +2334,13 @@ transfer.unpackLimit::
not set, the value of this variable is used instead.
The default value is 100.
 
+uploadarchive.allowUnreachable::
+   If true, allow clients to use `git archive --remote` to request
+   any tree, whether reachable from the ref tips or not. See the
+   discussion in the `SECURITY` section of
+   linkgit:git-upload-archive[1] for more details. Defaults to
+   `false`.
+
 uploadpack.hiderefs::
String(s) `upload-pack` uses to decide which refs to omit
from its initial advertisement.  Use more than one
diff --git a/Documentation/git-upload-archive.txt 
b/Documentation/git-upload-archive.txt
index 8ae65d8..cbef61b 100644
--- a/Documentation/git-upload-archive.txt
+++ b/Documentation/git-upload-archive.txt
@@ -46,6 +46,12 @@ implications. These rules are subject to change in future 
versions of
 git, and the server accessed by `git archive --remote` may or may not
 follow these exact rules.
 
+If the config option `uploadArchive.allowUnreachable` is true, these
+rules are ignored, and clients may use arbitrary sha1 expressions.
+This is useful if you do not care about the privacy of unreachable
+objects, or if your object database is already publicly available for
+access via non-smart-http.
+
 OPTIONS
 ---
 directory::
diff --git a/archive.c b/archive.c
index 346f3b2..7d0976f 100644
--- a/archive.c
+++ b/archive.c
@@ -17,6 +17,7 @@ static char const * const archive_usage[] = {
 static const struct archiver **archivers;
 static int nr_archivers;
 static int alloc_archivers;
+static int remote_allow_unreachable;
 
 void register_archiver(struct archiver *ar)
 {
@@ -257,7 +258,7 @@ static void parse_treeish_arg(const char **argv,
unsigned char sha1[20];
 
/* Remotes are only allowed to fetch actual refs */
-   if (remote) {
+   if (remote  !remote_allow_unreachable) {
char *ref = NULL;
const char *colon = strchr(name, ':');
int refnamelen = colon ? colon - name : strlen(name);
@@ -401,6 +402,14 @@ static int parse_archive_args(int argc, const char **argv,
return argc;
 }
 
+static int git_default_archive_config(const char *var, const char *value,
+ void *cb)
+{
+   if (!strcmp(var, uploadarchive.allowunreachable))
+   remote_allow_unreachable = git_config_bool(var, value);
+   return git_default_config(var, value, cb);
+}
+
 int write_archive(int argc, const char **argv, const char *prefix,
  int setup_prefix, const char *name_hint, int remote)
 {
@@ -411,7 +420,7 @@ int write_archive(int argc, const char **argv, const char 
*prefix,
if (setup_prefix  prefix == NULL)
prefix = setup_git_directory_gently(nongit);
 
-   git_config(git_default_config, NULL);
+   git_config(git_default_archive_config, NULL);
init_tar_archiver();
init_zip_archiver();
 
diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh
index 05f011d..1cf0a4e 100755
--- a/t/t5000-tar-tree.sh
+++ b/t/t5000-tar-tree.sh
@@ -213,6 +213,15 @@ test_expect_success 'clients cannot access unreachable 
commits' '
test_must_fail git archive --remote=. $sha1 remote.tar
 '
 
+test_expect_success 'upload-archive can allow unreachable commits' '
+   test_commit

Re: [PATCH 1/2] t3200-branch: test setting branch as own upstream

2014-02-28 Thread Jeff King
On Fri, Feb 28, 2014 at 07:44:10PM +0900, Brian Gesiak wrote:

  I notice that the warning comes from install_branch_config, which gets
  used both for branch -u, but also in the side effect case I
  mentioned above. Is it possible to trigger this as part of such a case?
  I think maybe git branch -f --track foo foo would do it. If so, we
  should perhaps include a test that it does not break if we upgrade the
  -u case to an error.
 
 Do you mean that install_branch_config should continue to emit a
 warning in the side effect case? I'm not sure I agree--how is git
 branch -f --track foo foo less erroneous than git branch -u foo
 refs/heads/foo? Perhaps I'm missing some insight on how --track is
 used.

I'd be more worried about triggering it via the config. E.g.:

  git config branch.autosetupmerge always
  git branch -f foo foo

Should the second command die? I admit I'm having a hard time coming up
with a feasible reason why anyone would do branch -f foo foo in the
first place. I just don't want to regress somebody else's workflow due
to my lack of imagination.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: GSoC idea: allow git rebase --interactive todo lines to take options

2014-02-28 Thread Jeff King
On Thu, Feb 27, 2014 at 01:10:30PM -0500, Brandon McCaig wrote:

 On Wed, Feb 26, 2014 at 5:52 AM, Jeff King p...@peff.net wrote:
  This seems like a reasonable feature to me. All of your examples are
  possible with an edit and another git command, but the convenience may
  be worth it (though personally, most of the examples you gave are
  particularly interesting to me[1]).
 
 This strikes me as over-complicating the rebase --interactive
 interface.

Sorry, I missed an important word in my final sentence. It should have
been the examples you gave are NOT particularly interesting to me.

 Particularly all of the ideas expressed later on about
 merge commits and resetting authors, etc. It seems like you're trying
 to define a whole new command set (i.e., API) for Git, but within the
 context of rebase --interactive. I think it would be hard to document
 this, and hard to learn it, and harder still to remember it (even
 though it would obviously try to mirror the existing Git command API).

I agree some of the examples are getting esoteric. Things like --signoff
and --reset-author are a fairly straightforward convenience feature:
they save you from writing exec git commit --amend --signoff.

For others that cannot currently be done with a simple option to git
commit, I think a reasonable first step would be to implement them
there. For example, you cannot currently git commit --tree. Maybe that
is too insane and low-level an option for git commit. But if it is,
then it is almost certainly too insane and low-level for a rebase
instruction.

For others from Michael's list, I expect they may not make _sense_
outside of a rebase. That is, they are operations whose input is not a
single commit, but a sequence of commits (e.g., if you had some
high-level command that allowed swapping two commits without having to
redo the conflicts from the second commit). Those ones might make sense
to exist as part of rebase and nowhere else (but then they are not
necessarily just options, but rather new instructions).

 That said, I do think that this is probably a bad direction and
 shouldn't be rushed into too fast.

I'm not sure whether it is a good idea or not. But I think it is looking
decreasingly like a good GSoC project.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] t3200-branch: test setting branch as own upstream

2014-02-28 Thread Jeff King
On Fri, Feb 28, 2014 at 08:16:13PM +0900, Brian Gesiak wrote:

  I just don't want to regress somebody else's workflow due
  to my lack of imagination.
 
 This makes a lot of sense to me, although as-is the function emits a
 warning and returns immediately (although with a successful status
 code), so I'm also stumped as to what kind of workflow this would be
 included in.

I'm cc-ing Matthieu, who wrote 85e2233, which introduces the check. Its
commit message says:

  branch: warn and refuse to set a branch as a tracking branch of
  itself.

  Previous patch allows commands like git branch --set-upstream foo
  foo, which doesn't make much sense. Warn the user and don't change
  the configuration in this case. Don't die to let the caller finish its
  job in such case.

For those just joining us, we are focused on the final statement, and
deciding whether we should die() in this case. But I am not clear what
job it would want to be finishing (creating the branch, I guess, but you
cannot be doing anything useful, since by definition the branch already
exists and you are not changing its tip). There wasn't any relevant
discussion on the list[1]. Matthieu, can you remember anything else that
led to that decision?

 In any case, if the jury's out on this one, I suppose the two patches
 I submitted are good to merge? Part of me thinks the bump from warning
 to error belongs in its own patch anyway.

Yeah, it should definitely be a separate patch on top.

-Peff

[1] Threads are:

  http://thread.gmane.org/gmane.comp.version-control.git/137297/focus=137299

and

  http://thread.gmane.org/gmane.comp.version-control.git/137401/focus=137403

but the discussion focused on the first part of the series.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] repack: add `repack.honorpackkeep` config var

2014-02-28 Thread Jeff King
On Fri, Feb 28, 2014 at 10:45:39AM -0800, Junio C Hamano wrote:

  Exactly. The two features (bitmaps and .keep) are not compatible with
  each other, so you have to prioritize one. If you are using static .keep
  files, you might want them to continue being respected at the expense of
  using bitmaps for that repo. So I think you want a separate option from
  --write-bitmap-index to allow the appropriate flexibility.
 
 What is the appropriate flexibility, though?  If the user wants to
 use bitmap, we would need to drop .keep, no?

Or the flip side: if the user wants to use .keep, we should drop
bitmaps. My point is that we do not know which way the user wants to
go, so we should not tie the options together.

 Doesn't always having two copies in two packs degrade performance
 unnecessarily (without even talking about wasted diskspace)?  An
 explicit .keep exists in the repository because it is expensive and
 undesirable to duplicate what is in there in the first place, so it
 feels to me that either

The benefits of static .keep files are (I think):

  1. less I/O during repacks, as you do not rewrite a static set of
 objects

  2. less turnover of packfiles, which can make dumb access more
 efficient (both for dumb clients, but also for things like
 non-git-aware backups).

I think the existence of smart-http more or less nullifies (2). For (1),
it helps at first, but you get diminishing returns as your non-keep
packfile grows. I think it only helps in pathological cases (e.g., you
mark 10GB worth of giant blobs in a .keep pack, and then pack the other
10MB of trees, commits, and normal-sized blobs as usual).

  - Disable with warning, or outright refuse, the -b option if
there is .keep (if we want to give precedence to .keep); or
 
  - Remove .keep with warning when -b option is given (if we want
to give precedence to -b).
 
 and nothing else would be a reasonable option.  Unfortunately, we
 can do neither automatically because there could be a transient .keep
 file in an active repository.

Right, the transient ones complicate the issue. But I think even for
static .keep versus bitmaps, there is question. See below...

  The default is another matter.  I think most people using .bitmaps on a
  server would probably want to set repack.packKeptObjects.  They would
  want to repack often to take advantage of the .bitmaps anyway, so they
  probably don't care about .keep files (any they see are due to races
  with incoming pushes).
 
 ... which makes me think that repack.packKeptObjects is merely a
 distraction---it should be enough to just pass --pack-kept-objects
 when -b is asked, without giving any extra configurability, no?

But you do not necessarily ask for -b explicitly; it might come from
the config, too. Imagine you have a server with many repos. You want to
use bitmaps when you can, so you set pack.writeBitmaps in
/etc/gitconfig. But in a few repos, you want to use .keep files, and
it's more important for you to use it than bitmaps (e.g., because it is
one of the pathological cases above). So you set repack.packKeptObjects
to false in /etc/gitconfig, to prefer .keep to bitmaps where
appropriate.

If you did not have that config option, your alternative would be to
turn off pack.writeBitmaps in the repositories with .keep files. But
then you need to per-repo keep that flag in sync with whether or not the
repo has .keep files.

To be clear, at GitHub we do not plan on ever having
repack.packKeptObjects off (for now we have it on explicitly, but if it
were connected to pack.writeBitmaps, then we would be happy with that
default). I am mostly trying to give an escape hatch to let people use
different optimization strategies if they want.

If we are going to have --pack-kept-objects (and I think we should),
I think we also should have a matching config option. Because it is
useful when matched with the bitmap code, and that can be turned on both
from the command-line or from the config. Wherever you are doing that,
you would want to be able to make the matching .keep decision.

And I don't think it hurts much.  With the fallback-to-on behavior, you
do not have to even care that it is there unless you are doing something
clever.

  So we could do something like falling back to turning the option on if
  --write-bitmap-index is on _and_ the user didn't specify
  --pack-kept-objects.
 
 If you mean didn't specify --no-pack-kept-objects, then I think
 that is sensible.  I still do not know why we would want the
 configuration variable, though.

Right, I meant if the pack-kept-objects variable is set, either on or
off, either via the command line or in the config.

As far as what the config is good for, see above.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] repack: add `repack.honorpackkeep` config var

2014-02-28 Thread Jeff King
On Fri, Feb 28, 2014 at 10:09:08AM -0700, Nasser Grainawi wrote:

  Exactly. The two features (bitmaps and .keep) are not compatible with
  each other, so you have to prioritize one. If you are using static .keep
  files, you might want them to continue being respected at the expense of
  using bitmaps for that repo. So I think you want a separate option from
  --write-bitmap-index to allow the appropriate flexibility.
 
 Has anyone thought about how to make them compatible?

Yes, but it's complicated and not likely to happen soon.

Having .keep files means that you are not including some objects in the
newly created pack. Each bit in a commit's bitmap corresponds to one
object in the pack, and whether it is reachable from that commit. The
bitmap is only useful if we can calculate the full reachability from it,
and it has no way to specify objects outside of the pack.

To fix this, you would need to change the on-disk format of the bitmaps
to somehow reference objects outside of the pack. Either by having the
bitmaps index a repo-global set of objects, or by permitting a list of
edge objects that are referenced from the pack, but not included (and
then when assembling the full reachable list, you would have to recurse
across edge objects to find their reachable list in another pack,
etc).

So it's possible, but it would complicate the scheme quite a bit, and
would not be backwards compatible with either JGit or C Git.

 We're using Martin Fick's git-exproll script which makes heavy use of
 keeps to reduce pack file churn. In addition to the on-disk benefits
 we get there, the driving factor behind creating exproll was to
 prevent Gerrit from having two large (30GB+) mostly duplicated pack
 files open in memory at the same time. Repacking in JGit would help in
 a single-master environment, but we'd be back to having this problem
 once we go to a multi-master setup.
 
 Perhaps the solution here is actually something in JGit where it could
 aggressively try to close references to pack files

In C git we don't worry about this too much, because our programs tend
to be short-lived, and references to the old pack will go away quickly.
Plus it is all mmap'd, so as we simply stop accessing the pages of the
old pack, they should eventually be dropped if there is memory pressure.

I seem to recall that JGit does not mmap its packfiles. Does it pread?
In that case, I'd expect unused bits from the duplicated packfile to get
dropped from the disk cache over time. If it loads whole packfiles into
memory, then yes, it should probably close more aggressively.

 , but that still
 doesn't help the disk churn problem. As Peff says below, we would want
 to repack often to get up-to-date bitmaps, but ideally we could do
 that without writing hundreds of GBs to disk (which is obviously worse
 when disk is a NFS mount).

Ultimately I think the solution to the churn problem is a packfile-like
storage that allows true appending of deltas. You can come up with a
scheme to allow deltas between on-disk packs (i.e., thin packs on
disk). The trick there is handling the dependencies and cycles. I think
you could get by with a strict ordering of packs and a few rules:

  1. An object in a pack with weight A cannot have as a base an object
 in a pack with weight = A.

  2. A pack with weight A cannot be deleted if there exists a pack with
 weight  A.

But you'd want to also add in a single update-able index over all the
packfiles, and even then you'd still want to pack occasionally (because
you'd end up with deltas on bases going back in time, but you really
prefer your bases to be near the tip of history).

So I am not volunteering to work on it. :)

At GitHub we mostly deal with the churn by throwing more server
resources at it. But we have the advantage of having a very large number
of small-to-medium repos, which is relatively easy to scale up. A small
number of huge repos is trickier.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Halt during fetch on MacOS

2014-02-28 Thread Jeff King
On Fri, Feb 28, 2014 at 03:26:28PM -0800, Conley Owens wrote:

 test.sh
 
 #!/bin/bash
 rungit() {
 mkdir $1
 GIT_DIR=$1 git init --bare
 echo '[remote aosp]'  $1/config
 echo 'url =
 https://android.googlesource.com/platform/external/tinyxml2' 
 $1/config
 GIT_DIR=$1 git fetch aosp +refs/heads/master:refs/remotes/aosp/master

I don't think this is affecting your test, but you probably want  to
append to the config for the first line, too. Otherwise you are
overwriting some of git's default settings.

 When everything cools, you can see that there are some fetches hanging
 (typically).
 $ ps | grep 'git fetch'
 ...
 63310 ttys0040:00.01 git fetch aosp
 +refs/heads/master:refs/remotes/aosp/master
 [...]

I can't reproduce here on Linux. Can you find out what the processes are
doing with something like strace?

 You can look at the parent process of each and see that one half
 spawned the other half, or you can look at the environment variables
 for each to see that there are two processes operating in the same
 directory for each directory where there's an issue.
 $ echo $(for pid in $(ps | grep 'git fetch' | grep -o '^[0-9]*'); do
 ps -p $pid -wwwE | grep 'GIT_DIR=[^ ]*' -o; done) | sort
 GIT_DIR=testdir14
 GIT_DIR=testdir14
 GIT_DIR=testdir32
 GIT_DIR=testdir32
 GIT_DIR=testdir47
 GIT_DIR=testdir47

A fetch will start many sub-processes. Try:

  GIT_TRACE=1 git fetch \
https://android.googlesource.com/platform/external/tinyxml2

which shows git-fetch starting the git-remote-https helper, which in
turn starts git-fetch-pack to do the actual protocol, which uses
git-unpack-objects to manage the incoming pack.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: t9200 cvsexportcommit test fails on Ubuntu server 12.04.4 LTS

2014-02-28 Thread Jeff King
On Fri, Feb 28, 2014 at 07:45:07PM +0100, Fabio D'Alfonso wrote:

 I get 12 of 15 tests faling.
 
 Any idea? the same build works fine on 11.04 where I have a desktop version.

No clue. It works fine for me here (Debian sid). Perhaps try running the
script like:

./t9200-git-cvsexportcommit.sh -v -i

which should give more information about what exactly is failing?

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 01/11] builtin/pack-objects.c: change check_pbase_path() to use ALLOC_GROW()

2014-02-28 Thread Jeff King
On Fri, Feb 28, 2014 at 11:03:19AM -0800, Junio C Hamano wrote:

 Michael Haggerty mhag...@alum.mit.edu writes:
 
  So my vote is that the patches are OK the way Dmitry wrote them (mind, I
  have only read through 05/11 so far).
 
 Seconded ;-)
 
 By the way, I do not like these long subjects.  change is a
 redundant word when one sends a patch---as all patches are about
 changing something.
 
   Subject: builtin/pack-objects.c: use ALLOC_GROW() in check_pbase_path()
 
 would be a lot more appropriate for git shortlog consumption.

I would actually go one step further and drop or shorten the filename in
the subject. It is very long, it is already easy to see which file was
changed from the diffstat, and it doesn't give any useful context for
other parts of the subject.

I really like the foo: convention for starting a subject line, because
it immediately makes clear what area you are working in without having
to waste space on English conjunctions or prepositions. But it does not
have to be a filename. It can be a subsystem, a command, a function, an
area of the project, or anything that gives context to the rest of the
line.

So I would suggest one of:

  Subject: use ALLOC_GROW() in check_pbase_path()

Talking about the filename is redundant; there's only one
check_pbase_path.

  Subject: check_pbase_path: use ALLOW_GROW

Even shorter.

  Subject: builtin/pack-objects.c: use ALLOC_GROW

This one implies to me that the point of the commit is to convert
the whole file to use ALLOC_GROW where appropriate, not just that
function (even if that function may be the only spot changed).

I'd probably not use:

  Subject: pack-objects: use ALLOC_GROW

as the scope is not about the command, but about the C file.

I realize that I just bikeshedded on subject lines for half a page, and
part of me wants to go kill myself in shame. But I feel like I see the
technique misapplied often enough that maybe some guidance is merited.
Feel free to ignore. :)

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC GSoC idea: new git config features

2014-02-28 Thread Jeff King
On Sat, Mar 01, 2014 at 01:19:32AM +0100, Michael Haggerty wrote:

 I absolutely understand that changing all of the config parsers is not
 feasible.  But I had imagined a third route:
 
 (3) parse the config once, storing the raw values to records in
 memory.  When an unset is seen, delete any previous records that
 have accumulated for that key.  After the whole config has been
 read, iterate through the records, feeding the surviving values
 into the callback in the order they were originally read (minus
 deletions).
 
 Do you see any problems with this way of implementing the functionality
 (aside from slightly increased overhead)?

Yeah, this is something I have considered many times. It does have some
overhead, but the existing system is not great either. As you noted, we
often read the config several times for a given program invocation.

But moreover, we linearly strcmp each config key we find against each
one we know about. In some cases we return early if a sub-function is
looking for `starts_with(key, foo.)`, but in most cases we just look
for foo.bar, foo.baz, and so on.

If we had the keys in-memory, we could reverse this: config code asks
for keys it cares about, and we can do an optimized lookup (binary
search, hash, etc).

That also makes many constructs easier to express. Recently we had a
problem where the parsing order of remote.pushdefault and
branch.*.pushremote mattered, because they were read into the same
variable. The solution is to use two variables and reconcile them after
all config is read. But if you can just query the config subsystem
directly, the boilerplate of reading them into strings goes away, and
you can just do:

  x = config_string_getf(branch.%s.pushremote, current_branch);
  if (!x)
  x = config_string_get(remote.pushdefault);
  if (!x)
  x = config_string_getf(branch.%s.remote, current_branch);
  if (!x)
  x = origin;

As it is now, the code that does this has a lot more boilerplate, and is
split across several functions.

Another example is the way we have to manage deferred config in
git-status (see 84b4202). This might be more clear if we could simply
`config_get_bool(status.branch)` at the right moment.

The talk of efficiency is probably a red-herring here. I do not think
config-reading is a significant source of slow-down in the current code.
But I'd be in favor of something that reduced boilerplate and made the
code easier to read.

  But the side effects these callbacks may cause are not limited to
  setting a simple scaler variable (like 'frotz' in the illustration)
  but would include things that are hard to undo once done
  (e.g. calling a set-up function with a lot of side effects).

Most callbacks would convert to a query system in a pretty
straightforward way, but some that have side effects might be tricky.
Converting them all may be too large for a GSoC project, but I think you
could do it gradually:

  1. Convert the parser to read into an in-memory representation, but
 leave git_config() as a wrapper which iterates over it.

  2. Add query functions like config_string_get() above.

  3. Convert callbacks to query functions one by one.

  4. Eventually drop git_config().

A GSoC project could take us partway through (3).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] rebase: accept -number as another way of saying HEAD~number

2014-03-03 Thread Jeff King
On Mon, Mar 03, 2014 at 10:37:11AM +0100, Matthieu Moy wrote:

 Michael Haggerty mhag...@alum.mit.edu writes:
 
  Or perhaps -NUM should fail with an error message if any of the last
  NUM commits are merges.  In that restricted scenario (which probably
  accounts for 99% of rebases), -NUM is equivalent to HEAD~NUM.
 
 Makes sense to me. So, -NUM would actually mean rebase the last NUM
 commits (as well as being an alias for HEAD~NUM), but would fail when
 it does not make sense (with an error message explaining the situation
 and pointing the user to HEAD~N if this is what he wanted).

Yeah, I agree with this. I think -NUM is only useful for linear
string-of-pearls history. With merges, it either means:

  - linearize history (a la git-log), go back NUM commits, and treat the
result as the upstream. This is useless, because it's difficult to
predict where you're going to end up. Using explicit ~ and ^
relative-commit operators to specify the upstream is more flexible
if you really want to do this (but I don't know why you would).

  - do not use an UNINTERESTING upstream as the cutoff, but instead
try to rebase exactly NUM commits. In the face of non-linear
history, I'm not even sure what this would mean, or how we would
implement it.

Neither of those is useful, so I don't think erroring out on them is a
bad thing. And by erroring out (rather than documenting some weird and
useless behavior), we don't have to worry about backwards compatibility
if we later _do_ come up with a useful behavior (the best I can think of
is that --first-parent -3 might have a use, but I think that should
already be covered, as the results of --first-parent are by-definition
linear).

 This would actually be a feature for me: I often want to rebase recent
 enough history, and when my @{upstream} isn't well positionned, I
 randomly type HEAD~N without remembering what N should be. When N is too
 small, the rebase doesn't reach the interesting commit, and when N is
 too big, it reaches a merge commit and I get a bunch of commits I'm not
 allowed to edit in my todo-list. Then I have to abort the commit
 manually. With -N failing on merge commits, the rebase would abort
 itself automatically.

I do the same thing.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] repack: add `repack.honorpackkeep` config var

2014-03-03 Thread Jeff King
On Mon, Mar 03, 2014 at 10:13:47AM -0800, Junio C Hamano wrote:

  Or the flip side: if the user wants to use .keep, we should drop
  bitmaps. My point is that we do not know which way the user wants to
  go, so we should not tie the options together.
 
 Hmph.  I think the short of your later explanation is global config
 may tell us to use bitmap, in which case we would need a way to
 defeat that and have existing .keep honored, and it makes it easier
 to do so if these two are kept separate, because you do not want to
 run around and selectively disable bitmaps in these repositories.
 We can instead do so with repack.packKeptObjects in the global
 configuration. and I tend to agree with the reasoning.

Yes. Do you need a re-roll from me? I think the last version I sent +
the squash to tie the default to bitmap-writing makes the most sense.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] repack: add `repack.honorpackkeep` config var

2014-03-03 Thread Jeff King
On Mon, Mar 03, 2014 at 11:51:06AM -0800, Junio C Hamano wrote:

  Yes. Do you need a re-roll from me? I think the last version I sent +
  the squash to tie the default to bitmap-writing makes the most sense.
 
 I have 9e20b390 (repack: add `repack.packKeptObjects` config var,
 2014-02-26); I do not recall I've squashed anything into it, though.
 
 Do you mean this one?
 
 Here's the interdiff for doing the fallback:
 [...]

Yes. Though I just noticed that the commit message needs updating if
that is squashed in. Here is the whole patch, with that update.

And I am dropping Vicent as the author, since I think there are now
literally zero lines of his left. ;)

-- 8 --
Subject: [PATCH] repack: add `repack.packKeptObjects` config var

The git-repack command always passes `--honor-pack-keep`
to pack-objects. This has traditionally been a good thing,
as we do not want to duplicate those objects in a new pack,
and we are not going to delete the old pack.

However, when bitmaps are in use, it is important for a full
repack to include all reachable objects, even if they may be
duplicated in a .keep pack. Otherwise, we cannot generate
the bitmaps, as the on-disk format requires the set of
objects in the pack to be fully closed.

Even if the repository does not generally have .keep files,
a simultaneous push could cause a race condition in which a
.keep file exists at the moment of a repack. The repack may
try to include those objects in one of two situations:

  1. The pushed .keep pack contains objects that were
 already in the repository (e.g., blobs due to a revert of
 an old commit).

  2. Receive-pack updates the refs, making the objects
 reachable, but before it removes the .keep file, the
 repack runs.

In either case, we may prefer to duplicate some objects in
the new, full pack, and let the next repack (after the .keep
file is cleaned up) take care of removing them.

This patch introduces both a command-line and config option
to disable the `--honor-pack-keep` option.  By default, it
is triggered when pack.writeBitmaps (or `--write-bitmap-index`
is turned on), but specifying it explicitly can override the
behavior (e.g., in cases where you prefer .keep files to
bitmaps, but only when they are present).

Note that this option just disables the pack-objects
behavior. We still leave packs with a .keep in place, as we
do not necessarily know that we have duplicated all of their
objects.

Signed-off-by: Jeff King p...@peff.net
---
 Documentation/config.txt |  7 +++
 Documentation/git-repack.txt |  8 
 builtin/repack.c | 13 -
 t/t7700-repack.sh| 18 +-
 4 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index becbade..a8ddc7f 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2136,6 +2136,13 @@ repack.usedeltabaseoffset::
false and repack. Access from old Git versions over the
native protocol are unaffected by this option.
 
+repack.packKeptObjects::
+   If set to true, makes `git repack` act as if
+   `--pack-kept-objects` was passed. See linkgit:git-repack[1] for
+   details. Defaults to `false` normally, but `true` if a bitmap
+   index is being written (either via `--write-bitmap-index` or
+   `pack.writeBitmaps`).
+
 rerere.autoupdate::
When set to true, `git-rerere` updates the index with the
resulting contents after it cleanly resolves conflicts using
diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index 002cfd5..4786a78 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -117,6 +117,14 @@ other objects in that pack they already have locally.
must be able to refer to all reachable objects. This option
overrides the setting of `pack.writebitmaps`.
 
+--pack-kept-objects::
+   Include objects in `.keep` files when repacking.  Note that we
+   still do not delete `.keep` packs after `pack-objects` finishes.
+   This means that we may duplicate objects, but this makes the
+   option safe to use when there are concurrent pushes or fetches.
+   This option is generally only useful if you are writing bitmaps
+   with `-b` or `pack.writebitmaps`, as it ensures that the
+   bitmapped packfile has the necessary objects.
 
 Configuration
 -
diff --git a/builtin/repack.c b/builtin/repack.c
index 49f5857..6b0b62d 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -9,6 +9,7 @@
 #include argv-array.h
 
 static int delta_base_offset = 1;
+static int pack_kept_objects = -1;
 static char *packdir, *packtmp;
 
 static const char *const git_repack_usage[] = {
@@ -22,6 +23,10 @@ static int repack_config(const char *var, const char *value, 
void *cb)
delta_base_offset = git_config_bool(var, value);
return 0;
}
+   if (!strcmp(var

Re: [GSoC14][RFC] Proposal Draft: Refactor tempfile handling

2014-03-03 Thread Jeff King
On Sun, Mar 02, 2014 at 06:04:39AM +0900, Brian Gesiak wrote:

 My name is Brian Gesiak. I'm a research student at the University of
 Tokyo, and I'm hoping to participate in this year's Google Summer of
 Code by contributing to Git. I'm a longtime user, first-time
 contributor--some of you may have noticed my microproject
 patches.[1][2]

Yes, we did notice them. Thanks, and welcome. :)

 The ideas page points out that while lock files are closed and
 unlinked[3] when the program exits[4], object pack files implement
 their own brand of temp file creation and deletion. This
 implementation doesn't share the same guarantees as lock files--it is
 possible that the program terminates before the temp file is
 unlinked.[5]
 
 Lock file references are stored in a linked list. When the program
 exits, this list is traversed and each file is closed and unlinked. It
 seems to me that this mechanism is appropriate for temp files in
 general, not just lock files. Thus, my proposal would be to extract
 this logic into a separate module--tempfile.h, perhaps. Lock and
 object files would share the tempfile implementation.
 
 That is, both object and lock temp files would be stored in a linked
 list, and all of these would be deleted at program exit.

Yes, I think this is definitely the right way to go. We should be able
to unify the tempfile handling for all of git. Once the logic is
extracted into a nice API, there are several other places that can use
it, too:

  - the external diff code creates tempfiles and uses its own cleanup
routines

  - the shallow_XX tempfiles (these are not cleaned right now,
though I sent a patch recently for them to do their own cleanup)

Those are just off the top of my head. There may be other spots, too.

It is worth thinking in your proposal about some of the things that the
API will want to handle. What are the mismatches in how lockfiles and
object files are handled? E.g., how do we finalize them into place?
How should the API be designed to minimize race conditions (e.g., if we
get a signal delivered while we are committing or cleaning up a file)?

 Please let me know if this seems like it would make for an interesting
 proposal, or if perhaps there is something I am overlooking. Any
 feedback at all would be appreciated. Thank you!

You definitely have a grasp of what the project is aiming for, and which
areas need to be touched.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] disable grafts during fetch/push/bundle

2014-03-04 Thread Jeff King
When we are creating a pack to send to a remote, we should
make sure that we are not respecting grafts or replace refs.
Otherwise, we may end up sending a broken pack to the other
side that does not contain all objects (either omitting them
entirely, or using objects that the other side does not have
as delta bases).

We already make an attempt to do the right thing in several
places by turning off read_replace_refs. However, we missed
at least one case (during bundle creation), and we do
nothing anywhere to handle grafts.

This patch introduces a new function to disable both grafts
and replace refs both for the current process and for its
children. We add a call anywhere that packs objects for
sending: bundle create, upload-pack, and push.

This may seem like a rather heavy hammer, as we could just
teach pack-objects not to respect grafts. But this catches
more possible failures:

  1. We may actually feed pack-objects with the results of
 rev-list (e.g., bundle creation does this).

  2. We may be negotiating the HAVEs and WANTs with the
 other side, and our view should be consistent with what
 we feed pack-objects.

  3. It may be useful to let pack-objects use grafts in some
 instances, as evidenced by --keep-true-parents.

Signed-off-by: Jeff King p...@peff.net
---
This is motivated by a real-world case of somebody trying to push to
GitHub with a graft on their local end.

I suspect many other spots that use read_replace_refs = 0 probably
want to disable grafts, too (e.g., fsck and index-pack) but I do not
know of any particular breakage offhand.

I am tempted to say that not using --keep-true-parents is insane, and it
should be the default, but perhaps there is some case I'm missing.

Note that disabling grafts here just turns off .git/info/grafts, which
explicitly leaves the shallow file enabled (even though it ends up in
the same graft list). I'm assuming that the shallow file is handled
properly where it's appropriate, and it would not want to be included in
any of this disabling (certainly fetch/push should be handling it
explicitly already).

 builtin/push.c   |  1 +
 bundle.c |  2 +
 cache.h  |  1 +
 commit.c |  5 +++
 commit.h |  1 +
 environment.c| 22 ++
 t/t6051-replace-grafts-remote.sh | 94 
 upload-pack.c|  2 +-
 8 files changed, 127 insertions(+), 1 deletion(-)
 create mode 100755 t/t6051-replace-grafts-remote.sh

diff --git a/builtin/push.c b/builtin/push.c
index 0e50ddb..448dcb5 100644
--- a/builtin/push.c
+++ b/builtin/push.c
@@ -527,6 +527,7 @@ int cmd_push(int argc, const char **argv, const char 
*prefix)
OPT_END()
};
 
+   disable_grafts_and_replace_refs();
packet_trace_identity(push);
git_config(git_default_config, NULL);
argc = parse_options(argc, argv, prefix, options, push_usage, 0);
diff --git a/bundle.c b/bundle.c
index e99065c..37b00a6 100644
--- a/bundle.c
+++ b/bundle.c
@@ -248,6 +248,8 @@ int create_bundle(struct bundle_header *header, const char 
*path,
struct child_process rls;
FILE *rls_fout;
 
+   disable_grafts_and_replace_refs();
+
bundle_to_stdout = !strcmp(path, -);
if (bundle_to_stdout)
bundle_fd = 1;
diff --git a/cache.h b/cache.h
index db223e8..f538cc2 100644
--- a/cache.h
+++ b/cache.h
@@ -410,6 +410,7 @@ extern const char *get_git_work_tree(void);
 extern const char *read_gitfile(const char *path);
 extern const char *resolve_gitdir(const char *suspect);
 extern void set_git_work_tree(const char *tree);
+extern void disable_grafts_and_replace_refs(void);
 
 #define ALTERNATE_DB_ENVIRONMENT GIT_ALTERNATE_OBJECT_DIRECTORIES
 
diff --git a/commit.c b/commit.c
index 6bf4fe0..886dbfe 100644
--- a/commit.c
+++ b/commit.c
@@ -114,6 +114,11 @@ static unsigned long parse_commit_date(const char *buf, 
const char *tail)
 static struct commit_graft **commit_graft;
 static int commit_graft_alloc, commit_graft_nr;
 
+int commit_grafts_loaded(void)
+{
+   return !!commit_graft_nr;
+}
+
 static int commit_graft_pos(const unsigned char *sha1)
 {
int lo, hi;
diff --git a/commit.h b/commit.h
index 16d9c43..dde0618 100644
--- a/commit.h
+++ b/commit.h
@@ -186,6 +186,7 @@ typedef int (*each_commit_graft_fn)(const struct 
commit_graft *, void *);
 struct commit_graft *read_graft_line(char *buf, int len);
 int register_commit_graft(struct commit_graft *, int);
 struct commit_graft *lookup_commit_graft(const unsigned char *sha1);
+int commit_grafts_loaded(void);
 
 extern struct commit_list *get_merge_bases(struct commit *rev1, struct commit 
*rev2, int cleanup);
 extern struct commit_list *get_merge_bases_many(struct commit *one, int n, 
struct commit **twos, int cleanup);
diff --git a/environment.c b/environment.c
index 4a3437d..b960293 100644

[RFC/PATCH] diff: simplify cpp funcname regex

2014-03-04 Thread Jeff King
The current funcname matcher for C files requires one or
more words before the function name, like:

  static int foo(int arg)
  {

However, some coding styles look like this:

  static int
  foo(int arg)
  {

and we do not match, even though the default regex would.

This patch simplifies the regex significantly. Rather than
trying to handle all the variants of keywords and return
types, we simply look for an identifier at the start of the
line that contains a (, meaning it is either a function
definition or a function call, and then not containing ;
which would indicate it is a call or declaration.

This can be fooled by something like:

if (foo)

but it is unlikely to find that at the very start of a line
with no identation (and without having a complete set of
keywords, we cannot distinguish that from a function called
if taking foo anyway).

We had no tests at all for .c files, so this attempts to add
a few obvious cases (including the one we are fixing here).

Signed-off-by: Jeff King p...@peff.net
---
I tried accommodating this one case in the current regex, but it just
kept getting more complicated and unreadable. Maybe I am being naive to
think that this much simpler version can work. We don't have a lot of
tests or a known-good data set to check.

I did try git log -1000 -p before and after on git.git, diff'd the
results and manually inspected. The results were a mix of strict
improvement (mostly instances of the style I was trying to fix here) and
cases where there is no good answer. For example, for top-level changes
outside functions, we might find:

  N_(some text that is long

that is part of:

  const char *foo =
  N_(some text that is long
  and spans multiple lines);

Before this change, we would skip past it (using the cpp regex, that is;
the default one tends to find the same line) and either report nothing,
or whatever random function was before us. So it's a behavior change,
but the existing behavior is really no better.

 t/t4018-diff-funcname.sh | 36 
 userdiff.c   |  2 +-
 2 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/t/t4018-diff-funcname.sh b/t/t4018-diff-funcname.sh
index 38a092a..1e80b15 100755
--- a/t/t4018-diff-funcname.sh
+++ b/t/t4018-diff-funcname.sh
@@ -93,6 +93,29 @@ sed -e '
s/song;/song();/
 ' Beer.perl Beer-correct.perl
 
+cat Beer.c \EOF
+static int foo(void)
+{
+label:
+   int x = old;
+}
+
+struct foo; /* catch failure below */
+static int
+gnu(int arg)
+{
+   int x = old;
+}
+
+struct foo; /* catch failure below */
+int multiline(int arg,
+ char *arg2)
+{
+   int x = old;
+}
+EOF
+sed s/old/new/ Beer.c Beer-correct.c
+
 test_expect_funcname () {
lang=${2-java}
test_expect_code 1 git diff --no-index -U1 \
@@ -127,6 +150,7 @@ test_expect_success 'set up .gitattributes declaring 
drivers to test' '
cat .gitattributes -\EOF
*.java diff=java
*.perl diff=perl
+   *.c diff=cpp
EOF
 '
 
@@ -158,6 +182,18 @@ test_expect_success 'perl pattern is not distracted by 
forward declaration' '
test_expect_funcname package Beer;\$ perl
 '
 
+test_expect_success 'c pattern skips labels' '
+   test_expect_funcname static int foo(void) c
+'
+
+test_expect_success 'c pattern matches GNU-style functions' '
+   test_expect_funcname gnu(int arg)\$ c
+'
+
+test_expect_success 'c pattern matches multiline functions' '
+   test_expect_funcname int multiline(int arg,\$ c
+'
+
 test_expect_success 'custom pattern' '
test_config diff.java.funcname !static
 !String
diff --git a/userdiff.c b/userdiff.c
index 10b61ec..b9d52b7 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -127,7 +127,7 @@
 /* Jump targets or access declarations */
 !^[ \t]*[A-Za-z_][A-Za-z_0-9]*:.*$\n
 /* C/++ functions/methods at top level */
-^([A-Za-z_][A-Za-z_0-9]*([ \t*]+[A-Za-z_][A-Za-z_0-9]*([ \t]*::[ 
\t]*[^[:space:]]+)?){1,}[ \t]*\\([^;]*)$\n
+^([A-Za-z_].*\\([^;]*)$\n
 /* compound type at top level */
 ^((struct|class|enum)[^;]*)$,
 /* -- */
-- 
1.8.5.2.500.g8060133
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] disable grafts during fetch/push/bundle

2014-03-04 Thread Jeff King
On Tue, Mar 04, 2014 at 06:36:07PM -0500, Eric Sunshine wrote:

 On Tue, Mar 4, 2014 at 12:48 PM, Jeff King p...@peff.net wrote:
  diff --git a/commit.c b/commit.c
  index 6bf4fe0..886dbfe 100644
  --- a/commit.c
  +++ b/commit.c
  @@ -114,6 +114,11 @@ static unsigned long parse_commit_date(const char 
  *buf, const char *tail)
   static struct commit_graft **commit_graft;
   static int commit_graft_alloc, commit_graft_nr;
 
  +int commit_grafts_loaded(void)
  +{
  +   return !!commit_graft_nr;
  +}
 
 Did you mean !!commit_graft ?

Shouldn't they produce the same results?

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] disable grafts during fetch/push/bundle

2014-03-04 Thread Jeff King
On Tue, Mar 04, 2014 at 12:52:18PM -0800, Junio C Hamano wrote:

  We already make an attempt to do the right thing in several
  places by turning off read_replace_refs. However, we missed
  at least one case (during bundle creation), and we do
  nothing anywhere to handle grafts.
 
 Doing nothing for grafts has been pretty much a deliberate
 omission.  Because we have no way to transfer how histories are
 grafted together, people cloning from a repository that grafts away
 a commit that records a mistakenly committed sekrit will end up with
 a disjoint history, instead of exposing the sekrit to them, and are
 expected to join the history by recreating grafts (perhaps a README
 of such a project instructs them to do so).  That was deemed far
 better than exposing the hidden history, I think.

I see your point, but I would be tempted to say that the person trying
to hide a secret with grafting is simply wrong to do so. You need to
cement that history with a rewrite if you want to share with people.

I do not recall any past discussion on this topic, and searching the
archive only shows people echoing what I said above. Is this something
we've promised to work in the past?

I'm certainly sympathetic to systems failing to a secure default rather
than doing something that the user does not expect. But at the same
time, if using grafts for security isn't something people reasonably
expect, then failing only hurts the non-security cases.

 And replace tries to do the right thing was an attempt to rectify
 that misfeature of grafts in that we now do have a way to transfer
 how the history is grafted together, so that project README does not
 have to instruct the fetcher of doing anything special.

Perhaps the right response is grafts are broken, use git-replace
instead. But then should we think about deprecating grafts? Again, this
patch was spurred by a real user with a graft trying to push and getting
a confusing error message.

 It _might_ be a misfeature, however, for the object connectivity
 layer to expose a part of the history replaced away to the party
 that fetches from such a repository.  Ideally, the right thing
 ought to be to include history that would be omitted if we did not
 have the replacement (i.e. adding parents the underlying commit does
 not record), while not following the history that replacement wants
 to hide (i.e. excluding the commits replacement commits overlay).

I don't really think it's worth the complexity. It's fairly common
knowledge (or at least I think so) that replace refs are a _view_ onto
the history. When you share the history graph, you share the true
objects. You can _also_ share your views in replace/refs, but it is up
to the client to fetch them. If you want to hide things, then you need
to rewrite the true objects, end of story.

I dunno. Maybe there are people who have different expectations.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] disable grafts during fetch/push/bundle

2014-03-04 Thread Jeff King
On Tue, Mar 04, 2014 at 08:00:44PM -0500, Eric Sunshine wrote:

   +int commit_grafts_loaded(void)
   +{
   +   return !!commit_graft_nr;
   +}
 
  Did you mean !!commit_graft ?
 
  Shouldn't they produce the same results?
 
 Yes they should, but the use of !! seemed to imply that you wanted to
 apply it to the pointer value. (If you indeed intended to use
 commit_graft_nr, then 'return commit_graft_nr', without !!, would have
 been sufficient and idiomatic C.)

I just wanted to normalize the return value to a boolean 0/1. Even when
the input is an int, it eliminates surprises when you try to assign the
result to a bitfield or other smaller-width type.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: New directory lost by git am

2014-03-05 Thread Jeff King
On Wed, Mar 05, 2014 at 09:26:43AM -0500, Phillip Susi wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 On 3/5/2014 3:10 AM, Chris Packham wrote:
  My example is creating a commit on the temp branch then applying
  it to the master branch using git am.
  
  Do a reset HEAD~1 --hard, and git clean -x -f -d before git am.
  I didn't notice the missing file myself for some time because it
  is left in the working tree, just not added to the index and
  included in the commit.
  
 
 Right... so the file is left in the directory, even though it is not
 checked in.  A git status should show it is an unknown file, and a
 clean should remove it.

I don't think those steps are necessary for Chris's example. When he
switches back to the master branch, git removes the subdirectory (the
file is tracked in temp but not master, so we remove it when
switching branches, and then the directory is empty, so we clean it up,
too). You can verify with an extra ls after the checkout but before
the am.

  * git apply parsed patches that add new files, generated by
  programs other than Git, incorrectly.  This is an old breakage in
  v1.7.11.
  
  Does that sound like your problem? If you can I'd suggest
  updating, ideally to the recent 1.9.0 release but if you're feeling
  conservative try 1.8.3.4.
 
 Vaguely, except for the other than git part.  This patch was
 generated by git-format-patch ( I didn't think apply handled patches
 that weren't ).

I can't get Chris's script to fail on any version of git. Can you show
us an example of a patch that does not behave (or better yet, a
reproduction recipe to generate the patch with format-patch)?

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: New directory lost by git am

2014-03-05 Thread Jeff King
On Wed, Mar 05, 2014 at 11:47:12AM -0500, Phillip Susi wrote:

  I can't get Chris's script to fail on any version of git. Can you
  show us an example of a patch that does not behave (or better yet,
  a reproduction recipe to generate the patch with format-patch)?
 
 AHA!  It requires a conflict.  There were simple conflicts in the NEWS
 file so I applied the patch with git am --reject and fixed up the
 NEWS, and ran git am --resolved.  The git am --reject fails to add the
 new directory to the index.

Thanks, I can reproduce here. I do not think it has anything to do with
being in a subdirectory; any new file does not get added to the index.
In fact, I do not think we update the index at all with --reject. For
example, try this:

git init repo 
cd repo 

echo base conflict 
echo base modified 
git add . 
git commit -m base 

echo master conflict 
git add . 
git commit -m master 

git checkout -b other HEAD^ 
echo other conflict 
echo other modified 
echo other new 
git add . 
git commit -m other 

git checkout master 
git format-patch other -1 --stdout patch 
git am --reject patch

Running git status -s shows:

   M modified
   ?? conflict.rej
   ?? new
   ?? patch

We apply the changes to modified and new to the working tree, but we
do not stage anything in the index. I suspect this is because our
invocation of apply --index (which is what is doing the real work with
--reject here) bails before touching the index. In theory it should be
able to update the index for files that applied cleanly and leave the
other ones alone.

But I have not thought hard about it, so maybe there is a good reason
not to (it is a little weird just because the resulting index is a
partial application of the patch).  The am -3 path does what you want
here, but it is much simpler: it knows it can represent the 3-way
conflict in the index. So the index represents the complete state of the
patch application at the end, including conflicts.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] push: detect local refspec errors early

2014-03-05 Thread Jeff King
On Wed, Mar 05, 2014 at 01:36:12PM +0400, Dmitry wrote:

 Here's my usecase. I have created a BranchWithVeryLongName and I want
 to have it pushed to origin. So I use this command with version
 1.8.1.2:
 
 git push origin BranchMistypedLongName
 
 (note that I mistyped the branch name). The following happens:
 1. git asks me for username and password
 2. it authenticates on the origin server
 3. it bails out with error: sfc refspec BranchMistypedLongName does not 
 match any
 
 Can't git perhaps check that the branch exists before it asks me for
 credentials and just say there's no such branch?

We can't fully process the refspecs until we have talked to the other
side, because they may involve matching refs from the remote; I don't
think git even really looks at them until after we've connected.

But I think there are some obvious cases, like a bogus left-hand side
(i.e., what you have here) that cannot ever succeed, no matter what the
other side has. We could sanity check the refspecs before doing anything
else.

Here's a patch series that does that.

  [1/3]: match_explicit: hoist refspec lhs checks into their own function
  [2/3]: match_explicit_lhs: allow a verify only mode
  [3/3]: push: detect local refspec errors early

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] match_explicit: hoist refspec lhs checks into their own function

2014-03-05 Thread Jeff King
In preparation for being able to check the left-hand side of
our push refspecs separately, this pulls the examination of
them out into its own function. There should be no behavior
change.

Signed-off-by: Jeff King p...@peff.net
---
 remote.c | 49 ++---
 1 file changed, 30 insertions(+), 19 deletions(-)

diff --git a/remote.c b/remote.c
index e41251e..6aa9dd2 100644
--- a/remote.c
+++ b/remote.c
@@ -1096,12 +1096,36 @@ static char *guess_ref(const char *name, struct ref 
*peer)
return strbuf_detach(buf, NULL);
 }
 
+static int match_explicit_lhs(struct ref *src,
+ struct refspec *rs,
+ struct ref **match,
+ int *allocated_match)
+{
+   switch (count_refspec_match(rs-src, src, match)) {
+   case 1:
+   *allocated_match = 0;
+   return 0;
+   case 0:
+   /* The source could be in the get_sha1() format
+* not a reference name.  :refs/other is a
+* way to delete 'other' ref at the remote end.
+*/
+   *match = try_explicit_object_name(rs-src);
+   if (!*match)
+   return error(src refspec %s does not match any., 
rs-src);
+   *allocated_match = 1;
+   return 0;
+   default:
+   return error(src refspec %s matches more than one., rs-src);
+   }
+}
+
 static int match_explicit(struct ref *src, struct ref *dst,
  struct ref ***dst_tail,
  struct refspec *rs)
 {
struct ref *matched_src, *matched_dst;
-   int copy_src;
+   int allocated_src;
 
const char *dst_value = rs-dst;
char *dst_guess;
@@ -1110,23 +1134,8 @@ static int match_explicit(struct ref *src, struct ref 
*dst,
return 0;
 
matched_src = matched_dst = NULL;
-   switch (count_refspec_match(rs-src, src, matched_src)) {
-   case 1:
-   copy_src = 1;
-   break;
-   case 0:
-   /* The source could be in the get_sha1() format
-* not a reference name.  :refs/other is a
-* way to delete 'other' ref at the remote end.
-*/
-   matched_src = try_explicit_object_name(rs-src);
-   if (!matched_src)
-   return error(src refspec %s does not match any., 
rs-src);
-   copy_src = 0;
-   break;
-   default:
-   return error(src refspec %s matches more than one., rs-src);
-   }
+   if (match_explicit_lhs(src, rs, matched_src, allocated_src)  0)
+   return -1;
 
if (!dst_value) {
unsigned char sha1[20];
@@ -1171,7 +1180,9 @@ static int match_explicit(struct ref *src, struct ref 
*dst,
return error(dst ref %s receives from more than one src.,
  matched_dst-name);
else {
-   matched_dst-peer_ref = copy_src ? copy_ref(matched_src) : 
matched_src;
+   matched_dst-peer_ref = allocated_src ?
+   matched_src :
+   copy_ref(matched_src);
matched_dst-force = rs-force;
}
return 0;
-- 
1.8.5.2.500.g8060133

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] match_explicit_lhs: allow a verify only mode

2014-03-05 Thread Jeff King
The match_explicit_lhs function has all of the logic
necessary to verify the refspecs without actually doing any
work. This patch lets callers pass a NULL match pointer to
indicate they want a verify only operation.

For the most part, we just need to avoid writing to the NULL
pointer. However, we also have to refactor the
try_explicit_object_name sub-function; it indicates success by
allocating and returning a new ref. Instead, we give it an
out parameter for the match and return a numeric status.

Signed-off-by: Jeff King p...@peff.net
---
 remote.c | 38 --
 1 file changed, 24 insertions(+), 14 deletions(-)

diff --git a/remote.c b/remote.c
index 6aa9dd2..b6586c0 100644
--- a/remote.c
+++ b/remote.c
@@ -1031,11 +1031,13 @@ int count_refspec_match(const char *pattern,
}
}
if (!matched) {
-   *matched_ref = matched_weak;
+   if (matched_ref)
+   *matched_ref = matched_weak;
return weak_match;
}
else {
-   *matched_ref = matched;
+   if (matched_ref)
+   *matched_ref = matched;
return match;
}
 }
@@ -1055,18 +1057,25 @@ static struct ref *alloc_delete_ref(void)
return ref;
 }
 
-static struct ref *try_explicit_object_name(const char *name)
+static int try_explicit_object_name(const char *name,
+   struct ref **match)
 {
unsigned char sha1[20];
-   struct ref *ref;
 
-   if (!*name)
-   return alloc_delete_ref();
+   if (!*name) {
+   if (match)
+   *match = alloc_delete_ref();
+   return 0;
+   }
+
if (get_sha1(name, sha1))
-   return NULL;
-   ref = alloc_ref(name);
-   hashcpy(ref-new_sha1, sha1);
-   return ref;
+   return -1;
+
+   if (match) {
+   *match = alloc_ref(name);
+   hashcpy((*match)-new_sha1, sha1);
+   }
+   return 0;
 }
 
 static struct ref *make_linked_ref(const char *name, struct ref ***tail)
@@ -1103,17 +1112,18 @@ static int match_explicit_lhs(struct ref *src,
 {
switch (count_refspec_match(rs-src, src, match)) {
case 1:
-   *allocated_match = 0;
+   if (allocated_match)
+   *allocated_match = 0;
return 0;
case 0:
/* The source could be in the get_sha1() format
 * not a reference name.  :refs/other is a
 * way to delete 'other' ref at the remote end.
 */
-   *match = try_explicit_object_name(rs-src);
-   if (!*match)
+   if (try_explicit_object_name(rs-src, match)  0)
return error(src refspec %s does not match any., 
rs-src);
-   *allocated_match = 1;
+   if (allocated_match)
+   *allocated_match = 1;
return 0;
default:
return error(src refspec %s matches more than one., rs-src);
-- 
1.8.5.2.500.g8060133

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] push: detect local refspec errors early

2014-03-05 Thread Jeff King
When pushing, we do not even look at our push refspecs until
after we have made contact with the remote receive-pack and
gotten its list of refs. This means that we may go to some
work, including asking the user to log in, before realizing
we have simple errors like git push origin matser.

We cannot catch all refspec problems, since fully evaluating
the refspecs requires knowing what the remote side has. But
we can do a quick sanity check of the local side and catch a
few simple error cases.

Signed-off-by: Jeff King p...@peff.net
---
 remote.c   | 25 +
 remote.h   |  1 +
 t/t5529-push-errors.sh | 48 
 transport.c|  8 ++--
 4 files changed, 80 insertions(+), 2 deletions(-)
 create mode 100755 t/t5529-push-errors.sh

diff --git a/remote.c b/remote.c
index b6586c0..8471fb1 100644
--- a/remote.c
+++ b/remote.c
@@ -1374,6 +1374,31 @@ static void prepare_ref_index(struct string_list 
*ref_index, struct ref *ref)
 }
 
 /*
+ * Given only the set of local refs, sanity-check the set of push
+ * refspecs. We can't catch all errors that match_push_refs would,
+ * but we can catch some errors early before even talking to the
+ * remote side.
+ */
+int check_push_refs(struct ref *src, int nr_refspec, const char 
**refspec_names)
+{
+   struct refspec *refspec = parse_push_refspec(nr_refspec, refspec_names);
+   int ret = 0;
+   int i;
+
+   for (i = 0; i  nr_refspec; i++) {
+   struct refspec *rs = refspec + i;
+
+   if (rs-pattern || rs-matching)
+   continue;
+
+   ret |= match_explicit_lhs(src, rs, NULL, NULL);
+   }
+
+   free_refspec(nr_refspec, refspec);
+   return ret;
+}
+
+/*
  * Given the set of refs the local repository has, the set of refs the
  * remote repository has, and the refspec used for push, determine
  * what remote refs we will update and with what value by setting
diff --git a/remote.h b/remote.h
index fb7647f..917d383 100644
--- a/remote.h
+++ b/remote.h
@@ -166,6 +166,7 @@ extern int query_refspecs(struct refspec *specs, int nr, 
struct refspec *query);
 char *apply_refspecs(struct refspec *refspecs, int nr_refspec,
 const char *name);
 
+int check_push_refs(struct ref *src, int nr_refspec, const char **refspec);
 int match_push_refs(struct ref *src, struct ref **dst,
int nr_refspec, const char **refspec, int all);
 void set_ref_status_for_push(struct ref *remote_refs, int send_mirror,
diff --git a/t/t5529-push-errors.sh b/t/t5529-push-errors.sh
new file mode 100755
index 000..9871307
--- /dev/null
+++ b/t/t5529-push-errors.sh
@@ -0,0 +1,48 @@
+#!/bin/sh
+
+test_description='detect some push errors early (before contacting remote)'
+. ./test-lib.sh
+
+test_expect_success 'setup commits' '
+   test_commit one
+'
+
+test_expect_success 'setup remote' '
+   git init --bare remote.git 
+   git remote add origin remote.git
+'
+
+test_expect_success 'setup fake receive-pack' '
+   FAKE_RP_ROOT=$(pwd) 
+   export FAKE_RP_ROOT 
+   write_script fake-rp -\EOF 
+   echo yes $FAKE_RP_ROOT/rp-ran
+   exit 1
+   EOF
+   git config remote.origin.receivepack \\$FAKE_RP_ROOT/fake-rp\
+'
+
+test_expect_success 'detect missing branches early' '
+   echo no rp-ran 
+   echo no expect 
+   test_must_fail git push origin missing 
+   test_cmp expect rp-ran
+'
+
+test_expect_success 'detect missing sha1 expressions early' '
+   echo no rp-ran 
+   echo no expect 
+   test_must_fail git push origin master~2:master 
+   test_cmp expect rp-ran
+'
+
+test_expect_success 'detect ambiguous refs early' '
+   git branch foo 
+   git tag foo 
+   echo no rp-ran 
+   echo no expect 
+   test_must_fail git push origin foo 
+   test_cmp expect rp-ran
+'
+
+test_done
diff --git a/transport.c b/transport.c
index ca7bb44..325f03e 100644
--- a/transport.c
+++ b/transport.c
@@ -1132,8 +1132,7 @@ int transport_push(struct transport *transport,
 
return transport-push(transport, refspec_nr, refspec, flags);
} else if (transport-push_refs) {
-   struct ref *remote_refs =
-   transport-get_refs_list(transport, 1);
+   struct ref *remote_refs;
struct ref *local_refs = get_local_heads();
int match_flags = MATCH_REFS_NONE;
int verbose = (transport-verbose  0);
@@ -1142,6 +1141,11 @@ int transport_push(struct transport *transport,
int pretend = flags  TRANSPORT_PUSH_DRY_RUN;
int push_ret, ret, err;
 
+   if (check_push_refs(local_refs, refspec_nr, refspec)  0)
+   return -1;
+
+   remote_refs = transport-get_refs_list(transport, 1);
+
if (flags  TRANSPORT_PUSH_ALL)
match_flags

Re: Negation in refspecs

2014-03-05 Thread Jeff King
On Wed, Mar 05, 2014 at 10:06:26AM -0800, Mickey Killianey wrote:

 Is there any syntax to support partial negations of refspecs, such as:
 
 +refs/heads/*:refs/remotes/origin/*
 !refs/heads/dont-pull:
 !:refs/remotes/origin/dont-push
 
 If not now, is negation something that might be possible/reasonable in
 a future version of Git, or is it difficult/unlikely to change?

No, it doesn't exist now. It's something that people have asked for
occasionally, but I don't think anybody is actively working on it.

I posted a rough patch here:

  http://thread.gmane.org/gmane.comp.version-control.git/240997/focus=241057

about a month ago, but I do not have any immediate plans to take it
further (that email details some of the issues). If you're interested in
working on it, I think people are receptive to the idea; it just needs a
clean implementation.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] disable grafts during fetch/push/bundle

2014-03-05 Thread Jeff King
On Wed, Mar 05, 2014 at 11:18:17AM -0800, Junio C Hamano wrote:

 Jeff King p...@peff.net writes:
 
  On Wed, Mar 05, 2014 at 10:49:24AM -0800, Junio C Hamano wrote:
 
  ... the plan, at least in my mind, has always been exactly that: grafts
  were a nice little attempt but is broken---if you really wanted to
  muck with the history without rewriting (which is still discouraged,
  by the way), do not use graft, but use replace.
 
  I certainly had in the back of my mind that grafts were a lesser form of
  replace, and that eventually we could get rid of the former. Perhaps
  my question should have been: why haven't we deprecated grafts yet?.
 
 Given that we discourage grafts strongly and replace less so
 (but still discourage it), telling the users that biting the bullet
 and rewriting the history is _the_ permanent solution, I think it is
 understandable why nobody has bothered to.

Perhaps the patch below would help discourage grafts more?

The notable place in the documentation where grafts are still used is
git-filter-branch.txt.  But since the example there is about cementing
rewritten history, it might be OK to leave.

I used outdated below. We could also up the ante to deprecated.

-- 8 --
Subject: [PATCH] docs: mark info/grafts as outdated

We should be encouraging people to use git-replace instead.

Signed-off-by: Jeff King p...@peff.net
---
 Documentation/gitrepository-layout.txt | 4 
 Documentation/glossary-content.txt | 4 
 2 files changed, 8 insertions(+)

diff --git a/Documentation/gitrepository-layout.txt 
b/Documentation/gitrepository-layout.txt
index aa03882..17d2ea6 100644
--- a/Documentation/gitrepository-layout.txt
+++ b/Documentation/gitrepository-layout.txt
@@ -176,6 +176,10 @@ info/grafts::
per line describes a commit and its fake parents by
listing their 40-byte hexadecimal object names separated
by a space and terminated by a newline.
++
+Note that the grafts mechanism is outdated and can lead to problems
+transferring objects between repositories; see linkgit:git-replace[1]
+for a more flexible and robust system to do the same thing.
 
 info/exclude::
This file, by convention among Porcelains, stores the
diff --git a/Documentation/glossary-content.txt 
b/Documentation/glossary-content.txt
index 378306f..be0858c 100644
--- a/Documentation/glossary-content.txt
+++ b/Documentation/glossary-content.txt
@@ -176,6 +176,10 @@ current branch integrates with) obviously do not work, as 
there is no
you can make Git pretend the set of def_parent,parents a 
def_commit,commit has
is different from what was recorded when the commit was
created. Configured via the `.git/info/grafts` file.
++
+Note that the grafts mechanism is outdated and can lead to problems
+transferring objects between repositories; see linkgit:git-replace[1]
+for a more flexible and robust system to do the same thing.
 
 [[def_hash]]hash::
In Git's context, synonym for def_object_name,object name.
-- 
1.8.5.2.500.g8060133

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] push: detect local refspec errors early

2014-03-05 Thread Jeff King
On Wed, Mar 05, 2014 at 12:51:06PM -0800, Junio C Hamano wrote:

 Jeff King p...@peff.net writes:
 
  We can't fully process the refspecs until we have talked to the other
  side, because they may involve matching refs from the remote; I don't
  think git even really looks at them until after we've connected.
 
  But I think there are some obvious cases, like a bogus left-hand side
  (i.e., what you have here) that cannot ever succeed, no matter what the
  other side has. We could sanity check the refspecs before doing anything
  else.
 
 The user's wallclock time is more important than machine cycles,
 checking things we could check before having the user do things is a
 good principle to follow.
 
 I wish that the solution did not have to involve doing the same
 computation twice, but I do not think there is a clean way around
 that in this codepath.

Yeah, there are two inefficiencies here:

  1. We parse the refspecs twice. In theory we could parse them once,
 feed the result to check_push_refspecs, then again to
 match_push_refspecs. That wouldn't be too hard a refactor.

  2. We match the src side to local refs twice. Getting rid of this
 would involve splitting match_push_refs into two (analyzing the
 src half and the dst half), somehow storing the intermediate
 the two calls, and only contacting the remote after the first step
 is done. This is probably trickier.

I'd be happy if somebody wanted to do those cleanups on top, but I don't
personally have an interest in spending time on them. The amount of
duplicated computation we're talking about here is not very much (and
the number of refspecs tends to be small, anyway).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unable to shrink repository size

2014-03-06 Thread Jeff King
On Wed, Mar 05, 2014 at 08:55:30PM -0600, Robert Dailey wrote:

 What I'd like to do is somehow hunt down the largest commit (*not*
 blob) in the entire history of the repository to hopefully find out
 where huge directories have been checked in.
 
 I can't do a search for largest file (which most google results seem
 to show to do) since the culprit is really thousands of unnecessary
 files checked into a single subdirectory somewhere in history.

Other people have offered scripts to look at commit sizes. But it might
also be useful to see sizes by subdirectory. Sort of a du across all
of history. Script is below.

Note that this script also uses cat-file's %(objectsize:disk). So it
is finding the actual on-disk storage, taking into account delta
storage. You will need git v1.8.5 or later to use this feature.

  git rev-list --objects --all |
  git cat-file --batch-check=%(objectsize:disk) %(rest) |
  perl -lne '
my ($size, $path) = split / /, $_, 2;
next unless defined $path; # commit obj
do {
  $sizes{$path} += $size;
} while ($path =~ s{/[^/]+$}{});

END { print $sizes{$_} $_ for (keys %sizes) }
  ' |
  sort -rn

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] disable grafts during fetch/push/bundle

2014-03-06 Thread Jeff King
On Thu, Mar 06, 2014 at 09:42:46AM +0100, Michael Haggerty wrote:

 Replace objects are better than grafts in *almost* every dimension.  The
 exception is that it is dead simple to create grafts, whereas I always
 have to break open the man pages to remember how to create a replace
 object that does the same thing.
 
 So I think a helpful step towards deprecating grafts would be to offer a
 couple of convenience features to help people kick the grafts habit:

I agree that better tool support would make git replace more pleasant
to use.

 * A tool that converts grafts (i.e., the grafts read from
 $GIT_DIR/info/grafts) into the equivalent replacements.

I don't know if this is strictly necessary, if we make your command
below pleasant to use. I.e., it should just be:

  while read sha1 parents; do
git replace --graft $sha1 $parents
  done .git/info/grafts

We can wrap that in git replace --convert-grafts, but I do not think
grafts are so common that there would be a big demand for it.

 * A tool that creates a new replacement object that is the equivalent of
 a graft.  I.e., it should do, using replace references, the equivalent
 of the following command:
 
   echo SHA1 [PARENT1...] $GIT_DIR/info/grafts
 
 These features could be added to git replace or could be built into a
 new git grafts command.

I think it would be nice to have a set of mode options for
git-replace to do basic editing of a sha1 and install the result
(technically you could split the editing into a separate command, but I
do not see the point in editing a sha1 and then _not_ replacing it).

Perhaps:

  # pretty-print sha1 based on type, start $EDITOR, create a
  # type-appropriate object from the result (e.g., using hash-object,
  # mktree, or mktag), and then set up the object as a replacement for
  # SHA1
  git replace --edit SHA1

  # ditto, but replace the $EDITOR step with the parent list
  git replace --graft SHA1 PARENT1 PARENT2

  # ...or remove entries from a tree
  git replace --remove-entry SHA1 foo bar

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] disable grafts during fetch/push/bundle

2014-03-06 Thread Jeff King
On Thu, Mar 06, 2014 at 05:41:27PM +0100, Michael Haggerty wrote:

  We can wrap that in git replace --convert-grafts, but I do not think
  grafts are so common that there would be a big demand for it.
 
 It's probably easier to wrap it than to explain to Windows users what
 they have to do.

How would Windows users get a graft file in the first-place? There's no
GUI for it! ;)

It should be easy to do --convert-grafts, though, and I think it fits
into the scheme we're discussing below.

  I think it would be nice to have a set of mode options for
  git-replace to do basic editing of a sha1 and install the result
  (technically you could split the editing into a separate command, but I
  do not see the point in editing a sha1 and then _not_ replacing it).
 
 If modifying without replacing is needed, it would be pretty easy to add
 an option --stdout that writes the SHA1 of the modified object to stdout
 instead of creating a replace reference.  That way what you want 95% of
 the time is the default but there is still an escape hatch.

Agreed. I had originally though that perhaps something like this should
be part of hash-object, and that replace should farm out the work.
But thinking on it more, it doesn't really make sense as part of
hash-object.

  Perhaps:
  
# pretty-print sha1 based on type, start $EDITOR, create a
# type-appropriate object from the result (e.g., using hash-object,
# mktree, or mktag), and then set up the object as a replacement for
# SHA1
git replace --edit SHA1

Here's a rough series that gets us this far:

  [1/4]: replace: refactor command-mode determination
  [2/4]: replace: use OPT_CMDMODE to handle modes
  [3/4]: replace: factor object resolution out of replace_object
  [4/4]: replace: add --edit option

It shouldn't be too hard to do --graft or --convert-grafts on top.

I also noticed that doing:

git replace foo foo

is less than friendly (we notice the cycle, but just barf). It's
especially easy to do with git replace --edit, if you just exit the
editor without making changes.  Or if you make changes to an
already-replaced object to revert it back, in which case we would want
to notice and delete the replacement.

So I think we want to have git replace foo foo silently converted into
git replace -d foo (but without an error if there is no existing
replacement), and then --edit will just do the right thing, as it's
built on top.

I also noticed that the diff engine does not play well with replacements
of blobs. When we are diffing the trees, we see that the sha1 for path
foo is the same on either side, and do not look further, even though
feeding those sha1s to builtin_diff would fetch the replacements.  I
think compare_tree_entry would have to learn lookup_replace_object (and
I suspect it would make tree diffs noticeably slower when you have even
one replace ref).

 git replace could support some of the options that git filter-branch
 can take, like --env-filter, --msg-filter, etc. (at least if the target
 is a commit object).
 
 All of this would make it possible to build up the changes that you want
 to integrate via filter-branch piecemeal instead of having to have a
 single monster filter-branch invocation.  For example,

Right. I was tempted to suggest that, too, but I think it can get rather
tricky, as you need to replace in a loop, and sometimes the exact
objects you need aren't obvious.  For example, a common use of
--index-filter is to remove a single file. But to remove
foo/bar/baz, you would need to loop over each commit, find the tree
for foo/bar, and then remove the baz entry in 

Still, I really like the workflow of having decent replace tools,
followed by cementing the changes into place with a filter-branch
run (which, btw, does not yet know how to cement trees and blobs into
place). It lets you work on the filtering incrementally, and even share
or work collaboratively on it by pushing refs/replace).

And as you mention, it could be a heck of a lot faster than what we have
now.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC/PATCH 1/4] replace: refactor command-mode determination

2014-03-06 Thread Jeff King
The git-replace command has three modes: listing, deleting,
and replacing. The first two are selected explicitly. If
none is selected, we fallback to listing when there are no
arguments, and replacing otherwise.

Let's figure out up front which operation we are going to
do, before getting into the application logic. That lets us
simplify our option checks (e.g., we currently have to check
whether a useless --force is given both along with an
explicit list, as well as with an implicit one).

This saves some lines, makes the logic easier to follow, and
will facilitate further cleanups.

Signed-off-by: Jeff King p...@peff.net
---
 builtin/replace.c | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/builtin/replace.c b/builtin/replace.c
index 2336325..6a0e8bd 100644
--- a/builtin/replace.c
+++ b/builtin/replace.c
@@ -182,12 +182,16 @@ int cmd_replace(int argc, const char **argv, const char 
*prefix)
 
argc = parse_options(argc, argv, prefix, options, git_replace_usage, 0);
 
+   if (!list  !delete)
+   if (!argc)
+   list = 1;
+
if (list  delete)
usage_msg_opt(-l and -d cannot be used together,
  git_replace_usage, options);
 
-   if (format  delete)
-   usage_msg_opt(--format and -d cannot be used together,
+   if (format  !list)
+   usage_msg_opt(--format cannot be used when not listing,
  git_replace_usage, options);
 
if (force  (list || delete))
@@ -207,9 +211,6 @@ int cmd_replace(int argc, const char **argv, const char 
*prefix)
if (argc != 2)
usage_msg_opt(bad number of arguments,
  git_replace_usage, options);
-   if (format)
-   usage_msg_opt(--format cannot be used when not 
listing,
- git_replace_usage, options);
return replace_object(argv[0], argv[1], force);
}
 
@@ -217,9 +218,6 @@ int cmd_replace(int argc, const char **argv, const char 
*prefix)
if (argc  1)
usage_msg_opt(only one pattern can be given with -l,
  git_replace_usage, options);
-   if (force)
-   usage_msg_opt(-f needs some arguments,
- git_replace_usage, options);
 
return list_replace_refs(argv[0], format);
 }
-- 
1.8.5.2.500.g8060133

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC/PATCH 2/4] replace: use OPT_CMDMODE to handle modes

2014-03-06 Thread Jeff King
By using OPT_CMDMODE, the mutual exclusion between modes is
taken care of for us. It also makes it easy for us to
maintain a single variable with the mode, which makes its
intent more clear. We can use a single switch() to make sure
we have covered all of the modes.

This ends up breaking even in code size, but the win will be
much bigger when we start adding more modes.

Signed-off-by: Jeff King p...@peff.net
---
 builtin/replace.c | 49 +
 1 file changed, 25 insertions(+), 24 deletions(-)

diff --git a/builtin/replace.c b/builtin/replace.c
index 6a0e8bd..0b5cb17 100644
--- a/builtin/replace.c
+++ b/builtin/replace.c
@@ -168,11 +168,17 @@ static int replace_object(const char *object_ref, const 
char *replace_ref,
 
 int cmd_replace(int argc, const char **argv, const char *prefix)
 {
-   int list = 0, delete = 0, force = 0;
+   int force = 0;
const char *format = NULL;
+   enum {
+   MODE_UNSPECIFIED = 0,
+   MODE_LIST,
+   MODE_DELETE,
+   MODE_REPLACE
+   } cmdmode = MODE_UNSPECIFIED;
struct option options[] = {
-   OPT_BOOL('l', list, list, N_(list replace refs)),
-   OPT_BOOL('d', delete, delete, N_(delete replace refs)),
+   OPT_CMDMODE('l', list, cmdmode, N_(list replace refs), 
MODE_LIST),
+   OPT_CMDMODE('d', delete, cmdmode, N_(delete replace refs), 
MODE_DELETE),
OPT_BOOL('f', force, force, N_(replace the ref if it 
exists)),
OPT_STRING(0, format, format, N_(format), N_(use this 
format)),
OPT_END()
@@ -182,42 +188,37 @@ int cmd_replace(int argc, const char **argv, const char 
*prefix)
 
argc = parse_options(argc, argv, prefix, options, git_replace_usage, 0);
 
-   if (!list  !delete)
-   if (!argc)
-   list = 1;
+   if (!cmdmode)
+   cmdmode = argc ? MODE_REPLACE : MODE_DELETE;
 
-   if (list  delete)
-   usage_msg_opt(-l and -d cannot be used together,
- git_replace_usage, options);
-
-   if (format  !list)
+   if (format  cmdmode != MODE_LIST)
usage_msg_opt(--format cannot be used when not listing,
  git_replace_usage, options);
 
-   if (force  (list || delete))
-   usage_msg_opt(-f cannot be used with -d or -l,
+   if (force  cmdmode != MODE_REPLACE)
+   usage_msg_opt(-f only makes sense when writing a replacement,
  git_replace_usage, options);
 
-   /* Delete refs */
-   if (delete) {
+   switch (cmdmode) {
+   case MODE_DELETE:
if (argc  1)
usage_msg_opt(-d needs at least one argument,
  git_replace_usage, options);
return for_each_replace_name(argv, delete_replace_ref);
-   }
 
-   /* Replace object */
-   if (!list  argc) {
+   case MODE_REPLACE:
if (argc != 2)
usage_msg_opt(bad number of arguments,
  git_replace_usage, options);
return replace_object(argv[0], argv[1], force);
-   }
 
-   /* List refs, even if list is not set */
-   if (argc  1)
-   usage_msg_opt(only one pattern can be given with -l,
- git_replace_usage, options);
+   case MODE_LIST:
+   if (argc  1)
+   usage_msg_opt(only one pattern can be given with -l,
+ git_replace_usage, options);
+   return list_replace_refs(argv[0], format);
 
-   return list_replace_refs(argv[0], format);
+   default:
+   die(BUG: invalid cmdmode %d, (int)cmdmode);
+   }
 }
-- 
1.8.5.2.500.g8060133

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC/PATCH 3/4] replace: factor object resolution out of replace_object

2014-03-06 Thread Jeff King
As we add new options that operate on objects before
replacing them, we'll want to be able to feed raw sha1s
straight into replace_object. Split replace_object into the
object-resolution part and the actual replacement.

Signed-off-by: Jeff King p...@peff.net
---
 builtin/replace.c | 26 ++
 1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/builtin/replace.c b/builtin/replace.c
index 0b5cb17..a090302 100644
--- a/builtin/replace.c
+++ b/builtin/replace.c
@@ -123,19 +123,17 @@ static int delete_replace_ref(const char *name, const 
char *ref,
return 0;
 }
 
-static int replace_object(const char *object_ref, const char *replace_ref,
- int force)
+static int replace_object_sha1(const char *object_ref,
+  unsigned char object[20],
+  const char *replace_ref,
+  unsigned char repl[20],
+  int force)
 {
-   unsigned char object[20], prev[20], repl[20];
+   unsigned char prev[20];
enum object_type obj_type, repl_type;
char ref[PATH_MAX];
struct ref_lock *lock;
 
-   if (get_sha1(object_ref, object))
-   die(Failed to resolve '%s' as a valid ref., object_ref);
-   if (get_sha1(replace_ref, repl))
-   die(Failed to resolve '%s' as a valid ref., replace_ref);
-
if (snprintf(ref, sizeof(ref),
 refs/replace/%s,
 sha1_to_hex(object))  sizeof(ref) - 1)
@@ -166,6 +164,18 @@ static int replace_object(const char *object_ref, const 
char *replace_ref,
return 0;
 }
 
+static int replace_object(const char *object_ref, const char *replace_ref, int 
force)
+{
+   unsigned char object[20], repl[20];
+
+   if (get_sha1(object_ref, object))
+   die(Failed to resolve '%s' as a valid ref., object_ref);
+   if (get_sha1(replace_ref, repl))
+   die(Failed to resolve '%s' as a valid ref., replace_ref);
+
+   return replace_object_sha1(object_ref, object, replace_ref, repl, 
force);
+}
+
 int cmd_replace(int argc, const char **argv, const char *prefix)
 {
int force = 0;
-- 
1.8.5.2.500.g8060133

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC/PATCH 4/4] replace: add --edit option

2014-03-06 Thread Jeff King
This allows you to run:

git replace --edit SHA1

to get dumped in an editor with the contents of the object
for SHA1. The result is then read back in and used as a
replace object for SHA1. The writing/reading is
type-aware, so you get to edit ls-tree output rather than
the binary tree format.

Missing documentation and tests.

Signed-off-by: Jeff King p...@peff.net
---
Besides missing docs and tests, we might find that we want to factor the
code a little differently when we start adding other helpers (like
--graft). I will probably push this forward at some point, but I'm not
planning on working on it for the rest of the day, so if you want to
pick it up as a base in the meantime and try --graft, --env-filter,
or anything else clever on top, please go ahead.

 builtin/replace.c | 110 +-
 1 file changed, 109 insertions(+), 1 deletion(-)

diff --git a/builtin/replace.c b/builtin/replace.c
index a090302..3ed5f75 100644
--- a/builtin/replace.c
+++ b/builtin/replace.c
@@ -12,6 +12,7 @@
 #include builtin.h
 #include refs.h
 #include parse-options.h
+#include run-command.h
 
 static const char * const git_replace_usage[] = {
N_(git replace [-f] object replacement),
@@ -176,6 +177,105 @@ static int replace_object(const char *object_ref, const 
char *replace_ref, int f
return replace_object_sha1(object_ref, object, replace_ref, repl, 
force);
 }
 
+/*
+ * Write the contents of the object named by sha1 to the file filename,
+ * pretty-printed for human editing based on its type.
+ */
+static void export_object(const unsigned char *sha1, const char *filename)
+{
+   const char *argv[] = { cat-file, -p, NULL, NULL };
+   struct child_process cmd = { argv };
+   int fd;
+
+   fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC, 0666);
+   if (fd  0)
+   die_errno(unable to open %s for writing, filename);
+
+   argv[2] = sha1_to_hex(sha1);
+   cmd.git_cmd = 1;
+   cmd.out = fd;
+
+   if (run_command(cmd))
+   die(cat-file reported failure);
+
+   close(fd);
+}
+
+/*
+ * Read a previously-exported (and possibly edited) object back from 
filename,
+ * interpreting it as type, and writing the result to the object database.
+ * The sha1 of the written object is returned via sha1.
+ */
+static void import_object(unsigned char *sha1, enum object_type type,
+ const char *filename)
+{
+   int fd;
+
+   fd = open(filename, O_RDONLY);
+   if (fd  0)
+   die_errno(unable to open %s for reading, filename);
+
+   if (type == OBJ_TREE) {
+   const char *argv[] = { mktree, NULL };
+   struct child_process cmd = { argv };
+   struct strbuf result = STRBUF_INIT;
+
+   cmd.argv = argv;
+   cmd.git_cmd = 1;
+   cmd.in = fd;
+   cmd.out = -1;
+
+   if (start_command(cmd))
+   die(unable to spawn mktree);
+
+   if (strbuf_read(result, cmd.out, 41)  0)
+   die_errno(unable to read from mktree);
+   close(cmd.out);
+
+   if (finish_command(cmd))
+   die(mktree reported failure);
+   if (get_sha1_hex(result.buf, sha1)  0)
+   die(mktree did not return an object name);
+   } else {
+   struct stat st;
+   int flags = HASH_FORMAT_CHECK | HASH_WRITE_OBJECT;
+
+   if (fstat(fd, st)  0)
+   die_errno(unable to fstat %s, filename);
+   if (index_fd(sha1, fd, st, type, NULL, flags)  0)
+   die(unable to write object to database);
+   /* index_fd close()s fd for us */
+   }
+
+   /*
+* No need to close(fd) here; both run-command and index-fd
+* will have done it for us.
+*/
+}
+
+static int edit_and_replace(const char *object_ref, int force)
+{
+   char *tmpfile = git_pathdup(REPLACE_EDITOBJ);
+   enum object_type type;
+   unsigned char old[20], new[20];
+
+   if (get_sha1(object_ref, old)  0)
+   die(Not a valid object name: '%s', object_ref);
+
+   type = sha1_object_info(old, NULL);
+   if (type  0)
+   die(unable to get object type for %s, sha1_to_hex(old));
+
+   export_object(old, tmpfile);
+   if (launch_editor(tmpfile, NULL, NULL)  0)
+   die(editing object file failed);
+   import_object(new, type, tmpfile);
+
+   free(tmpfile);
+
+   return replace_object_sha1(object_ref, old, replacement, new, force);
+}
+
 int cmd_replace(int argc, const char **argv, const char *prefix)
 {
int force = 0;
@@ -184,11 +284,13 @@ int cmd_replace(int argc, const char **argv, const char 
*prefix)
MODE_UNSPECIFIED = 0,
MODE_LIST,
MODE_DELETE,
+   MODE_EDIT

Re: [RFC/PATCH 2/4] replace: use OPT_CMDMODE to handle modes

2014-03-06 Thread Jeff King
On Thu, Mar 06, 2014 at 07:35:19PM +0100, Christian Couder wrote:

  +   if (!cmdmode)
  +   cmdmode = argc ? MODE_REPLACE : MODE_DELETE;
 
 Shouldn't it be MODE_LIST instead of MODE_DELETE?

Argh, yes, thank you for catching. My original iteration used chars like
'd' and 'l' (similar to other uses of OPT_CMDMODE). I switched it at the
last minute to an enum, and somehow thinko'd that completely (and
obviously did not run the tests again afterwards).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] disable grafts during fetch/push/bundle

2014-03-06 Thread Jeff King
On Thu, Mar 06, 2014 at 11:00:08AM -0800, Junio C Hamano wrote:

 Jeff King p...@peff.net writes:
 
  I also noticed that the diff engine does not play well with replacements
  of blobs. When we are diffing the trees, we see that the sha1 for path
  foo is the same on either side, and do not look further, even though
  feeding those sha1s to builtin_diff would fetch the replacements.
 
 Sorry, I do not quite understand.
 
 In git diff A B -- path, if the object name recorded for A:path
 and B:path are the same, but the replacement mechanism maps the
 object name for that blob object to some other blob object, wouldn't
 the result be empty because both sides replace to the same thing
 anyway?

Oh, right, I was being stupid. I did:

  git replace --edit HEAD:some-file

and expected git show to find the diff. But that doesn't make sense.
On top of that, I need to do:

  git replace --edit HEAD^{tree}

to replace the sha1 of the entry in the tree. In which case diff would
find it just fine.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC GSoC idea: git configuration caching (needs co-mentor!)

2014-03-06 Thread Jeff King
On Thu, Mar 06, 2014 at 11:24:18AM -0800, Junio C Hamano wrote:

  * Add new API calls that allow the cache to be inquired easily and
efficiently.  Rewrite other functions like `git_config_int()` to be
cache-aware.
 
 Are you sure about the second sentence of this item is what you
 want?
 
 git_config_type(name, value) are all about parsing value (string
 or NULL) as type, return the parsed value or complain against a
 bad value for name.  They do not care where these name and
 value come from right now, and there is no reason for them to
 start caring about caching.  They will still be the underlying
 helper functions the git_config() callbacks will depend on even
 after the second item in your list happens.

Yeah, I agree we want a _new_ set of helpers for retrieving values in a
non-callback way. We could call those helpers git_config_int (and
rename the existing pure functions), but I'd rather not, as it simply
invites confusion with the existing ones.

 A set of new API calls would look more like this, I would think:
 
   extern int git_get_config_string_multi(const char *, int *, const char 
 ***);

Not important at this stage, but I was hoping we could keep the names of
the new helpers shorter. :)

   const char *git_get_config_string(const char *name)
 {
   const char **values, *result;
 int num_values;
 
   if (git_get_config_string_multi(sample.str, num_values, 
 values))
   return NULL;
 result = num_values ? values[num_values - 1] : NULL;
 free(values);
   return result;
   }
 
 that implements the last one wins semantics.  The real thing would
 need to avoid allocation and free overhead.

One of the things that needs to be figured out by the student is the
format of the internal cache. I had actually envisioned a mapping of
keys to values, where values are represented as a full list of strings.
Then your string_multi can just return a pointer to that list, and a
last-one-wins lookup can grab the final value, with no allocation or
ownership complexity. We'd lose the relative order of different config
keys, but those should never be important (only the order of single
keys, but that is reflected in the order of the value list).

Another approach would be to actually represent the syntax tree of the
config file in memory. That would make lookups of individual keys more
expensive, but would enable other manipulation. E.g., if your syntax
tree included nodes for comments and other non-semantic constructs, then
we can use it for a complete rewrite. And git config becomes:

  1. Read the tree.

  2. Perform operations on the tree (add nodes, delete nodes, etc).

  3. Write out the tree.

and things like remove the section header when the last item in the
section is removed become trivial during step 2.

But comparing those approaches is something for the student to figure
out, I think.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Testing for commit reachability through plumbing commands

2014-03-06 Thread Jeff King
On Thu, Mar 06, 2014 at 12:17:34PM -0500, Martin Langhoff wrote:

 I have a shell script that trims old history on a cronjob. This is for
 a repo that is used to track reports that have limited life (like
 logs). Old history is trimmed with grafts pointing to an empty root
 commit.
 
 Right now, info/graft grows unbound. I am looking for a way to trim
 unreachable grafts, I would like to be able to say something like:
 
  git is-reachable treeish

How about:

git rev-list --objects --all |
cut -d' ' -f1 |
grep $(git rev-parse treeish)

Add --reflog to the rev-list invocation if you want to catch things
referenced by the reflog, too.

If you're looking for a commit, you can drop the --objects, and it
will run much faster.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] t3200-branch: test setting branch as own upstream

2014-03-06 Thread Jeff King
On Wed, Mar 05, 2014 at 04:31:55PM +0900, Brian Gesiak wrote:

 No test asserts that git branch -u refs/heads/my-branch my-branch
 emits a warning. Add a test that does so.
 
 Signed-off-by: Brian Gesiak modoca...@gmail.com

Thanks, this looks good. Two minor points that may or may not be worth
addressing:

 +test_expect_success '--set-upstream-to shows warning if used to set branch 
 as own upstream' '
 + git branch --set-upstream-to refs/heads/my13 my13 2actual 
 + cat expected EOF 
 +warning: Not setting branch my13 as its own upstream.
 +EOF

If you spell the EOF marker as:

cat expect -\EOF

then:

  1. The shell does not interpolate the contents (it does not matter
 here, but it is a good habit to be in, so we typically do it unless
 there is a need to interpolate).

  2. Using - will strip leading tabs, so the content can be indented
 properly along with the rest of the test.

 + test_i18ncmp expected actual 
 + test_must_fail git config branch.my13.remote 
 + test_must_fail git config branch.my13.merge

I think we could tighten these to:

  test_expect_code 1 git config branch.my13.remote

to eliminate a false-positive success on other config errors. It's
highly improbable for it to ever matter, though (and it looks like we
are not so careful in most other places that call git config looking
for a missing entry, either).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Halt during fetch on MacOS

2014-03-06 Thread Jeff King
On Thu, Mar 06, 2014 at 10:24:49AM -0800, Junio C Hamano wrote:

  OK, I've tried using my own build from master, and I still get the same 
  results.
 
  I've done a little more investigation and discovered it always hangs at:
  `atexit(notify_parent);` in `run-command.c:start_command`
  when running:
  trace: run_command: 'git-remote-https' 'aosp'
  'https://android.googlesource.com/platform/external/tinyxml2'
 
  Could this have to do with the atexit implementation?  (eg. limit on
  the number of functions that can be registered, etc)
 
 Thanks.
 
 An interesting theory indeed.  I read that an implementation is
 supposed to take at least ATEXIT_MAX (32) calls to atexit(3); while
 I do think we register functions with atexit(3) from multiple places
 in our code, I doubt we would be making that many.

It seems awfully weird that it would _hang_ in such a case, though. That
sounds more like hitting a mutex that's internal to atexit(), or
something similar.

Conley, can you see if dropping that atexit clears up the problem (you
should be OK without it; git will just fail to notice the child's
exec failure with as much detail).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH] diff: simplify cpp funcname regex

2014-03-06 Thread Jeff King
On Wed, Mar 05, 2014 at 08:58:26AM +0100, Johannes Sixt wrote:

 Here is a patch that I'm carrying around since... a while.
 What do you think?
 
 The pattern I chose also catches variable definition, not just
 functions. That is what I need, but it hurts grep --function-context
 That's the reason I didn't forward the patch, yet.

If by variable definition you mean:

   struct foo bar = {
  -   old
  +   new
   };

I'd think that would be covered by the existing struct|class|enum.
Though I think we'd want to also allow keywords in front of it, like
static. I suspect the original was more meant to find:

   struct foo {
  -old
  +new
   };

 The parts of the pattern have the following flaws:
 
 - The first part matches an identifier followed immediately by a colon and
   arbitrary text and is intended to reject goto labels and C++ access
   specifiers (public, private, protected). But this pattern also rejects
   C++ constructs, which look like this:
 
 MyClass::MyClass()
 MyClass::~MyClass()
 MyClass::Item MyClass::Find(...

Makes sense. I noticed your fix is to look for end-of-line or comments
afterwards.  Would it be simpler to just check for a non-colon, like:

  !^[ \t]*[A-Za-z_][A-Za-z_0-9]*:($|[^:])

 - The second part matches an identifier followed by a list of qualified
   names (i.e. identifiers separated by the C++ scope operator '::')
 [...]

A tried to keep the looks like a function definition bit in mine, and
yours loosens this quite a bit more. I think that may be OK. That is, I
do not think there is any reason for somebody to do:

void foo() {
call_to_bar();
   -old
   +new
}

That is, nobody would put a function _call_ without indentation. If
something has alphanumerics at the left-most column, then it is probably
interesting no matter what.

 - The third part of the pattern finally matches compound definitions. But
   it forgets about unions and namespaces, and also skips single-line
   definitions
 
 struct random_iterator_tag {};
 
   because no semicolon can occur on the line.

I don't see how that is an interesting line. The point is to find a
block that is surrounding the changes, but that is not surrounding
the lines below.

 Notice that all interesting anchor points begin with an identifier or
 keyword. But since there is a large variety of syntactical constructs after
 the first word, the simplest is to require only this word and accept
 everything else. Therefore, this boils down to a line that begins with a
 letter or underscore (optionally preceded by the C++ scope operator '::'
 to accept functions returning a type anchored at the global namespace).
 Replace the second and third part by a single pattern that picks such a
 line.

Yeah, this bit makes sense to me.

Both yours and mine will find the first line here in things like:

   void foo(void);
  -void bar(void);
  +void bar(int arg);

but I think that is OK. There _isn't_ any interesting surrounding
context here. The current code will sometimes come up with an empty
funcline (which is good), but it may just as easily come up with a
totally bogus funcline in a case like:

   void unrelated(void)
   {
   }

   void foo(void);
  -void bar(void);
  +void bar(int arg);

So trying to be very restrictive and say that doesn't look like a
function does not really buy us anything (and it creates tons of false
negatives, as you documented, because C++ syntax has all kinds of crazy
stuff).

_If_ the backwards search learned to terminate (e.g., seeing the ^}
line and saying well, we can't be inside a function), then such
negative lines might be useful for coming up with an empty funcname
rather than the bogus void foo(void);. But we do not do that
currently, and I do not think it is that useful (the funcname above is
arguably just as or more useful than an empty one).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] show_ident_date: fix always-false conditional

2014-03-07 Thread Jeff King
On Thu, Mar 06, 2014 at 08:35:24PM -0500, Eric Sunshine wrote:

 1dca155fe3fa (log: handle integer overflow in timestamps, 2014-02-24)
 assigns the result of strtol() to an 'int' and then checks it against
 LONG_MIN and LONG_MAX, indicating underflow or overflow, even though
 'int' may not be large enough to represent those values.
 
 On Mac, the compiler complains:
 
 warning: comparison of constant 9223372036854775807 with
   expression of type 'int' is always false
   [-Wtautological-constant-out-of-range-compare]
   if (tz == LONG_MAX || tz == LONG_MIN)
 
 Similarly for the LONG_MIN case. Fix this.

Yeah, this is definitely a potential bug. When I added the overflow
check, I blindly assumed that the existing code was at least using a
sufficiently large type to store the result of strtol, but it's not.

I don't think your fix catches all overflow, though:

 + else if (ident-tz_begin  ident-tz_end) {
 + errno = 0;
 + tz = strtol(ident-tz_begin, NULL, 10);
 + if (errno)

Errno will trigger if we overflowed a long, but then we assign the
result into an int, possibly truncating the result.

 Alternately, the result of strtol() could be assigned temporarily to a
 'long', compared against LONG_MIN and LONG_MAX, and then assigned to the
 'int' tz variable.

That catches overflow from strtol, but we'd then truncate when we pass
it as an int to show_date.

I think we want this instead:

-- 8 --
Subject: show_ident_date: fix tz range check

Commit 1dca155fe3fa (log: handle integer overflow in
timestamps, 2014-02-24) tried to catch integer overflow
coming from strtol() on the timezone field by comparing against
LONG_MIN/LONG_MAX. However, the intermediate tz variable
is an int, which means it can never be LONG_MAX on LP64
systems; we would truncate the output from strtol before the
comparison.

Clang's -Wtautological-constant-out-of-range-compare notices
this and rightly complains.

Let's instead store the result of strtol in a long, and then
compare it against INT_MIN/INT_MAX. This will catch overflow
from strtol, and also overflow when we pass the result as an
int to show_date.

Reported-by: Eric Sunshine sunsh...@sunshineco.com
Signed-off-by: Jeff King p...@peff.net
---
 pretty.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/pretty.c b/pretty.c
index 3b811ed..6e266dd 100644
--- a/pretty.c
+++ b/pretty.c
@@ -397,7 +397,7 @@ static const char *show_ident_date(const struct ident_split 
*ident,
   enum date_mode mode)
 {
unsigned long date = 0;
-   int tz = 0;
+   long tz = 0;
 
if (ident-date_begin  ident-date_end)
date = strtoul(ident-date_begin, NULL, 10);
@@ -406,7 +406,7 @@ static const char *show_ident_date(const struct ident_split 
*ident,
else {
if (ident-tz_begin  ident-tz_end)
tz = strtol(ident-tz_begin, NULL, 10);
-   if (tz == LONG_MAX || tz == LONG_MIN)
+   if (tz = INT_MAX || tz = INT_MIN)
tz = 0;
}
return show_date(date, tz, mode);
-- 
1.8.5.2.500.g8060133

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 4/4] replace: add --edit option

2014-03-07 Thread Jeff King
On Thu, Mar 06, 2014 at 08:57:58PM -0500, Eric Sunshine wrote:

  +   if (strbuf_read(result, cmd.out, 41)  0)
  +   die_errno(unable to read from mktree);
  +   close(cmd.out);
  +
  +   if (finish_command(cmd))
  +   die(mktree reported failure);
  +   if (get_sha1_hex(result.buf, sha1)  0)
  +   die(mktree did not return an object name);
 
 strbuf_release(result);

Thanks for catching. I'll include it in any re-roll.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] disable grafts during fetch/push/bundle

2014-03-07 Thread Jeff King
On Fri, Mar 07, 2014 at 08:08:37AM +0100, Christian Couder wrote:

  Be it graft or replace, I do not think we want to invite people to
  use these mechansims too lightly to locally rewrite their history
  willy-nilly without fixing their mistakes at the object layer with
  commit --amend, rebase, bfg, etc. in the longer term.  So in
  that sense, adding a command to make it easy is not something I am
  enthusiastic about.
 
  On the other hand, if the user does need to use graft or replace
  (perhaps to prepare for casting the fixed history in stone with
  filter-branch), it would be good to help them avoid making mistakes
  while doing so and tool support may be a way to do so.
 
  So, ... I am of two minds.
 
 Maybe if we add a new command (or maybe a script) with a name long and
 cryptic-looking enough like git create-replacement-object it will
 scare casual users from touching it, while power users will be happy
 to benefit from it.

I do not think the features we are talking about are significantly more
dangerous than git replace is in the first place. If we want to make
people aware of the dangers, perhaps git-replace.1 is the right place to
do it.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Trust issues with hooks and config files

2014-03-07 Thread Jeff King
On Thu, Mar 06, 2014 at 10:47:43PM +0100, Julian Brost wrote:

 I've noticed some behavior of git that might lead to some security
 issues if the user is not aware of this.
 
 Assume we have an evil user on a system, let's call him eve. He
 prepares a repository where he allows other user to push changes to.
 If he now adds a post-receive hook, git will happly execute it as
 whatever user pushes to this repository:

Yes, this is a well-known issue. The only safe operation on a repository
for which somebody else controls hooks and config is to fetch from it
(upload-pack on the remote repository does not respect any dangerous
config or hooks).

 Something similiar might happen if eve adds some alias to the config
 file in the repository and grants any other user read access to the
 repository. These aliases will be executed when some other user is
 running any git command in this repository. Even though git does not
 allow defining aliases for existing commands, you might mistype
 something, so adding an alias for lg instead of log might succeed:

Much easier is to define an external diff or textconv tool; then the
victim does not even have to typo.

 I'd suggest taking a similar approach as Mercurial [1], i.e. ignoring
 configuration files and hooks owned by another user unless the owner
 is explicitly trusted

It has been discussed, but nobody has produced patches. I think that
nobody is really interested in doing so because:

  1. It introduces complications into previously-working setups (e.g., a
 daemon running as nobody serving repos owned by a git user
 needs to mark git as trusted).

  2. In most cases, cross-server boundaries provide sufficient
 insulation (e.g., I might not push to an evil person's repo, but rather
 to a shared repo whose hooks and config are controlled by root on
 the remote server).

If you want to work on it, I think it's an interesting area. But any
development would need to think about the transition plan for existing
sites that will be broken.

At the very least, the current trust model could stand to be documented
much better (I do not think the rule of fetching is safe, everything
else is not is mentioned anywhere explicitly).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/WIP] Pluggable reference backends

2014-03-10 Thread Jeff King
On Mon, Mar 10, 2014 at 07:30:45AM -0700, Shawn Pearce wrote:

  * Store references in a SQLite database, to get correct transaction
handling.
 
 No to SQLLite in git-core. Using it from JGit requires building
 SQLLite and a JNI wrapper, which makes JGit significantly less
 portable. I know SQLLite is pretty amazing, but implementing
 compatibility with it from JGit will be a big nightmare for us.

That seems like a poor reason not to implement a pluggable feature for
git-core. If we implement it, then a site using only git-core can take
advantage of it. Sites with JGit cannot, and would use a different
pluggable storage mechanism that's supported by both. But if we don't
implement, it hurts people using only git-core, and it does not help
sites using JGit at all.

That's assuming that attention spent on implementing the feature does
not take away from implementing some other parallel scheme that does the
same thing but does not use SQLite. I don't know what that would be
offhand; mapping the ref and reflog into a relational database is pretty
simple, and we get a lot of robustness and efficiency benefits for free.
We could perhaps have some kind of relational backend could use an
ODBC-like abstraction to point to a database. I have no idea if people
would want to ever store refs in a real server-backend RDBMS, but I
suspect Java has native support for such things.

Certainly I think we should aim for compatibility where we can, but if
there's not a compatible way to do something, I don't think the
limitations of one platform should drag other ones down. And that goes
both ways; we had to reimplement disk-compatible EWAH from scratch in C
for git-core to have bitmaps, whereas JGit just got to use a ready-made
library. I don't think that was a bad thing.  People in
mixed-implementation environments couldn't use it, but people with
JGit-only environments were free to take advantage of it.

At any rate, the repository needs to advertise this is the ref storage
mechanism I use in the config. We're going to need to bump
core.repositoryformatversion for such cases (because an old version of
git should not blindly lock and write to a refs/ directory that nobody
else is ever going to look at). And I'd suggest with that bump adding in
something like core.refstorage, so that an implementation can say
foobar ref storage? Never heard of it and barf. Whether it's because
that implementation doesn't support foobar, because it's an old
version that doesn't understand foobar yet, or because it was simply
built without foobar support.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] clean: respect pathspecs with -d

2014-03-10 Thread Jeff King
git-clean uses read_directory to fill in a `struct dir` with
potential hits. However, read_directory does not actually
check against our pathspec. It uses a simplified version
that may turn up false positives. As a result, we need to
check that any hits match our pathspec. We do so reliably
for non-directories. For directories, if -d is not given
we check that the pathspec matched exactly (i.e., we are
even stricter, and require an explicit git clean foo to
clean foo/). But if -d is given, rather than relaxing
the exact match to allow a recursive match, we do not check
the pathspec at all.

This regression was introduced in 113f10f (Make git-clean a
builtin, 2007-11-11).

Signed-off-by: Jeff King p...@peff.net
---
On Mon, Mar 10, 2014 at 11:31:37AM +0100, Robin Pedersen wrote:

 I accidentially deleted a directory using git clean. I would think
 this is a bug, but I'm not sure. Was using 1.8.1, but upgraded to
 1.9.0 just to see if it was still reproducable, and it was.

Definitely a bug, and it dates back quite a while.  Thanks for a very
clear bug report.

-- 8 --
 builtin/clean.c  | 5 +++--
 t/t7300-clean.sh | 8 
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/builtin/clean.c b/builtin/clean.c
index 114d7bf..31c1488 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -947,14 +947,15 @@ int cmd_clean(int argc, const char **argv, const char 
*prefix)
if (pathspec.nr)
matches = dir_path_match(ent, pathspec, 0, NULL);
 
+   if (pathspec.nr  !matches)
+   continue;
+
if (S_ISDIR(st.st_mode)) {
if (remove_directories || (matches == MATCHED_EXACTLY)) 
{
rel = relative_path(ent-name, prefix, buf);
string_list_append(del_list, rel);
}
} else {
-   if (pathspec.nr  !matches)
-   continue;
rel = relative_path(ent-name, prefix, buf);
string_list_append(del_list, rel);
}
diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index 710be90..0c602de 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -511,4 +511,12 @@ test_expect_success SANITY 'git clean -d with an 
unreadable empty directory' '
! test -d foo
 '
 
+test_expect_success 'git clean -d respects pathspecs' '
+   mkdir foo 
+   mkdir foobar 
+   git clean -df foobar 
+   test_path_is_dir foo 
+   test_path_is_missing foobar
+'
+
 test_done
-- 
1.9.0.403.g7a2f4b0

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] clean: respect pathspecs with -d

2014-03-10 Thread Jeff King
On Mon, Mar 10, 2014 at 01:20:02PM -0400, Jeff King wrote:

 On Mon, Mar 10, 2014 at 11:31:37AM +0100, Robin Pedersen wrote:
 
  I accidentially deleted a directory using git clean. I would think
  this is a bug, but I'm not sure. Was using 1.8.1, but upgraded to
  1.9.0 just to see if it was still reproducable, and it was.
 
 Definitely a bug, and it dates back quite a while.  Thanks for a very
 clear bug report.
 
 -- 8 --

Whoops, accidentally included a scissors line here that will break
people using git am --scissors to pick up the patch. Here it is
correctly formatted.

-- 8 --
Subject: clean: respect pathspecs with -d

git-clean uses read_directory to fill in a `struct dir` with
potential hits. However, read_directory does not actually
check against our pathspec. It uses a simplified version
that may turn up false positives. As a result, we need to
check that any hits match our pathspec. We do so reliably
for non-directories. For directories, if -d is not given
we check that the pathspec matched exactly (i.e., we are
even stricter, and require an explicit git clean foo to
clean foo/). But if -d is given, rather than relaxing
the exact match to allow a recursive match, we do not check
the pathspec at all.

This regression was introduced in 113f10f (Make git-clean a
builtin, 2007-11-11).

Signed-off-by: Jeff King p...@peff.net
---
 builtin/clean.c  | 5 +++--
 t/t7300-clean.sh | 8 
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/builtin/clean.c b/builtin/clean.c
index 114d7bf..31c1488 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -947,14 +947,15 @@ int cmd_clean(int argc, const char **argv, const char 
*prefix)
if (pathspec.nr)
matches = dir_path_match(ent, pathspec, 0, NULL);
 
+   if (pathspec.nr  !matches)
+   continue;
+
if (S_ISDIR(st.st_mode)) {
if (remove_directories || (matches == MATCHED_EXACTLY)) 
{
rel = relative_path(ent-name, prefix, buf);
string_list_append(del_list, rel);
}
} else {
-   if (pathspec.nr  !matches)
-   continue;
rel = relative_path(ent-name, prefix, buf);
string_list_append(del_list, rel);
}
diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index 710be90..0c602de 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -511,4 +511,12 @@ test_expect_success SANITY 'git clean -d with an 
unreadable empty directory' '
! test -d foo
 '
 
+test_expect_success 'git clean -d respects pathspecs' '
+   mkdir foo 
+   mkdir foobar 
+   git clean -df foobar 
+   test_path_is_dir foo 
+   test_path_is_missing foobar
+'
+
 test_done
-- 
1.9.0.403.g7a2f4b0

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] clean: simplify dir/not-dir logic

2014-03-10 Thread Jeff King
On Mon, Mar 10, 2014 at 01:20:02PM -0400, Jeff King wrote:

 git-clean uses read_directory to fill in a `struct dir` with
 potential hits. However, read_directory does not actually
 check against our pathspec. It uses a simplified version
 that may turn up false positives. As a result, we need to
 check that any hits match our pathspec. We do so reliably
 for non-directories. For directories, if -d is not given
 we check that the pathspec matched exactly (i.e., we are
 even stricter, and require an explicit git clean foo to
 clean foo/). But if -d is given, rather than relaxing
 the exact match to allow a recursive match, we do not check
 the pathspec at all.
 
 This regression was introduced in 113f10f (Make git-clean a
 builtin, 2007-11-11).

The code has been cleaned up quite a bit from that original version, and
it was pretty easy to see the discrepancy between the two code paths.
However, if the code were structured like the cleanup patch below, I
think it would have been even easier.

This comes on top of my other patch. So the bug is already fixed, but I
think the end result is more readable.

-- 8 --
When we get a list of paths from read_directory, we further
prune it to create the final list of items to remove. The
code paths for directories and non-directories repeat the
same add to list code.

This patch restructures the code so that we don't repeat
ourselves. Also, by following a if (condition) continue
pattern like the pathspec check above, it makes it more
obvious that the conditional is about excluding directories
under certain circumstances.

Signed-off-by: Jeff King p...@peff.net
---
 builtin/clean.c | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/builtin/clean.c b/builtin/clean.c
index 31c1488..cf76b1f 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -950,15 +950,12 @@ int cmd_clean(int argc, const char **argv, const char 
*prefix)
if (pathspec.nr  !matches)
continue;
 
-   if (S_ISDIR(st.st_mode)) {
-   if (remove_directories || (matches == MATCHED_EXACTLY)) 
{
-   rel = relative_path(ent-name, prefix, buf);
-   string_list_append(del_list, rel);
-   }
-   } else {
-   rel = relative_path(ent-name, prefix, buf);
-   string_list_append(del_list, rel);
-   }
+   if (S_ISDIR(st.st_mode)  !remove_directories 
+   matches != MATCHED_EXACTLY)
+   continue;
+
+   rel = relative_path(ent-name, prefix, buf);
+   string_list_append(del_list, rel);
}
 
if (interactive  del_list.nr  0)
-- 
1.9.0.403.g7a2f4b0

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/WIP] Pluggable reference backends

2014-03-10 Thread Jeff King
On Mon, Mar 10, 2014 at 10:46:01AM -0700, Junio C Hamano wrote:

  No to SQLLite in git-core. Using it from JGit requires building
  SQLLite and a JNI wrapper, which makes JGit significantly less
  portable. I know SQLLite is pretty amazing, but implementing
  compatibility with it from JGit will be a big nightmare for us.
 
  That seems like a poor reason not to implement a pluggable feature for
  git-core. If we implement it, then a site using only git-core can take
  advantage of it. Sites with JGit cannot, and would use a different
  pluggable storage mechanism that's supported by both. But if we don't
  implement, it hurts people using only git-core, and it does not help
  sites using JGit at all.
 
 We would need to eventually have at least one backend that we know
 will play well with different Git implementations that matter
 (namely, git-core, Jgit and libgit2) before the feature can be
 widely adopted.

I assumed that the current refs/ and logs/ code, massaged into pluggable
backend form, would be the first such. And I wouldn't be surprised to
see some iteration on that once it is easier to move from scheme to
scheme (e.g., to use some encoding of the names on the filesystem to
avoid D/F conflicts, and thus allow reflogs for deleted refs).

 The first backend that is used while the plugging-interface is in
 development can be anything and does not have to be one that
 eventual ubiquitous one, however; as long as it is something that we
 do not mind carrying it forever, along with that final reference
 backend.  I take the objection from Shawn only as against making the
 sqlite that final one.

Sure, I'd agree with that. I'd think something like an sqlite interface
would be mainly of interest to people running busy servers. I don't know
that it would make a good default.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/WIP] Pluggable reference backends

2014-03-10 Thread Jeff King
On Mon, Mar 10, 2014 at 05:14:02PM +0100, David Kastrup wrote:

 [storing refs in sqlite]

 Of course, the basic premise for this feature is let's assume that our
 file and/or operating system suck at providing file system functionality
 at file name granularity.  There have been two historically approaches
 to that problem that are not independent: a) use Linux b) kick Linus.

You didn't define suck here, but there are a number of issues with the
current ref storage system. Here is a sampling:

  1. The filesystem does not present an atomic view of the data (e.g.,
 you read a, then while you are reading b, somebody else updates
 a; your view is one that never existed at any point in time).

  2. Using the filesystem creates D/F conflicts between branches foo
 and foo/bar. Because this name is a primary key even for the
 reflogs, we cannot easily persist reflogs after the ref is removed.

  3. We use packed-refs in conjunction with loose ones to achieve
 reasonable performance when there are a large number of refs. The
 scheme for determining the current value of a ref is complicated
 and error-prone (we had several race conditions that caused real
 data loss).

Those things can be solved through better support from the filesystem.
But they were also solved decades ago by relational databases.

I generally avoid databases where possible. They lock your data up in a
binary format that you can't easily touch with standard unix tools. And
they introduce complexity and opportunity for bugs.

But they are also a proven technology for solving exactly the sorts of
problems that some people are having with git. I do not see a reason not
to consider them as an option for a pluggable refs system. But I also do
not see a reason to inflict their costs on people who do not have those
problems. And that is why Michael's email is about _pluggable_ ref
backends, and not let's convert git to sqlite.

I do not even know if sqlite is going to end up as an interesting
option. But it will be nice to be able to experiment with it easily due
to git's ref code becoming more modular.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] fix status_printf_ln calls zero-length format warnings

2014-03-10 Thread Jeff King
On Mon, Mar 10, 2014 at 08:27:25PM +0100, Benoit Pierre wrote:

 Those happens with gcc -Wformat-zero-length. Since passing NULL does not
 generate a warning (as __attribute__((printf())) does not imply nonull), 
 modify
 status_printf/status_printf_ln to allow a NULL format and update calls with an
 empty string.

Most of us who compile with -Wall decided a while ago to use
-Wno-format-zero-length, because it really is a silly complaint (it
assumes there are no side effects of the function besides printing the
format string, which is obviously not true in this case).

It would be nice to compile out of the box with -Wall -Werror, and I
think your solution looks relatively clean. But I am slightly concerned
about the assumption that it is OK to pass a NULL fmt parameter to a
function marked with __attribute__((format)).  It certainly seems to be
the case now, and I do not know of any plans for that to change, but it
seems like a potentially obvious thing for the compiler to check.

I dunno; perhaps it has already been considered and rejected by gcc
folks to allow the exact escape hatch we are using here.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/7] commit: fix patch hunk editing with commit -p -m

2014-03-10 Thread Jeff King
On Mon, Mar 10, 2014 at 07:49:34PM +0100, Benoit Pierre wrote:

 Don't change git environment: move the GIT_EDITOR=: override to the
 hook command subprocess, like it's already done for GIT_INDEX_FILE.
 
 Signed-off-by: Benoit Pierre benoit.pie...@gmail.com
 ---
  builtin/checkout.c|  8 +++
  builtin/clone.c   |  4 ++--
  builtin/commit.c  | 35 ---
  builtin/gc.c  |  2 +-
  builtin/merge.c   |  6 +++---
  commit.h  |  3 +++
  run-command.c | 44 
 ---
  run-command.h |  6 +-
  t/t7513-commit_-p_-m_hunk_edit.sh |  2 +-
  9 files changed, 79 insertions(+), 31 deletions(-)

This is a lot of change, and in some ways I think it is making things
better overall. However, the simplest fix for this is basically to move
the setting of GIT_EDITOR down to after we prepare the index.

Jun Hao (cc'd) has been preparing a series for this based on the
Bloomberg git hackday a few weeks ago, but it hasn't been sent to the
list yet.

Commits are here:

  https://github.com/bloomberg/git/compare/commit-patch-allow-hunk-editing

if you care to look. I'm not sure which solution is technically
superior, but it's worth considering both.

I regret not encouraging Jun to post to the list sooner, as we might
have avoided some duplicated effort. But that's a sunk cost, and we
should pick up whichever is the best for the project.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] clean: respect pathspecs with -d

2014-03-10 Thread Jeff King
On Mon, Mar 10, 2014 at 09:02:35PM +0100, Simon Ruderich wrote:

 On Mon, Mar 10, 2014 at 01:22:15PM -0400, Jeff King wrote:
  +test_expect_success 'git clean -d respects pathspecs' '
  +   mkdir foo 
  +   mkdir foobar 
  +   git clean -df foobar 
  +   test_path_is_dir foo 
  +   test_path_is_missing foobar
  +'
  +
   test_done
 
 I think we should also test removing foo, which was also in the
 original report, to make sure we don't match prefixes, e.g.:
 
 test_expect_success 'git clean -d respects pathspecs' '
   mkdir foo 
   mkdir foobar 
   git clean -df foo 
   test_path_is_missing foo 
   test_path_is_dir foobar
 '

Yeah, it probably makes sense to test both ways (though the root cause
and fix are the same). Those mkdirs need to be mkdir -p, though.

Here's an updated patch with your suggestion:

-- 8 --
Subject: clean: respect pathspecs with -d

git-clean uses read_directory to fill in a `struct dir` with
potential hits. However, read_directory does not actually
check against our pathspec. It uses a simplified version
that may turn up false positives. As a result, we need to
check that any hits match our pathspec. We do so reliably
for non-directories. For directories, if -d is not given
we check that the pathspec matched exactly (i.e., we are
even stricter, and require an explicit git clean foo to
clean foo/). But if -d is given, rather than relaxing
the exact match to allow a recursive match, we do not check
the pathspec at all.

This regression was introduced in 113f10f (Make git-clean a
builtin, 2007-11-11).

Signed-off-by: Jeff King p...@peff.net
---
 builtin/clean.c  |  5 +++--
 t/t7300-clean.sh | 16 
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/builtin/clean.c b/builtin/clean.c
index 114d7bf..31c1488 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -947,14 +947,15 @@ int cmd_clean(int argc, const char **argv, const char 
*prefix)
if (pathspec.nr)
matches = dir_path_match(ent, pathspec, 0, NULL);
 
+   if (pathspec.nr  !matches)
+   continue;
+
if (S_ISDIR(st.st_mode)) {
if (remove_directories || (matches == MATCHED_EXACTLY)) 
{
rel = relative_path(ent-name, prefix, buf);
string_list_append(del_list, rel);
}
} else {
-   if (pathspec.nr  !matches)
-   continue;
rel = relative_path(ent-name, prefix, buf);
string_list_append(del_list, rel);
}
diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index 710be90..74de814 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -511,4 +511,20 @@ test_expect_success SANITY 'git clean -d with an 
unreadable empty directory' '
! test -d foo
 '
 
+test_expect_success 'git clean -d respects pathspecs (dir is prefix of 
pathspec)' '
+   mkdir -p foo 
+   mkdir -p foobar 
+   git clean -df foobar 
+   test_path_is_dir foo 
+   test_path_is_missing foobar
+'
+
+test_expect_success 'git clean -d respects pathspecs (pathspec is prefix of 
dir)' '
+   mkdir -p foo 
+   mkdir -p foobar 
+   git clean -df foo 
+   test_path_is_missing foo 
+   test_path_is_dir foobar
+'
+
 test_done
-- 
1.9.0.403.g7a2f4b0

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GSoC14][RFC] Proposal Draft: Refactor tempfile handling

2014-03-10 Thread Jeff King
On Sun, Mar 09, 2014 at 02:04:16AM +0900, Brian Gesiak wrote:

  Once the logic is extracted into a nice API, there are
  several other places that can use it, too: ...
 
 I've found the following four areas so far:
 
 1. lockfile.lock_file
 2. git-compat-util.odb_mkstemp
 3. git-compat-util.odb_pack_keep
 4. diff.prepare_temp_file
 
 Tons of files use (1) and (2). (3) is less common, and (4) is only
 used for external diffs.

Yeah, I would expect (1) and (2) to be the most frequent. (3) gets
written on every push and fetch, but only for a short period. (4) is
also used for diff's textconv, though like external diffs, they are
relatively rare.

In my experience, most of the cruft that gets left is from (2), since a
push or fetch will spool to a tmpfile, then verify the results via git
index-pack. Any failure there leaves the file in place.

There are a few other potential candidates we can find by grepping for
mkstemp. Not all of those might want cleanup, but it's a starting point
for investigation.

  the shallow_XX tempfiles
 
 I'm not sure I was able to find this one. Are you referring to the
 lock files used when fetching, such as in fetch-pack.c?

I mean the xmkstemp from setup_temporary_shallow in shallow.c.

 I'd say the biggest difference between lockfiles and object files is
 that tempfile methods like odb_mkstemp need to know the location of
 the object directory. Aside from that, lockfiles and the external diff
 files appear to be cleaned up at exit, while temporary object files
 tend to have a more finely controlled lifecycle. I'm still
 investigating this aspect of the proposal, though.

The diff tempfiles are true tempfiles; they always go away in the end
(though of course we want to clean them up as we finish with them,
rather than doing it all at the end). Lockfiles may get committed into
place (i.e., via atomic rename) or rolled back (deleted).

Object files should generally be hard-linked into place, but there is
some extra magic in move_temp_to_file to fallback to renames.  Some of
that we may be able to get rid of (e.g., we try to avoid doing
cross-directory renames at all these days, so the comment there may be
out of date).

 One question, though: the idea on the ideas page specifies that
 temporary pack and object files may optionally be cleaned up in case
 of error during program execution. How will users specify their
 preference? I think the API for creating temporary files should allow
 cleanup options to be specified on a per-file basis. That way each
 part of the program that creates tempfiles can specify a different
 config value to determine the cleanup policy.

That probably makes sense. I certainly had a config option in mind. I
mentioned above that the most common cruft is leftover packfiles from
pushes and fetches. We haven't deleted those historically because the
same person often controls both the client and the server, and they
would want to possibly do forensics on the packfile sent to the remote,
or even rescue objects out of it. But the remote end may simply have
rejected the pack by some policy, and has no interest in forensics.

Having a config option for each type of file may be cool, but I don't
know how useful it would be in practice. Still, it's certainly worth
thinking about and looking into.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: GSoC idea: allow git rebase --interactive todo lines to take options

2014-03-10 Thread Jeff King
On Fri, Feb 28, 2014 at 03:03:52PM +0100, Michael Haggerty wrote:

  I'm not sure whether it is a good idea or not. But I think it is looking
  decreasingly like a good GSoC project.
 
 I guess I misread the sentiment of the mailing list, because I merged
 this idea into the list about two hours ago.

Yeesh, sorry to be so slow on the reply to this. It floated to the
bottom of my to respond list.

 But if you think that even the proposal's simpler sub-ideas are
 controversial, then let me know and I will delete the idea from the list
 again.  I don't want a GSoC student to have to fight battles of my own
 creation :-)

I'd say keep it at this point. I think there _are_ some good ideas here,
and part of a project is figuring out what is good. And part of the role
of the mentor is applying some taste. There are probably students who
would be a good fit, and students who would not. That is true for just
about every project, of course, but I think this one is just a little
trickier than some.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/8] Hiding refs

2014-03-10 Thread Jeff King
On Sun, Feb 23, 2014 at 09:44:14AM +0700, Duy Nguyen wrote:

 (Digging up an old thread about initial refs listing in git protocol)

And now I am responding to it slowly. :)

  For that to work, the new server needs to wait for the client to
  speak first.  How would that server handle old clients who expect to
  be spoken first?  Wait with a read timeout (no timeout is the right
  timeout for everybody)?
 
 I think the client always speaks first when it asks for a remote
 service. Earlier in this thread you described the new protocol
 upload-pack-2. Why can't it be a new service upload-pack-2 in
 git-daemon?
 
 So new client will try requesting upload-pack-2 service with client
 capability advertisement before ref listing. Old servers do not
 recognize this service and disconnect so the new client falls back to
 the good old upload-pack (one more round trip though, but you could
 configure new client to use old protocol for certain old hosts).
 Similar thing happens for ssh transport. upload-pack service is
 always there for old clients.

Right, I recall the general feeling being that such a system would work,
and the transition would be managed by a config variable like
remote.*.useUploadPack2. Probably with settings like:

  true:
always try, but allow fall back to upload-pack

  false:
never try, always use upload-pack

  auto:
try, but if we fail, set remote.*.uploadPackTimestamp, and do not
try again for N days

The default would start at false, and people who know their server is
very up-to-date can turn it on. And then when many server
implementations support it, flip the default to auto. And either leave
it there forever, or eventually just set it to true and drop auto
entirely as a code cleanup.

In theory we could do more radical protocol changes here, but I think
most people are just interested in adding an opportunity for the client
to speak before the ref advertisement in order to set a few
flags/variables.  That should be relatively simple, and for http we can
probably pass those flags via url parameters without any extra
compatibility/round-trip at all.

I think the main flag of interest is giving an fnmatch pattern to limit
the advertised refs. There could potentially be others, but I do not
know of any offhand.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] mv: prevent mismatched data when ignoring errors.

2014-03-10 Thread Jeff King
On Sat, Mar 08, 2014 at 07:21:39PM +, brian m. carlson wrote:

 We shrink the source and destination arrays, but not the modes or
 submodule_gitfile arrays, resulting in potentially mismatched data.  Shrink
 all the arrays at the same time to prevent this.
 
 Signed-off-by: brian m. carlson sand...@crustytoothpaste.net
 ---
  builtin/mv.c | 5 +
  1 file changed, 5 insertions(+)
 
 diff --git a/builtin/mv.c b/builtin/mv.c
 index f99c91e..b20cd95 100644
 --- a/builtin/mv.c
 +++ b/builtin/mv.c
 @@ -230,6 +230,11 @@ int cmd_mv(int argc, const char **argv, const char 
 *prefix)
   memmove(destination + i,
   destination + i + 1,
   (argc - i) * sizeof(char *));
 + memmove(modes + i, modes + i + 1,
 + (argc - i) * sizeof(char *));
 + memmove(submodule_gitfile + i,
 + submodule_gitfile + i + 1,
 + (argc - i) * sizeof(char *));

I haven't looked that closely, but would it be crazy to suggest that
these arrays all be squashed into one array-of-struct? It would be less
error prone and perhaps more readable.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/7] commit: fix patch hunk editing with commit -p -m

2014-03-11 Thread Jeff King
On Tue, Mar 11, 2014 at 06:56:02PM +0100, Benoit Pierre wrote:

 According to the original commit, the change to GIT_EDITOR is only
 here for hooks:
 
 commit 406400ce4f69e79b544dd3539a71b85d99331820
 Author: Paolo Bonzini bonz...@gnu.org
 Date:   Tue Feb 5 11:01:45 2008 +0100
 
 git-commit: set GIT_EDITOR=: if editor will not be launched
 
 This is a preparatory patch that provides a simple way for the future
 prepare-commit-msg hook to discover if the editor will be launched.
 
 Signed-off-by: Junio C Hamano gits...@pobox.com
 
 So there is really no reason to set it earlier, and not just in the
 hook subprocess environment.

Ah, you're right. I was thinking that our invocation of launch_editor
also respected it. It does, but we never get there at all because
use_editor is set to 0. So yeah, it really is just needed for the hook.

Your patch, even though it is a bigger change, keeps the setting to the
minimal area, which is cleaner and more maintainable in the long run.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GSoC14][RFC] Proposal Draft: Refactor tempfile handling

2014-03-11 Thread Jeff King
On Tue, Mar 11, 2014 at 05:27:05PM +0100, Michael Haggerty wrote:

 Thanks for your proposal.  I have a technical point that I think your
 proposal should address:
 
 Currently the linked list of lockfiles only grows, never shrinks.  Once
 an object has been linked into the list, there is no way to remove it
 again even after the lock has been released.  So if a lock needs to be
 created dynamically at a random place in the code, its memory is
 unavoidably leaked.

Thanks, I remember thinking about this when I originally conceived of
the idea, but I forgot to mention it in the idea writeup.

In most cases the potential leaks are finite and small, but object
creation and diff tempfiles could both be unbounded. So this is
definitely something to consider. In both cases we have a bounded number
of _simultaneous_ tempfiles, so one strategy could be to continue using
static objects. But it should not be hard to do it dynamically, and I
suspect the resulting API will be a lot easier to comprehend.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/8] Hiding refs

2014-03-11 Thread Jeff King
On Tue, Mar 11, 2014 at 12:32:37PM -0700, Junio C Hamano wrote:

 Jeff King p...@peff.net writes:
 
  I think the main flag of interest is giving an fnmatch pattern to limit
  the advertised refs. There could potentially be others, but I do not
  know of any offhand.
 
 One thing that comes to mind is where symrefs point at, which we
 failed to add the last time around because we ran out of the
 hidden-space behind NUL.

Yeah, good idea. I might be misremembering some complications, but we
can probably do it with:

  1. Teach the client to send an advertise-symrefs flag before the ref
 advertisement.

  2. Teach the server to include symrefs in the ref advertisement; we
 can invent a new syntax because we know the client has asked for
 it.

That does not have to come immediately, though. Done correctly,
upload-pack2 is not about implementing the fnmatch feature, but allowing
arbitrary capability strings from the client before the ref
advertisement starts. So this just becomes an extension that we can add
and advertise during that new phase.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/8] Hiding refs

2014-03-11 Thread Jeff King
On Tue, Mar 11, 2014 at 01:25:23PM -0700, Junio C Hamano wrote:

  Yeah, good idea. I might be misremembering some complications, but we
  can probably do it with:
 
1. Teach the client to send an advertise-symrefs flag before the ref
   advertisement.
 
2. Teach the server to include symrefs in the ref advertisement; we
   can invent a new syntax because we know the client has asked for
   it.
 
 I was thinking more about the underlying protocol, not advertisement
 in particular, and I think we came to the same conclusion.
 
 The capability advertisement deserves to have its own separate
 packet message type, when both sides say that they understand it, so
 that we do not have to be limited by the pkt-line length limit.  We
 could do one message per capability, and at the same time can lift
 the traditional capability hidden after the NUL is purged every
 time, so we need to repeat them if we want to later change it,
 because that is how older clients and servers use that information
 insanity, for example.

So this may be entering the more radical changes realm I mentioned
earlier.

If the client is limited to setting a few flags, then something like
http can get away with:

  GET 
foo.git/info/refs?service=git-upload-packadvertise-symrefsrefspec=refs/heads/*

And it does not need to worry about upload-pack2 at all. Either the
server recognizes and acts on them, or it ignores them.

But given that we do not have such a magic out-of-band method for
passing values over ssh and git, maybe it is not worth worrying about.
Http can move to upload-pack2 along with the rest.

One thing that _is_ worth considering for http is how the protocol
starts. We do not want to introduce an extra http round-trip to the
protocol if we can help it. If the initial GET becomes a POST, then it
could pass along the pkt-line of client capabilities with the initial
request, and the server would respond with the ref advertisement as
usual.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] implement submodule config cache for lookup of submodule names

2014-03-11 Thread Jeff King
On Mon, Mar 10, 2014 at 10:24:12PM +0100, Heiko Voigt wrote:

 I have also moved all functions into the new submodule-config-cache
 module. I am not completely satisfied with the naming since it is quite
 long. If someone comes up with some different naming I am open for it.
 Maybe simply submodule-config (submodule_config prefix for functions)?

Since the cache is totally internal to the submodule-config code, I do
not know that you even need to have a nice submodule-config-cache API.
Can it just be a singleton?

That is bad design in a sense (it becomes harder later if you ever want
to pull submodule config from two sources simultaneously), but it
matches many other subsystems in git which cache behind the scenes.

I also suspect you could call submodule_config simply submodule, and
let it be a stand-in for the submodule (for now, only data from the
config, but potentially you could keep other data on it, too).

So with all that, the entry point into your code is just:

  const struct submodule *submodule_from_path(const char *path);

and the caching magically happens behind-the-scenes.

But take all of this with a giant grain of salt, as I am not too
familiar with the needs of the callers.

 +/* one submodule_config_cache entry */
 +struct submodule_config {
 + struct strbuf path;
 + struct strbuf name;
 + unsigned char gitmodule_sha1[20];
 +};

Do these strings need changed after they are written once? If not, you
may want to just make these bare pointers (you can still use strbufs to
create them, and then strbuf_detach at the end). That may just be a matter of
taste, though.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] general style: replaces memcmp() with proper starts_with()

2014-03-12 Thread Jeff King
On Wed, Mar 12, 2014 at 10:43:54AM -0400, Quint Guvernator wrote:

 memcmp() is replaced with negated starts_with() when comparing strings
 from the beginning. starts_with() looks nicer and it saves the extra
 argument of the length of the comparing string.

Thanks, I think this is a real readability improvement in most cases.

One of the things to do when reviewing a patch like this is make sure
that there aren't any silly arithmetic mistakes. Checking that can be
tedious, so I thought about how _I_ would do it programatically, and
then compared our results.

I tried:

  perl -i -lpe '
s/memcmp\(([^,]+), (.*?), (\d+)\)/
 length($2) == $3 ?
   qq{!starts_with($1, $2)} :
   $
/ge
  ' $(git ls-files '*.c')

That comes up with almost the same results as yours, but misses a few
cases that you caught (in addition to the need to simplify
!!starts_with()):

  - memcmp(foo, bar, strlen(bar))

  - strings with interpolation, like foo\n, which is actually 4
characters in length, not 5.

But there were only a few of those, and I hand-verified them. So I feel
pretty good that the changes are all correct transformations.

That leaves the question of whether they are all improvements in
readability (more on that below).

 note: this commit properly handles negation in starts_with()
 
 Signed-off-by: Quint Guvernator quintus.pub...@gmail.com
 ---

This note should go after the --- line so that it is not part of the
commit message (it is only interesting to people seeing v2 and wondering
what changed from v1 earlier on the list, not people reading the history
much later).

 diff --git a/builtin/commit-tree.c b/builtin/commit-tree.c
 index 987a4c3..2777519 100644
 --- a/builtin/commit-tree.c
 +++ b/builtin/commit-tree.c
 @@ -66,7 +66,7 @@ int cmd_commit_tree(int argc, const char **argv, const char 
 *prefix)
   continue;
   }
  
 - if (!memcmp(arg, -S, 2)) {
 + if (starts_with(arg, -S)) {
   sign_commit = arg + 2;
   continue;
   }

This is an improvement, but we still have the magic 2 below. Would
skip_prefix be a better match here, like:

  if ((val = skip_prefix(arg, -S))) {
  sign_commit = val;
  continue;
  }

We've also talked about refactoring skip_prefix to return a boolean, and
put the value in an out-parameter. Which would make this even more
readable:

  if (skip_prefix(arg, -S, sign_commit))
  continue;

Maybe the time has come to do that.

 --- a/builtin/mailinfo.c
 +++ b/builtin/mailinfo.c
 @@ -626,7 +626,7 @@ static int handle_boundary(void)
   strbuf_addch(newline, '\n');
  again:
   if (line.len = (*content_top)-len + 2 
 - !memcmp(line.buf + (*content_top)-len, --, 2)) {
 + starts_with(line.buf + (*content_top)-len, --)) {

I'm not sure if this improves readability or not. It's certainly nice to
get rid of the magic 2, but starts_with is a bit of a misnomer, since
we are indexing deep into the buffer anyway. And we still have the magic
2 above anyway.

I'm on the fence.

 @@ -727,8 +727,8 @@ static int is_scissors_line(const struct strbuf *line)
   continue;
   }
   if (i + 1  len 
 - (!memcmp(buf + i, 8, 2) || !memcmp(buf + i, 8, 2) ||
 -  !memcmp(buf + i, %, 2) || !memcmp(buf + i, %, 2))) {
 + (starts_with(buf + i, 8) || starts_with(buf + i, 8) ||
 +  starts_with(buf + i, %) || starts_with(buf + i, %))) 
 {

Same as above.

 @@ -66,7 +66,7 @@ static int verify_tag(char *buffer, unsigned long size)
   return error(char%PRIuMAX: could not find next \\\n\,
   (uintmax_t) (type_line - buffer));
   tag_line++;
 - if (memcmp(tag_line, tag , 4) || tag_line[4] == '\n')
 + if (!starts_with(tag_line, tag ) || tag_line[4] == '\n')
   return error(char%PRIuMAX: no \tag \ found,
   (uintmax_t) (tag_line - buffer));

I think this is another that could benefit from an enhanced skip_prefix:

  if (!skip_prefix(tag_line, tag , tag_name) || *tag_name == '\n')
 ...

and then we can get rid of the tag_line += 4 that is used much later
(in fact, I suspect that whole function could be improved in this
respect).

 @@ -269,7 +269,7 @@ int parse_commit_buffer(struct commit *item, const void 
 *buffer, unsigned long s
   return 0;
   item-object.parsed = 1;
   tail += size;
 - if (tail = bufptr + 46 || memcmp(bufptr, tree , 5) || bufptr[45] != 
 '\n')
 + if (tail = bufptr + 46 || !starts_with(bufptr, tree ) || bufptr[45] 
 != '\n')
   return error(bogus commit object %s, 
 sha1_to_hex(item-object.sha1));
   if (get_sha1_hex(bufptr + 5, parent)  0)

Again, we just use bufptr + 5 a bit later here. So a candidate for
skip_prefix.

   graft = lookup_commit_graft(item-object.sha1);
 - while (bufptr + 48  

Re: [PATCH] general style: replaces memcmp() with starts_with()

2014-03-12 Thread Jeff King
On Wed, Mar 12, 2014 at 06:22:24PM +0100, Jens Lehmann wrote:

  Let me know if you still think the hunk should be dropped there.
 
 Yes, I think so. That spot uses memcmp() because ce-name may
 not be 0-terminated. If that assumption isn't correct, it should
 be replaced with a plain strcmp() instead (while also dropping
 the ce_namelen() comparison in the line above). But starts_with()
 points into the wrong direction there.

I think the length-check and memcmp is an optimization[1] here. But we
should be able to encapsulate that pattern and avoid magic numbers
entirely with something like mem_equals(). See my other response for
more details.

-Peff

[1] Getting rid of the magic number entirely means we have to call
strlen(.gitmodules), which seems like it is working against this
optimization. But I think past experiments have shown that decent
compilers will optimize strlen on a string literal to a constant, so
as long as mem_equals is an inline, it should be equivalent.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: New GSoC microproject ideas

2014-03-12 Thread Jeff King
On Wed, Mar 12, 2014 at 08:16:53PM +0100, David Kastrup wrote:

 Junio C Hamano gits...@pobox.com writes:
 
  Here is another, as I seem to have managed to kill another one ;-)
 
  -- 8 --
 
  VAR=VAL command is sufficient to run 'command' with environment
  variable VAR set to value VAL without affecting the environment of
  the shell itself, but we cannot do the same with a shell function
  (most notably, test_must_fail);
 
 No? bash:
 
 dak@lola:/usr/local/tmp/lilypond$ zippo()
  {
  echo $XXX
  echo $XXX
  }
 dak@lola:/usr/local/tmp/lilypond$ XXX=8 zippo
 8
 8

Try:

  zippo() {
echo $XXX
  }
  XXX=8 zippo
  zippo

XXX remains set after the first call under dash (but not bash). I
believe ash has the same behavior.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What's cooking in git.git (Mar 2014, #02; Tue, 11)

2014-03-12 Thread Jeff King
On Tue, Mar 11, 2014 at 03:12:11PM -0700, Junio C Hamano wrote:

 * jk/warn-on-object-refname-ambiguity (2014-01-09) 6 commits
  - get_sha1: drop object/refname ambiguity flag
  - get_sha1: speed up ambiguous 40-hex test
  - FIXUP: teach DO_FOR_EACH_NO_RECURSE to prime_ref_dir()
  - refs: teach for_each_ref a flag to avoid recursion
  - cat-file: fix a minor memory leak in batch_objects
  - cat-file: refactor error handling of batch_objects
 
  Expecting a reroll.

I finally got a chance to return to this one. Michael had some good
comments on the refactoring that was going on in the middle patches. He
ended with:

  Yes.  Still, the code is really piling up for this one warning for the
  contrived eventuality that somebody wants to pass SHA-1s and branch
  names together in a single cat-file invocation *and* wants to pass
  lots of inputs at once and so is worried about performance *and* has
  reference names that look like SHA-1s.  Otherwise we could just leave
  the warning disabled in this case, as now.  Or we could add a new
  --hashes-only option that tells cat-file to treat all of its
  arguments/inputs as SHA-1s; such an option would permit an even faster
  code path for bulk callers.

Having looked at it again, I really think it is not worth pursuing. The
end goal is not that interesting, there is a lot of code introduced, and
a reasonable chance of accidentally introducing regressions and/or
making the code less maintainable.  Keeping the existing code (which
just disables the check for cat-file) is probably the sanest course of
action. We can do a similar thing for rev-list --stdin if we want, or
we can wait until somebody complains.

The bottom two patches are reasonable cleanups we should keep, though
(and the rest can just be discarded).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] general style: replaces memcmp() with proper starts_with()

2014-03-12 Thread Jeff King
On Wed, Mar 12, 2014 at 12:39:01PM -0700, Junio C Hamano wrote:

 Jeff King p...@peff.net writes:
 
   static inline int standard_header_field(const char *field, size_t len)
   {
  -  return ((len == 4  !memcmp(field, tree , 5)) ||
  -  (len == 6  !memcmp(field, parent , 7)) ||
  -  (len == 6  !memcmp(field, author , 7)) ||
  -  (len == 9  !memcmp(field, committer , 10)) ||
  -  (len == 8  !memcmp(field, encoding , 9)));
  +  return ((len == 4  starts_with(field, tree )) ||
  +  (len == 6  starts_with(field, parent )) ||
  +  (len == 6  starts_with(field, author )) ||
  +  (len == 9  starts_with(field, committer )) ||
  +  (len == 8  starts_with(field, encoding )));
 
  These extra len checks are interesting.  They look like an attempt to
  optimize lookup, since the caller will already have scanned forward to
  the space.
 
 If one really wants to remove the magic constants from this, then
 one must take advantage of the pattern
 
   len == strlen(S) - 1  !memcmp(field, S, strlen(S))
 
 that appears here, and come up with a simple abstraction to express
 that we are only using the string S (e.g. tree ), length len and
 location field of the counted string.
 
 Blindly replacing starts_with() with !memcmp() in the above part is
 a readability regression otherwise.

I actually think the right solution is:

  static inline int standard_header_field(const char *field, size_t len)
  {
  return mem_equals(field, len, tree ) ||
 mem_equals(field, len, parent ) ||
 ...;
  }

and the caller should tell us it's OK to look at field[len]:

  standard_header_field(line, eof - line + 1)

We could also omit the space from the standard_header_field. The caller
just ran strchr() looking for the space, so we know that either it is
there, or we are at the end of the line/buffer. Arguably a string like
parent\n should be a parent header with no data (but right now it is
not matched by this function). I'm not aware of an implementation that
writes such a thing, but it seems to fall in the be liberal in what you
accept category.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] fsck.c: Change the type of fsck_ident()'s first argument

2014-03-12 Thread Jeff King
On Thu, Mar 13, 2014 at 02:51:29AM +0800, Yuxuan Shui wrote:

 Since fsck_ident doesn't change the content of **ident, the type of
 ident could be const char **.

Unfortunately, const double-pointers in C are a bit tricky, and a
pointer to char * cannot automatically be passed as a pointer to
const char *. 

I think you want this on top:

diff --git a/fsck.c b/fsck.c
index 1789c34..7776660 100644
--- a/fsck.c
+++ b/fsck.c
@@ -281,7 +281,7 @@ static int fsck_ident(const char **ident, struct object 
*obj, fsck_error error_f
 
 static int fsck_commit(struct commit *commit, fsck_error error_func)
 {
-   char *buffer = commit-buffer;
+   const char *buffer = commit-buffer;
unsigned char tree_sha1[20], sha1[20];
struct commit_graft *graft;
int parents = 0;

Otherwise, gcc will complain about incompatible pointer types.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: New GSoC microproject ideas

2014-03-12 Thread Jeff King
On Wed, Mar 12, 2014 at 09:37:41PM +0100, David Kastrup wrote:

  Try:
 
zippo() {
  echo $XXX
}
XXX=8 zippo
zippo
 
  XXX remains set after the first call under dash (but not bash). I
  believe ash has the same behavior.
 
 Yes.  I would lean towards considering this a bug.  But I agree that it
 does not help.

Dash's behavior is POSIX (and bash --posix behaves the same way).

  http://article.gmane.org/gmane.comp.version-control.git/137095

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] general style: replaces memcmp() with proper starts_with()

2014-03-12 Thread Jeff King
On Wed, Mar 12, 2014 at 01:08:03PM -0700, Junio C Hamano wrote:

 Jeff King p...@peff.net writes:
 
  Blindly replacing starts_with() with !memcmp() in the above part is
  a readability regression otherwise.
 
  I actually think the right solution is:
 
static inline int standard_header_field(const char *field, size_t len)
{
return mem_equals(field, len, tree ) ||
   mem_equals(field, len, parent ) ||
   ...;
}
 
  and the caller should tell us it's OK to look at field[len]:
 
standard_header_field(line, eof - line + 1)
 
  We could also omit the space from the standard_header_field.
 
 Yes, that was what I had in mind.  The only reason why the callee
 (over-)optimizes the SP must follow these know keywords part by
 using the extra len parameter is because the caller has to do a
 single strchr() to skip an arbitrary field name anyway so computing
 len is essentially free.

One thing that bugs me about the current code is that the sub-function
looks one past the end of the length given to it by the caller.
Switching it to pass eof - line + 1 resolves that, but is it right?

The character pointed at by eof is either:

  1. space, if our strchr(line, ' ') found something

  2. the first character of the next line, if our
 memchr(line, '\n', eob - line) found something

  3. Whatever is at eob (end-of-buffer)

There are two questionable things here. In (1), we use strchr on a
sized buffer.  And in (3), we look one past the size that was passed in.

In both cases, we are saved by the fact that the buffer is actually NUL
terminated at the end of size (because it comes from read_sha1_file).
But we may find a space much further than the line ending which is
supposed to be our boundary, and end up having to do a comparison to
cover this case.

So I think the current code is correct, but we could make it more
obvious by:

  1. Restricting our search for the field separator to the current line.

  2. Explicitly avoid looking for headers when we did not find a space,
 since we cannot match anything anyway.

Like:

diff --git a/commit.c b/commit.c
index 6bf4fe0..9383cc1 100644
--- a/commit.c
+++ b/commit.c
@@ -1325,14 +1325,14 @@ static struct commit_extra_header 
*read_commit_extra_header_lines(
strbuf_reset(buf);
it = NULL;
 
-   eof = strchr(line, ' ');
-   if (next = eof)
+   eof = memchr(line, ' ', next - line);
+   if (eof) {
+   if (standard_header_field(line, eof - line + 1) ||
+   excluded_header_field(line, eof - line, exclude))
+   continue;
+   } else
eof = next;
 
-   if (standard_header_field(line, eof - line) ||
-   excluded_header_field(line, eof - line, exclude))
-   continue;
-
it = xcalloc(1, sizeof(*it));
it-key = xmemdupz(line, eof-line);
*tail = it;

I also think that eof = next (which I retained here) is off-by-one.
next here is not the newline, but the start of the next line. And I'm
guessing the code actually wanted the newline (otherwise it-key ends
up with the newline in it). But we cannot just subtract one, because if
we hit eob, it really is in the right spot.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] general style: replaces memcmp() with proper starts_with()

2014-03-12 Thread Jeff King
On Wed, Mar 12, 2014 at 05:14:15PM -0400, Jeff King wrote:

 I also think that eof = next (which I retained here) is off-by-one.
 next here is not the newline, but the start of the next line. And I'm
 guessing the code actually wanted the newline (otherwise it-key ends
 up with the newline in it). But we cannot just subtract one, because if
 we hit eob, it really is in the right spot.

I started on a patch for this, but found another interesting corner
case. If we do not find a newline and hit end-of-buffer (which
_shouldn't_ happen, as that is a malformed commit object), we set next
to eob. But then we copy the whole string, including *next into the
value of the header.

So we intend to capture the trailing newline in the value, and do in the
normal case. But in the end-of-buffer case, we capture an extra NUL. I
think that's OK, because the eventual fate of the buffer is to become a
C-string. But it does mean that the result sometimes has a
newline-terminator and sometimes doesn't, and the calling code needs to
handle this when printing, or risk lines running together.

Should this function append a newline if there is not already one? If
it's a mergetag header, we feed the result to gpg, etc, and expect to
get the data out verbatim. We would not want to mess that up. OTOH, the
commit object is bogus (and possibly truncated) in the first place, so
it probably doesn't really matter. And the GPG signature on tag objects
has its own delimiters, so a stray newline present or not at the end
should not matter.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] general style: replaces memcmp() with proper starts_with()

2014-03-12 Thread Jeff King
On Wed, Mar 12, 2014 at 05:14:15PM -0400, Jeff King wrote:

 One thing that bugs me about the current code is that the sub-function
 looks one past the end of the length given to it by the caller.
 Switching it to pass eof - line + 1 resolves that, but is it right?
 
 The character pointed at by eof is either:
 
   1. space, if our strchr(line, ' ') found something
 
   2. the first character of the next line, if our
  memchr(line, '\n', eob - line) found something
 
   3. Whatever is at eob (end-of-buffer)

This misses a case. We might find no space at all, and eof is NULL. We
never dereference it, so we don't segfault, but it does cause a bogus
giant allocation.

I'm out of time for the day, but here is a test I started on that
demonstrates the failure:

diff --git a/t/t7513-commit-bogus-extra-headers.sh 
b/t/t7513-commit-bogus-extra-headers.sh
index e69de29..834db08 100755
--- a/t/t7513-commit-bogus-extra-headers.sh
+++ b/t/t7513-commit-bogus-extra-headers.sh
@@ -0,0 +1,26 @@
+#!/bin/sh
+
+test_description='check parsing of badly formed commit objects'
+. ./test-lib.sh
+
+EMPTY_TREE=4b825dc642cb6eb9a060e54bf8d69288fbee4904
+
+test_expect_success 'newline right after mergetag header' '
+   test_tick 
+   cat commit -EOF 
+   tree $EMPTY_TREE
+   author $GIT_AUTHOR_NAME $GIT_AUTHOR_EMAIL $GIT_AUTHOR_DATE
+   committer $GIT_COMMITTER_NAME $GIT_COMMITTER_EMAIL $GIT_COMMITTER_DATE
+   mergetag
+
+   foo
+   EOF
+   commit=$(git hash-object -w -t commit commit) 
+   cat expect -EOF 
+   todo...
+   EOF
+   git --no-pager show --show-signature $commit actual 
+   test_cmp expect actual
+'
+
+test_done

The git show fails with:

  fatal: Out of memory, malloc failed (tried to allocate 18446744073699764927 
bytes)

So I think the whole function could use some refactoring to handle
corner cases better. I'll try to take a look tomorrow, but please feel
free if somebody else wants to take a crack at it.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Fix possible buffer overflow in remove_subtree()

2014-03-13 Thread Jeff King
On Thu, Mar 13, 2014 at 10:19:06AM +0100, Michael Haggerty wrote:

 These patches are proposed for maint (but also apply cleanly to
 master).
 
 I presume that this is exploitable via Git commands, though I haven't
 verified it explicitly [1].

It's possible to overflow this buffer, like:

git init repo  cd repo 
blob=$(git hash-object -w /dev/null) 
big=$(perl -e 'print a x 250')
(for i in $(seq 1 17); do mkdir $big  cd $big; done)
printf 100644 blob $blob\t$big\n tree 
tree=$(git mktree tree) 
git read-tree $tree 
git checkout-index -f --all

but I'm not sure if it is easily exploitable for two reasons:

  1. We never actually return from the function with the smashed stack.
 Immediately after overflowing the buffer, we call lstat(), which
 will fail with ENAMETOOLONG (at least on Linux), causing us to call
 into die_errno and exit.

  2. The overflow relies on us trying to move a very deep hierarchy out
 of the way, but I could not convince git to _create_ such a
 hierarchy in the first place. Here I do it using relative paths and
 recursion, but git generally tries to create paths from the top of
 the repo using full pathnames. So it gets ENAMETOOLONG trying to
 create the paths in the first place.

Of course somebody may be more clever than I am, or perhaps some systems
have a PATH_MAX that is not enforced by the kernel. It's still a
suspicious bit of code, and I think your patches are a strict
improvement. Besides fixing this potential problem, I notice that we
currently put 4096 bytes on the stack for a recursive function. Removing
a/a/a... can put up to 8M on the stack, which might be too much on
some systems (besides just being silly and wasteful).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] general style: replaces memcmp() with proper starts_with()

2014-03-13 Thread Jeff King
On Thu, Mar 13, 2014 at 10:47:28AM -0700, Junio C Hamano wrote:

  --- a/builtin/cat-file.c
  +++ b/builtin/cat-file.c
  @@ -82,7 +82,7 @@ static int cat_one_file(int opt, const char *exp_type, 
  const char *obj_name)
 enum object_type type;
 unsigned long size;
 char *buffer = read_sha1_file(sha1, type, 
  size);
  -  if (memcmp(buffer, object , 7) ||
  +  if (!starts_with(buffer, object ) ||
 
  [...]
 
  The original hunks show that the code knows and relies on magic
  numbers 7 and 8 very clearly and there are rooms for improvement.
 
  Like: what if the file is empty?
 
 Yes.

I think this one is actually OK. The result of read_sha1_file is
NUL-terminated, and we know that starts_with reads byte-by-byte (the
prior memcmp is wrong, but only if you care about accessing bytes past
the end of the NUL).

That whole piece of code seems silly, though, IMHO. It should be using
parse_tag or peel_to_type.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: No progress from push when using bitmaps

2014-03-13 Thread Jeff King
On Wed, Mar 12, 2014 at 05:21:21PM -0700, Shawn Pearce wrote:

 Today I tried pushing a copy of linux.git from a client that had
 bitmaps into a JGit server. The client stalled for a long time with no
 progress, because it reused the existing pack. No progress appeared
 while it was sending the existing file on the wire:
 
   $ git push git://localhost/linux.git master
   Reusing existing pack: 2938117, done.
   Total 2938117 (delta 0), reused 0 (delta 0)
   remote: Resolving deltas:  66% (1637269/2455727)
 
 This is not the best user experience. :-(

Yeah, I agree that sucks. I hadn't noticed it, as I don't typically have
my client repos bitmapped (and on fetch, the interesting progress is
coming from the local index-pack).

It would definitely be good to have throughput measurements while
writing out the pack. However, I'm not sure we have anything useful to
count. We know the total number of objects we're reusing, but we're not
actually parsing the data; we're just blitting it out as a stream. I
think the progress code will need some refactoring to handle a
throughput-only case.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: No progress from push when using bitmaps

2014-03-13 Thread Jeff King
On Thu, Mar 13, 2014 at 03:01:09PM -0700, Shawn Pearce wrote:

  It would definitely be good to have throughput measurements while
  writing out the pack. However, I'm not sure we have anything useful to
  count. We know the total number of objects we're reusing, but we're not
  actually parsing the data; we're just blitting it out as a stream. I
  think the progress code will need some refactoring to handle a
  throughput-only case.
 
 Yes. I think JGit suffers from this same bug, and again we never
 noticed it because usually only the servers are bitmapped, not the
 clients.
 
 pack-objects writes a throughput meter when its writing objects.
 Really just the bytes out/second would be enough to let the user know
 the client is working. Unfortunately I think that is still tied to the
 overall progress system having some other counter?

Yes, I'm looking at it right now. The throughput meter is actually
connected to the sha1fd output. So really we just need to call
display_progress periodically as we loop through the data. It's a
one-liner fix.

_But_ it still looks ugly, because, as you mention, it's tied to the
progress meter, which is counting up to N objects. So we basically sit
there at 0, pumping data, and then after the pack is done, we can say
we sent N. :)

There are a few ways around this:

  1. Add a new phase Writing packs which counts from 0 to 1. Even
 though it's more accurate, moving from 0 to 1 really isn't that
 useful (the throughput is, but the 0/1 just looks like noise).

  2. Add a new phase Writing reused objects that counts from 0 bytes
 up to N bytes. This looks stupid, though, because we are repeating
 the current byte count both here and in the throughput.

  3. Use the regular Writing objects progress, but fake the object
 count. We know we are writing M bytes with N objects. Bump the
 counter by 1 for every M/N bytes we write.

The first two require some non-trivial surgery to the progress code. I
am leaning towards the third. Not just because it's easy, but because I
think it actually shows the most intuitive display. Yes, it's fudging
the object numbers, but those are largely meaningless anyway (in fact,
it makes them _better_ because now they're even, instead of getting 95%
done and then hitting some blob that is as big as the rest of the repo
combined).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: No progress from push when using bitmaps

2014-03-13 Thread Jeff King
On Thu, Mar 13, 2014 at 06:07:54PM -0400, Jeff King wrote:

   3. Use the regular Writing objects progress, but fake the object
  count. We know we are writing M bytes with N objects. Bump the
  counter by 1 for every M/N bytes we write.

Here is that strategy. I think it looks pretty nice, and it seamlessly
handles the case where you have extra objects to send on top of the
reused pack (we just keep the same progress meter counting up).

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 831dd05..f187859 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -709,7 +709,7 @@ static struct object_entry **compute_write_order(void)
 static off_t write_reused_pack(struct sha1file *f)
 {
unsigned char buffer[8192];
-   off_t to_write;
+   off_t to_write, total;
int fd;
 
if (!is_pack_valid(reuse_packfile))
@@ -726,7 +726,7 @@ static off_t write_reused_pack(struct sha1file *f)
if (reuse_packfile_offset  0)
reuse_packfile_offset = reuse_packfile-pack_size - 20;
 
-   to_write = reuse_packfile_offset - sizeof(struct pack_header);
+   total = to_write = reuse_packfile_offset - sizeof(struct pack_header);
 
while (to_write) {
int read_pack = xread(fd, buffer, sizeof(buffer));
@@ -739,10 +739,23 @@ static off_t write_reused_pack(struct sha1file *f)
 
sha1write(f, buffer, read_pack);
to_write -= read_pack;
+
+   /*
+* We don't know the actual number of objects written,
+* only how many bytes written, how many bytes total, and
+* how many objects total. So we can fake it by pretending all
+* objects we are writing are the same size. This gives us a
+* smooth progress meter, and at the end it matches the true
+* answer.
+*/
+   written = reuse_packfile_objects *
+   (((double)(total - to_write)) / total);
+   display_progress(progress_state, written);
}
 
close(fd);
-   written += reuse_packfile_objects;
+   written = reuse_packfile_objects;
+   display_progress(progress_state, written);
return reuse_packfile_offset - sizeof(struct pack_header);
 }
 
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


<    1   2   3   4   5   6   7   8   9   10   >