Re: Cygwin + git log = no pager!
On Wed, Feb 26, 2014 at 09:54:39AM -0600, Robert Dailey wrote: That _should_ turn on the pager, but I think it does not due to a bug with setup_pager and aliases. Something like the patch below would make it work (but if you are having to use -p manually, there is something to fix in your cygwin environment, which does not think you are on a terminal). I have tried `git -p log` and `git log` and neither use the pager. I thought Git's behavior was a bug, but it seems to be the intended effect that -p is just cancel --no-pager and not turn on the pager, even if stdout is not a tty. So the patch I sent is not something we would want to apply, but it might help debugging your situation (if my hunch is right that isatty() is returning false, then git -p log would work with it). Or perhaps a simpler way to check that is just: diff --git a/pager.c b/pager.c index 0cc75a8..6d870ac 100644 --- a/pager.c +++ b/pager.c @@ -41,8 +41,10 @@ const char *git_pager(int stdout_is_tty) { const char *pager; - if (!stdout_is_tty) + if (!stdout_is_tty) { + warning(disabling pager, stdout is not a tty); return NULL; + } pager = getenv(GIT_PAGER); if (!pager) { Should I apply the provided patch to the Git for Windows master branch? Also I'm not sure if there is a convenient way to apply a patch via email, so should I copy paste it to a file then apply the file? The usual way is to save the patch to an mbox, then use git am to apply it. Most Unix-y mail clients have mbox support, but I suspect many Windows ones do not. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] upload-pack: allow shallow fetching from a read-only repository
On Thu, Feb 27, 2014 at 02:13:03PM +0700, Nguyễn Thái Ngọc Duy wrote: Before cdab485 (upload-pack: delegate rev walking in shallow fetch to pack-objects - 2013-08-16) upload-pack does not write to the source repository. cdab485 starts to write $GIT_DIR/shallow_XX if it's a shallow fetch, so the source repo must be writable. Fall back to $TMPDIR if $GIT_DIR/shallow_XX cannot be created in this case. Note that in other cases that write $GIT_DIR/shallow_XX and eventually rename it to $GIT_DIR/shallow, there is no fallback to $TMPDIR. That makes sense. @@ -224,7 +224,16 @@ char *setup_temporary_shallow(const struct sha1_array *extra) if (write_shallow_commits(sb, 0, extra)) { struct strbuf path = STRBUF_INIT; strbuf_addstr(path, git_path(shallow_XX)); - fd = xmkstemp(path.buf); + if (read_only) { + fd = mkstemp(path.buf); + if (fd 0) { + char buf[PATH_MAX]; + fd = git_mkstemp(buf, sizeof(buf), shallow_XX); + } + if (fd 0) + die_errno(Unable to create temporary shallow file); + } else + fd = xmkstemp(path.buf); This is not introduced by your patch, but I notice that we do not seem to do anything with the tempfiles when the program dies prematurely. We've started collecting stale shallow_XX files in our server repos. For the writable cases, should we be using the lockfile API to take shallow.lock? It looks like we do take a lock on it, but only after the fetch, and then we have to do a manual check for it having moved (by the way, shouldn't we do that check under the lock? I think setup_alternate_shallow has a race condition). I guess the reason to take the lock at the last minute is that the whole fetch is heavyweight. But if the fetches are going to conflict, perhaps it is better to have lock contention at the beginning, rather than failing only at the end. I don't know very much about the shallow system; does each operation rewrite the shallow file, or is it often left untouched (in which case two simultaneous fetches could coexist most of the time). At any rate, if we used the lockfile API, it handles tempfile cleanup automatically. Otherwise, I think we need something like the patch below (in the interest of simplicity, I just drop all of the explicit unlinks and let them use the atexit handler to clean up; you could also add a function to explicitly unlink the tempfile and clear the handler). diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c index 85bba35..ac1d9b5 100644 --- a/builtin/receive-pack.c +++ b/builtin/receive-pack.c @@ -833,8 +833,6 @@ static void execute_commands(struct command *commands, error(BUG: run 'git fsck' for safety.\n If there are errors, try to remove the reported refs above); - if (alt_shallow_file *alt_shallow_file) - unlink(alt_shallow_file); } } @@ -1087,10 +1085,6 @@ static void update_shallow_info(struct command *commands, cmd-skip_update = 1; } } - if (alt_shallow_file *alt_shallow_file) { - unlink(alt_shallow_file); - alt_shallow_file = NULL; - } free(ref_status); } diff --git a/fetch-pack.c b/fetch-pack.c index 90fdd49..ae8550e 100644 --- a/fetch-pack.c +++ b/fetch-pack.c @@ -947,17 +947,6 @@ static void update_shallow(struct fetch_pack_args *args, if (!si-shallow || !si-shallow-nr) return; - if (alternate_shallow_file) { - /* -* The temporary shallow file is only useful for -* index-pack and unpack-objects because it may -* contain more roots than we want. Delete it. -*/ - if (*alternate_shallow_file) - unlink(alternate_shallow_file); - free((char *)alternate_shallow_file); - } - if (args-cloning) { /* * remote is shallow, but this is a clone, there are diff --git a/shallow.c b/shallow.c index bbc98b5..0f2bb48 100644 --- a/shallow.c +++ b/shallow.c @@ -8,6 +8,7 @@ #include diff.h #include revision.h #include commit-slab.h +#include sigchain.h static int is_shallow = -1; static struct stat shallow_stat; @@ -216,27 +217,49 @@ int write_shallow_commits(struct strbuf *out, int use_pack_protocol, return write_shallow_commits_1(out, use_pack_protocol, extra, 0); } +static struct strbuf shallow_tmpfile = STRBUF_INIT; + +static void remove_shallow_tmpfile(void) +{ + if (shallow_tmpfile.len) { + unlink_or_warn(shallow_tmpfile.buf); + strbuf_reset(shallow_tmpfile); + }
[PATCH] shallow: verify shallow file after taking lock
Before writing the shallow file, we stat() the existing file to make sure it has not been updated since our operation began. However, we do not do so under a lock, so there is a possible race: 1. Process A takes the lock. 2. Process B calls check_shallow_file_for_update and finds no update. 3. Process A commits the lockfile. 4. Process B takes the lock, then overwrite's process A's changes. We can fix this by doing our check while we hold the lock. Signed-off-by: Jeff King p...@peff.net --- On Thu, Feb 27, 2014 at 04:04:26AM -0500, Jeff King wrote: by the way, shouldn't we do that check under the lock? I think setup_alternate_shallow has a race condition). Here is the fix. I didn't actually reproduce the race experimentally, but it seems pretty straightforward. I also notice that check_shallow_file_for_update returns early if !is_shallow. Is that safe? Is it possible for another process to have made us shallow since the program began? In that case, we would have to stat() the file always, then complain if it exists and !is_shallow. shallow.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/shallow.c b/shallow.c index bbc98b5..75da07a 100644 --- a/shallow.c +++ b/shallow.c @@ -246,9 +246,9 @@ void setup_alternate_shallow(struct lock_file *shallow_lock, struct strbuf sb = STRBUF_INIT; int fd; - check_shallow_file_for_update(); fd = hold_lock_file_for_update(shallow_lock, git_path(shallow), LOCK_DIE_ON_ERROR); + check_shallow_file_for_update(); if (write_shallow_commits(sb, 0, extra)) { if (write_in_full(fd, sb.buf, sb.len) != sb.len) die_errno(failed to write to %s, @@ -293,9 +293,9 @@ void prune_shallow(int show_only) strbuf_release(sb); return; } - check_shallow_file_for_update(); fd = hold_lock_file_for_update(shallow_lock, git_path(shallow), LOCK_DIE_ON_ERROR); + check_shallow_file_for_update(); if (write_shallow_commits_1(sb, 0, NULL, SEEN_ONLY)) { if (write_in_full(fd, sb.buf, sb.len) != sb.len) die_errno(failed to write to %s, -- 1.8.5.2.500.g8060133 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] shallow: verify shallow file after taking lock
On Thu, Feb 27, 2014 at 04:10:12AM -0500, Jeff King wrote: I also notice that check_shallow_file_for_update returns early if !is_shallow. Is that safe? Is it possible for another process to have made us shallow since the program began? In that case, we would have to stat() the file always, then complain if it exists and !is_shallow. That patch would look like this: diff --git a/shallow.c b/shallow.c index 75da07a..e05a241 100644 --- a/shallow.c +++ b/shallow.c @@ -139,13 +139,13 @@ void check_shallow_file_for_update(void) { struct stat st; - if (!is_shallow) - return; - else if (is_shallow == -1) + if (is_shallow == -1) die(BUG: shallow must be initialized by now); if (stat(git_path(shallow), st)) die(shallow file was removed during fetch); + else if (!is_shallow) + die(shallow file appeared during fetch); else if (st.st_mtime != shallow_stat.st_mtime #ifdef USE_NSEC || ST_MTIME_NSEC(st) != ST_MTIME_NSEC(shallow_stat) but again, I'm not really clear on whether this is possible. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] shallow: use stat_validity to check for up-to-date file
On Thu, Feb 27, 2014 at 05:18:58PM +0700, Duy Nguyen wrote: On Thu, Feb 27, 2014 at 4:22 PM, Jeff King p...@peff.net wrote: On Thu, Feb 27, 2014 at 04:10:12AM -0500, Jeff King wrote: I also notice that check_shallow_file_for_update returns early if !is_shallow. Is that safe? Is it possible for another process to have made us shallow since the program began? In that case, we would have to stat() the file always, then complain if it exists and !is_shallow. I think it's safer to do it your way. Yeah, I played around a bit and found that using git fetch --depth in a non-shallow repo could run into this case. if (stat(git_path(shallow), st)) die(shallow file was removed during fetch); + else if (!is_shallow) + die(shallow file appeared during fetch); Note that this is wrong; when the file is missing (the first part of the conditional), we need to check is_shallow before dying. Otherwise we erroneously complain when creating the file for the first time. As I was fixing it, though, I recalled that we had to write a similar system for the packed-refs file. Fortunately, it was easy to reuse, and I ended up with the patch below. -- 8 -- Subject: shallow: use stat_validity to check for up-to-date file When we are about to write the shallow file, we check that it has not changed since we last read it. Instead of hand-rolling this, we can use stat_validity. This is built around the index stat-check, so it is more robust than just checking the mtime, as we do now (it uses the same check as we do for index files). The new code also handles the case of a shallow file appearing unexpectedly. With the current code, two simultaneous processes making us shallow (e.g., two git fetch --depth=1 running at the same time in a non-shallow repository) can race to overwrite each other. As a bonus, we also remove a race in determining the stat information of what we read (we stat and then open, leaving a race window; instead we should open and then fstat the descriptor). Signed-off-by: Jeff King p...@peff.net --- shallow.c | 24 +++- 1 file changed, 7 insertions(+), 17 deletions(-) diff --git a/shallow.c b/shallow.c index 75da07a..9668b39 100644 --- a/shallow.c +++ b/shallow.c @@ -10,7 +10,7 @@ #include commit-slab.h static int is_shallow = -1; -static struct stat shallow_stat; +static struct stat_validity shallow_stat; static char *alternate_shallow_file; void set_alternate_shallow_file(const char *path, int override) @@ -52,12 +52,12 @@ int is_repository_shallow(void) * shallow file should be used. We could just open it and it * will likely fail. But let's do an explicit check instead. */ - if (!*path || - stat(path, shallow_stat) || - (fp = fopen(path, r)) == NULL) { + if (!*path || (fp = fopen(path, r)) == NULL) { + stat_validity_clear(shallow_stat); is_shallow = 0; return is_shallow; } + stat_validity_update(shallow_stat, fileno(fp)); is_shallow = 1; while (fgets(buf, sizeof(buf), fp)) { @@ -137,21 +137,11 @@ struct commit_list *get_shallow_commits(struct object_array *heads, int depth, void check_shallow_file_for_update(void) { - struct stat st; - - if (!is_shallow) - return; - else if (is_shallow == -1) + if (is_shallow == -1) die(BUG: shallow must be initialized by now); - if (stat(git_path(shallow), st)) - die(shallow file was removed during fetch); - else if (st.st_mtime != shallow_stat.st_mtime -#ifdef USE_NSEC -|| ST_MTIME_NSEC(st) != ST_MTIME_NSEC(shallow_stat) -#endif - ) - die(shallow file was changed during fetch); + if (!stat_validity_check(shallow_stat, git_path(shallow))) + die(shallow file has changed since we read it); } #define SEEN_ONLY 1 -- 1.8.5.2.500.g8060133 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] shallow: automatically clean up shallow tempfiles
On Thu, Feb 27, 2014 at 05:11:41PM +0700, Duy Nguyen wrote: We only update shallow file in these cases: clone --depth, fetch --update-shallow, fetch --depth, and push when receive.shallowupdate is set. All of them may end up not updating shallow file though. OK, that last sentence is what I wasn't sure of. If they might sometimes not update the shallow file, then I think locking early is wrong. It means we create lock contention where it might not exist. For read-only case in upload-file, locking only reduces the availability of clone/fetch. Right, those should never lock. So even if we did want to tweak the locking, we would still need a separate tempfile cleanup for those. Here it is as a completed patch. It conflicts textually but not semantically with the patch that started this thread (I think one of my earlier patches has a minor textual conflict, too). Should we roll them into a single series to help Junio? If so, do you want to do it, or should I? -- 8 -- Subject: shallow: automatically clean up shallow tempfiles We sometimes write tempfiles of the form shallow_XX during fetch/push operations with shallow repositories. Under normal circumstances, we clean up the result when we are done. However, we do no take steps to clean up after ourselves when we exit due to die() or signal death. This patch teaches the tempfile creation code to register handlers to clean up after ourselves. To handle this, we change the ownership semantics of the filename returned by setup_temporary_shallow. It now keeps a copy of the filename itself, and returns only a const pointer to it. We can also do away with explicit tempfile removal in the callers. They all exit not long after finishing with the file, so they can rely on the auto-cleanup, simplifying the code. Note that we keep things simple and maintain only a single filename to be cleaned. This is sufficient for the current caller, but we future-proof it with a die(BUG). Signed-off-by: Jeff King p...@peff.net --- builtin/receive-pack.c | 16 commit.h | 2 +- fetch-pack.c | 11 --- shallow.c | 41 ++--- upload-pack.c | 7 +-- 5 files changed, 40 insertions(+), 37 deletions(-) diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c index 85bba35..c323081 100644 --- a/builtin/receive-pack.c +++ b/builtin/receive-pack.c @@ -828,14 +828,10 @@ static void execute_commands(struct command *commands, } } - if (shallow_update) { - if (!checked_connectivity) - error(BUG: run 'git fsck' for safety.\n - If there are errors, try to remove - the reported refs above); - if (alt_shallow_file *alt_shallow_file) - unlink(alt_shallow_file); - } + if (shallow_update !checked_connectivity) + error(BUG: run 'git fsck' for safety.\n + If there are errors, try to remove + the reported refs above); } static struct command *read_head_info(struct sha1_array *shallow) @@ -1087,10 +1083,6 @@ static void update_shallow_info(struct command *commands, cmd-skip_update = 1; } } - if (alt_shallow_file *alt_shallow_file) { - unlink(alt_shallow_file); - alt_shallow_file = NULL; - } free(ref_status); } diff --git a/commit.h b/commit.h index 16d9c43..55631f1 100644 --- a/commit.h +++ b/commit.h @@ -209,7 +209,7 @@ extern int write_shallow_commits(struct strbuf *out, int use_pack_protocol, extern void setup_alternate_shallow(struct lock_file *shallow_lock, const char **alternate_shallow_file, const struct sha1_array *extra); -extern char *setup_temporary_shallow(const struct sha1_array *extra); +extern const char *setup_temporary_shallow(const struct sha1_array *extra); extern void advertise_shallow_grafts(int); struct shallow_info { diff --git a/fetch-pack.c b/fetch-pack.c index 90fdd49..ae8550e 100644 --- a/fetch-pack.c +++ b/fetch-pack.c @@ -947,17 +947,6 @@ static void update_shallow(struct fetch_pack_args *args, if (!si-shallow || !si-shallow-nr) return; - if (alternate_shallow_file) { - /* -* The temporary shallow file is only useful for -* index-pack and unpack-objects because it may -* contain more roots than we want. Delete it. -*/ - if (*alternate_shallow_file) - unlink(alternate_shallow_file); - free((char *)alternate_shallow_file); - } - if (args-cloning) { /* * remote is shallow, but this is a clone, there are diff --git a/shallow.c b/shallow.c
Re: [PATCH] repack: add `repack.honorpackkeep` config var
On Wed, Feb 26, 2014 at 12:30:36PM -0800, Junio C Hamano wrote: pack-kept-objects then? Hmm. That does address my point above, but somehow the word kept feels awkward to me. I'm ambivalent between the two. That word does make my backside somewhat itchy ;-) Would it help to take a step back and think what the option really does? Perhaps we should call it --pack-all-objects, which is short for --pack-all-objectsregardless-of-where-they-currently-are-stored, or something? The word all gives a wrong connotation in a different way (e.g. regardless of reachability is a possible wrong interpretation), so that does not sound too good, either. I do not think all is what we want to say. It only affects kept objects, not any of the other exclusions (e.g., -l). --repack-kept-objects? --include-kept-objects? Of all of them, I think --pack-kept-objects is probably the best. And I think we are hitting diminishing returns in thinking too much more on the name. :) -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] tag: support --sort=spec
On Thu, Feb 27, 2014 at 07:56:52PM +0700, Nguyễn Thái Ngọc Duy wrote: --sort=version:refname (or --sort=v:refname for short) sorts tags as if they are versions. --sort=-refname reverses the order (with or without :version). versioncmp() is copied from string/strverscmp.c in glibc commit ee9247c38a8def24a59eb5cfb7196a98bef8cfdc, reformatted to Git coding style. The implementation is under LGPL-2.1 and according to [1] I can relicense it to GPLv2. This looks good to me. One minor typo: diff --git a/Documentation/git-tag.txt b/Documentation/git-tag.txt index 404257d..9b05931 100644 --- a/Documentation/git-tag.txt +++ b/Documentation/git-tag.txt @@ -95,6 +95,12 @@ OPTIONS using fnmatch(3)). Multiple patterns may be given; if any of them matches, the tag is shown. +--sort=type:: + Sort in a specific order. Supported type is refname + (lexicographic order), version:refname or v:refname (tags + name are treated as versions). Prepend - to reverse sort s/tags name/tag names/ You had mentioned earlier tweaking the version comparison to handle things like -rc better. I think that can come on top of this initial patch, but we should probably figure out the final sort order before including this in a release. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] t3200-branch: test setting branch as own upstream
On Fri, Feb 28, 2014 at 12:04:18PM +0900, Brian Gesiak wrote: No test asserts that git branch -u refs/heads/my-branch my-branch emits a warning. Add a test that does so. For an operation like git branch foo origin where setting up the tracking is a side effect, a warning makes sense. But the sole purpose of the command above is to set the upstream, and we didn't do it; should this warning actually be upgraded to an error? +test_expect_success '--set-upstream-to shows warning if used to set branch as own upstream' ' + git branch --set-upstream-to refs/heads/my13 my13 2actual + cat expected EOF +warning: Not setting branch my13 as its own upstream. +EOF + test_cmp expected actual +' This should use test_i18ncmp, as the string you are matching is internationalized. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] branch: use skip_prefix
On Fri, Feb 28, 2014 at 12:04:19PM +0900, Brian Gesiak wrote: From: modocache modoca...@gmail.com Both your emailed patches have this, which is due to your author name not matching your sending identity. You probably want to set user.name, or if you already have (which it looks like you might have from your Signed-off-by), use git commit --amend --reset-author to update the author information. The install_branch_config function reimplemented the skip_prefix function inline. Use skip_prefix function instead for brevity. Signed-off-by: Brian Gesiak modoca...@gmail.com Reported-by: Michael Haggerty mhag...@alum.mit.edu It's a minor thing, but usually these footer lines try to follow a chronological order. So the report would come before the signoff (and a further signoff from the maintainer would go after yours). diff --git a/branch.c b/branch.c index 723a36b..e163f3c 100644 --- a/branch.c +++ b/branch.c [...] The patch itself looks OK to me. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Rewrite git-compat-util.h:skip_prefix() as a loop
On Thu, Feb 27, 2014 at 09:33:45PM +0100, David Kastrup wrote: diff --git a/git-compat-util.h b/git-compat-util.h index cbd86c3..4daa6cf 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -357,8 +357,8 @@ extern int suffixcmp(const char *str, const char *suffix); static inline const char *skip_prefix(const char *str, const char *prefix) { - size_t len = strlen(prefix); - return strncmp(str, prefix, len) ? NULL : str + len; +while( *prefix != '\0' *str++ == *prefix++ ); +return *prefix == '\0' ? str : NULL; Documentation/CodingGuidelines? Mostly relevant for tabification here, not helping much otherwise. In particular, does not contain the advice empty statements should appear on a line of their own which would help with readability here. Also whitespace in the while, which I could not find mentioned in CodingGuidelines either. Maybe: -- 8 -- Subject: [PATCH] CodingGuidelines: mention C whitespace rules We are fairly consistent about these, so most are covered by follow existing style, but it doesn't hurt to be explicit. Signed-off-by: Jeff King p...@peff.net --- Documentation/CodingGuidelines | 11 +++ 1 file changed, 11 insertions(+) diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines index ef67b53..ed432a8 100644 --- a/Documentation/CodingGuidelines +++ b/Documentation/CodingGuidelines @@ -126,6 +126,17 @@ For C programs: char * string. This makes it easier to understand code like char *string, c;. + - Use whitespace around operators and keywords, but not inside + parentheses and not around functions. So: + +while (condition) + func(bar + 1); + + and not: + +while( condition ) + func (bar+1); + - We avoid using braces unnecessarily. I.e. if (bla) { -- 1.8.5.2.500.g8060133 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] t3200-branch: test setting branch as own upstream
On Fri, Feb 28, 2014 at 03:17:28PM +0900, Brian Gesiak wrote: For an operation like git branch foo origin where setting up the tracking is a side effect, a warning makes sense. But the sole purpose of the command above is to set the upstream, and we didn't do it; should this warning actually be upgraded to an error? I agree. I originally wrote the test using test_expect_failure--imagine my surprise when the exit status was 0, despite the fact that the upstream wasn't set! For reference, you don't want test_expect_failure here; it is about we want this to succeed, but git is currently buggy and we know it, so mark it as a failing test. You want test_must_fail to indicate a command inside a test that must exit non-zero: test_expect_success 'pointing upstream to itself fails' ' test_must_fail git branch -u ... ' I notice that the warning comes from install_branch_config, which gets used both for branch -u, but also in the side effect case I mentioned above. Is it possible to trigger this as part of such a case? I think maybe git branch -f --track foo foo would do it. If so, we should perhaps include a test that it does not break if we upgrade the -u case to an error. Patch is on the way, just waiting for the tests to complete. Thanks for pointing that out! Also, sorry if it's in the Makefile somewhere, but is there an easy way to run just a single test file in the t directory? You can run ./t-sh explicitly. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFC] rebase: new convenient option to edit a single commit
On Thu, Feb 27, 2014 at 08:01:18PM +0700, Nguyễn Thái Ngọc Duy wrote: I find myself often do git rebase -i xxx and replace one pick line with edit to amend just one commit when I see something I don't like in that commit. This happens often while cleaning up a series. This automates the replace step so it sends me straight to that commit. Yeah, I do this a lot, too. The interface you propose makes sense to me, though I'm not sure how much I would use it, as I often do not know the specifier of the commit I want to change (was it HEAD~3 or HEAD~4?). I guess using :/ could make that easier. One comment on the option name: +1,edit-one!generate todo list to edit this commit I'd expect -$n to mean rebase the last $n commits (as opposed to everything not in the upstream). That does not work currently, of course, but: 1. It has the potential to confuse people who read it, since it's unlike what -1 means in most of the rest of git. 2. It closes the door if we want to support -$n in the future. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] t3200-branch: test setting branch as own upstream
On Fri, Feb 28, 2014 at 07:55:25AM +0100, Johannes Sixt wrote: This should use test_i18ncmp, as the string you are matching is internationalized. More generally, stderr output shouldn't be tested with test_cmp or test_i18ncmp at all, but with grep and test_i18ngrep. The reason is that when you run the test with 'sh -x t3200* -v -i', the trace output is also in stderr, and the test_cmp/test_i18ncmp fails due to the unexpected extra text. I didn't think we bothered to make sh -x work robustly. I don't mind if we do, but git grep -E 'test_(i18n)?cmp .*err shows many potential problem spots. Hmm. Looks like it is only a problem if you are calling a shell function (since it is the shell function's trace output you are seeing). So this test would be OK as-is, but testing for an error, like: test_must_fail git branch -u foo foo 2stderr would not be, because we see the trace from test_must_fail. So some of the callsites found by my grep are actually probably fine. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] t3200-branch: test setting branch as own upstream
On Fri, Feb 28, 2014 at 02:14:01AM -0500, Jeff King wrote: I didn't think we bothered to make sh -x work robustly. I don't mind if we do, but git grep -E 'test_(i18n)?cmp .*err shows many potential problem spots. Just for fun: cd t make SHELL_PATH=sh -x prove causes 326 test failures across 43 scripts. That's slightly misleading, because 200 of the failures are all in t0008, and updating one function would probably clear up all of them. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFC] rebase: new convenient option to edit a single commit
On Fri, Feb 28, 2014 at 02:34:16PM +0700, Duy Nguyen wrote: Yeah, I do this a lot, too. The interface you propose makes sense to me, though I'm not sure how much I would use it, as I often do not know the specifier of the commit I want to change (was it HEAD~3 or HEAD~4?). I guess using :/ could make that easier. In my case, I just copy/paste the commit ID from git log -lp. I think :/ already works with rebase.. I think it should work. I just meant I will have to get in the habit of starting to use :/. :) -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] t3200-branch: test setting branch as own upstream
On Fri, Feb 28, 2014 at 04:28:38PM +0900, Brian Gesiak wrote: I would be in favor of using test_i18ngrep, but it seems like this test file overwhelmingly uses test_(i18n)cmp, even when inspecting stderr output. We generally prefer cmp checks to grep checks, because they are more rigorous. However, when testing human-readable output which may change, sometimes being too specific can simply make the tests brittle and annoying. Using a forgiving regex that matches keywords can be helpful. So there's definitely some room for judgement. I think what you posted as v2 looks OK. Making double-sure that all tests pass when run with sh -x seems like a larger endeavor. Of course, I'd be happy to submit several patches if there's support for such a change. But as Peff points out it will be a large diff. Yeah, I don't think it's worth the effort. If you feel like continuing on this series, converting the warning() into a die() would be a much more productive use of time (but if you don't, I do not see any reason not to take the patches you've posted). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] repack: add `repack.honorpackkeep` config var
On Thu, Feb 27, 2014 at 10:04:44AM -0800, Junio C Hamano wrote: I wonder if it makes sense to link it with pack.writebitmaps more tightly, without even exposing it as a seemingly orthogonal knob that can be tweaked, though. I think that is because I do not fully understand the , because ... part of the below: This patch introduces an option to disable the `--honor-pack-keep` option. It is not triggered by default, even when pack.writeBitmaps is turned on, because its use depends on your overall packing strategy and use of .keep files. If you ask --write-bitmap-index (or have pack.writeBitmaps on), you do want the bitmap-index to be written, and unless you tell pack-objects to ignore the .keep marker, it cannot do so, no? Does the , because ... part above mean you may have an overall packing strategy to use .keep file to not ever repack some subset of the objects, so we will not silently explode the kept objects into a new pack? Exactly. The two features (bitmaps and .keep) are not compatible with each other, so you have to prioritize one. If you are using static .keep files, you might want them to continue being respected at the expense of using bitmaps for that repo. So I think you want a separate option from --write-bitmap-index to allow the appropriate flexibility. The default is another matter. I think most people using .bitmaps on a server would probably want to set repack.packKeptObjects. They would want to repack often to take advantage of the .bitmaps anyway, so they probably don't care about .keep files (any they see are due to races with incoming pushes). So we could do something like falling back to turning the option on if --write-bitmap-index is on _and_ the user didn't specify --pack-kept-objects. The existing default is mostly there because it is the conservative choice (.keep files continue to do their thing as normal unless you say otherwise). But the fallback thing would be one less knob that bitmap users would need to turn in the common case. Here's the interdiff for doing the fallback: --- diff --git a/Documentation/config.txt b/Documentation/config.txt index 3a3d84f..a8ddc7f 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -2139,7 +2139,9 @@ repack.usedeltabaseoffset:: repack.packKeptObjects:: If set to true, makes `git repack` act as if `--pack-kept-objects` was passed. See linkgit:git-repack[1] for - details. Defaults to false. + details. Defaults to `false` normally, but `true` if a bitmap + index is being written (either via `--write-bitmap-index` or + `pack.writeBitmaps`). rerere.autoupdate:: When set to true, `git-rerere` updates the index with the diff --git a/builtin/repack.c b/builtin/repack.c index 49947b2..6b0b62d 100644 --- a/builtin/repack.c +++ b/builtin/repack.c @@ -9,7 +9,7 @@ #include argv-array.h static int delta_base_offset = 1; -static int pack_kept_objects; +static int pack_kept_objects = -1; static char *packdir, *packtmp; static const char *const git_repack_usage[] = { @@ -190,6 +190,9 @@ int cmd_repack(int argc, const char **argv, const char *prefix) argc = parse_options(argc, argv, prefix, builtin_repack_options, git_repack_usage, 0); + if (pack_kept_objects 0) + pack_kept_objects = write_bitmap; + packdir = mkpathdup(%s/pack, get_object_directory()); packtmp = mkpathdup(%s/.tmp-%d-pack, packdir, (int)getpid()); diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh index f8431a8..b1eed5c 100755 --- a/t/t7700-repack.sh +++ b/t/t7700-repack.sh @@ -21,7 +21,7 @@ test_expect_success 'objects in packs marked .keep are not repacked' ' objsha1=$(git verify-pack -v pack-$packsha1.idx | head -n 1 | sed -e s/^\([0-9a-f]\{40\}\).*/\1/) mv pack-* .git/objects/pack/ - git repack -A -d -l + git repack --no-pack-kept-objects -A -d -l git prune-packed for p in .git/objects/pack/*.idx; do idx=$(basename $p) @@ -35,9 +35,9 @@ test_expect_success 'objects in packs marked .keep are not repacked' ' test -z $found_duplicate_object ' -test_expect_success '--pack-kept-objects duplicates objects' ' +test_expect_success 'writing bitmaps duplicates .keep objects' ' # build on $objsha1, $packsha1, and .keep state from previous - git repack -Adl --pack-kept-objects + git repack -Adl test_when_finished found_duplicate_object= for p in .git/objects/pack/*.idx; do idx=$(basename $p) -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] archive: add archive.restrictRemote option
On Thu, Feb 27, 2014 at 10:37:30AM -0800, Junio C Hamano wrote: Signed-off-by: Jeff King p...@peff.net Thanks. Do GitHub people have general aversion against signing off (or sending out, for that matter) their own patches, unless they were already active here before they joined GitHub, by the way? Mostly it is that I clean up the patches and commit messages before sending them out. Michael sends out his own patches because they are already perfect by the time I see them. :) I can certainly get S-O-B from GitHubbers, but my impression of the DCO is that it does not matter; as the first link in the signoff chain, I am certifying that the patch meets the licensing requirements. Of course, I know that because of my relationship to the author and our employer, which is something that isn't encoded in the headers. A S-O-B from the author would perhaps make it more obvious what happened. I like the general idea and this escape hatch would be a good thing to have. A few comments: - Seeing the word combination restrict+remote before reading the explanation made me think hmph, only allow remote archive from certain hosts but not from others? We are restricting the objects and only on the remote usage, not restricting remotes, so somebody else may be able to come up with a less misleading name. - It might be better to call the escape hatch allow something that defaults to false. It is merely the issue of perception, but having a knob to be limiting that defaults to true gives a wrong impression that in an ideal world remote archive ought to be loose and we are artificially limiting it by default. After reading your first point, I came up with archive.allowRemoteUnreachable, which also satisfies the second. I do not have a strong opinion. +archive.restrictRemote:: + If true, archives can only be requested by refnames. If false, As this does not affect local use of git archive, requested by refnames may need to be clarified further. Perhaps remote archives can be requested only for published refnames or something. I was hoping to be vague. If we really want to get into specifics, we should probably document the current rules (refnames, and sub-trees of refnames). It might be a good thing to document that anyway, though. And by doing so, it would become obvious why one would want to set this option. I'll see what I can come up with. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 0/2] lifting upload-archive restrictions
On Fri, Feb 28, 2014 at 04:07:09AM -0500, Jeff King wrote: As this does not affect local use of git archive, requested by refnames may need to be clarified further. Perhaps remote archives can be requested only for published refnames or something. I was hoping to be vague. If we really want to get into specifics, we should probably document the current rules (refnames, and sub-trees of refnames). It might be a good thing to document that anyway, though. And by doing so, it would become obvious why one would want to set this option. I'll see what I can come up with. Here's a series to handle this; I think the end result is much nicer. I ended up calling the option uploadarchive.allowUnreachable. By moving it there instead of under archive, it is more clear that this is about the _serving_ end of the remote connection, and the word remote becomes redundant. [1/2]: docs: clarify remote restrictions for git-upload-archive [2/2]: add uploadarchive.allowUnreachable option -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/2] docs: clarify remote restrictions for git-upload-archive
Commits ee27ca4 and 0f544ee introduced rules by which git-upload-archive would restrict clients from accessing unreachable objects. However, we never documented those rules anywhere, nor their reason for being. Let's do so now. Signed-off-by: Jeff King p...@peff.net --- Documentation/git-archive.txt| 5 - Documentation/git-upload-archive.txt | 26 ++ 2 files changed, 30 insertions(+), 1 deletion(-) diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt index b97aaab..cfa1e4e 100644 --- a/Documentation/git-archive.txt +++ b/Documentation/git-archive.txt @@ -65,7 +65,10 @@ OPTIONS --remote=repo:: Instead of making a tar archive from the local repository, - retrieve a tar archive from a remote repository. + retrieve a tar archive from a remote repository. Note that the + remote repository may place restrictions on which sha1 + expressions may be allowed in `tree-ish`. See + linkgit:git-upload-archive[1] for details. --exec=git-upload-archive:: Used with --remote to specify the path to the diff --git a/Documentation/git-upload-archive.txt b/Documentation/git-upload-archive.txt index d09bbb5..8ae65d8 100644 --- a/Documentation/git-upload-archive.txt +++ b/Documentation/git-upload-archive.txt @@ -20,6 +20,32 @@ This command is usually not invoked directly by the end user. The UI for the protocol is on the 'git archive' side, and the program pair is meant to be used to get an archive from a remote repository. +SECURITY + + +In order to protect the privacy of objects that have been removed from +history but may not yet have been pruned, `git-upload-archive` avoids +serving archives for commits and trees that are not reachable from the +repository's refs. However, because calculating object reachability is +computationally expensive, `git-upload-archive` implements a stricter +but easier-to-check set of rules: + + 1. Clients may request a commit or tree that is pointed to directly by + a ref. E.g., `git archive --remote=origin v1.0`. + + 2. Clients may request a sub-tree within a commit or tree using the + `ref:path` syntax. E.g., `git archive --remote=origin v1.0:Documentation`. + + 3. Clients may _not_ use other sha1 expressions, even if the end + result is reachable. E.g., neither a relative commit like `master^` + nor a literal sha1 like `abcd1234` is allowed, even if the result + is reachable from the refs. + +Note that rule 3 disallows many cases that do not have any privacy +implications. These rules are subject to change in future versions of +git, and the server accessed by `git archive --remote` may or may not +follow these exact rules. + OPTIONS --- directory:: -- 1.8.5.2.500.g8060133 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/2] add uploadarchive.allowUnreachable option
From: Scott J. Goldman scot...@github.com In commit ee27ca4, we started restricting remote git-archive invocations to only accessing reachable commits. This matches what upload-pack allows, but does restrict some useful cases (e.g., HEAD:foo). We loosened this in 0f544ee, which allows `foo:bar` as long as `foo` is a ref tip. However, that still doesn't allow many useful things, like: 1. Commits accessible from a ref, like `foo^:bar`, which are reachable 2. Arbitrary sha1s, even if they are reachable. We can do a full object-reachability check for these cases, but it can be quite expensive if the client has sent us the sha1 of a tree; we have to visit every sub-tree of every commit in the worst case. Let's instead give site admins an escape hatch, in case they prefer the more liberal behavior. For many sites, the full object database is public anyway (e.g., if you allow dumb walker access), or the site admin may simply decide the security/convenience tradeoff is not worth it. This patch adds a new config option to disable the restrictions added in ee27ca4. It defaults to off, meaning there is no change in behavior by default. Signed-off-by: Jeff King p...@peff.net --- Documentation/config.txt | 7 +++ Documentation/git-upload-archive.txt | 6 ++ archive.c| 13 +++-- t/t5000-tar-tree.sh | 9 + 4 files changed, 33 insertions(+), 2 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 040197b..62f0a4e 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -2334,6 +2334,13 @@ transfer.unpackLimit:: not set, the value of this variable is used instead. The default value is 100. +uploadarchive.allowUnreachable:: + If true, allow clients to use `git archive --remote` to request + any tree, whether reachable from the ref tips or not. See the + discussion in the `SECURITY` section of + linkgit:git-upload-archive[1] for more details. Defaults to + `false`. + uploadpack.hiderefs:: String(s) `upload-pack` uses to decide which refs to omit from its initial advertisement. Use more than one diff --git a/Documentation/git-upload-archive.txt b/Documentation/git-upload-archive.txt index 8ae65d8..cbef61b 100644 --- a/Documentation/git-upload-archive.txt +++ b/Documentation/git-upload-archive.txt @@ -46,6 +46,12 @@ implications. These rules are subject to change in future versions of git, and the server accessed by `git archive --remote` may or may not follow these exact rules. +If the config option `uploadArchive.allowUnreachable` is true, these +rules are ignored, and clients may use arbitrary sha1 expressions. +This is useful if you do not care about the privacy of unreachable +objects, or if your object database is already publicly available for +access via non-smart-http. + OPTIONS --- directory:: diff --git a/archive.c b/archive.c index 346f3b2..7d0976f 100644 --- a/archive.c +++ b/archive.c @@ -17,6 +17,7 @@ static char const * const archive_usage[] = { static const struct archiver **archivers; static int nr_archivers; static int alloc_archivers; +static int remote_allow_unreachable; void register_archiver(struct archiver *ar) { @@ -257,7 +258,7 @@ static void parse_treeish_arg(const char **argv, unsigned char sha1[20]; /* Remotes are only allowed to fetch actual refs */ - if (remote) { + if (remote !remote_allow_unreachable) { char *ref = NULL; const char *colon = strchr(name, ':'); int refnamelen = colon ? colon - name : strlen(name); @@ -401,6 +402,14 @@ static int parse_archive_args(int argc, const char **argv, return argc; } +static int git_default_archive_config(const char *var, const char *value, + void *cb) +{ + if (!strcmp(var, uploadarchive.allowunreachable)) + remote_allow_unreachable = git_config_bool(var, value); + return git_default_config(var, value, cb); +} + int write_archive(int argc, const char **argv, const char *prefix, int setup_prefix, const char *name_hint, int remote) { @@ -411,7 +420,7 @@ int write_archive(int argc, const char **argv, const char *prefix, if (setup_prefix prefix == NULL) prefix = setup_git_directory_gently(nongit); - git_config(git_default_config, NULL); + git_config(git_default_archive_config, NULL); init_tar_archiver(); init_zip_archiver(); diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh index 05f011d..1cf0a4e 100755 --- a/t/t5000-tar-tree.sh +++ b/t/t5000-tar-tree.sh @@ -213,6 +213,15 @@ test_expect_success 'clients cannot access unreachable commits' ' test_must_fail git archive --remote=. $sha1 remote.tar ' +test_expect_success 'upload-archive can allow unreachable commits' ' + test_commit
Re: [PATCH 1/2] t3200-branch: test setting branch as own upstream
On Fri, Feb 28, 2014 at 07:44:10PM +0900, Brian Gesiak wrote: I notice that the warning comes from install_branch_config, which gets used both for branch -u, but also in the side effect case I mentioned above. Is it possible to trigger this as part of such a case? I think maybe git branch -f --track foo foo would do it. If so, we should perhaps include a test that it does not break if we upgrade the -u case to an error. Do you mean that install_branch_config should continue to emit a warning in the side effect case? I'm not sure I agree--how is git branch -f --track foo foo less erroneous than git branch -u foo refs/heads/foo? Perhaps I'm missing some insight on how --track is used. I'd be more worried about triggering it via the config. E.g.: git config branch.autosetupmerge always git branch -f foo foo Should the second command die? I admit I'm having a hard time coming up with a feasible reason why anyone would do branch -f foo foo in the first place. I just don't want to regress somebody else's workflow due to my lack of imagination. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: GSoC idea: allow git rebase --interactive todo lines to take options
On Thu, Feb 27, 2014 at 01:10:30PM -0500, Brandon McCaig wrote: On Wed, Feb 26, 2014 at 5:52 AM, Jeff King p...@peff.net wrote: This seems like a reasonable feature to me. All of your examples are possible with an edit and another git command, but the convenience may be worth it (though personally, most of the examples you gave are particularly interesting to me[1]). This strikes me as over-complicating the rebase --interactive interface. Sorry, I missed an important word in my final sentence. It should have been the examples you gave are NOT particularly interesting to me. Particularly all of the ideas expressed later on about merge commits and resetting authors, etc. It seems like you're trying to define a whole new command set (i.e., API) for Git, but within the context of rebase --interactive. I think it would be hard to document this, and hard to learn it, and harder still to remember it (even though it would obviously try to mirror the existing Git command API). I agree some of the examples are getting esoteric. Things like --signoff and --reset-author are a fairly straightforward convenience feature: they save you from writing exec git commit --amend --signoff. For others that cannot currently be done with a simple option to git commit, I think a reasonable first step would be to implement them there. For example, you cannot currently git commit --tree. Maybe that is too insane and low-level an option for git commit. But if it is, then it is almost certainly too insane and low-level for a rebase instruction. For others from Michael's list, I expect they may not make _sense_ outside of a rebase. That is, they are operations whose input is not a single commit, but a sequence of commits (e.g., if you had some high-level command that allowed swapping two commits without having to redo the conflicts from the second commit). Those ones might make sense to exist as part of rebase and nowhere else (but then they are not necessarily just options, but rather new instructions). That said, I do think that this is probably a bad direction and shouldn't be rushed into too fast. I'm not sure whether it is a good idea or not. But I think it is looking decreasingly like a good GSoC project. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] t3200-branch: test setting branch as own upstream
On Fri, Feb 28, 2014 at 08:16:13PM +0900, Brian Gesiak wrote: I just don't want to regress somebody else's workflow due to my lack of imagination. This makes a lot of sense to me, although as-is the function emits a warning and returns immediately (although with a successful status code), so I'm also stumped as to what kind of workflow this would be included in. I'm cc-ing Matthieu, who wrote 85e2233, which introduces the check. Its commit message says: branch: warn and refuse to set a branch as a tracking branch of itself. Previous patch allows commands like git branch --set-upstream foo foo, which doesn't make much sense. Warn the user and don't change the configuration in this case. Don't die to let the caller finish its job in such case. For those just joining us, we are focused on the final statement, and deciding whether we should die() in this case. But I am not clear what job it would want to be finishing (creating the branch, I guess, but you cannot be doing anything useful, since by definition the branch already exists and you are not changing its tip). There wasn't any relevant discussion on the list[1]. Matthieu, can you remember anything else that led to that decision? In any case, if the jury's out on this one, I suppose the two patches I submitted are good to merge? Part of me thinks the bump from warning to error belongs in its own patch anyway. Yeah, it should definitely be a separate patch on top. -Peff [1] Threads are: http://thread.gmane.org/gmane.comp.version-control.git/137297/focus=137299 and http://thread.gmane.org/gmane.comp.version-control.git/137401/focus=137403 but the discussion focused on the first part of the series. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] repack: add `repack.honorpackkeep` config var
On Fri, Feb 28, 2014 at 10:45:39AM -0800, Junio C Hamano wrote: Exactly. The two features (bitmaps and .keep) are not compatible with each other, so you have to prioritize one. If you are using static .keep files, you might want them to continue being respected at the expense of using bitmaps for that repo. So I think you want a separate option from --write-bitmap-index to allow the appropriate flexibility. What is the appropriate flexibility, though? If the user wants to use bitmap, we would need to drop .keep, no? Or the flip side: if the user wants to use .keep, we should drop bitmaps. My point is that we do not know which way the user wants to go, so we should not tie the options together. Doesn't always having two copies in two packs degrade performance unnecessarily (without even talking about wasted diskspace)? An explicit .keep exists in the repository because it is expensive and undesirable to duplicate what is in there in the first place, so it feels to me that either The benefits of static .keep files are (I think): 1. less I/O during repacks, as you do not rewrite a static set of objects 2. less turnover of packfiles, which can make dumb access more efficient (both for dumb clients, but also for things like non-git-aware backups). I think the existence of smart-http more or less nullifies (2). For (1), it helps at first, but you get diminishing returns as your non-keep packfile grows. I think it only helps in pathological cases (e.g., you mark 10GB worth of giant blobs in a .keep pack, and then pack the other 10MB of trees, commits, and normal-sized blobs as usual). - Disable with warning, or outright refuse, the -b option if there is .keep (if we want to give precedence to .keep); or - Remove .keep with warning when -b option is given (if we want to give precedence to -b). and nothing else would be a reasonable option. Unfortunately, we can do neither automatically because there could be a transient .keep file in an active repository. Right, the transient ones complicate the issue. But I think even for static .keep versus bitmaps, there is question. See below... The default is another matter. I think most people using .bitmaps on a server would probably want to set repack.packKeptObjects. They would want to repack often to take advantage of the .bitmaps anyway, so they probably don't care about .keep files (any they see are due to races with incoming pushes). ... which makes me think that repack.packKeptObjects is merely a distraction---it should be enough to just pass --pack-kept-objects when -b is asked, without giving any extra configurability, no? But you do not necessarily ask for -b explicitly; it might come from the config, too. Imagine you have a server with many repos. You want to use bitmaps when you can, so you set pack.writeBitmaps in /etc/gitconfig. But in a few repos, you want to use .keep files, and it's more important for you to use it than bitmaps (e.g., because it is one of the pathological cases above). So you set repack.packKeptObjects to false in /etc/gitconfig, to prefer .keep to bitmaps where appropriate. If you did not have that config option, your alternative would be to turn off pack.writeBitmaps in the repositories with .keep files. But then you need to per-repo keep that flag in sync with whether or not the repo has .keep files. To be clear, at GitHub we do not plan on ever having repack.packKeptObjects off (for now we have it on explicitly, but if it were connected to pack.writeBitmaps, then we would be happy with that default). I am mostly trying to give an escape hatch to let people use different optimization strategies if they want. If we are going to have --pack-kept-objects (and I think we should), I think we also should have a matching config option. Because it is useful when matched with the bitmap code, and that can be turned on both from the command-line or from the config. Wherever you are doing that, you would want to be able to make the matching .keep decision. And I don't think it hurts much. With the fallback-to-on behavior, you do not have to even care that it is there unless you are doing something clever. So we could do something like falling back to turning the option on if --write-bitmap-index is on _and_ the user didn't specify --pack-kept-objects. If you mean didn't specify --no-pack-kept-objects, then I think that is sensible. I still do not know why we would want the configuration variable, though. Right, I meant if the pack-kept-objects variable is set, either on or off, either via the command line or in the config. As far as what the config is good for, see above. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] repack: add `repack.honorpackkeep` config var
On Fri, Feb 28, 2014 at 10:09:08AM -0700, Nasser Grainawi wrote: Exactly. The two features (bitmaps and .keep) are not compatible with each other, so you have to prioritize one. If you are using static .keep files, you might want them to continue being respected at the expense of using bitmaps for that repo. So I think you want a separate option from --write-bitmap-index to allow the appropriate flexibility. Has anyone thought about how to make them compatible? Yes, but it's complicated and not likely to happen soon. Having .keep files means that you are not including some objects in the newly created pack. Each bit in a commit's bitmap corresponds to one object in the pack, and whether it is reachable from that commit. The bitmap is only useful if we can calculate the full reachability from it, and it has no way to specify objects outside of the pack. To fix this, you would need to change the on-disk format of the bitmaps to somehow reference objects outside of the pack. Either by having the bitmaps index a repo-global set of objects, or by permitting a list of edge objects that are referenced from the pack, but not included (and then when assembling the full reachable list, you would have to recurse across edge objects to find their reachable list in another pack, etc). So it's possible, but it would complicate the scheme quite a bit, and would not be backwards compatible with either JGit or C Git. We're using Martin Fick's git-exproll script which makes heavy use of keeps to reduce pack file churn. In addition to the on-disk benefits we get there, the driving factor behind creating exproll was to prevent Gerrit from having two large (30GB+) mostly duplicated pack files open in memory at the same time. Repacking in JGit would help in a single-master environment, but we'd be back to having this problem once we go to a multi-master setup. Perhaps the solution here is actually something in JGit where it could aggressively try to close references to pack files In C git we don't worry about this too much, because our programs tend to be short-lived, and references to the old pack will go away quickly. Plus it is all mmap'd, so as we simply stop accessing the pages of the old pack, they should eventually be dropped if there is memory pressure. I seem to recall that JGit does not mmap its packfiles. Does it pread? In that case, I'd expect unused bits from the duplicated packfile to get dropped from the disk cache over time. If it loads whole packfiles into memory, then yes, it should probably close more aggressively. , but that still doesn't help the disk churn problem. As Peff says below, we would want to repack often to get up-to-date bitmaps, but ideally we could do that without writing hundreds of GBs to disk (which is obviously worse when disk is a NFS mount). Ultimately I think the solution to the churn problem is a packfile-like storage that allows true appending of deltas. You can come up with a scheme to allow deltas between on-disk packs (i.e., thin packs on disk). The trick there is handling the dependencies and cycles. I think you could get by with a strict ordering of packs and a few rules: 1. An object in a pack with weight A cannot have as a base an object in a pack with weight = A. 2. A pack with weight A cannot be deleted if there exists a pack with weight A. But you'd want to also add in a single update-able index over all the packfiles, and even then you'd still want to pack occasionally (because you'd end up with deltas on bases going back in time, but you really prefer your bases to be near the tip of history). So I am not volunteering to work on it. :) At GitHub we mostly deal with the churn by throwing more server resources at it. But we have the advantage of having a very large number of small-to-medium repos, which is relatively easy to scale up. A small number of huge repos is trickier. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Halt during fetch on MacOS
On Fri, Feb 28, 2014 at 03:26:28PM -0800, Conley Owens wrote: test.sh #!/bin/bash rungit() { mkdir $1 GIT_DIR=$1 git init --bare echo '[remote aosp]' $1/config echo 'url = https://android.googlesource.com/platform/external/tinyxml2' $1/config GIT_DIR=$1 git fetch aosp +refs/heads/master:refs/remotes/aosp/master I don't think this is affecting your test, but you probably want to append to the config for the first line, too. Otherwise you are overwriting some of git's default settings. When everything cools, you can see that there are some fetches hanging (typically). $ ps | grep 'git fetch' ... 63310 ttys0040:00.01 git fetch aosp +refs/heads/master:refs/remotes/aosp/master [...] I can't reproduce here on Linux. Can you find out what the processes are doing with something like strace? You can look at the parent process of each and see that one half spawned the other half, or you can look at the environment variables for each to see that there are two processes operating in the same directory for each directory where there's an issue. $ echo $(for pid in $(ps | grep 'git fetch' | grep -o '^[0-9]*'); do ps -p $pid -wwwE | grep 'GIT_DIR=[^ ]*' -o; done) | sort GIT_DIR=testdir14 GIT_DIR=testdir14 GIT_DIR=testdir32 GIT_DIR=testdir32 GIT_DIR=testdir47 GIT_DIR=testdir47 A fetch will start many sub-processes. Try: GIT_TRACE=1 git fetch \ https://android.googlesource.com/platform/external/tinyxml2 which shows git-fetch starting the git-remote-https helper, which in turn starts git-fetch-pack to do the actual protocol, which uses git-unpack-objects to manage the incoming pack. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: t9200 cvsexportcommit test fails on Ubuntu server 12.04.4 LTS
On Fri, Feb 28, 2014 at 07:45:07PM +0100, Fabio D'Alfonso wrote: I get 12 of 15 tests faling. Any idea? the same build works fine on 11.04 where I have a desktop version. No clue. It works fine for me here (Debian sid). Perhaps try running the script like: ./t9200-git-cvsexportcommit.sh -v -i which should give more information about what exactly is failing? -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 01/11] builtin/pack-objects.c: change check_pbase_path() to use ALLOC_GROW()
On Fri, Feb 28, 2014 at 11:03:19AM -0800, Junio C Hamano wrote: Michael Haggerty mhag...@alum.mit.edu writes: So my vote is that the patches are OK the way Dmitry wrote them (mind, I have only read through 05/11 so far). Seconded ;-) By the way, I do not like these long subjects. change is a redundant word when one sends a patch---as all patches are about changing something. Subject: builtin/pack-objects.c: use ALLOC_GROW() in check_pbase_path() would be a lot more appropriate for git shortlog consumption. I would actually go one step further and drop or shorten the filename in the subject. It is very long, it is already easy to see which file was changed from the diffstat, and it doesn't give any useful context for other parts of the subject. I really like the foo: convention for starting a subject line, because it immediately makes clear what area you are working in without having to waste space on English conjunctions or prepositions. But it does not have to be a filename. It can be a subsystem, a command, a function, an area of the project, or anything that gives context to the rest of the line. So I would suggest one of: Subject: use ALLOC_GROW() in check_pbase_path() Talking about the filename is redundant; there's only one check_pbase_path. Subject: check_pbase_path: use ALLOW_GROW Even shorter. Subject: builtin/pack-objects.c: use ALLOC_GROW This one implies to me that the point of the commit is to convert the whole file to use ALLOC_GROW where appropriate, not just that function (even if that function may be the only spot changed). I'd probably not use: Subject: pack-objects: use ALLOC_GROW as the scope is not about the command, but about the C file. I realize that I just bikeshedded on subject lines for half a page, and part of me wants to go kill myself in shame. But I feel like I see the technique misapplied often enough that maybe some guidance is merited. Feel free to ignore. :) -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC GSoC idea: new git config features
On Sat, Mar 01, 2014 at 01:19:32AM +0100, Michael Haggerty wrote: I absolutely understand that changing all of the config parsers is not feasible. But I had imagined a third route: (3) parse the config once, storing the raw values to records in memory. When an unset is seen, delete any previous records that have accumulated for that key. After the whole config has been read, iterate through the records, feeding the surviving values into the callback in the order they were originally read (minus deletions). Do you see any problems with this way of implementing the functionality (aside from slightly increased overhead)? Yeah, this is something I have considered many times. It does have some overhead, but the existing system is not great either. As you noted, we often read the config several times for a given program invocation. But moreover, we linearly strcmp each config key we find against each one we know about. In some cases we return early if a sub-function is looking for `starts_with(key, foo.)`, but in most cases we just look for foo.bar, foo.baz, and so on. If we had the keys in-memory, we could reverse this: config code asks for keys it cares about, and we can do an optimized lookup (binary search, hash, etc). That also makes many constructs easier to express. Recently we had a problem where the parsing order of remote.pushdefault and branch.*.pushremote mattered, because they were read into the same variable. The solution is to use two variables and reconcile them after all config is read. But if you can just query the config subsystem directly, the boilerplate of reading them into strings goes away, and you can just do: x = config_string_getf(branch.%s.pushremote, current_branch); if (!x) x = config_string_get(remote.pushdefault); if (!x) x = config_string_getf(branch.%s.remote, current_branch); if (!x) x = origin; As it is now, the code that does this has a lot more boilerplate, and is split across several functions. Another example is the way we have to manage deferred config in git-status (see 84b4202). This might be more clear if we could simply `config_get_bool(status.branch)` at the right moment. The talk of efficiency is probably a red-herring here. I do not think config-reading is a significant source of slow-down in the current code. But I'd be in favor of something that reduced boilerplate and made the code easier to read. But the side effects these callbacks may cause are not limited to setting a simple scaler variable (like 'frotz' in the illustration) but would include things that are hard to undo once done (e.g. calling a set-up function with a lot of side effects). Most callbacks would convert to a query system in a pretty straightforward way, but some that have side effects might be tricky. Converting them all may be too large for a GSoC project, but I think you could do it gradually: 1. Convert the parser to read into an in-memory representation, but leave git_config() as a wrapper which iterates over it. 2. Add query functions like config_string_get() above. 3. Convert callbacks to query functions one by one. 4. Eventually drop git_config(). A GSoC project could take us partway through (3). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] rebase: accept -number as another way of saying HEAD~number
On Mon, Mar 03, 2014 at 10:37:11AM +0100, Matthieu Moy wrote: Michael Haggerty mhag...@alum.mit.edu writes: Or perhaps -NUM should fail with an error message if any of the last NUM commits are merges. In that restricted scenario (which probably accounts for 99% of rebases), -NUM is equivalent to HEAD~NUM. Makes sense to me. So, -NUM would actually mean rebase the last NUM commits (as well as being an alias for HEAD~NUM), but would fail when it does not make sense (with an error message explaining the situation and pointing the user to HEAD~N if this is what he wanted). Yeah, I agree with this. I think -NUM is only useful for linear string-of-pearls history. With merges, it either means: - linearize history (a la git-log), go back NUM commits, and treat the result as the upstream. This is useless, because it's difficult to predict where you're going to end up. Using explicit ~ and ^ relative-commit operators to specify the upstream is more flexible if you really want to do this (but I don't know why you would). - do not use an UNINTERESTING upstream as the cutoff, but instead try to rebase exactly NUM commits. In the face of non-linear history, I'm not even sure what this would mean, or how we would implement it. Neither of those is useful, so I don't think erroring out on them is a bad thing. And by erroring out (rather than documenting some weird and useless behavior), we don't have to worry about backwards compatibility if we later _do_ come up with a useful behavior (the best I can think of is that --first-parent -3 might have a use, but I think that should already be covered, as the results of --first-parent are by-definition linear). This would actually be a feature for me: I often want to rebase recent enough history, and when my @{upstream} isn't well positionned, I randomly type HEAD~N without remembering what N should be. When N is too small, the rebase doesn't reach the interesting commit, and when N is too big, it reaches a merge commit and I get a bunch of commits I'm not allowed to edit in my todo-list. Then I have to abort the commit manually. With -N failing on merge commits, the rebase would abort itself automatically. I do the same thing. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] repack: add `repack.honorpackkeep` config var
On Mon, Mar 03, 2014 at 10:13:47AM -0800, Junio C Hamano wrote: Or the flip side: if the user wants to use .keep, we should drop bitmaps. My point is that we do not know which way the user wants to go, so we should not tie the options together. Hmph. I think the short of your later explanation is global config may tell us to use bitmap, in which case we would need a way to defeat that and have existing .keep honored, and it makes it easier to do so if these two are kept separate, because you do not want to run around and selectively disable bitmaps in these repositories. We can instead do so with repack.packKeptObjects in the global configuration. and I tend to agree with the reasoning. Yes. Do you need a re-roll from me? I think the last version I sent + the squash to tie the default to bitmap-writing makes the most sense. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] repack: add `repack.honorpackkeep` config var
On Mon, Mar 03, 2014 at 11:51:06AM -0800, Junio C Hamano wrote: Yes. Do you need a re-roll from me? I think the last version I sent + the squash to tie the default to bitmap-writing makes the most sense. I have 9e20b390 (repack: add `repack.packKeptObjects` config var, 2014-02-26); I do not recall I've squashed anything into it, though. Do you mean this one? Here's the interdiff for doing the fallback: [...] Yes. Though I just noticed that the commit message needs updating if that is squashed in. Here is the whole patch, with that update. And I am dropping Vicent as the author, since I think there are now literally zero lines of his left. ;) -- 8 -- Subject: [PATCH] repack: add `repack.packKeptObjects` config var The git-repack command always passes `--honor-pack-keep` to pack-objects. This has traditionally been a good thing, as we do not want to duplicate those objects in a new pack, and we are not going to delete the old pack. However, when bitmaps are in use, it is important for a full repack to include all reachable objects, even if they may be duplicated in a .keep pack. Otherwise, we cannot generate the bitmaps, as the on-disk format requires the set of objects in the pack to be fully closed. Even if the repository does not generally have .keep files, a simultaneous push could cause a race condition in which a .keep file exists at the moment of a repack. The repack may try to include those objects in one of two situations: 1. The pushed .keep pack contains objects that were already in the repository (e.g., blobs due to a revert of an old commit). 2. Receive-pack updates the refs, making the objects reachable, but before it removes the .keep file, the repack runs. In either case, we may prefer to duplicate some objects in the new, full pack, and let the next repack (after the .keep file is cleaned up) take care of removing them. This patch introduces both a command-line and config option to disable the `--honor-pack-keep` option. By default, it is triggered when pack.writeBitmaps (or `--write-bitmap-index` is turned on), but specifying it explicitly can override the behavior (e.g., in cases where you prefer .keep files to bitmaps, but only when they are present). Note that this option just disables the pack-objects behavior. We still leave packs with a .keep in place, as we do not necessarily know that we have duplicated all of their objects. Signed-off-by: Jeff King p...@peff.net --- Documentation/config.txt | 7 +++ Documentation/git-repack.txt | 8 builtin/repack.c | 13 - t/t7700-repack.sh| 18 +- 4 files changed, 44 insertions(+), 2 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index becbade..a8ddc7f 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -2136,6 +2136,13 @@ repack.usedeltabaseoffset:: false and repack. Access from old Git versions over the native protocol are unaffected by this option. +repack.packKeptObjects:: + If set to true, makes `git repack` act as if + `--pack-kept-objects` was passed. See linkgit:git-repack[1] for + details. Defaults to `false` normally, but `true` if a bitmap + index is being written (either via `--write-bitmap-index` or + `pack.writeBitmaps`). + rerere.autoupdate:: When set to true, `git-rerere` updates the index with the resulting contents after it cleanly resolves conflicts using diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt index 002cfd5..4786a78 100644 --- a/Documentation/git-repack.txt +++ b/Documentation/git-repack.txt @@ -117,6 +117,14 @@ other objects in that pack they already have locally. must be able to refer to all reachable objects. This option overrides the setting of `pack.writebitmaps`. +--pack-kept-objects:: + Include objects in `.keep` files when repacking. Note that we + still do not delete `.keep` packs after `pack-objects` finishes. + This means that we may duplicate objects, but this makes the + option safe to use when there are concurrent pushes or fetches. + This option is generally only useful if you are writing bitmaps + with `-b` or `pack.writebitmaps`, as it ensures that the + bitmapped packfile has the necessary objects. Configuration - diff --git a/builtin/repack.c b/builtin/repack.c index 49f5857..6b0b62d 100644 --- a/builtin/repack.c +++ b/builtin/repack.c @@ -9,6 +9,7 @@ #include argv-array.h static int delta_base_offset = 1; +static int pack_kept_objects = -1; static char *packdir, *packtmp; static const char *const git_repack_usage[] = { @@ -22,6 +23,10 @@ static int repack_config(const char *var, const char *value, void *cb) delta_base_offset = git_config_bool(var, value); return 0; } + if (!strcmp(var
Re: [GSoC14][RFC] Proposal Draft: Refactor tempfile handling
On Sun, Mar 02, 2014 at 06:04:39AM +0900, Brian Gesiak wrote: My name is Brian Gesiak. I'm a research student at the University of Tokyo, and I'm hoping to participate in this year's Google Summer of Code by contributing to Git. I'm a longtime user, first-time contributor--some of you may have noticed my microproject patches.[1][2] Yes, we did notice them. Thanks, and welcome. :) The ideas page points out that while lock files are closed and unlinked[3] when the program exits[4], object pack files implement their own brand of temp file creation and deletion. This implementation doesn't share the same guarantees as lock files--it is possible that the program terminates before the temp file is unlinked.[5] Lock file references are stored in a linked list. When the program exits, this list is traversed and each file is closed and unlinked. It seems to me that this mechanism is appropriate for temp files in general, not just lock files. Thus, my proposal would be to extract this logic into a separate module--tempfile.h, perhaps. Lock and object files would share the tempfile implementation. That is, both object and lock temp files would be stored in a linked list, and all of these would be deleted at program exit. Yes, I think this is definitely the right way to go. We should be able to unify the tempfile handling for all of git. Once the logic is extracted into a nice API, there are several other places that can use it, too: - the external diff code creates tempfiles and uses its own cleanup routines - the shallow_XX tempfiles (these are not cleaned right now, though I sent a patch recently for them to do their own cleanup) Those are just off the top of my head. There may be other spots, too. It is worth thinking in your proposal about some of the things that the API will want to handle. What are the mismatches in how lockfiles and object files are handled? E.g., how do we finalize them into place? How should the API be designed to minimize race conditions (e.g., if we get a signal delivered while we are committing or cleaning up a file)? Please let me know if this seems like it would make for an interesting proposal, or if perhaps there is something I am overlooking. Any feedback at all would be appreciated. Thank you! You definitely have a grasp of what the project is aiming for, and which areas need to be touched. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] disable grafts during fetch/push/bundle
When we are creating a pack to send to a remote, we should make sure that we are not respecting grafts or replace refs. Otherwise, we may end up sending a broken pack to the other side that does not contain all objects (either omitting them entirely, or using objects that the other side does not have as delta bases). We already make an attempt to do the right thing in several places by turning off read_replace_refs. However, we missed at least one case (during bundle creation), and we do nothing anywhere to handle grafts. This patch introduces a new function to disable both grafts and replace refs both for the current process and for its children. We add a call anywhere that packs objects for sending: bundle create, upload-pack, and push. This may seem like a rather heavy hammer, as we could just teach pack-objects not to respect grafts. But this catches more possible failures: 1. We may actually feed pack-objects with the results of rev-list (e.g., bundle creation does this). 2. We may be negotiating the HAVEs and WANTs with the other side, and our view should be consistent with what we feed pack-objects. 3. It may be useful to let pack-objects use grafts in some instances, as evidenced by --keep-true-parents. Signed-off-by: Jeff King p...@peff.net --- This is motivated by a real-world case of somebody trying to push to GitHub with a graft on their local end. I suspect many other spots that use read_replace_refs = 0 probably want to disable grafts, too (e.g., fsck and index-pack) but I do not know of any particular breakage offhand. I am tempted to say that not using --keep-true-parents is insane, and it should be the default, but perhaps there is some case I'm missing. Note that disabling grafts here just turns off .git/info/grafts, which explicitly leaves the shallow file enabled (even though it ends up in the same graft list). I'm assuming that the shallow file is handled properly where it's appropriate, and it would not want to be included in any of this disabling (certainly fetch/push should be handling it explicitly already). builtin/push.c | 1 + bundle.c | 2 + cache.h | 1 + commit.c | 5 +++ commit.h | 1 + environment.c| 22 ++ t/t6051-replace-grafts-remote.sh | 94 upload-pack.c| 2 +- 8 files changed, 127 insertions(+), 1 deletion(-) create mode 100755 t/t6051-replace-grafts-remote.sh diff --git a/builtin/push.c b/builtin/push.c index 0e50ddb..448dcb5 100644 --- a/builtin/push.c +++ b/builtin/push.c @@ -527,6 +527,7 @@ int cmd_push(int argc, const char **argv, const char *prefix) OPT_END() }; + disable_grafts_and_replace_refs(); packet_trace_identity(push); git_config(git_default_config, NULL); argc = parse_options(argc, argv, prefix, options, push_usage, 0); diff --git a/bundle.c b/bundle.c index e99065c..37b00a6 100644 --- a/bundle.c +++ b/bundle.c @@ -248,6 +248,8 @@ int create_bundle(struct bundle_header *header, const char *path, struct child_process rls; FILE *rls_fout; + disable_grafts_and_replace_refs(); + bundle_to_stdout = !strcmp(path, -); if (bundle_to_stdout) bundle_fd = 1; diff --git a/cache.h b/cache.h index db223e8..f538cc2 100644 --- a/cache.h +++ b/cache.h @@ -410,6 +410,7 @@ extern const char *get_git_work_tree(void); extern const char *read_gitfile(const char *path); extern const char *resolve_gitdir(const char *suspect); extern void set_git_work_tree(const char *tree); +extern void disable_grafts_and_replace_refs(void); #define ALTERNATE_DB_ENVIRONMENT GIT_ALTERNATE_OBJECT_DIRECTORIES diff --git a/commit.c b/commit.c index 6bf4fe0..886dbfe 100644 --- a/commit.c +++ b/commit.c @@ -114,6 +114,11 @@ static unsigned long parse_commit_date(const char *buf, const char *tail) static struct commit_graft **commit_graft; static int commit_graft_alloc, commit_graft_nr; +int commit_grafts_loaded(void) +{ + return !!commit_graft_nr; +} + static int commit_graft_pos(const unsigned char *sha1) { int lo, hi; diff --git a/commit.h b/commit.h index 16d9c43..dde0618 100644 --- a/commit.h +++ b/commit.h @@ -186,6 +186,7 @@ typedef int (*each_commit_graft_fn)(const struct commit_graft *, void *); struct commit_graft *read_graft_line(char *buf, int len); int register_commit_graft(struct commit_graft *, int); struct commit_graft *lookup_commit_graft(const unsigned char *sha1); +int commit_grafts_loaded(void); extern struct commit_list *get_merge_bases(struct commit *rev1, struct commit *rev2, int cleanup); extern struct commit_list *get_merge_bases_many(struct commit *one, int n, struct commit **twos, int cleanup); diff --git a/environment.c b/environment.c index 4a3437d..b960293 100644
[RFC/PATCH] diff: simplify cpp funcname regex
The current funcname matcher for C files requires one or more words before the function name, like: static int foo(int arg) { However, some coding styles look like this: static int foo(int arg) { and we do not match, even though the default regex would. This patch simplifies the regex significantly. Rather than trying to handle all the variants of keywords and return types, we simply look for an identifier at the start of the line that contains a (, meaning it is either a function definition or a function call, and then not containing ; which would indicate it is a call or declaration. This can be fooled by something like: if (foo) but it is unlikely to find that at the very start of a line with no identation (and without having a complete set of keywords, we cannot distinguish that from a function called if taking foo anyway). We had no tests at all for .c files, so this attempts to add a few obvious cases (including the one we are fixing here). Signed-off-by: Jeff King p...@peff.net --- I tried accommodating this one case in the current regex, but it just kept getting more complicated and unreadable. Maybe I am being naive to think that this much simpler version can work. We don't have a lot of tests or a known-good data set to check. I did try git log -1000 -p before and after on git.git, diff'd the results and manually inspected. The results were a mix of strict improvement (mostly instances of the style I was trying to fix here) and cases where there is no good answer. For example, for top-level changes outside functions, we might find: N_(some text that is long that is part of: const char *foo = N_(some text that is long and spans multiple lines); Before this change, we would skip past it (using the cpp regex, that is; the default one tends to find the same line) and either report nothing, or whatever random function was before us. So it's a behavior change, but the existing behavior is really no better. t/t4018-diff-funcname.sh | 36 userdiff.c | 2 +- 2 files changed, 37 insertions(+), 1 deletion(-) diff --git a/t/t4018-diff-funcname.sh b/t/t4018-diff-funcname.sh index 38a092a..1e80b15 100755 --- a/t/t4018-diff-funcname.sh +++ b/t/t4018-diff-funcname.sh @@ -93,6 +93,29 @@ sed -e ' s/song;/song();/ ' Beer.perl Beer-correct.perl +cat Beer.c \EOF +static int foo(void) +{ +label: + int x = old; +} + +struct foo; /* catch failure below */ +static int +gnu(int arg) +{ + int x = old; +} + +struct foo; /* catch failure below */ +int multiline(int arg, + char *arg2) +{ + int x = old; +} +EOF +sed s/old/new/ Beer.c Beer-correct.c + test_expect_funcname () { lang=${2-java} test_expect_code 1 git diff --no-index -U1 \ @@ -127,6 +150,7 @@ test_expect_success 'set up .gitattributes declaring drivers to test' ' cat .gitattributes -\EOF *.java diff=java *.perl diff=perl + *.c diff=cpp EOF ' @@ -158,6 +182,18 @@ test_expect_success 'perl pattern is not distracted by forward declaration' ' test_expect_funcname package Beer;\$ perl ' +test_expect_success 'c pattern skips labels' ' + test_expect_funcname static int foo(void) c +' + +test_expect_success 'c pattern matches GNU-style functions' ' + test_expect_funcname gnu(int arg)\$ c +' + +test_expect_success 'c pattern matches multiline functions' ' + test_expect_funcname int multiline(int arg,\$ c +' + test_expect_success 'custom pattern' ' test_config diff.java.funcname !static !String diff --git a/userdiff.c b/userdiff.c index 10b61ec..b9d52b7 100644 --- a/userdiff.c +++ b/userdiff.c @@ -127,7 +127,7 @@ /* Jump targets or access declarations */ !^[ \t]*[A-Za-z_][A-Za-z_0-9]*:.*$\n /* C/++ functions/methods at top level */ -^([A-Za-z_][A-Za-z_0-9]*([ \t*]+[A-Za-z_][A-Za-z_0-9]*([ \t]*::[ \t]*[^[:space:]]+)?){1,}[ \t]*\\([^;]*)$\n +^([A-Za-z_].*\\([^;]*)$\n /* compound type at top level */ ^((struct|class|enum)[^;]*)$, /* -- */ -- 1.8.5.2.500.g8060133 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] disable grafts during fetch/push/bundle
On Tue, Mar 04, 2014 at 06:36:07PM -0500, Eric Sunshine wrote: On Tue, Mar 4, 2014 at 12:48 PM, Jeff King p...@peff.net wrote: diff --git a/commit.c b/commit.c index 6bf4fe0..886dbfe 100644 --- a/commit.c +++ b/commit.c @@ -114,6 +114,11 @@ static unsigned long parse_commit_date(const char *buf, const char *tail) static struct commit_graft **commit_graft; static int commit_graft_alloc, commit_graft_nr; +int commit_grafts_loaded(void) +{ + return !!commit_graft_nr; +} Did you mean !!commit_graft ? Shouldn't they produce the same results? -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] disable grafts during fetch/push/bundle
On Tue, Mar 04, 2014 at 12:52:18PM -0800, Junio C Hamano wrote: We already make an attempt to do the right thing in several places by turning off read_replace_refs. However, we missed at least one case (during bundle creation), and we do nothing anywhere to handle grafts. Doing nothing for grafts has been pretty much a deliberate omission. Because we have no way to transfer how histories are grafted together, people cloning from a repository that grafts away a commit that records a mistakenly committed sekrit will end up with a disjoint history, instead of exposing the sekrit to them, and are expected to join the history by recreating grafts (perhaps a README of such a project instructs them to do so). That was deemed far better than exposing the hidden history, I think. I see your point, but I would be tempted to say that the person trying to hide a secret with grafting is simply wrong to do so. You need to cement that history with a rewrite if you want to share with people. I do not recall any past discussion on this topic, and searching the archive only shows people echoing what I said above. Is this something we've promised to work in the past? I'm certainly sympathetic to systems failing to a secure default rather than doing something that the user does not expect. But at the same time, if using grafts for security isn't something people reasonably expect, then failing only hurts the non-security cases. And replace tries to do the right thing was an attempt to rectify that misfeature of grafts in that we now do have a way to transfer how the history is grafted together, so that project README does not have to instruct the fetcher of doing anything special. Perhaps the right response is grafts are broken, use git-replace instead. But then should we think about deprecating grafts? Again, this patch was spurred by a real user with a graft trying to push and getting a confusing error message. It _might_ be a misfeature, however, for the object connectivity layer to expose a part of the history replaced away to the party that fetches from such a repository. Ideally, the right thing ought to be to include history that would be omitted if we did not have the replacement (i.e. adding parents the underlying commit does not record), while not following the history that replacement wants to hide (i.e. excluding the commits replacement commits overlay). I don't really think it's worth the complexity. It's fairly common knowledge (or at least I think so) that replace refs are a _view_ onto the history. When you share the history graph, you share the true objects. You can _also_ share your views in replace/refs, but it is up to the client to fetch them. If you want to hide things, then you need to rewrite the true objects, end of story. I dunno. Maybe there are people who have different expectations. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] disable grafts during fetch/push/bundle
On Tue, Mar 04, 2014 at 08:00:44PM -0500, Eric Sunshine wrote: +int commit_grafts_loaded(void) +{ + return !!commit_graft_nr; +} Did you mean !!commit_graft ? Shouldn't they produce the same results? Yes they should, but the use of !! seemed to imply that you wanted to apply it to the pointer value. (If you indeed intended to use commit_graft_nr, then 'return commit_graft_nr', without !!, would have been sufficient and idiomatic C.) I just wanted to normalize the return value to a boolean 0/1. Even when the input is an int, it eliminates surprises when you try to assign the result to a bitfield or other smaller-width type. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: New directory lost by git am
On Wed, Mar 05, 2014 at 09:26:43AM -0500, Phillip Susi wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 3/5/2014 3:10 AM, Chris Packham wrote: My example is creating a commit on the temp branch then applying it to the master branch using git am. Do a reset HEAD~1 --hard, and git clean -x -f -d before git am. I didn't notice the missing file myself for some time because it is left in the working tree, just not added to the index and included in the commit. Right... so the file is left in the directory, even though it is not checked in. A git status should show it is an unknown file, and a clean should remove it. I don't think those steps are necessary for Chris's example. When he switches back to the master branch, git removes the subdirectory (the file is tracked in temp but not master, so we remove it when switching branches, and then the directory is empty, so we clean it up, too). You can verify with an extra ls after the checkout but before the am. * git apply parsed patches that add new files, generated by programs other than Git, incorrectly. This is an old breakage in v1.7.11. Does that sound like your problem? If you can I'd suggest updating, ideally to the recent 1.9.0 release but if you're feeling conservative try 1.8.3.4. Vaguely, except for the other than git part. This patch was generated by git-format-patch ( I didn't think apply handled patches that weren't ). I can't get Chris's script to fail on any version of git. Can you show us an example of a patch that does not behave (or better yet, a reproduction recipe to generate the patch with format-patch)? -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: New directory lost by git am
On Wed, Mar 05, 2014 at 11:47:12AM -0500, Phillip Susi wrote: I can't get Chris's script to fail on any version of git. Can you show us an example of a patch that does not behave (or better yet, a reproduction recipe to generate the patch with format-patch)? AHA! It requires a conflict. There were simple conflicts in the NEWS file so I applied the patch with git am --reject and fixed up the NEWS, and ran git am --resolved. The git am --reject fails to add the new directory to the index. Thanks, I can reproduce here. I do not think it has anything to do with being in a subdirectory; any new file does not get added to the index. In fact, I do not think we update the index at all with --reject. For example, try this: git init repo cd repo echo base conflict echo base modified git add . git commit -m base echo master conflict git add . git commit -m master git checkout -b other HEAD^ echo other conflict echo other modified echo other new git add . git commit -m other git checkout master git format-patch other -1 --stdout patch git am --reject patch Running git status -s shows: M modified ?? conflict.rej ?? new ?? patch We apply the changes to modified and new to the working tree, but we do not stage anything in the index. I suspect this is because our invocation of apply --index (which is what is doing the real work with --reject here) bails before touching the index. In theory it should be able to update the index for files that applied cleanly and leave the other ones alone. But I have not thought hard about it, so maybe there is a good reason not to (it is a little weird just because the resulting index is a partial application of the patch). The am -3 path does what you want here, but it is much simpler: it knows it can represent the 3-way conflict in the index. So the index represents the complete state of the patch application at the end, including conflicts. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/3] push: detect local refspec errors early
On Wed, Mar 05, 2014 at 01:36:12PM +0400, Dmitry wrote: Here's my usecase. I have created a BranchWithVeryLongName and I want to have it pushed to origin. So I use this command with version 1.8.1.2: git push origin BranchMistypedLongName (note that I mistyped the branch name). The following happens: 1. git asks me for username and password 2. it authenticates on the origin server 3. it bails out with error: sfc refspec BranchMistypedLongName does not match any Can't git perhaps check that the branch exists before it asks me for credentials and just say there's no such branch? We can't fully process the refspecs until we have talked to the other side, because they may involve matching refs from the remote; I don't think git even really looks at them until after we've connected. But I think there are some obvious cases, like a bogus left-hand side (i.e., what you have here) that cannot ever succeed, no matter what the other side has. We could sanity check the refspecs before doing anything else. Here's a patch series that does that. [1/3]: match_explicit: hoist refspec lhs checks into their own function [2/3]: match_explicit_lhs: allow a verify only mode [3/3]: push: detect local refspec errors early -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] match_explicit: hoist refspec lhs checks into their own function
In preparation for being able to check the left-hand side of our push refspecs separately, this pulls the examination of them out into its own function. There should be no behavior change. Signed-off-by: Jeff King p...@peff.net --- remote.c | 49 ++--- 1 file changed, 30 insertions(+), 19 deletions(-) diff --git a/remote.c b/remote.c index e41251e..6aa9dd2 100644 --- a/remote.c +++ b/remote.c @@ -1096,12 +1096,36 @@ static char *guess_ref(const char *name, struct ref *peer) return strbuf_detach(buf, NULL); } +static int match_explicit_lhs(struct ref *src, + struct refspec *rs, + struct ref **match, + int *allocated_match) +{ + switch (count_refspec_match(rs-src, src, match)) { + case 1: + *allocated_match = 0; + return 0; + case 0: + /* The source could be in the get_sha1() format +* not a reference name. :refs/other is a +* way to delete 'other' ref at the remote end. +*/ + *match = try_explicit_object_name(rs-src); + if (!*match) + return error(src refspec %s does not match any., rs-src); + *allocated_match = 1; + return 0; + default: + return error(src refspec %s matches more than one., rs-src); + } +} + static int match_explicit(struct ref *src, struct ref *dst, struct ref ***dst_tail, struct refspec *rs) { struct ref *matched_src, *matched_dst; - int copy_src; + int allocated_src; const char *dst_value = rs-dst; char *dst_guess; @@ -1110,23 +1134,8 @@ static int match_explicit(struct ref *src, struct ref *dst, return 0; matched_src = matched_dst = NULL; - switch (count_refspec_match(rs-src, src, matched_src)) { - case 1: - copy_src = 1; - break; - case 0: - /* The source could be in the get_sha1() format -* not a reference name. :refs/other is a -* way to delete 'other' ref at the remote end. -*/ - matched_src = try_explicit_object_name(rs-src); - if (!matched_src) - return error(src refspec %s does not match any., rs-src); - copy_src = 0; - break; - default: - return error(src refspec %s matches more than one., rs-src); - } + if (match_explicit_lhs(src, rs, matched_src, allocated_src) 0) + return -1; if (!dst_value) { unsigned char sha1[20]; @@ -1171,7 +1180,9 @@ static int match_explicit(struct ref *src, struct ref *dst, return error(dst ref %s receives from more than one src., matched_dst-name); else { - matched_dst-peer_ref = copy_src ? copy_ref(matched_src) : matched_src; + matched_dst-peer_ref = allocated_src ? + matched_src : + copy_ref(matched_src); matched_dst-force = rs-force; } return 0; -- 1.8.5.2.500.g8060133 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] match_explicit_lhs: allow a verify only mode
The match_explicit_lhs function has all of the logic necessary to verify the refspecs without actually doing any work. This patch lets callers pass a NULL match pointer to indicate they want a verify only operation. For the most part, we just need to avoid writing to the NULL pointer. However, we also have to refactor the try_explicit_object_name sub-function; it indicates success by allocating and returning a new ref. Instead, we give it an out parameter for the match and return a numeric status. Signed-off-by: Jeff King p...@peff.net --- remote.c | 38 -- 1 file changed, 24 insertions(+), 14 deletions(-) diff --git a/remote.c b/remote.c index 6aa9dd2..b6586c0 100644 --- a/remote.c +++ b/remote.c @@ -1031,11 +1031,13 @@ int count_refspec_match(const char *pattern, } } if (!matched) { - *matched_ref = matched_weak; + if (matched_ref) + *matched_ref = matched_weak; return weak_match; } else { - *matched_ref = matched; + if (matched_ref) + *matched_ref = matched; return match; } } @@ -1055,18 +1057,25 @@ static struct ref *alloc_delete_ref(void) return ref; } -static struct ref *try_explicit_object_name(const char *name) +static int try_explicit_object_name(const char *name, + struct ref **match) { unsigned char sha1[20]; - struct ref *ref; - if (!*name) - return alloc_delete_ref(); + if (!*name) { + if (match) + *match = alloc_delete_ref(); + return 0; + } + if (get_sha1(name, sha1)) - return NULL; - ref = alloc_ref(name); - hashcpy(ref-new_sha1, sha1); - return ref; + return -1; + + if (match) { + *match = alloc_ref(name); + hashcpy((*match)-new_sha1, sha1); + } + return 0; } static struct ref *make_linked_ref(const char *name, struct ref ***tail) @@ -1103,17 +1112,18 @@ static int match_explicit_lhs(struct ref *src, { switch (count_refspec_match(rs-src, src, match)) { case 1: - *allocated_match = 0; + if (allocated_match) + *allocated_match = 0; return 0; case 0: /* The source could be in the get_sha1() format * not a reference name. :refs/other is a * way to delete 'other' ref at the remote end. */ - *match = try_explicit_object_name(rs-src); - if (!*match) + if (try_explicit_object_name(rs-src, match) 0) return error(src refspec %s does not match any., rs-src); - *allocated_match = 1; + if (allocated_match) + *allocated_match = 1; return 0; default: return error(src refspec %s matches more than one., rs-src); -- 1.8.5.2.500.g8060133 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] push: detect local refspec errors early
When pushing, we do not even look at our push refspecs until after we have made contact with the remote receive-pack and gotten its list of refs. This means that we may go to some work, including asking the user to log in, before realizing we have simple errors like git push origin matser. We cannot catch all refspec problems, since fully evaluating the refspecs requires knowing what the remote side has. But we can do a quick sanity check of the local side and catch a few simple error cases. Signed-off-by: Jeff King p...@peff.net --- remote.c | 25 + remote.h | 1 + t/t5529-push-errors.sh | 48 transport.c| 8 ++-- 4 files changed, 80 insertions(+), 2 deletions(-) create mode 100755 t/t5529-push-errors.sh diff --git a/remote.c b/remote.c index b6586c0..8471fb1 100644 --- a/remote.c +++ b/remote.c @@ -1374,6 +1374,31 @@ static void prepare_ref_index(struct string_list *ref_index, struct ref *ref) } /* + * Given only the set of local refs, sanity-check the set of push + * refspecs. We can't catch all errors that match_push_refs would, + * but we can catch some errors early before even talking to the + * remote side. + */ +int check_push_refs(struct ref *src, int nr_refspec, const char **refspec_names) +{ + struct refspec *refspec = parse_push_refspec(nr_refspec, refspec_names); + int ret = 0; + int i; + + for (i = 0; i nr_refspec; i++) { + struct refspec *rs = refspec + i; + + if (rs-pattern || rs-matching) + continue; + + ret |= match_explicit_lhs(src, rs, NULL, NULL); + } + + free_refspec(nr_refspec, refspec); + return ret; +} + +/* * Given the set of refs the local repository has, the set of refs the * remote repository has, and the refspec used for push, determine * what remote refs we will update and with what value by setting diff --git a/remote.h b/remote.h index fb7647f..917d383 100644 --- a/remote.h +++ b/remote.h @@ -166,6 +166,7 @@ extern int query_refspecs(struct refspec *specs, int nr, struct refspec *query); char *apply_refspecs(struct refspec *refspecs, int nr_refspec, const char *name); +int check_push_refs(struct ref *src, int nr_refspec, const char **refspec); int match_push_refs(struct ref *src, struct ref **dst, int nr_refspec, const char **refspec, int all); void set_ref_status_for_push(struct ref *remote_refs, int send_mirror, diff --git a/t/t5529-push-errors.sh b/t/t5529-push-errors.sh new file mode 100755 index 000..9871307 --- /dev/null +++ b/t/t5529-push-errors.sh @@ -0,0 +1,48 @@ +#!/bin/sh + +test_description='detect some push errors early (before contacting remote)' +. ./test-lib.sh + +test_expect_success 'setup commits' ' + test_commit one +' + +test_expect_success 'setup remote' ' + git init --bare remote.git + git remote add origin remote.git +' + +test_expect_success 'setup fake receive-pack' ' + FAKE_RP_ROOT=$(pwd) + export FAKE_RP_ROOT + write_script fake-rp -\EOF + echo yes $FAKE_RP_ROOT/rp-ran + exit 1 + EOF + git config remote.origin.receivepack \\$FAKE_RP_ROOT/fake-rp\ +' + +test_expect_success 'detect missing branches early' ' + echo no rp-ran + echo no expect + test_must_fail git push origin missing + test_cmp expect rp-ran +' + +test_expect_success 'detect missing sha1 expressions early' ' + echo no rp-ran + echo no expect + test_must_fail git push origin master~2:master + test_cmp expect rp-ran +' + +test_expect_success 'detect ambiguous refs early' ' + git branch foo + git tag foo + echo no rp-ran + echo no expect + test_must_fail git push origin foo + test_cmp expect rp-ran +' + +test_done diff --git a/transport.c b/transport.c index ca7bb44..325f03e 100644 --- a/transport.c +++ b/transport.c @@ -1132,8 +1132,7 @@ int transport_push(struct transport *transport, return transport-push(transport, refspec_nr, refspec, flags); } else if (transport-push_refs) { - struct ref *remote_refs = - transport-get_refs_list(transport, 1); + struct ref *remote_refs; struct ref *local_refs = get_local_heads(); int match_flags = MATCH_REFS_NONE; int verbose = (transport-verbose 0); @@ -1142,6 +1141,11 @@ int transport_push(struct transport *transport, int pretend = flags TRANSPORT_PUSH_DRY_RUN; int push_ret, ret, err; + if (check_push_refs(local_refs, refspec_nr, refspec) 0) + return -1; + + remote_refs = transport-get_refs_list(transport, 1); + if (flags TRANSPORT_PUSH_ALL) match_flags
Re: Negation in refspecs
On Wed, Mar 05, 2014 at 10:06:26AM -0800, Mickey Killianey wrote: Is there any syntax to support partial negations of refspecs, such as: +refs/heads/*:refs/remotes/origin/* !refs/heads/dont-pull: !:refs/remotes/origin/dont-push If not now, is negation something that might be possible/reasonable in a future version of Git, or is it difficult/unlikely to change? No, it doesn't exist now. It's something that people have asked for occasionally, but I don't think anybody is actively working on it. I posted a rough patch here: http://thread.gmane.org/gmane.comp.version-control.git/240997/focus=241057 about a month ago, but I do not have any immediate plans to take it further (that email details some of the issues). If you're interested in working on it, I think people are receptive to the idea; it just needs a clean implementation. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] disable grafts during fetch/push/bundle
On Wed, Mar 05, 2014 at 11:18:17AM -0800, Junio C Hamano wrote: Jeff King p...@peff.net writes: On Wed, Mar 05, 2014 at 10:49:24AM -0800, Junio C Hamano wrote: ... the plan, at least in my mind, has always been exactly that: grafts were a nice little attempt but is broken---if you really wanted to muck with the history without rewriting (which is still discouraged, by the way), do not use graft, but use replace. I certainly had in the back of my mind that grafts were a lesser form of replace, and that eventually we could get rid of the former. Perhaps my question should have been: why haven't we deprecated grafts yet?. Given that we discourage grafts strongly and replace less so (but still discourage it), telling the users that biting the bullet and rewriting the history is _the_ permanent solution, I think it is understandable why nobody has bothered to. Perhaps the patch below would help discourage grafts more? The notable place in the documentation where grafts are still used is git-filter-branch.txt. But since the example there is about cementing rewritten history, it might be OK to leave. I used outdated below. We could also up the ante to deprecated. -- 8 -- Subject: [PATCH] docs: mark info/grafts as outdated We should be encouraging people to use git-replace instead. Signed-off-by: Jeff King p...@peff.net --- Documentation/gitrepository-layout.txt | 4 Documentation/glossary-content.txt | 4 2 files changed, 8 insertions(+) diff --git a/Documentation/gitrepository-layout.txt b/Documentation/gitrepository-layout.txt index aa03882..17d2ea6 100644 --- a/Documentation/gitrepository-layout.txt +++ b/Documentation/gitrepository-layout.txt @@ -176,6 +176,10 @@ info/grafts:: per line describes a commit and its fake parents by listing their 40-byte hexadecimal object names separated by a space and terminated by a newline. ++ +Note that the grafts mechanism is outdated and can lead to problems +transferring objects between repositories; see linkgit:git-replace[1] +for a more flexible and robust system to do the same thing. info/exclude:: This file, by convention among Porcelains, stores the diff --git a/Documentation/glossary-content.txt b/Documentation/glossary-content.txt index 378306f..be0858c 100644 --- a/Documentation/glossary-content.txt +++ b/Documentation/glossary-content.txt @@ -176,6 +176,10 @@ current branch integrates with) obviously do not work, as there is no you can make Git pretend the set of def_parent,parents a def_commit,commit has is different from what was recorded when the commit was created. Configured via the `.git/info/grafts` file. ++ +Note that the grafts mechanism is outdated and can lead to problems +transferring objects between repositories; see linkgit:git-replace[1] +for a more flexible and robust system to do the same thing. [[def_hash]]hash:: In Git's context, synonym for def_object_name,object name. -- 1.8.5.2.500.g8060133 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] push: detect local refspec errors early
On Wed, Mar 05, 2014 at 12:51:06PM -0800, Junio C Hamano wrote: Jeff King p...@peff.net writes: We can't fully process the refspecs until we have talked to the other side, because they may involve matching refs from the remote; I don't think git even really looks at them until after we've connected. But I think there are some obvious cases, like a bogus left-hand side (i.e., what you have here) that cannot ever succeed, no matter what the other side has. We could sanity check the refspecs before doing anything else. The user's wallclock time is more important than machine cycles, checking things we could check before having the user do things is a good principle to follow. I wish that the solution did not have to involve doing the same computation twice, but I do not think there is a clean way around that in this codepath. Yeah, there are two inefficiencies here: 1. We parse the refspecs twice. In theory we could parse them once, feed the result to check_push_refspecs, then again to match_push_refspecs. That wouldn't be too hard a refactor. 2. We match the src side to local refs twice. Getting rid of this would involve splitting match_push_refs into two (analyzing the src half and the dst half), somehow storing the intermediate the two calls, and only contacting the remote after the first step is done. This is probably trickier. I'd be happy if somebody wanted to do those cleanups on top, but I don't personally have an interest in spending time on them. The amount of duplicated computation we're talking about here is not very much (and the number of refspecs tends to be small, anyway). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Unable to shrink repository size
On Wed, Mar 05, 2014 at 08:55:30PM -0600, Robert Dailey wrote: What I'd like to do is somehow hunt down the largest commit (*not* blob) in the entire history of the repository to hopefully find out where huge directories have been checked in. I can't do a search for largest file (which most google results seem to show to do) since the culprit is really thousands of unnecessary files checked into a single subdirectory somewhere in history. Other people have offered scripts to look at commit sizes. But it might also be useful to see sizes by subdirectory. Sort of a du across all of history. Script is below. Note that this script also uses cat-file's %(objectsize:disk). So it is finding the actual on-disk storage, taking into account delta storage. You will need git v1.8.5 or later to use this feature. git rev-list --objects --all | git cat-file --batch-check=%(objectsize:disk) %(rest) | perl -lne ' my ($size, $path) = split / /, $_, 2; next unless defined $path; # commit obj do { $sizes{$path} += $size; } while ($path =~ s{/[^/]+$}{}); END { print $sizes{$_} $_ for (keys %sizes) } ' | sort -rn -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] disable grafts during fetch/push/bundle
On Thu, Mar 06, 2014 at 09:42:46AM +0100, Michael Haggerty wrote: Replace objects are better than grafts in *almost* every dimension. The exception is that it is dead simple to create grafts, whereas I always have to break open the man pages to remember how to create a replace object that does the same thing. So I think a helpful step towards deprecating grafts would be to offer a couple of convenience features to help people kick the grafts habit: I agree that better tool support would make git replace more pleasant to use. * A tool that converts grafts (i.e., the grafts read from $GIT_DIR/info/grafts) into the equivalent replacements. I don't know if this is strictly necessary, if we make your command below pleasant to use. I.e., it should just be: while read sha1 parents; do git replace --graft $sha1 $parents done .git/info/grafts We can wrap that in git replace --convert-grafts, but I do not think grafts are so common that there would be a big demand for it. * A tool that creates a new replacement object that is the equivalent of a graft. I.e., it should do, using replace references, the equivalent of the following command: echo SHA1 [PARENT1...] $GIT_DIR/info/grafts These features could be added to git replace or could be built into a new git grafts command. I think it would be nice to have a set of mode options for git-replace to do basic editing of a sha1 and install the result (technically you could split the editing into a separate command, but I do not see the point in editing a sha1 and then _not_ replacing it). Perhaps: # pretty-print sha1 based on type, start $EDITOR, create a # type-appropriate object from the result (e.g., using hash-object, # mktree, or mktag), and then set up the object as a replacement for # SHA1 git replace --edit SHA1 # ditto, but replace the $EDITOR step with the parent list git replace --graft SHA1 PARENT1 PARENT2 # ...or remove entries from a tree git replace --remove-entry SHA1 foo bar -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] disable grafts during fetch/push/bundle
On Thu, Mar 06, 2014 at 05:41:27PM +0100, Michael Haggerty wrote: We can wrap that in git replace --convert-grafts, but I do not think grafts are so common that there would be a big demand for it. It's probably easier to wrap it than to explain to Windows users what they have to do. How would Windows users get a graft file in the first-place? There's no GUI for it! ;) It should be easy to do --convert-grafts, though, and I think it fits into the scheme we're discussing below. I think it would be nice to have a set of mode options for git-replace to do basic editing of a sha1 and install the result (technically you could split the editing into a separate command, but I do not see the point in editing a sha1 and then _not_ replacing it). If modifying without replacing is needed, it would be pretty easy to add an option --stdout that writes the SHA1 of the modified object to stdout instead of creating a replace reference. That way what you want 95% of the time is the default but there is still an escape hatch. Agreed. I had originally though that perhaps something like this should be part of hash-object, and that replace should farm out the work. But thinking on it more, it doesn't really make sense as part of hash-object. Perhaps: # pretty-print sha1 based on type, start $EDITOR, create a # type-appropriate object from the result (e.g., using hash-object, # mktree, or mktag), and then set up the object as a replacement for # SHA1 git replace --edit SHA1 Here's a rough series that gets us this far: [1/4]: replace: refactor command-mode determination [2/4]: replace: use OPT_CMDMODE to handle modes [3/4]: replace: factor object resolution out of replace_object [4/4]: replace: add --edit option It shouldn't be too hard to do --graft or --convert-grafts on top. I also noticed that doing: git replace foo foo is less than friendly (we notice the cycle, but just barf). It's especially easy to do with git replace --edit, if you just exit the editor without making changes. Or if you make changes to an already-replaced object to revert it back, in which case we would want to notice and delete the replacement. So I think we want to have git replace foo foo silently converted into git replace -d foo (but without an error if there is no existing replacement), and then --edit will just do the right thing, as it's built on top. I also noticed that the diff engine does not play well with replacements of blobs. When we are diffing the trees, we see that the sha1 for path foo is the same on either side, and do not look further, even though feeding those sha1s to builtin_diff would fetch the replacements. I think compare_tree_entry would have to learn lookup_replace_object (and I suspect it would make tree diffs noticeably slower when you have even one replace ref). git replace could support some of the options that git filter-branch can take, like --env-filter, --msg-filter, etc. (at least if the target is a commit object). All of this would make it possible to build up the changes that you want to integrate via filter-branch piecemeal instead of having to have a single monster filter-branch invocation. For example, Right. I was tempted to suggest that, too, but I think it can get rather tricky, as you need to replace in a loop, and sometimes the exact objects you need aren't obvious. For example, a common use of --index-filter is to remove a single file. But to remove foo/bar/baz, you would need to loop over each commit, find the tree for foo/bar, and then remove the baz entry in Still, I really like the workflow of having decent replace tools, followed by cementing the changes into place with a filter-branch run (which, btw, does not yet know how to cement trees and blobs into place). It lets you work on the filtering incrementally, and even share or work collaboratively on it by pushing refs/replace). And as you mention, it could be a heck of a lot faster than what we have now. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC/PATCH 1/4] replace: refactor command-mode determination
The git-replace command has three modes: listing, deleting, and replacing. The first two are selected explicitly. If none is selected, we fallback to listing when there are no arguments, and replacing otherwise. Let's figure out up front which operation we are going to do, before getting into the application logic. That lets us simplify our option checks (e.g., we currently have to check whether a useless --force is given both along with an explicit list, as well as with an implicit one). This saves some lines, makes the logic easier to follow, and will facilitate further cleanups. Signed-off-by: Jeff King p...@peff.net --- builtin/replace.c | 14 ++ 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/builtin/replace.c b/builtin/replace.c index 2336325..6a0e8bd 100644 --- a/builtin/replace.c +++ b/builtin/replace.c @@ -182,12 +182,16 @@ int cmd_replace(int argc, const char **argv, const char *prefix) argc = parse_options(argc, argv, prefix, options, git_replace_usage, 0); + if (!list !delete) + if (!argc) + list = 1; + if (list delete) usage_msg_opt(-l and -d cannot be used together, git_replace_usage, options); - if (format delete) - usage_msg_opt(--format and -d cannot be used together, + if (format !list) + usage_msg_opt(--format cannot be used when not listing, git_replace_usage, options); if (force (list || delete)) @@ -207,9 +211,6 @@ int cmd_replace(int argc, const char **argv, const char *prefix) if (argc != 2) usage_msg_opt(bad number of arguments, git_replace_usage, options); - if (format) - usage_msg_opt(--format cannot be used when not listing, - git_replace_usage, options); return replace_object(argv[0], argv[1], force); } @@ -217,9 +218,6 @@ int cmd_replace(int argc, const char **argv, const char *prefix) if (argc 1) usage_msg_opt(only one pattern can be given with -l, git_replace_usage, options); - if (force) - usage_msg_opt(-f needs some arguments, - git_replace_usage, options); return list_replace_refs(argv[0], format); } -- 1.8.5.2.500.g8060133 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC/PATCH 2/4] replace: use OPT_CMDMODE to handle modes
By using OPT_CMDMODE, the mutual exclusion between modes is taken care of for us. It also makes it easy for us to maintain a single variable with the mode, which makes its intent more clear. We can use a single switch() to make sure we have covered all of the modes. This ends up breaking even in code size, but the win will be much bigger when we start adding more modes. Signed-off-by: Jeff King p...@peff.net --- builtin/replace.c | 49 + 1 file changed, 25 insertions(+), 24 deletions(-) diff --git a/builtin/replace.c b/builtin/replace.c index 6a0e8bd..0b5cb17 100644 --- a/builtin/replace.c +++ b/builtin/replace.c @@ -168,11 +168,17 @@ static int replace_object(const char *object_ref, const char *replace_ref, int cmd_replace(int argc, const char **argv, const char *prefix) { - int list = 0, delete = 0, force = 0; + int force = 0; const char *format = NULL; + enum { + MODE_UNSPECIFIED = 0, + MODE_LIST, + MODE_DELETE, + MODE_REPLACE + } cmdmode = MODE_UNSPECIFIED; struct option options[] = { - OPT_BOOL('l', list, list, N_(list replace refs)), - OPT_BOOL('d', delete, delete, N_(delete replace refs)), + OPT_CMDMODE('l', list, cmdmode, N_(list replace refs), MODE_LIST), + OPT_CMDMODE('d', delete, cmdmode, N_(delete replace refs), MODE_DELETE), OPT_BOOL('f', force, force, N_(replace the ref if it exists)), OPT_STRING(0, format, format, N_(format), N_(use this format)), OPT_END() @@ -182,42 +188,37 @@ int cmd_replace(int argc, const char **argv, const char *prefix) argc = parse_options(argc, argv, prefix, options, git_replace_usage, 0); - if (!list !delete) - if (!argc) - list = 1; + if (!cmdmode) + cmdmode = argc ? MODE_REPLACE : MODE_DELETE; - if (list delete) - usage_msg_opt(-l and -d cannot be used together, - git_replace_usage, options); - - if (format !list) + if (format cmdmode != MODE_LIST) usage_msg_opt(--format cannot be used when not listing, git_replace_usage, options); - if (force (list || delete)) - usage_msg_opt(-f cannot be used with -d or -l, + if (force cmdmode != MODE_REPLACE) + usage_msg_opt(-f only makes sense when writing a replacement, git_replace_usage, options); - /* Delete refs */ - if (delete) { + switch (cmdmode) { + case MODE_DELETE: if (argc 1) usage_msg_opt(-d needs at least one argument, git_replace_usage, options); return for_each_replace_name(argv, delete_replace_ref); - } - /* Replace object */ - if (!list argc) { + case MODE_REPLACE: if (argc != 2) usage_msg_opt(bad number of arguments, git_replace_usage, options); return replace_object(argv[0], argv[1], force); - } - /* List refs, even if list is not set */ - if (argc 1) - usage_msg_opt(only one pattern can be given with -l, - git_replace_usage, options); + case MODE_LIST: + if (argc 1) + usage_msg_opt(only one pattern can be given with -l, + git_replace_usage, options); + return list_replace_refs(argv[0], format); - return list_replace_refs(argv[0], format); + default: + die(BUG: invalid cmdmode %d, (int)cmdmode); + } } -- 1.8.5.2.500.g8060133 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC/PATCH 3/4] replace: factor object resolution out of replace_object
As we add new options that operate on objects before replacing them, we'll want to be able to feed raw sha1s straight into replace_object. Split replace_object into the object-resolution part and the actual replacement. Signed-off-by: Jeff King p...@peff.net --- builtin/replace.c | 26 ++ 1 file changed, 18 insertions(+), 8 deletions(-) diff --git a/builtin/replace.c b/builtin/replace.c index 0b5cb17..a090302 100644 --- a/builtin/replace.c +++ b/builtin/replace.c @@ -123,19 +123,17 @@ static int delete_replace_ref(const char *name, const char *ref, return 0; } -static int replace_object(const char *object_ref, const char *replace_ref, - int force) +static int replace_object_sha1(const char *object_ref, + unsigned char object[20], + const char *replace_ref, + unsigned char repl[20], + int force) { - unsigned char object[20], prev[20], repl[20]; + unsigned char prev[20]; enum object_type obj_type, repl_type; char ref[PATH_MAX]; struct ref_lock *lock; - if (get_sha1(object_ref, object)) - die(Failed to resolve '%s' as a valid ref., object_ref); - if (get_sha1(replace_ref, repl)) - die(Failed to resolve '%s' as a valid ref., replace_ref); - if (snprintf(ref, sizeof(ref), refs/replace/%s, sha1_to_hex(object)) sizeof(ref) - 1) @@ -166,6 +164,18 @@ static int replace_object(const char *object_ref, const char *replace_ref, return 0; } +static int replace_object(const char *object_ref, const char *replace_ref, int force) +{ + unsigned char object[20], repl[20]; + + if (get_sha1(object_ref, object)) + die(Failed to resolve '%s' as a valid ref., object_ref); + if (get_sha1(replace_ref, repl)) + die(Failed to resolve '%s' as a valid ref., replace_ref); + + return replace_object_sha1(object_ref, object, replace_ref, repl, force); +} + int cmd_replace(int argc, const char **argv, const char *prefix) { int force = 0; -- 1.8.5.2.500.g8060133 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC/PATCH 4/4] replace: add --edit option
This allows you to run: git replace --edit SHA1 to get dumped in an editor with the contents of the object for SHA1. The result is then read back in and used as a replace object for SHA1. The writing/reading is type-aware, so you get to edit ls-tree output rather than the binary tree format. Missing documentation and tests. Signed-off-by: Jeff King p...@peff.net --- Besides missing docs and tests, we might find that we want to factor the code a little differently when we start adding other helpers (like --graft). I will probably push this forward at some point, but I'm not planning on working on it for the rest of the day, so if you want to pick it up as a base in the meantime and try --graft, --env-filter, or anything else clever on top, please go ahead. builtin/replace.c | 110 +- 1 file changed, 109 insertions(+), 1 deletion(-) diff --git a/builtin/replace.c b/builtin/replace.c index a090302..3ed5f75 100644 --- a/builtin/replace.c +++ b/builtin/replace.c @@ -12,6 +12,7 @@ #include builtin.h #include refs.h #include parse-options.h +#include run-command.h static const char * const git_replace_usage[] = { N_(git replace [-f] object replacement), @@ -176,6 +177,105 @@ static int replace_object(const char *object_ref, const char *replace_ref, int f return replace_object_sha1(object_ref, object, replace_ref, repl, force); } +/* + * Write the contents of the object named by sha1 to the file filename, + * pretty-printed for human editing based on its type. + */ +static void export_object(const unsigned char *sha1, const char *filename) +{ + const char *argv[] = { cat-file, -p, NULL, NULL }; + struct child_process cmd = { argv }; + int fd; + + fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC, 0666); + if (fd 0) + die_errno(unable to open %s for writing, filename); + + argv[2] = sha1_to_hex(sha1); + cmd.git_cmd = 1; + cmd.out = fd; + + if (run_command(cmd)) + die(cat-file reported failure); + + close(fd); +} + +/* + * Read a previously-exported (and possibly edited) object back from filename, + * interpreting it as type, and writing the result to the object database. + * The sha1 of the written object is returned via sha1. + */ +static void import_object(unsigned char *sha1, enum object_type type, + const char *filename) +{ + int fd; + + fd = open(filename, O_RDONLY); + if (fd 0) + die_errno(unable to open %s for reading, filename); + + if (type == OBJ_TREE) { + const char *argv[] = { mktree, NULL }; + struct child_process cmd = { argv }; + struct strbuf result = STRBUF_INIT; + + cmd.argv = argv; + cmd.git_cmd = 1; + cmd.in = fd; + cmd.out = -1; + + if (start_command(cmd)) + die(unable to spawn mktree); + + if (strbuf_read(result, cmd.out, 41) 0) + die_errno(unable to read from mktree); + close(cmd.out); + + if (finish_command(cmd)) + die(mktree reported failure); + if (get_sha1_hex(result.buf, sha1) 0) + die(mktree did not return an object name); + } else { + struct stat st; + int flags = HASH_FORMAT_CHECK | HASH_WRITE_OBJECT; + + if (fstat(fd, st) 0) + die_errno(unable to fstat %s, filename); + if (index_fd(sha1, fd, st, type, NULL, flags) 0) + die(unable to write object to database); + /* index_fd close()s fd for us */ + } + + /* +* No need to close(fd) here; both run-command and index-fd +* will have done it for us. +*/ +} + +static int edit_and_replace(const char *object_ref, int force) +{ + char *tmpfile = git_pathdup(REPLACE_EDITOBJ); + enum object_type type; + unsigned char old[20], new[20]; + + if (get_sha1(object_ref, old) 0) + die(Not a valid object name: '%s', object_ref); + + type = sha1_object_info(old, NULL); + if (type 0) + die(unable to get object type for %s, sha1_to_hex(old)); + + export_object(old, tmpfile); + if (launch_editor(tmpfile, NULL, NULL) 0) + die(editing object file failed); + import_object(new, type, tmpfile); + + free(tmpfile); + + return replace_object_sha1(object_ref, old, replacement, new, force); +} + int cmd_replace(int argc, const char **argv, const char *prefix) { int force = 0; @@ -184,11 +284,13 @@ int cmd_replace(int argc, const char **argv, const char *prefix) MODE_UNSPECIFIED = 0, MODE_LIST, MODE_DELETE, + MODE_EDIT
Re: [RFC/PATCH 2/4] replace: use OPT_CMDMODE to handle modes
On Thu, Mar 06, 2014 at 07:35:19PM +0100, Christian Couder wrote: + if (!cmdmode) + cmdmode = argc ? MODE_REPLACE : MODE_DELETE; Shouldn't it be MODE_LIST instead of MODE_DELETE? Argh, yes, thank you for catching. My original iteration used chars like 'd' and 'l' (similar to other uses of OPT_CMDMODE). I switched it at the last minute to an enum, and somehow thinko'd that completely (and obviously did not run the tests again afterwards). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] disable grafts during fetch/push/bundle
On Thu, Mar 06, 2014 at 11:00:08AM -0800, Junio C Hamano wrote: Jeff King p...@peff.net writes: I also noticed that the diff engine does not play well with replacements of blobs. When we are diffing the trees, we see that the sha1 for path foo is the same on either side, and do not look further, even though feeding those sha1s to builtin_diff would fetch the replacements. Sorry, I do not quite understand. In git diff A B -- path, if the object name recorded for A:path and B:path are the same, but the replacement mechanism maps the object name for that blob object to some other blob object, wouldn't the result be empty because both sides replace to the same thing anyway? Oh, right, I was being stupid. I did: git replace --edit HEAD:some-file and expected git show to find the diff. But that doesn't make sense. On top of that, I need to do: git replace --edit HEAD^{tree} to replace the sha1 of the entry in the tree. In which case diff would find it just fine. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC GSoC idea: git configuration caching (needs co-mentor!)
On Thu, Mar 06, 2014 at 11:24:18AM -0800, Junio C Hamano wrote: * Add new API calls that allow the cache to be inquired easily and efficiently. Rewrite other functions like `git_config_int()` to be cache-aware. Are you sure about the second sentence of this item is what you want? git_config_type(name, value) are all about parsing value (string or NULL) as type, return the parsed value or complain against a bad value for name. They do not care where these name and value come from right now, and there is no reason for them to start caring about caching. They will still be the underlying helper functions the git_config() callbacks will depend on even after the second item in your list happens. Yeah, I agree we want a _new_ set of helpers for retrieving values in a non-callback way. We could call those helpers git_config_int (and rename the existing pure functions), but I'd rather not, as it simply invites confusion with the existing ones. A set of new API calls would look more like this, I would think: extern int git_get_config_string_multi(const char *, int *, const char ***); Not important at this stage, but I was hoping we could keep the names of the new helpers shorter. :) const char *git_get_config_string(const char *name) { const char **values, *result; int num_values; if (git_get_config_string_multi(sample.str, num_values, values)) return NULL; result = num_values ? values[num_values - 1] : NULL; free(values); return result; } that implements the last one wins semantics. The real thing would need to avoid allocation and free overhead. One of the things that needs to be figured out by the student is the format of the internal cache. I had actually envisioned a mapping of keys to values, where values are represented as a full list of strings. Then your string_multi can just return a pointer to that list, and a last-one-wins lookup can grab the final value, with no allocation or ownership complexity. We'd lose the relative order of different config keys, but those should never be important (only the order of single keys, but that is reflected in the order of the value list). Another approach would be to actually represent the syntax tree of the config file in memory. That would make lookups of individual keys more expensive, but would enable other manipulation. E.g., if your syntax tree included nodes for comments and other non-semantic constructs, then we can use it for a complete rewrite. And git config becomes: 1. Read the tree. 2. Perform operations on the tree (add nodes, delete nodes, etc). 3. Write out the tree. and things like remove the section header when the last item in the section is removed become trivial during step 2. But comparing those approaches is something for the student to figure out, I think. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Testing for commit reachability through plumbing commands
On Thu, Mar 06, 2014 at 12:17:34PM -0500, Martin Langhoff wrote: I have a shell script that trims old history on a cronjob. This is for a repo that is used to track reports that have limited life (like logs). Old history is trimmed with grafts pointing to an empty root commit. Right now, info/graft grows unbound. I am looking for a way to trim unreachable grafts, I would like to be able to say something like: git is-reachable treeish How about: git rev-list --objects --all | cut -d' ' -f1 | grep $(git rev-parse treeish) Add --reflog to the rev-list invocation if you want to catch things referenced by the reflog, too. If you're looking for a commit, you can drop the --objects, and it will run much faster. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] t3200-branch: test setting branch as own upstream
On Wed, Mar 05, 2014 at 04:31:55PM +0900, Brian Gesiak wrote: No test asserts that git branch -u refs/heads/my-branch my-branch emits a warning. Add a test that does so. Signed-off-by: Brian Gesiak modoca...@gmail.com Thanks, this looks good. Two minor points that may or may not be worth addressing: +test_expect_success '--set-upstream-to shows warning if used to set branch as own upstream' ' + git branch --set-upstream-to refs/heads/my13 my13 2actual + cat expected EOF +warning: Not setting branch my13 as its own upstream. +EOF If you spell the EOF marker as: cat expect -\EOF then: 1. The shell does not interpolate the contents (it does not matter here, but it is a good habit to be in, so we typically do it unless there is a need to interpolate). 2. Using - will strip leading tabs, so the content can be indented properly along with the rest of the test. + test_i18ncmp expected actual + test_must_fail git config branch.my13.remote + test_must_fail git config branch.my13.merge I think we could tighten these to: test_expect_code 1 git config branch.my13.remote to eliminate a false-positive success on other config errors. It's highly improbable for it to ever matter, though (and it looks like we are not so careful in most other places that call git config looking for a missing entry, either). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Halt during fetch on MacOS
On Thu, Mar 06, 2014 at 10:24:49AM -0800, Junio C Hamano wrote: OK, I've tried using my own build from master, and I still get the same results. I've done a little more investigation and discovered it always hangs at: `atexit(notify_parent);` in `run-command.c:start_command` when running: trace: run_command: 'git-remote-https' 'aosp' 'https://android.googlesource.com/platform/external/tinyxml2' Could this have to do with the atexit implementation? (eg. limit on the number of functions that can be registered, etc) Thanks. An interesting theory indeed. I read that an implementation is supposed to take at least ATEXIT_MAX (32) calls to atexit(3); while I do think we register functions with atexit(3) from multiple places in our code, I doubt we would be making that many. It seems awfully weird that it would _hang_ in such a case, though. That sounds more like hitting a mutex that's internal to atexit(), or something similar. Conley, can you see if dropping that atexit clears up the problem (you should be OK without it; git will just fail to notice the child's exec failure with as much detail). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH] diff: simplify cpp funcname regex
On Wed, Mar 05, 2014 at 08:58:26AM +0100, Johannes Sixt wrote: Here is a patch that I'm carrying around since... a while. What do you think? The pattern I chose also catches variable definition, not just functions. That is what I need, but it hurts grep --function-context That's the reason I didn't forward the patch, yet. If by variable definition you mean: struct foo bar = { - old + new }; I'd think that would be covered by the existing struct|class|enum. Though I think we'd want to also allow keywords in front of it, like static. I suspect the original was more meant to find: struct foo { -old +new }; The parts of the pattern have the following flaws: - The first part matches an identifier followed immediately by a colon and arbitrary text and is intended to reject goto labels and C++ access specifiers (public, private, protected). But this pattern also rejects C++ constructs, which look like this: MyClass::MyClass() MyClass::~MyClass() MyClass::Item MyClass::Find(... Makes sense. I noticed your fix is to look for end-of-line or comments afterwards. Would it be simpler to just check for a non-colon, like: !^[ \t]*[A-Za-z_][A-Za-z_0-9]*:($|[^:]) - The second part matches an identifier followed by a list of qualified names (i.e. identifiers separated by the C++ scope operator '::') [...] A tried to keep the looks like a function definition bit in mine, and yours loosens this quite a bit more. I think that may be OK. That is, I do not think there is any reason for somebody to do: void foo() { call_to_bar(); -old +new } That is, nobody would put a function _call_ without indentation. If something has alphanumerics at the left-most column, then it is probably interesting no matter what. - The third part of the pattern finally matches compound definitions. But it forgets about unions and namespaces, and also skips single-line definitions struct random_iterator_tag {}; because no semicolon can occur on the line. I don't see how that is an interesting line. The point is to find a block that is surrounding the changes, but that is not surrounding the lines below. Notice that all interesting anchor points begin with an identifier or keyword. But since there is a large variety of syntactical constructs after the first word, the simplest is to require only this word and accept everything else. Therefore, this boils down to a line that begins with a letter or underscore (optionally preceded by the C++ scope operator '::' to accept functions returning a type anchored at the global namespace). Replace the second and third part by a single pattern that picks such a line. Yeah, this bit makes sense to me. Both yours and mine will find the first line here in things like: void foo(void); -void bar(void); +void bar(int arg); but I think that is OK. There _isn't_ any interesting surrounding context here. The current code will sometimes come up with an empty funcline (which is good), but it may just as easily come up with a totally bogus funcline in a case like: void unrelated(void) { } void foo(void); -void bar(void); +void bar(int arg); So trying to be very restrictive and say that doesn't look like a function does not really buy us anything (and it creates tons of false negatives, as you documented, because C++ syntax has all kinds of crazy stuff). _If_ the backwards search learned to terminate (e.g., seeing the ^} line and saying well, we can't be inside a function), then such negative lines might be useful for coming up with an empty funcname rather than the bogus void foo(void);. But we do not do that currently, and I do not think it is that useful (the funcname above is arguably just as or more useful than an empty one). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] show_ident_date: fix always-false conditional
On Thu, Mar 06, 2014 at 08:35:24PM -0500, Eric Sunshine wrote: 1dca155fe3fa (log: handle integer overflow in timestamps, 2014-02-24) assigns the result of strtol() to an 'int' and then checks it against LONG_MIN and LONG_MAX, indicating underflow or overflow, even though 'int' may not be large enough to represent those values. On Mac, the compiler complains: warning: comparison of constant 9223372036854775807 with expression of type 'int' is always false [-Wtautological-constant-out-of-range-compare] if (tz == LONG_MAX || tz == LONG_MIN) Similarly for the LONG_MIN case. Fix this. Yeah, this is definitely a potential bug. When I added the overflow check, I blindly assumed that the existing code was at least using a sufficiently large type to store the result of strtol, but it's not. I don't think your fix catches all overflow, though: + else if (ident-tz_begin ident-tz_end) { + errno = 0; + tz = strtol(ident-tz_begin, NULL, 10); + if (errno) Errno will trigger if we overflowed a long, but then we assign the result into an int, possibly truncating the result. Alternately, the result of strtol() could be assigned temporarily to a 'long', compared against LONG_MIN and LONG_MAX, and then assigned to the 'int' tz variable. That catches overflow from strtol, but we'd then truncate when we pass it as an int to show_date. I think we want this instead: -- 8 -- Subject: show_ident_date: fix tz range check Commit 1dca155fe3fa (log: handle integer overflow in timestamps, 2014-02-24) tried to catch integer overflow coming from strtol() on the timezone field by comparing against LONG_MIN/LONG_MAX. However, the intermediate tz variable is an int, which means it can never be LONG_MAX on LP64 systems; we would truncate the output from strtol before the comparison. Clang's -Wtautological-constant-out-of-range-compare notices this and rightly complains. Let's instead store the result of strtol in a long, and then compare it against INT_MIN/INT_MAX. This will catch overflow from strtol, and also overflow when we pass the result as an int to show_date. Reported-by: Eric Sunshine sunsh...@sunshineco.com Signed-off-by: Jeff King p...@peff.net --- pretty.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/pretty.c b/pretty.c index 3b811ed..6e266dd 100644 --- a/pretty.c +++ b/pretty.c @@ -397,7 +397,7 @@ static const char *show_ident_date(const struct ident_split *ident, enum date_mode mode) { unsigned long date = 0; - int tz = 0; + long tz = 0; if (ident-date_begin ident-date_end) date = strtoul(ident-date_begin, NULL, 10); @@ -406,7 +406,7 @@ static const char *show_ident_date(const struct ident_split *ident, else { if (ident-tz_begin ident-tz_end) tz = strtol(ident-tz_begin, NULL, 10); - if (tz == LONG_MAX || tz == LONG_MIN) + if (tz = INT_MAX || tz = INT_MIN) tz = 0; } return show_date(date, tz, mode); -- 1.8.5.2.500.g8060133 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 4/4] replace: add --edit option
On Thu, Mar 06, 2014 at 08:57:58PM -0500, Eric Sunshine wrote: + if (strbuf_read(result, cmd.out, 41) 0) + die_errno(unable to read from mktree); + close(cmd.out); + + if (finish_command(cmd)) + die(mktree reported failure); + if (get_sha1_hex(result.buf, sha1) 0) + die(mktree did not return an object name); strbuf_release(result); Thanks for catching. I'll include it in any re-roll. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] disable grafts during fetch/push/bundle
On Fri, Mar 07, 2014 at 08:08:37AM +0100, Christian Couder wrote: Be it graft or replace, I do not think we want to invite people to use these mechansims too lightly to locally rewrite their history willy-nilly without fixing their mistakes at the object layer with commit --amend, rebase, bfg, etc. in the longer term. So in that sense, adding a command to make it easy is not something I am enthusiastic about. On the other hand, if the user does need to use graft or replace (perhaps to prepare for casting the fixed history in stone with filter-branch), it would be good to help them avoid making mistakes while doing so and tool support may be a way to do so. So, ... I am of two minds. Maybe if we add a new command (or maybe a script) with a name long and cryptic-looking enough like git create-replacement-object it will scare casual users from touching it, while power users will be happy to benefit from it. I do not think the features we are talking about are significantly more dangerous than git replace is in the first place. If we want to make people aware of the dangers, perhaps git-replace.1 is the right place to do it. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Trust issues with hooks and config files
On Thu, Mar 06, 2014 at 10:47:43PM +0100, Julian Brost wrote: I've noticed some behavior of git that might lead to some security issues if the user is not aware of this. Assume we have an evil user on a system, let's call him eve. He prepares a repository where he allows other user to push changes to. If he now adds a post-receive hook, git will happly execute it as whatever user pushes to this repository: Yes, this is a well-known issue. The only safe operation on a repository for which somebody else controls hooks and config is to fetch from it (upload-pack on the remote repository does not respect any dangerous config or hooks). Something similiar might happen if eve adds some alias to the config file in the repository and grants any other user read access to the repository. These aliases will be executed when some other user is running any git command in this repository. Even though git does not allow defining aliases for existing commands, you might mistype something, so adding an alias for lg instead of log might succeed: Much easier is to define an external diff or textconv tool; then the victim does not even have to typo. I'd suggest taking a similar approach as Mercurial [1], i.e. ignoring configuration files and hooks owned by another user unless the owner is explicitly trusted It has been discussed, but nobody has produced patches. I think that nobody is really interested in doing so because: 1. It introduces complications into previously-working setups (e.g., a daemon running as nobody serving repos owned by a git user needs to mark git as trusted). 2. In most cases, cross-server boundaries provide sufficient insulation (e.g., I might not push to an evil person's repo, but rather to a shared repo whose hooks and config are controlled by root on the remote server). If you want to work on it, I think it's an interesting area. But any development would need to think about the transition plan for existing sites that will be broken. At the very least, the current trust model could stand to be documented much better (I do not think the rule of fetching is safe, everything else is not is mentioned anywhere explicitly). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/WIP] Pluggable reference backends
On Mon, Mar 10, 2014 at 07:30:45AM -0700, Shawn Pearce wrote: * Store references in a SQLite database, to get correct transaction handling. No to SQLLite in git-core. Using it from JGit requires building SQLLite and a JNI wrapper, which makes JGit significantly less portable. I know SQLLite is pretty amazing, but implementing compatibility with it from JGit will be a big nightmare for us. That seems like a poor reason not to implement a pluggable feature for git-core. If we implement it, then a site using only git-core can take advantage of it. Sites with JGit cannot, and would use a different pluggable storage mechanism that's supported by both. But if we don't implement, it hurts people using only git-core, and it does not help sites using JGit at all. That's assuming that attention spent on implementing the feature does not take away from implementing some other parallel scheme that does the same thing but does not use SQLite. I don't know what that would be offhand; mapping the ref and reflog into a relational database is pretty simple, and we get a lot of robustness and efficiency benefits for free. We could perhaps have some kind of relational backend could use an ODBC-like abstraction to point to a database. I have no idea if people would want to ever store refs in a real server-backend RDBMS, but I suspect Java has native support for such things. Certainly I think we should aim for compatibility where we can, but if there's not a compatible way to do something, I don't think the limitations of one platform should drag other ones down. And that goes both ways; we had to reimplement disk-compatible EWAH from scratch in C for git-core to have bitmaps, whereas JGit just got to use a ready-made library. I don't think that was a bad thing. People in mixed-implementation environments couldn't use it, but people with JGit-only environments were free to take advantage of it. At any rate, the repository needs to advertise this is the ref storage mechanism I use in the config. We're going to need to bump core.repositoryformatversion for such cases (because an old version of git should not blindly lock and write to a refs/ directory that nobody else is ever going to look at). And I'd suggest with that bump adding in something like core.refstorage, so that an implementation can say foobar ref storage? Never heard of it and barf. Whether it's because that implementation doesn't support foobar, because it's an old version that doesn't understand foobar yet, or because it was simply built without foobar support. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] clean: respect pathspecs with -d
git-clean uses read_directory to fill in a `struct dir` with potential hits. However, read_directory does not actually check against our pathspec. It uses a simplified version that may turn up false positives. As a result, we need to check that any hits match our pathspec. We do so reliably for non-directories. For directories, if -d is not given we check that the pathspec matched exactly (i.e., we are even stricter, and require an explicit git clean foo to clean foo/). But if -d is given, rather than relaxing the exact match to allow a recursive match, we do not check the pathspec at all. This regression was introduced in 113f10f (Make git-clean a builtin, 2007-11-11). Signed-off-by: Jeff King p...@peff.net --- On Mon, Mar 10, 2014 at 11:31:37AM +0100, Robin Pedersen wrote: I accidentially deleted a directory using git clean. I would think this is a bug, but I'm not sure. Was using 1.8.1, but upgraded to 1.9.0 just to see if it was still reproducable, and it was. Definitely a bug, and it dates back quite a while. Thanks for a very clear bug report. -- 8 -- builtin/clean.c | 5 +++-- t/t7300-clean.sh | 8 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/builtin/clean.c b/builtin/clean.c index 114d7bf..31c1488 100644 --- a/builtin/clean.c +++ b/builtin/clean.c @@ -947,14 +947,15 @@ int cmd_clean(int argc, const char **argv, const char *prefix) if (pathspec.nr) matches = dir_path_match(ent, pathspec, 0, NULL); + if (pathspec.nr !matches) + continue; + if (S_ISDIR(st.st_mode)) { if (remove_directories || (matches == MATCHED_EXACTLY)) { rel = relative_path(ent-name, prefix, buf); string_list_append(del_list, rel); } } else { - if (pathspec.nr !matches) - continue; rel = relative_path(ent-name, prefix, buf); string_list_append(del_list, rel); } diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh index 710be90..0c602de 100755 --- a/t/t7300-clean.sh +++ b/t/t7300-clean.sh @@ -511,4 +511,12 @@ test_expect_success SANITY 'git clean -d with an unreadable empty directory' ' ! test -d foo ' +test_expect_success 'git clean -d respects pathspecs' ' + mkdir foo + mkdir foobar + git clean -df foobar + test_path_is_dir foo + test_path_is_missing foobar +' + test_done -- 1.9.0.403.g7a2f4b0 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] clean: respect pathspecs with -d
On Mon, Mar 10, 2014 at 01:20:02PM -0400, Jeff King wrote: On Mon, Mar 10, 2014 at 11:31:37AM +0100, Robin Pedersen wrote: I accidentially deleted a directory using git clean. I would think this is a bug, but I'm not sure. Was using 1.8.1, but upgraded to 1.9.0 just to see if it was still reproducable, and it was. Definitely a bug, and it dates back quite a while. Thanks for a very clear bug report. -- 8 -- Whoops, accidentally included a scissors line here that will break people using git am --scissors to pick up the patch. Here it is correctly formatted. -- 8 -- Subject: clean: respect pathspecs with -d git-clean uses read_directory to fill in a `struct dir` with potential hits. However, read_directory does not actually check against our pathspec. It uses a simplified version that may turn up false positives. As a result, we need to check that any hits match our pathspec. We do so reliably for non-directories. For directories, if -d is not given we check that the pathspec matched exactly (i.e., we are even stricter, and require an explicit git clean foo to clean foo/). But if -d is given, rather than relaxing the exact match to allow a recursive match, we do not check the pathspec at all. This regression was introduced in 113f10f (Make git-clean a builtin, 2007-11-11). Signed-off-by: Jeff King p...@peff.net --- builtin/clean.c | 5 +++-- t/t7300-clean.sh | 8 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/builtin/clean.c b/builtin/clean.c index 114d7bf..31c1488 100644 --- a/builtin/clean.c +++ b/builtin/clean.c @@ -947,14 +947,15 @@ int cmd_clean(int argc, const char **argv, const char *prefix) if (pathspec.nr) matches = dir_path_match(ent, pathspec, 0, NULL); + if (pathspec.nr !matches) + continue; + if (S_ISDIR(st.st_mode)) { if (remove_directories || (matches == MATCHED_EXACTLY)) { rel = relative_path(ent-name, prefix, buf); string_list_append(del_list, rel); } } else { - if (pathspec.nr !matches) - continue; rel = relative_path(ent-name, prefix, buf); string_list_append(del_list, rel); } diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh index 710be90..0c602de 100755 --- a/t/t7300-clean.sh +++ b/t/t7300-clean.sh @@ -511,4 +511,12 @@ test_expect_success SANITY 'git clean -d with an unreadable empty directory' ' ! test -d foo ' +test_expect_success 'git clean -d respects pathspecs' ' + mkdir foo + mkdir foobar + git clean -df foobar + test_path_is_dir foo + test_path_is_missing foobar +' + test_done -- 1.9.0.403.g7a2f4b0 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] clean: simplify dir/not-dir logic
On Mon, Mar 10, 2014 at 01:20:02PM -0400, Jeff King wrote: git-clean uses read_directory to fill in a `struct dir` with potential hits. However, read_directory does not actually check against our pathspec. It uses a simplified version that may turn up false positives. As a result, we need to check that any hits match our pathspec. We do so reliably for non-directories. For directories, if -d is not given we check that the pathspec matched exactly (i.e., we are even stricter, and require an explicit git clean foo to clean foo/). But if -d is given, rather than relaxing the exact match to allow a recursive match, we do not check the pathspec at all. This regression was introduced in 113f10f (Make git-clean a builtin, 2007-11-11). The code has been cleaned up quite a bit from that original version, and it was pretty easy to see the discrepancy between the two code paths. However, if the code were structured like the cleanup patch below, I think it would have been even easier. This comes on top of my other patch. So the bug is already fixed, but I think the end result is more readable. -- 8 -- When we get a list of paths from read_directory, we further prune it to create the final list of items to remove. The code paths for directories and non-directories repeat the same add to list code. This patch restructures the code so that we don't repeat ourselves. Also, by following a if (condition) continue pattern like the pathspec check above, it makes it more obvious that the conditional is about excluding directories under certain circumstances. Signed-off-by: Jeff King p...@peff.net --- builtin/clean.c | 15 ++- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/builtin/clean.c b/builtin/clean.c index 31c1488..cf76b1f 100644 --- a/builtin/clean.c +++ b/builtin/clean.c @@ -950,15 +950,12 @@ int cmd_clean(int argc, const char **argv, const char *prefix) if (pathspec.nr !matches) continue; - if (S_ISDIR(st.st_mode)) { - if (remove_directories || (matches == MATCHED_EXACTLY)) { - rel = relative_path(ent-name, prefix, buf); - string_list_append(del_list, rel); - } - } else { - rel = relative_path(ent-name, prefix, buf); - string_list_append(del_list, rel); - } + if (S_ISDIR(st.st_mode) !remove_directories + matches != MATCHED_EXACTLY) + continue; + + rel = relative_path(ent-name, prefix, buf); + string_list_append(del_list, rel); } if (interactive del_list.nr 0) -- 1.9.0.403.g7a2f4b0 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/WIP] Pluggable reference backends
On Mon, Mar 10, 2014 at 10:46:01AM -0700, Junio C Hamano wrote: No to SQLLite in git-core. Using it from JGit requires building SQLLite and a JNI wrapper, which makes JGit significantly less portable. I know SQLLite is pretty amazing, but implementing compatibility with it from JGit will be a big nightmare for us. That seems like a poor reason not to implement a pluggable feature for git-core. If we implement it, then a site using only git-core can take advantage of it. Sites with JGit cannot, and would use a different pluggable storage mechanism that's supported by both. But if we don't implement, it hurts people using only git-core, and it does not help sites using JGit at all. We would need to eventually have at least one backend that we know will play well with different Git implementations that matter (namely, git-core, Jgit and libgit2) before the feature can be widely adopted. I assumed that the current refs/ and logs/ code, massaged into pluggable backend form, would be the first such. And I wouldn't be surprised to see some iteration on that once it is easier to move from scheme to scheme (e.g., to use some encoding of the names on the filesystem to avoid D/F conflicts, and thus allow reflogs for deleted refs). The first backend that is used while the plugging-interface is in development can be anything and does not have to be one that eventual ubiquitous one, however; as long as it is something that we do not mind carrying it forever, along with that final reference backend. I take the objection from Shawn only as against making the sqlite that final one. Sure, I'd agree with that. I'd think something like an sqlite interface would be mainly of interest to people running busy servers. I don't know that it would make a good default. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/WIP] Pluggable reference backends
On Mon, Mar 10, 2014 at 05:14:02PM +0100, David Kastrup wrote: [storing refs in sqlite] Of course, the basic premise for this feature is let's assume that our file and/or operating system suck at providing file system functionality at file name granularity. There have been two historically approaches to that problem that are not independent: a) use Linux b) kick Linus. You didn't define suck here, but there are a number of issues with the current ref storage system. Here is a sampling: 1. The filesystem does not present an atomic view of the data (e.g., you read a, then while you are reading b, somebody else updates a; your view is one that never existed at any point in time). 2. Using the filesystem creates D/F conflicts between branches foo and foo/bar. Because this name is a primary key even for the reflogs, we cannot easily persist reflogs after the ref is removed. 3. We use packed-refs in conjunction with loose ones to achieve reasonable performance when there are a large number of refs. The scheme for determining the current value of a ref is complicated and error-prone (we had several race conditions that caused real data loss). Those things can be solved through better support from the filesystem. But they were also solved decades ago by relational databases. I generally avoid databases where possible. They lock your data up in a binary format that you can't easily touch with standard unix tools. And they introduce complexity and opportunity for bugs. But they are also a proven technology for solving exactly the sorts of problems that some people are having with git. I do not see a reason not to consider them as an option for a pluggable refs system. But I also do not see a reason to inflict their costs on people who do not have those problems. And that is why Michael's email is about _pluggable_ ref backends, and not let's convert git to sqlite. I do not even know if sqlite is going to end up as an interesting option. But it will be nice to be able to experiment with it easily due to git's ref code becoming more modular. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] fix status_printf_ln calls zero-length format warnings
On Mon, Mar 10, 2014 at 08:27:25PM +0100, Benoit Pierre wrote: Those happens with gcc -Wformat-zero-length. Since passing NULL does not generate a warning (as __attribute__((printf())) does not imply nonull), modify status_printf/status_printf_ln to allow a NULL format and update calls with an empty string. Most of us who compile with -Wall decided a while ago to use -Wno-format-zero-length, because it really is a silly complaint (it assumes there are no side effects of the function besides printing the format string, which is obviously not true in this case). It would be nice to compile out of the box with -Wall -Werror, and I think your solution looks relatively clean. But I am slightly concerned about the assumption that it is OK to pass a NULL fmt parameter to a function marked with __attribute__((format)). It certainly seems to be the case now, and I do not know of any plans for that to change, but it seems like a potentially obvious thing for the compiler to check. I dunno; perhaps it has already been considered and rejected by gcc folks to allow the exact escape hatch we are using here. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/7] commit: fix patch hunk editing with commit -p -m
On Mon, Mar 10, 2014 at 07:49:34PM +0100, Benoit Pierre wrote: Don't change git environment: move the GIT_EDITOR=: override to the hook command subprocess, like it's already done for GIT_INDEX_FILE. Signed-off-by: Benoit Pierre benoit.pie...@gmail.com --- builtin/checkout.c| 8 +++ builtin/clone.c | 4 ++-- builtin/commit.c | 35 --- builtin/gc.c | 2 +- builtin/merge.c | 6 +++--- commit.h | 3 +++ run-command.c | 44 --- run-command.h | 6 +- t/t7513-commit_-p_-m_hunk_edit.sh | 2 +- 9 files changed, 79 insertions(+), 31 deletions(-) This is a lot of change, and in some ways I think it is making things better overall. However, the simplest fix for this is basically to move the setting of GIT_EDITOR down to after we prepare the index. Jun Hao (cc'd) has been preparing a series for this based on the Bloomberg git hackday a few weeks ago, but it hasn't been sent to the list yet. Commits are here: https://github.com/bloomberg/git/compare/commit-patch-allow-hunk-editing if you care to look. I'm not sure which solution is technically superior, but it's worth considering both. I regret not encouraging Jun to post to the list sooner, as we might have avoided some duplicated effort. But that's a sunk cost, and we should pick up whichever is the best for the project. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] clean: respect pathspecs with -d
On Mon, Mar 10, 2014 at 09:02:35PM +0100, Simon Ruderich wrote: On Mon, Mar 10, 2014 at 01:22:15PM -0400, Jeff King wrote: +test_expect_success 'git clean -d respects pathspecs' ' + mkdir foo + mkdir foobar + git clean -df foobar + test_path_is_dir foo + test_path_is_missing foobar +' + test_done I think we should also test removing foo, which was also in the original report, to make sure we don't match prefixes, e.g.: test_expect_success 'git clean -d respects pathspecs' ' mkdir foo mkdir foobar git clean -df foo test_path_is_missing foo test_path_is_dir foobar ' Yeah, it probably makes sense to test both ways (though the root cause and fix are the same). Those mkdirs need to be mkdir -p, though. Here's an updated patch with your suggestion: -- 8 -- Subject: clean: respect pathspecs with -d git-clean uses read_directory to fill in a `struct dir` with potential hits. However, read_directory does not actually check against our pathspec. It uses a simplified version that may turn up false positives. As a result, we need to check that any hits match our pathspec. We do so reliably for non-directories. For directories, if -d is not given we check that the pathspec matched exactly (i.e., we are even stricter, and require an explicit git clean foo to clean foo/). But if -d is given, rather than relaxing the exact match to allow a recursive match, we do not check the pathspec at all. This regression was introduced in 113f10f (Make git-clean a builtin, 2007-11-11). Signed-off-by: Jeff King p...@peff.net --- builtin/clean.c | 5 +++-- t/t7300-clean.sh | 16 2 files changed, 19 insertions(+), 2 deletions(-) diff --git a/builtin/clean.c b/builtin/clean.c index 114d7bf..31c1488 100644 --- a/builtin/clean.c +++ b/builtin/clean.c @@ -947,14 +947,15 @@ int cmd_clean(int argc, const char **argv, const char *prefix) if (pathspec.nr) matches = dir_path_match(ent, pathspec, 0, NULL); + if (pathspec.nr !matches) + continue; + if (S_ISDIR(st.st_mode)) { if (remove_directories || (matches == MATCHED_EXACTLY)) { rel = relative_path(ent-name, prefix, buf); string_list_append(del_list, rel); } } else { - if (pathspec.nr !matches) - continue; rel = relative_path(ent-name, prefix, buf); string_list_append(del_list, rel); } diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh index 710be90..74de814 100755 --- a/t/t7300-clean.sh +++ b/t/t7300-clean.sh @@ -511,4 +511,20 @@ test_expect_success SANITY 'git clean -d with an unreadable empty directory' ' ! test -d foo ' +test_expect_success 'git clean -d respects pathspecs (dir is prefix of pathspec)' ' + mkdir -p foo + mkdir -p foobar + git clean -df foobar + test_path_is_dir foo + test_path_is_missing foobar +' + +test_expect_success 'git clean -d respects pathspecs (pathspec is prefix of dir)' ' + mkdir -p foo + mkdir -p foobar + git clean -df foo + test_path_is_missing foo + test_path_is_dir foobar +' + test_done -- 1.9.0.403.g7a2f4b0 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GSoC14][RFC] Proposal Draft: Refactor tempfile handling
On Sun, Mar 09, 2014 at 02:04:16AM +0900, Brian Gesiak wrote: Once the logic is extracted into a nice API, there are several other places that can use it, too: ... I've found the following four areas so far: 1. lockfile.lock_file 2. git-compat-util.odb_mkstemp 3. git-compat-util.odb_pack_keep 4. diff.prepare_temp_file Tons of files use (1) and (2). (3) is less common, and (4) is only used for external diffs. Yeah, I would expect (1) and (2) to be the most frequent. (3) gets written on every push and fetch, but only for a short period. (4) is also used for diff's textconv, though like external diffs, they are relatively rare. In my experience, most of the cruft that gets left is from (2), since a push or fetch will spool to a tmpfile, then verify the results via git index-pack. Any failure there leaves the file in place. There are a few other potential candidates we can find by grepping for mkstemp. Not all of those might want cleanup, but it's a starting point for investigation. the shallow_XX tempfiles I'm not sure I was able to find this one. Are you referring to the lock files used when fetching, such as in fetch-pack.c? I mean the xmkstemp from setup_temporary_shallow in shallow.c. I'd say the biggest difference between lockfiles and object files is that tempfile methods like odb_mkstemp need to know the location of the object directory. Aside from that, lockfiles and the external diff files appear to be cleaned up at exit, while temporary object files tend to have a more finely controlled lifecycle. I'm still investigating this aspect of the proposal, though. The diff tempfiles are true tempfiles; they always go away in the end (though of course we want to clean them up as we finish with them, rather than doing it all at the end). Lockfiles may get committed into place (i.e., via atomic rename) or rolled back (deleted). Object files should generally be hard-linked into place, but there is some extra magic in move_temp_to_file to fallback to renames. Some of that we may be able to get rid of (e.g., we try to avoid doing cross-directory renames at all these days, so the comment there may be out of date). One question, though: the idea on the ideas page specifies that temporary pack and object files may optionally be cleaned up in case of error during program execution. How will users specify their preference? I think the API for creating temporary files should allow cleanup options to be specified on a per-file basis. That way each part of the program that creates tempfiles can specify a different config value to determine the cleanup policy. That probably makes sense. I certainly had a config option in mind. I mentioned above that the most common cruft is leftover packfiles from pushes and fetches. We haven't deleted those historically because the same person often controls both the client and the server, and they would want to possibly do forensics on the packfile sent to the remote, or even rescue objects out of it. But the remote end may simply have rejected the pack by some policy, and has no interest in forensics. Having a config option for each type of file may be cool, but I don't know how useful it would be in practice. Still, it's certainly worth thinking about and looking into. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: GSoC idea: allow git rebase --interactive todo lines to take options
On Fri, Feb 28, 2014 at 03:03:52PM +0100, Michael Haggerty wrote: I'm not sure whether it is a good idea or not. But I think it is looking decreasingly like a good GSoC project. I guess I misread the sentiment of the mailing list, because I merged this idea into the list about two hours ago. Yeesh, sorry to be so slow on the reply to this. It floated to the bottom of my to respond list. But if you think that even the proposal's simpler sub-ideas are controversial, then let me know and I will delete the idea from the list again. I don't want a GSoC student to have to fight battles of my own creation :-) I'd say keep it at this point. I think there _are_ some good ideas here, and part of a project is figuring out what is good. And part of the role of the mentor is applying some taste. There are probably students who would be a good fit, and students who would not. That is true for just about every project, of course, but I think this one is just a little trickier than some. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 0/8] Hiding refs
On Sun, Feb 23, 2014 at 09:44:14AM +0700, Duy Nguyen wrote: (Digging up an old thread about initial refs listing in git protocol) And now I am responding to it slowly. :) For that to work, the new server needs to wait for the client to speak first. How would that server handle old clients who expect to be spoken first? Wait with a read timeout (no timeout is the right timeout for everybody)? I think the client always speaks first when it asks for a remote service. Earlier in this thread you described the new protocol upload-pack-2. Why can't it be a new service upload-pack-2 in git-daemon? So new client will try requesting upload-pack-2 service with client capability advertisement before ref listing. Old servers do not recognize this service and disconnect so the new client falls back to the good old upload-pack (one more round trip though, but you could configure new client to use old protocol for certain old hosts). Similar thing happens for ssh transport. upload-pack service is always there for old clients. Right, I recall the general feeling being that such a system would work, and the transition would be managed by a config variable like remote.*.useUploadPack2. Probably with settings like: true: always try, but allow fall back to upload-pack false: never try, always use upload-pack auto: try, but if we fail, set remote.*.uploadPackTimestamp, and do not try again for N days The default would start at false, and people who know their server is very up-to-date can turn it on. And then when many server implementations support it, flip the default to auto. And either leave it there forever, or eventually just set it to true and drop auto entirely as a code cleanup. In theory we could do more radical protocol changes here, but I think most people are just interested in adding an opportunity for the client to speak before the ref advertisement in order to set a few flags/variables. That should be relatively simple, and for http we can probably pass those flags via url parameters without any extra compatibility/round-trip at all. I think the main flag of interest is giving an fnmatch pattern to limit the advertised refs. There could potentially be others, but I do not know of any offhand. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] mv: prevent mismatched data when ignoring errors.
On Sat, Mar 08, 2014 at 07:21:39PM +, brian m. carlson wrote: We shrink the source and destination arrays, but not the modes or submodule_gitfile arrays, resulting in potentially mismatched data. Shrink all the arrays at the same time to prevent this. Signed-off-by: brian m. carlson sand...@crustytoothpaste.net --- builtin/mv.c | 5 + 1 file changed, 5 insertions(+) diff --git a/builtin/mv.c b/builtin/mv.c index f99c91e..b20cd95 100644 --- a/builtin/mv.c +++ b/builtin/mv.c @@ -230,6 +230,11 @@ int cmd_mv(int argc, const char **argv, const char *prefix) memmove(destination + i, destination + i + 1, (argc - i) * sizeof(char *)); + memmove(modes + i, modes + i + 1, + (argc - i) * sizeof(char *)); + memmove(submodule_gitfile + i, + submodule_gitfile + i + 1, + (argc - i) * sizeof(char *)); I haven't looked that closely, but would it be crazy to suggest that these arrays all be squashed into one array-of-struct? It would be less error prone and perhaps more readable. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/7] commit: fix patch hunk editing with commit -p -m
On Tue, Mar 11, 2014 at 06:56:02PM +0100, Benoit Pierre wrote: According to the original commit, the change to GIT_EDITOR is only here for hooks: commit 406400ce4f69e79b544dd3539a71b85d99331820 Author: Paolo Bonzini bonz...@gnu.org Date: Tue Feb 5 11:01:45 2008 +0100 git-commit: set GIT_EDITOR=: if editor will not be launched This is a preparatory patch that provides a simple way for the future prepare-commit-msg hook to discover if the editor will be launched. Signed-off-by: Junio C Hamano gits...@pobox.com So there is really no reason to set it earlier, and not just in the hook subprocess environment. Ah, you're right. I was thinking that our invocation of launch_editor also respected it. It does, but we never get there at all because use_editor is set to 0. So yeah, it really is just needed for the hook. Your patch, even though it is a bigger change, keeps the setting to the minimal area, which is cleaner and more maintainable in the long run. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GSoC14][RFC] Proposal Draft: Refactor tempfile handling
On Tue, Mar 11, 2014 at 05:27:05PM +0100, Michael Haggerty wrote: Thanks for your proposal. I have a technical point that I think your proposal should address: Currently the linked list of lockfiles only grows, never shrinks. Once an object has been linked into the list, there is no way to remove it again even after the lock has been released. So if a lock needs to be created dynamically at a random place in the code, its memory is unavoidably leaked. Thanks, I remember thinking about this when I originally conceived of the idea, but I forgot to mention it in the idea writeup. In most cases the potential leaks are finite and small, but object creation and diff tempfiles could both be unbounded. So this is definitely something to consider. In both cases we have a bounded number of _simultaneous_ tempfiles, so one strategy could be to continue using static objects. But it should not be hard to do it dynamically, and I suspect the resulting API will be a lot easier to comprehend. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 0/8] Hiding refs
On Tue, Mar 11, 2014 at 12:32:37PM -0700, Junio C Hamano wrote: Jeff King p...@peff.net writes: I think the main flag of interest is giving an fnmatch pattern to limit the advertised refs. There could potentially be others, but I do not know of any offhand. One thing that comes to mind is where symrefs point at, which we failed to add the last time around because we ran out of the hidden-space behind NUL. Yeah, good idea. I might be misremembering some complications, but we can probably do it with: 1. Teach the client to send an advertise-symrefs flag before the ref advertisement. 2. Teach the server to include symrefs in the ref advertisement; we can invent a new syntax because we know the client has asked for it. That does not have to come immediately, though. Done correctly, upload-pack2 is not about implementing the fnmatch feature, but allowing arbitrary capability strings from the client before the ref advertisement starts. So this just becomes an extension that we can add and advertise during that new phase. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 0/8] Hiding refs
On Tue, Mar 11, 2014 at 01:25:23PM -0700, Junio C Hamano wrote: Yeah, good idea. I might be misremembering some complications, but we can probably do it with: 1. Teach the client to send an advertise-symrefs flag before the ref advertisement. 2. Teach the server to include symrefs in the ref advertisement; we can invent a new syntax because we know the client has asked for it. I was thinking more about the underlying protocol, not advertisement in particular, and I think we came to the same conclusion. The capability advertisement deserves to have its own separate packet message type, when both sides say that they understand it, so that we do not have to be limited by the pkt-line length limit. We could do one message per capability, and at the same time can lift the traditional capability hidden after the NUL is purged every time, so we need to repeat them if we want to later change it, because that is how older clients and servers use that information insanity, for example. So this may be entering the more radical changes realm I mentioned earlier. If the client is limited to setting a few flags, then something like http can get away with: GET foo.git/info/refs?service=git-upload-packadvertise-symrefsrefspec=refs/heads/* And it does not need to worry about upload-pack2 at all. Either the server recognizes and acts on them, or it ignores them. But given that we do not have such a magic out-of-band method for passing values over ssh and git, maybe it is not worth worrying about. Http can move to upload-pack2 along with the rest. One thing that _is_ worth considering for http is how the protocol starts. We do not want to introduce an extra http round-trip to the protocol if we can help it. If the initial GET becomes a POST, then it could pass along the pkt-line of client capabilities with the initial request, and the server would respond with the ref advertisement as usual. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] implement submodule config cache for lookup of submodule names
On Mon, Mar 10, 2014 at 10:24:12PM +0100, Heiko Voigt wrote: I have also moved all functions into the new submodule-config-cache module. I am not completely satisfied with the naming since it is quite long. If someone comes up with some different naming I am open for it. Maybe simply submodule-config (submodule_config prefix for functions)? Since the cache is totally internal to the submodule-config code, I do not know that you even need to have a nice submodule-config-cache API. Can it just be a singleton? That is bad design in a sense (it becomes harder later if you ever want to pull submodule config from two sources simultaneously), but it matches many other subsystems in git which cache behind the scenes. I also suspect you could call submodule_config simply submodule, and let it be a stand-in for the submodule (for now, only data from the config, but potentially you could keep other data on it, too). So with all that, the entry point into your code is just: const struct submodule *submodule_from_path(const char *path); and the caching magically happens behind-the-scenes. But take all of this with a giant grain of salt, as I am not too familiar with the needs of the callers. +/* one submodule_config_cache entry */ +struct submodule_config { + struct strbuf path; + struct strbuf name; + unsigned char gitmodule_sha1[20]; +}; Do these strings need changed after they are written once? If not, you may want to just make these bare pointers (you can still use strbufs to create them, and then strbuf_detach at the end). That may just be a matter of taste, though. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] general style: replaces memcmp() with proper starts_with()
On Wed, Mar 12, 2014 at 10:43:54AM -0400, Quint Guvernator wrote: memcmp() is replaced with negated starts_with() when comparing strings from the beginning. starts_with() looks nicer and it saves the extra argument of the length of the comparing string. Thanks, I think this is a real readability improvement in most cases. One of the things to do when reviewing a patch like this is make sure that there aren't any silly arithmetic mistakes. Checking that can be tedious, so I thought about how _I_ would do it programatically, and then compared our results. I tried: perl -i -lpe ' s/memcmp\(([^,]+), (.*?), (\d+)\)/ length($2) == $3 ? qq{!starts_with($1, $2)} : $ /ge ' $(git ls-files '*.c') That comes up with almost the same results as yours, but misses a few cases that you caught (in addition to the need to simplify !!starts_with()): - memcmp(foo, bar, strlen(bar)) - strings with interpolation, like foo\n, which is actually 4 characters in length, not 5. But there were only a few of those, and I hand-verified them. So I feel pretty good that the changes are all correct transformations. That leaves the question of whether they are all improvements in readability (more on that below). note: this commit properly handles negation in starts_with() Signed-off-by: Quint Guvernator quintus.pub...@gmail.com --- This note should go after the --- line so that it is not part of the commit message (it is only interesting to people seeing v2 and wondering what changed from v1 earlier on the list, not people reading the history much later). diff --git a/builtin/commit-tree.c b/builtin/commit-tree.c index 987a4c3..2777519 100644 --- a/builtin/commit-tree.c +++ b/builtin/commit-tree.c @@ -66,7 +66,7 @@ int cmd_commit_tree(int argc, const char **argv, const char *prefix) continue; } - if (!memcmp(arg, -S, 2)) { + if (starts_with(arg, -S)) { sign_commit = arg + 2; continue; } This is an improvement, but we still have the magic 2 below. Would skip_prefix be a better match here, like: if ((val = skip_prefix(arg, -S))) { sign_commit = val; continue; } We've also talked about refactoring skip_prefix to return a boolean, and put the value in an out-parameter. Which would make this even more readable: if (skip_prefix(arg, -S, sign_commit)) continue; Maybe the time has come to do that. --- a/builtin/mailinfo.c +++ b/builtin/mailinfo.c @@ -626,7 +626,7 @@ static int handle_boundary(void) strbuf_addch(newline, '\n'); again: if (line.len = (*content_top)-len + 2 - !memcmp(line.buf + (*content_top)-len, --, 2)) { + starts_with(line.buf + (*content_top)-len, --)) { I'm not sure if this improves readability or not. It's certainly nice to get rid of the magic 2, but starts_with is a bit of a misnomer, since we are indexing deep into the buffer anyway. And we still have the magic 2 above anyway. I'm on the fence. @@ -727,8 +727,8 @@ static int is_scissors_line(const struct strbuf *line) continue; } if (i + 1 len - (!memcmp(buf + i, 8, 2) || !memcmp(buf + i, 8, 2) || - !memcmp(buf + i, %, 2) || !memcmp(buf + i, %, 2))) { + (starts_with(buf + i, 8) || starts_with(buf + i, 8) || + starts_with(buf + i, %) || starts_with(buf + i, %))) { Same as above. @@ -66,7 +66,7 @@ static int verify_tag(char *buffer, unsigned long size) return error(char%PRIuMAX: could not find next \\\n\, (uintmax_t) (type_line - buffer)); tag_line++; - if (memcmp(tag_line, tag , 4) || tag_line[4] == '\n') + if (!starts_with(tag_line, tag ) || tag_line[4] == '\n') return error(char%PRIuMAX: no \tag \ found, (uintmax_t) (tag_line - buffer)); I think this is another that could benefit from an enhanced skip_prefix: if (!skip_prefix(tag_line, tag , tag_name) || *tag_name == '\n') ... and then we can get rid of the tag_line += 4 that is used much later (in fact, I suspect that whole function could be improved in this respect). @@ -269,7 +269,7 @@ int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long s return 0; item-object.parsed = 1; tail += size; - if (tail = bufptr + 46 || memcmp(bufptr, tree , 5) || bufptr[45] != '\n') + if (tail = bufptr + 46 || !starts_with(bufptr, tree ) || bufptr[45] != '\n') return error(bogus commit object %s, sha1_to_hex(item-object.sha1)); if (get_sha1_hex(bufptr + 5, parent) 0) Again, we just use bufptr + 5 a bit later here. So a candidate for skip_prefix. graft = lookup_commit_graft(item-object.sha1); - while (bufptr + 48
Re: [PATCH] general style: replaces memcmp() with starts_with()
On Wed, Mar 12, 2014 at 06:22:24PM +0100, Jens Lehmann wrote: Let me know if you still think the hunk should be dropped there. Yes, I think so. That spot uses memcmp() because ce-name may not be 0-terminated. If that assumption isn't correct, it should be replaced with a plain strcmp() instead (while also dropping the ce_namelen() comparison in the line above). But starts_with() points into the wrong direction there. I think the length-check and memcmp is an optimization[1] here. But we should be able to encapsulate that pattern and avoid magic numbers entirely with something like mem_equals(). See my other response for more details. -Peff [1] Getting rid of the magic number entirely means we have to call strlen(.gitmodules), which seems like it is working against this optimization. But I think past experiments have shown that decent compilers will optimize strlen on a string literal to a constant, so as long as mem_equals is an inline, it should be equivalent. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: New GSoC microproject ideas
On Wed, Mar 12, 2014 at 08:16:53PM +0100, David Kastrup wrote: Junio C Hamano gits...@pobox.com writes: Here is another, as I seem to have managed to kill another one ;-) -- 8 -- VAR=VAL command is sufficient to run 'command' with environment variable VAR set to value VAL without affecting the environment of the shell itself, but we cannot do the same with a shell function (most notably, test_must_fail); No? bash: dak@lola:/usr/local/tmp/lilypond$ zippo() { echo $XXX echo $XXX } dak@lola:/usr/local/tmp/lilypond$ XXX=8 zippo 8 8 Try: zippo() { echo $XXX } XXX=8 zippo zippo XXX remains set after the first call under dash (but not bash). I believe ash has the same behavior. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What's cooking in git.git (Mar 2014, #02; Tue, 11)
On Tue, Mar 11, 2014 at 03:12:11PM -0700, Junio C Hamano wrote: * jk/warn-on-object-refname-ambiguity (2014-01-09) 6 commits - get_sha1: drop object/refname ambiguity flag - get_sha1: speed up ambiguous 40-hex test - FIXUP: teach DO_FOR_EACH_NO_RECURSE to prime_ref_dir() - refs: teach for_each_ref a flag to avoid recursion - cat-file: fix a minor memory leak in batch_objects - cat-file: refactor error handling of batch_objects Expecting a reroll. I finally got a chance to return to this one. Michael had some good comments on the refactoring that was going on in the middle patches. He ended with: Yes. Still, the code is really piling up for this one warning for the contrived eventuality that somebody wants to pass SHA-1s and branch names together in a single cat-file invocation *and* wants to pass lots of inputs at once and so is worried about performance *and* has reference names that look like SHA-1s. Otherwise we could just leave the warning disabled in this case, as now. Or we could add a new --hashes-only option that tells cat-file to treat all of its arguments/inputs as SHA-1s; such an option would permit an even faster code path for bulk callers. Having looked at it again, I really think it is not worth pursuing. The end goal is not that interesting, there is a lot of code introduced, and a reasonable chance of accidentally introducing regressions and/or making the code less maintainable. Keeping the existing code (which just disables the check for cat-file) is probably the sanest course of action. We can do a similar thing for rev-list --stdin if we want, or we can wait until somebody complains. The bottom two patches are reasonable cleanups we should keep, though (and the rest can just be discarded). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] general style: replaces memcmp() with proper starts_with()
On Wed, Mar 12, 2014 at 12:39:01PM -0700, Junio C Hamano wrote: Jeff King p...@peff.net writes: static inline int standard_header_field(const char *field, size_t len) { - return ((len == 4 !memcmp(field, tree , 5)) || - (len == 6 !memcmp(field, parent , 7)) || - (len == 6 !memcmp(field, author , 7)) || - (len == 9 !memcmp(field, committer , 10)) || - (len == 8 !memcmp(field, encoding , 9))); + return ((len == 4 starts_with(field, tree )) || + (len == 6 starts_with(field, parent )) || + (len == 6 starts_with(field, author )) || + (len == 9 starts_with(field, committer )) || + (len == 8 starts_with(field, encoding ))); These extra len checks are interesting. They look like an attempt to optimize lookup, since the caller will already have scanned forward to the space. If one really wants to remove the magic constants from this, then one must take advantage of the pattern len == strlen(S) - 1 !memcmp(field, S, strlen(S)) that appears here, and come up with a simple abstraction to express that we are only using the string S (e.g. tree ), length len and location field of the counted string. Blindly replacing starts_with() with !memcmp() in the above part is a readability regression otherwise. I actually think the right solution is: static inline int standard_header_field(const char *field, size_t len) { return mem_equals(field, len, tree ) || mem_equals(field, len, parent ) || ...; } and the caller should tell us it's OK to look at field[len]: standard_header_field(line, eof - line + 1) We could also omit the space from the standard_header_field. The caller just ran strchr() looking for the space, so we know that either it is there, or we are at the end of the line/buffer. Arguably a string like parent\n should be a parent header with no data (but right now it is not matched by this function). I'm not aware of an implementation that writes such a thing, but it seems to fall in the be liberal in what you accept category. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/2] fsck.c: Change the type of fsck_ident()'s first argument
On Thu, Mar 13, 2014 at 02:51:29AM +0800, Yuxuan Shui wrote: Since fsck_ident doesn't change the content of **ident, the type of ident could be const char **. Unfortunately, const double-pointers in C are a bit tricky, and a pointer to char * cannot automatically be passed as a pointer to const char *. I think you want this on top: diff --git a/fsck.c b/fsck.c index 1789c34..7776660 100644 --- a/fsck.c +++ b/fsck.c @@ -281,7 +281,7 @@ static int fsck_ident(const char **ident, struct object *obj, fsck_error error_f static int fsck_commit(struct commit *commit, fsck_error error_func) { - char *buffer = commit-buffer; + const char *buffer = commit-buffer; unsigned char tree_sha1[20], sha1[20]; struct commit_graft *graft; int parents = 0; Otherwise, gcc will complain about incompatible pointer types. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: New GSoC microproject ideas
On Wed, Mar 12, 2014 at 09:37:41PM +0100, David Kastrup wrote: Try: zippo() { echo $XXX } XXX=8 zippo zippo XXX remains set after the first call under dash (but not bash). I believe ash has the same behavior. Yes. I would lean towards considering this a bug. But I agree that it does not help. Dash's behavior is POSIX (and bash --posix behaves the same way). http://article.gmane.org/gmane.comp.version-control.git/137095 -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] general style: replaces memcmp() with proper starts_with()
On Wed, Mar 12, 2014 at 01:08:03PM -0700, Junio C Hamano wrote: Jeff King p...@peff.net writes: Blindly replacing starts_with() with !memcmp() in the above part is a readability regression otherwise. I actually think the right solution is: static inline int standard_header_field(const char *field, size_t len) { return mem_equals(field, len, tree ) || mem_equals(field, len, parent ) || ...; } and the caller should tell us it's OK to look at field[len]: standard_header_field(line, eof - line + 1) We could also omit the space from the standard_header_field. Yes, that was what I had in mind. The only reason why the callee (over-)optimizes the SP must follow these know keywords part by using the extra len parameter is because the caller has to do a single strchr() to skip an arbitrary field name anyway so computing len is essentially free. One thing that bugs me about the current code is that the sub-function looks one past the end of the length given to it by the caller. Switching it to pass eof - line + 1 resolves that, but is it right? The character pointed at by eof is either: 1. space, if our strchr(line, ' ') found something 2. the first character of the next line, if our memchr(line, '\n', eob - line) found something 3. Whatever is at eob (end-of-buffer) There are two questionable things here. In (1), we use strchr on a sized buffer. And in (3), we look one past the size that was passed in. In both cases, we are saved by the fact that the buffer is actually NUL terminated at the end of size (because it comes from read_sha1_file). But we may find a space much further than the line ending which is supposed to be our boundary, and end up having to do a comparison to cover this case. So I think the current code is correct, but we could make it more obvious by: 1. Restricting our search for the field separator to the current line. 2. Explicitly avoid looking for headers when we did not find a space, since we cannot match anything anyway. Like: diff --git a/commit.c b/commit.c index 6bf4fe0..9383cc1 100644 --- a/commit.c +++ b/commit.c @@ -1325,14 +1325,14 @@ static struct commit_extra_header *read_commit_extra_header_lines( strbuf_reset(buf); it = NULL; - eof = strchr(line, ' '); - if (next = eof) + eof = memchr(line, ' ', next - line); + if (eof) { + if (standard_header_field(line, eof - line + 1) || + excluded_header_field(line, eof - line, exclude)) + continue; + } else eof = next; - if (standard_header_field(line, eof - line) || - excluded_header_field(line, eof - line, exclude)) - continue; - it = xcalloc(1, sizeof(*it)); it-key = xmemdupz(line, eof-line); *tail = it; I also think that eof = next (which I retained here) is off-by-one. next here is not the newline, but the start of the next line. And I'm guessing the code actually wanted the newline (otherwise it-key ends up with the newline in it). But we cannot just subtract one, because if we hit eob, it really is in the right spot. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] general style: replaces memcmp() with proper starts_with()
On Wed, Mar 12, 2014 at 05:14:15PM -0400, Jeff King wrote: I also think that eof = next (which I retained here) is off-by-one. next here is not the newline, but the start of the next line. And I'm guessing the code actually wanted the newline (otherwise it-key ends up with the newline in it). But we cannot just subtract one, because if we hit eob, it really is in the right spot. I started on a patch for this, but found another interesting corner case. If we do not find a newline and hit end-of-buffer (which _shouldn't_ happen, as that is a malformed commit object), we set next to eob. But then we copy the whole string, including *next into the value of the header. So we intend to capture the trailing newline in the value, and do in the normal case. But in the end-of-buffer case, we capture an extra NUL. I think that's OK, because the eventual fate of the buffer is to become a C-string. But it does mean that the result sometimes has a newline-terminator and sometimes doesn't, and the calling code needs to handle this when printing, or risk lines running together. Should this function append a newline if there is not already one? If it's a mergetag header, we feed the result to gpg, etc, and expect to get the data out verbatim. We would not want to mess that up. OTOH, the commit object is bogus (and possibly truncated) in the first place, so it probably doesn't really matter. And the GPG signature on tag objects has its own delimiters, so a stray newline present or not at the end should not matter. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] general style: replaces memcmp() with proper starts_with()
On Wed, Mar 12, 2014 at 05:14:15PM -0400, Jeff King wrote: One thing that bugs me about the current code is that the sub-function looks one past the end of the length given to it by the caller. Switching it to pass eof - line + 1 resolves that, but is it right? The character pointed at by eof is either: 1. space, if our strchr(line, ' ') found something 2. the first character of the next line, if our memchr(line, '\n', eob - line) found something 3. Whatever is at eob (end-of-buffer) This misses a case. We might find no space at all, and eof is NULL. We never dereference it, so we don't segfault, but it does cause a bogus giant allocation. I'm out of time for the day, but here is a test I started on that demonstrates the failure: diff --git a/t/t7513-commit-bogus-extra-headers.sh b/t/t7513-commit-bogus-extra-headers.sh index e69de29..834db08 100755 --- a/t/t7513-commit-bogus-extra-headers.sh +++ b/t/t7513-commit-bogus-extra-headers.sh @@ -0,0 +1,26 @@ +#!/bin/sh + +test_description='check parsing of badly formed commit objects' +. ./test-lib.sh + +EMPTY_TREE=4b825dc642cb6eb9a060e54bf8d69288fbee4904 + +test_expect_success 'newline right after mergetag header' ' + test_tick + cat commit -EOF + tree $EMPTY_TREE + author $GIT_AUTHOR_NAME $GIT_AUTHOR_EMAIL $GIT_AUTHOR_DATE + committer $GIT_COMMITTER_NAME $GIT_COMMITTER_EMAIL $GIT_COMMITTER_DATE + mergetag + + foo + EOF + commit=$(git hash-object -w -t commit commit) + cat expect -EOF + todo... + EOF + git --no-pager show --show-signature $commit actual + test_cmp expect actual +' + +test_done The git show fails with: fatal: Out of memory, malloc failed (tried to allocate 18446744073699764927 bytes) So I think the whole function could use some refactoring to handle corner cases better. I'll try to take a look tomorrow, but please feel free if somebody else wants to take a crack at it. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Fix possible buffer overflow in remove_subtree()
On Thu, Mar 13, 2014 at 10:19:06AM +0100, Michael Haggerty wrote: These patches are proposed for maint (but also apply cleanly to master). I presume that this is exploitable via Git commands, though I haven't verified it explicitly [1]. It's possible to overflow this buffer, like: git init repo cd repo blob=$(git hash-object -w /dev/null) big=$(perl -e 'print a x 250') (for i in $(seq 1 17); do mkdir $big cd $big; done) printf 100644 blob $blob\t$big\n tree tree=$(git mktree tree) git read-tree $tree git checkout-index -f --all but I'm not sure if it is easily exploitable for two reasons: 1. We never actually return from the function with the smashed stack. Immediately after overflowing the buffer, we call lstat(), which will fail with ENAMETOOLONG (at least on Linux), causing us to call into die_errno and exit. 2. The overflow relies on us trying to move a very deep hierarchy out of the way, but I could not convince git to _create_ such a hierarchy in the first place. Here I do it using relative paths and recursion, but git generally tries to create paths from the top of the repo using full pathnames. So it gets ENAMETOOLONG trying to create the paths in the first place. Of course somebody may be more clever than I am, or perhaps some systems have a PATH_MAX that is not enforced by the kernel. It's still a suspicious bit of code, and I think your patches are a strict improvement. Besides fixing this potential problem, I notice that we currently put 4096 bytes on the stack for a recursive function. Removing a/a/a... can put up to 8M on the stack, which might be too much on some systems (besides just being silly and wasteful). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] general style: replaces memcmp() with proper starts_with()
On Thu, Mar 13, 2014 at 10:47:28AM -0700, Junio C Hamano wrote: --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -82,7 +82,7 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name) enum object_type type; unsigned long size; char *buffer = read_sha1_file(sha1, type, size); - if (memcmp(buffer, object , 7) || + if (!starts_with(buffer, object ) || [...] The original hunks show that the code knows and relies on magic numbers 7 and 8 very clearly and there are rooms for improvement. Like: what if the file is empty? Yes. I think this one is actually OK. The result of read_sha1_file is NUL-terminated, and we know that starts_with reads byte-by-byte (the prior memcmp is wrong, but only if you care about accessing bytes past the end of the NUL). That whole piece of code seems silly, though, IMHO. It should be using parse_tag or peel_to_type. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: No progress from push when using bitmaps
On Wed, Mar 12, 2014 at 05:21:21PM -0700, Shawn Pearce wrote: Today I tried pushing a copy of linux.git from a client that had bitmaps into a JGit server. The client stalled for a long time with no progress, because it reused the existing pack. No progress appeared while it was sending the existing file on the wire: $ git push git://localhost/linux.git master Reusing existing pack: 2938117, done. Total 2938117 (delta 0), reused 0 (delta 0) remote: Resolving deltas: 66% (1637269/2455727) This is not the best user experience. :-( Yeah, I agree that sucks. I hadn't noticed it, as I don't typically have my client repos bitmapped (and on fetch, the interesting progress is coming from the local index-pack). It would definitely be good to have throughput measurements while writing out the pack. However, I'm not sure we have anything useful to count. We know the total number of objects we're reusing, but we're not actually parsing the data; we're just blitting it out as a stream. I think the progress code will need some refactoring to handle a throughput-only case. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: No progress from push when using bitmaps
On Thu, Mar 13, 2014 at 03:01:09PM -0700, Shawn Pearce wrote: It would definitely be good to have throughput measurements while writing out the pack. However, I'm not sure we have anything useful to count. We know the total number of objects we're reusing, but we're not actually parsing the data; we're just blitting it out as a stream. I think the progress code will need some refactoring to handle a throughput-only case. Yes. I think JGit suffers from this same bug, and again we never noticed it because usually only the servers are bitmapped, not the clients. pack-objects writes a throughput meter when its writing objects. Really just the bytes out/second would be enough to let the user know the client is working. Unfortunately I think that is still tied to the overall progress system having some other counter? Yes, I'm looking at it right now. The throughput meter is actually connected to the sha1fd output. So really we just need to call display_progress periodically as we loop through the data. It's a one-liner fix. _But_ it still looks ugly, because, as you mention, it's tied to the progress meter, which is counting up to N objects. So we basically sit there at 0, pumping data, and then after the pack is done, we can say we sent N. :) There are a few ways around this: 1. Add a new phase Writing packs which counts from 0 to 1. Even though it's more accurate, moving from 0 to 1 really isn't that useful (the throughput is, but the 0/1 just looks like noise). 2. Add a new phase Writing reused objects that counts from 0 bytes up to N bytes. This looks stupid, though, because we are repeating the current byte count both here and in the throughput. 3. Use the regular Writing objects progress, but fake the object count. We know we are writing M bytes with N objects. Bump the counter by 1 for every M/N bytes we write. The first two require some non-trivial surgery to the progress code. I am leaning towards the third. Not just because it's easy, but because I think it actually shows the most intuitive display. Yes, it's fudging the object numbers, but those are largely meaningless anyway (in fact, it makes them _better_ because now they're even, instead of getting 95% done and then hitting some blob that is as big as the rest of the repo combined). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: No progress from push when using bitmaps
On Thu, Mar 13, 2014 at 06:07:54PM -0400, Jeff King wrote: 3. Use the regular Writing objects progress, but fake the object count. We know we are writing M bytes with N objects. Bump the counter by 1 for every M/N bytes we write. Here is that strategy. I think it looks pretty nice, and it seamlessly handles the case where you have extra objects to send on top of the reused pack (we just keep the same progress meter counting up). diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 831dd05..f187859 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -709,7 +709,7 @@ static struct object_entry **compute_write_order(void) static off_t write_reused_pack(struct sha1file *f) { unsigned char buffer[8192]; - off_t to_write; + off_t to_write, total; int fd; if (!is_pack_valid(reuse_packfile)) @@ -726,7 +726,7 @@ static off_t write_reused_pack(struct sha1file *f) if (reuse_packfile_offset 0) reuse_packfile_offset = reuse_packfile-pack_size - 20; - to_write = reuse_packfile_offset - sizeof(struct pack_header); + total = to_write = reuse_packfile_offset - sizeof(struct pack_header); while (to_write) { int read_pack = xread(fd, buffer, sizeof(buffer)); @@ -739,10 +739,23 @@ static off_t write_reused_pack(struct sha1file *f) sha1write(f, buffer, read_pack); to_write -= read_pack; + + /* +* We don't know the actual number of objects written, +* only how many bytes written, how many bytes total, and +* how many objects total. So we can fake it by pretending all +* objects we are writing are the same size. This gives us a +* smooth progress meter, and at the end it matches the true +* answer. +*/ + written = reuse_packfile_objects * + (((double)(total - to_write)) / total); + display_progress(progress_state, written); } close(fd); - written += reuse_packfile_objects; + written = reuse_packfile_objects; + display_progress(progress_state, written); return reuse_packfile_offset - sizeof(struct pack_header); } -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html