Re: insteadOf and git-request-pull output
"brian m. carlson" writes: >> $ git request-pull HEAD^ git://foo.example.com/example | grep example >> ssh://bar.example.com/example >> >> I think that if we use the "principle of least surprise," insteadOf >> rules shouldn't be applied for git-request-pull URLs. > > I'd like to point out a different use that may change your view. I have > an insteadOf alias, gh:, that points to GitHub. Performing the rewrite > is definitely the right thing to do, since other users may not have my > alias available. > > I agree that in your case, a rewrite seems less appropriate, but I think > we should only skip the rewrite if the value already matches a valid > URL. It would be tricky to define what a valid URL is, though. Do we need some way to say "this is a private URL that should not be given preference when composing a request-pull message"? E.g. [url "git://git.dev/"] insteadOf = https://git.dev/ [url "https://github.com/;] insteadOf = gh: private The former does not mark https://git.dev/ a private one, so a "request-pull https://git.dev/$thing; would show the original "https://git.dev/$thing; without rewriting. The latter marks gh: a private one so "request-pull gh:$thing" would be rewritten before exposed to the public as "https://github.com/$thing; Or something like that?
Re: [PATCH] technical doc: add a design doc for the evolve command
sxe...@google.com writes: > +Detailed design > +=== > +Obsolescence information is stored as a graph of meta-commits. A meta-commit > is > +a specially-formatted merge commit that describes how one commit was created > +from others. > + > +Meta-commits look like this: > + > +$ git cat-file -p > +tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904 > +parent aa7ce55545bf2c14bef48db91af1a74e2347539a > +parent d64309ee51d0af12723b6cb027fc9f195b15a5e9 > +parent 7e1bbcd3a0fa854a7a9eac9bf1eea6465de98136 > +author Stefan Xenos 1540841596 -0700 > +committer Stefan Xenos 1540841596 -0700 > +parent-type content > +parent-type obsolete > +parent-type origin > + > +This says “commit aa7ce555 makes commit d64309ee obsolete. It was created by > +cherry-picking commit 7e1bbcd3”. > + > +The tree for meta-commits is always the empty tree whose hash matches > +4b825dc642cb6eb9a060e54bf8d69288fbee4904 exactly, but future versions of git > may > +attach other trees here. For forward-compatibility fsck should ignore such > trees > +if found on future repository versions. Similarly, current versions of git > +should always fill in an empty commit comment and tools like fsck should > ignore > +the content of the commit comment if present in a future repository version. > +This will allow future versions of git to add metadata to the meta-commit > +comments or tree without breaking forwards compatibility. Am I correct to understand that the reason why a commit object is (ab|re)used to represent a meta-commit is because by doing so we would get connectivity (i.e. fetching & pushing would transfer all the associated objects along) for free, and by not representing it as a new and different object type, existing implementations can just pass them along without understanding what they are, and as long as these are not mixed as parts of the main history of the project (e.g. when enumerating commits that has aa7ce5 as its parents, because somebody else obsoleted aa7ce5 and you want to evolve anything that built on it, you do not want to mistake the above "meta" commit as a commit that is part of the ordinary history and rebuild on top of the new version of aa7ce5, which would lead to a disaster), everything would work just fine? Perhaps you'd use something like "presence of parent-type header marks that a commit is a meta-commit and not part of the main history". How are these meta commits anchored so that it won't be reclaimed by repack? I do not see any "parent" field used to chain them together, but I do not think we can afford to spend one ref per meta commit, as refs are not designed to point into each and every object in the repository. I have a moderately strong opposition against "origin" thing. If aa7ce555 replaces d664309ee, in order for the tool to be able to "evolve" other histories that build on top of d664309ee, it only needs the history between aa7ce555 and d664309ee and it would not matter how aa7ce555 was built relative to its parent. The user may have typed/developed it from scratch, the user may have borrowed 70% of its change from 7e1bbcd while remaining 30% was done from scratch, or it was a concatenation of the change made in 7e1bbcd and another commit. One half of my point being that we can do _without_ it, and in all cases, aa7ce555, if leaving the fact that it was derived from 7e1bbcd is so important, can mention that in its log message how it relates to the "origin" thing. And the other half is that while I consider the "origin" thing is unnecessary for the above reasons, having it means we need to not just transfer the history reading to aa7ce555 and d664309ee (which are necessary anyway while we have histories to transplant from d664309ee to aa7ce555) but also have to pull in the history leading to 7e1bbcd and we cannot discard it.
Re: [PATCH] bundle: dup() output descriptor closer to point-of-use
Jeff King writes: > Actually, I realized there's an even simpler way to do this. Here it is. > > -- >8 -- > Subject: [PATCH] bundle: dup() output descriptor closer to point-of-use > > When writing a bundle to a file, the bundle code actually creates > "your.bundle.lock" using our lockfile interface. We feed that output > descriptor to a child git-pack-objects via run-command, which has the > quirk that it closes the output descriptor in the parent. > > To avoid confusing the lockfile code (which still thinks the descriptor > is valid), we dup() it, and operate on the duplicate. Yes... > We can solve this by moving the dup() much closer to start_command(), > shrinking the window in which we have the second descriptor open. It's > easy to place this in such a way that no die() is possible. We could > still die due to a signal in the exact wrong moment, but we already > tolerate races there (e.g., a signal could come before we manage to put > the file on the cleanup list in the first place). > > As a bonus, this shields create_bundle() itself from the duplicate-fd > trick, and we can simplify its error handling (note that the lock > rollback now happens unconditionally, but that's OK; it's a noop if we > didn't open the lock in the first place). ... I found this way too clever for me, but by following the codeflow, it was easy to convince myself that this does the right thing. Almost magical ;-) Will queue, with a Tested-by: with Dscho's name on it. TAhanks, both.
Re: [RFC/PATCH] read-cache: write all indexes with the same permissions
On Fri, Nov 16, 2018 at 8:20 PM Duy Nguyen wrote: > > On Fri, Nov 16, 2018 at 8:07 PM SZEDER Gábor wrote: > > > With the default 20% threshold a new shared index is written rather > > frequently with our usual small test-repos: > > Side note. Split index is definitely not meant for small repos. I very much agree with that. It makes sense to use them only for big repos and big repos usually don't pass a 20% threshold very often. > But > maybe we should have a lower limit (in terms of absolute number of > entries) that prevent splitting. This splitting seems excessive. I would agree if split index was the default mode or if our goal was to eventually make it the default mode. Or it could be a new "mixed" mode for core.splitIndex (which might eventually become the default mode) to have no split-index as long as the repo stays under a lower limit and to automatically use split-index when the repo gets over the limit.
Re: [RFC/PATCH 0/5] stop installing old libexec aliases like "git-init"
On Fri, Nov 16, 2018 at 08:22:11PM +0100, Ævar Arnfjörð Bjarmason wrote: > So maybe we should just document this interface better. It seems one > implicit dependency is that we expect a manpage for the tool to exist > for --help. Yeah, I think this really the only problematic assumption. The rest of "-c", "--git-dir", etc, are just manipulating the environment, and that gets passed through to sub-invocations of Git (so if I have a script which calls git-config, it will pick up the "-c" config). It would be nice if there was a way to ask "is there a manpage?", and fallback to running "git-cmd --help". But: 1. I don't think there is a portable way to query that via man. And anyway, depending platform and config, it may be opening a browser to show HTML documentation (or I think even texinfo?). 2. We can just ask whether "man git-sizer" (or whatever help display command) returned a non-zero exit code, and fall back to "git-sizer --help". But there's an infinite-loop possibility here: running "git-sizer --help" does what we want. But if "man git-log" failed, we'd run "git-log --help", which in turn runs "git help log", which runs "man git-log", and so on. You can break that loop with an environment variable for "I already tried to show the manpage", which would I guess convert "--help" to "-h". So it's maybe do-able, but not quite as trivial as one might hope. > But I don't remember the details, and can't reproduce it now with: > > $ cat ~/bin/git-sigint > #!/usr/bin/env perl > $SIG{INT} = sub { warn localtime . " " . $$ }; > sleep 1 while 1; > $ git sigint # or git-sigint > [... behaves the same either way ...] > > So maybe it was something else, or I'm misremembering... I think that generally works because hitting ^C is going to send SIGINT to the whole process group. A more interesting case is: git sigint & kill -INT $! There $! is a parent "git" process that is just waiting on git-sigint to die. But that works, too, because the parent relays common signals due to 10c6cddd92 (dashed externals: kill children on exit, 2012-01-08). So you might have been remembering issues prior to that commit (or uncommon signals; these come from sigchain_push_common). -Peff
Re: [PATCH] technical doc: add a design doc for the evolve command
On Thu, Nov 15, 2018 at 2:00 AM wrote: > +Goals > +- > +Legend: Goals marked with P0 are required. Goals marked with Pn should be > +attempted unless they interfere with goals marked with Pn-1. > + > +P0. All commands that modify commits (such as the normal commit --amend or > +rebase command) should mark the old commit as being obsolete and > replaced by > +the new one. No additional commands should be required to keep the > +obsolescence graph up-to-date. I sometimes "modify" a commit by "git reset @^", pick up the changes then "git commit -c @{1}". I don't think this counts as a typical modification and is probably hard to detect automatically. But I hope there's some way for me to tell git "yes this is a modified commit of that one, record that!". > +Example usage > +- > +# First create three dependent changes > +$ echo foo>bar.txt && git add . > +$ git commit -m "This is a test" > +created change metas/this_is_a_test I guess as an example, how the name metas/this_is_a_test is constructed does not matter much. But it's probably better to stick with some sort of id because subject line will change over time and the original one may become irrelevant. Perhaps we could use the original commit id as name. > +$ echo foo2>bar2.txt && git add . > +$ git commit -m "This is also a test" > +created change metas/this_is_also_a_test > +$ echo foo3>bar3.txt && git add . > +$ git commit -m "More testing" > +created change metas/more_testing > + > +# List all our changes in progress > +$ git change -l > +metas/this_is_a_test > +metas/this_is_also_a_test > +* metas/more_testing > +metas/some_change_already_merged_upstream > + > +# Now modify the earliest change, using its stable name > +$ git reset --hard metas/this_is_a_test > +$ echo morefoo>>bar.txt && git add . && git commit --amend --no-edit > + > +# Use git-evolve to fix up any dependent changes > +$ git evolve > +rebasing metas/this_is_also_a_test onto metas/this_is_a_test > +rebasing metas/more_testing onto metas/this_is_also_a_test > +Done > + > +# Use git-obslog to view the history of the this_is_a_test change > +$ git obslog > +93f110 metas/this_is_a_test@{0} commit (amend): This is a test > +930219 metas/this_is_a_test@{1} commit: This is a test > + > +# Now create an unrelated change > +$ git reset --hard origin/master > +$ echo newchange>unrelated.txt && git add . > +$ git commit -m "Unrelated change" > +created change metas/unrelated_change > + > +# Fetch the latest code from origin/master and use git-evolve > +# to rebase all dependent changes. > +$ git fetch origin master > +$ git evolve origin/master > +deleting metas/some_change_already_merged_upstream > +rebasing metas/this_is_a_test onto origin/master > +rebasing metas/this_is_also_a_test onto metas/this_is_a_test > +rebasing metas/more_testing onto metas/this_is_also_a_test > +rebasing metas/unrelated_change onto origin/master > +Conflict detected! Resolve it and then use git evolve --continue to resume. > + > +# Sort out the conflict > +$ git mergetool > +$ git evolve --continue > +Done > + > +# Share the full history of edits for the this_is_a_test change > +# with a review server > +$ git push origin metas/this_is_a_test:refs/for/master > +# Share the lastest commit for “Unrelated change”, without history > +$ git push origin HEAD:refs/for/master How do we group changes of a topic together? I think branch-diff could take advantage of that. > +Detailed design > +=== > +Obsolescence information is stored as a graph of meta-commits. A meta-commit > is > +a specially-formatted merge commit that describes how one commit was created > +from others. > + > +Meta-commits look like this: > + > +$ git cat-file -p > +tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904 > +parent aa7ce55545bf2c14bef48db91af1a74e2347539a > +parent d64309ee51d0af12723b6cb027fc9f195b15a5e9 > +parent 7e1bbcd3a0fa854a7a9eac9bf1eea6465de98136 > +author Stefan Xenos 1540841596 -0700 > +committer Stefan Xenos 1540841596 -0700 > +parent-type content > +parent-type obsolete > +parent-type origin > + > +This says “commit aa7ce555 makes commit d64309ee obsolete. It was created by > +cherry-picking commit 7e1bbcd3”. This feels a bit forced. Could we just organize it like a normal history? Something like * |\ | * last version of the commit * |\ | * second last version of the commit * |\ Basically all commits will be linked in a new merge history. Real commits are on the second parent, first parent is to link changes together. This makes it possible to just use "git log --first-parent --patch" (or "git log --oneline --graph") to examine the change. More details (e.g. parent-type) could be stored as normal trailers in the commit message of these merges. -- Duy
[PATCH-2] Change writing style
--- parse-options.c | 4 ++-- t/t0040-parse-options.sh | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/parse-options.c b/parse-options.c index 3b874a83a0c89..f5ef24155950b 100644 --- a/parse-options.c +++ b/parse-options.c @@ -352,7 +352,7 @@ static void check_typos(const char *arg, const struct option *options) return; if (starts_with(arg, "no-")) { - error ("did you mean `--%s` (with two dashes ?)", arg); + error("did you mean `--%s` (with two dashes)?", arg); exit(129); } @@ -360,7 +360,7 @@ static void check_typos(const char *arg, const struct option *options) if (!options->long_name) continue; if (starts_with(options->long_name, arg)) { - error ("did you mean `--%s` (with two dashes ?)", arg); + error("did you mean `--%s` (with two dashes)?", arg); exit(129); } } diff --git a/t/t0040-parse-options.sh b/t/t0040-parse-options.sh index 5b0560fa20e34..c442f9ae15ff8 100755 --- a/t/t0040-parse-options.sh +++ b/t/t0040-parse-options.sh @@ -222,7 +222,7 @@ test_expect_success 'non ambiguous option (after two options it abbreviates)' ' ' cat >typo.err <<\EOF -error: did you mean `--boolean` (with two dashes ?) +error: did you mean `--boolean` (with two dashes)? EOF test_expect_success 'detect possible typos' ' @@ -232,7 +232,7 @@ test_expect_success 'detect possible typos' ' ' cat >typo.err <<\EOF -error: did you mean `--ambiguous` (with two dashes ?) +error: did you mean `--ambiguous` (with two dashes)? EOF test_expect_success 'detect possible typos' ' -- https://github.com/git/git/pull/540
[PATCH v5 1/1] protocol: advertise multiple supported versions
Currently the client advertises that it supports the wire protocol version set in the protocol.version config. However, not all services support the same set of protocol versions. For example, git-receive-pack supports v1 and v0, but not v2. If a client connects to git-receive-pack and requests v2, it will instead be downgraded to v0. Other services, such as git-upload-archive, do not do any version negotiation checks. This patch creates a protocol version registry. Individual client and server programs register all the protocol versions they support prior to communicating with a remote instance. Versions should be listed in preference order; the version specified in protocol.version will automatically be moved to the front of the registry. The protocol version registry is passed to remote helpers via the GIT_PROTOCOL environment variable. Clients now advertise the full list of registered versions. Servers select the first allowed version from this advertisement. Additionally, remove special cases around advertising version=0. Previously we avoided adding version negotiation fields in server responses if it looked like the client wanted v0. However, including these fields does not change behavior, so it's better to have simpler code. While we're at it, remove unnecessary externs from function declarations in protocol.h. Signed-off-by: Josh Steadmon --- builtin/archive.c | 3 + builtin/clone.c | 4 ++ builtin/fetch-pack.c| 4 ++ builtin/fetch.c | 5 ++ builtin/ls-remote.c | 5 ++ builtin/pull.c | 5 ++ builtin/push.c | 4 ++ builtin/receive-pack.c | 3 + builtin/send-pack.c | 3 + builtin/upload-archive.c| 3 + builtin/upload-pack.c | 4 ++ connect.c | 52 +++ protocol.c | 122 +--- protocol.h | 23 ++- remote-curl.c | 27 +--- t/t5551-http-fetch-smart.sh | 1 + t/t5570-git-daemon.sh | 2 +- t/t5601-clone.sh| 38 +-- t/t5700-protocol-v1.sh | 8 +-- t/t5702-protocol-v2.sh | 16 +++-- transport-helper.c | 6 ++ 21 files changed, 256 insertions(+), 82 deletions(-) diff --git a/builtin/archive.c b/builtin/archive.c index e74f675390..8adb9f381b 100644 --- a/builtin/archive.c +++ b/builtin/archive.c @@ -8,6 +8,7 @@ #include "transport.h" #include "parse-options.h" #include "pkt-line.h" +#include "protocol.h" #include "sideband.h" static void create_output_file(const char *output_file) @@ -94,6 +95,8 @@ int cmd_archive(int argc, const char **argv, const char *prefix) OPT_END() }; + register_allowed_protocol_version(protocol_v0); + argc = parse_options(argc, argv, prefix, local_opts, NULL, PARSE_OPT_KEEP_ALL); diff --git a/builtin/clone.c b/builtin/clone.c index fd2c3ef090..1651a950b6 100644 --- a/builtin/clone.c +++ b/builtin/clone.c @@ -900,6 +900,10 @@ int cmd_clone(int argc, const char **argv, const char *prefix) struct refspec rs = REFSPEC_INIT_FETCH; struct argv_array ref_prefixes = ARGV_ARRAY_INIT; + register_allowed_protocol_version(protocol_v2); + register_allowed_protocol_version(protocol_v1); + register_allowed_protocol_version(protocol_v0); + fetch_if_missing = 0; packet_trace_identity("clone"); diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c index 1a1bc63566..cba935e4d3 100644 --- a/builtin/fetch-pack.c +++ b/builtin/fetch-pack.c @@ -57,6 +57,10 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix) fetch_if_missing = 0; + register_allowed_protocol_version(protocol_v2); + register_allowed_protocol_version(protocol_v1); + register_allowed_protocol_version(protocol_v0); + packet_trace_identity("fetch-pack"); memset(, 0, sizeof(args)); diff --git a/builtin/fetch.c b/builtin/fetch.c index 61bec5d213..2a20cf8bfd 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -21,6 +21,7 @@ #include "argv-array.h" #include "utf8.h" #include "packfile.h" +#include "protocol.h" #include "list-objects-filter-options.h" static const char * const builtin_fetch_usage[] = { @@ -1476,6 +1477,10 @@ int cmd_fetch(int argc, const char **argv, const char *prefix) int prune_tags_ok = 1; struct argv_array argv_gc_auto = ARGV_ARRAY_INIT; + register_allowed_protocol_version(protocol_v2); + register_allowed_protocol_version(protocol_v1); + register_allowed_protocol_version(protocol_v0); + packet_trace_identity("fetch"); fetch_if_missing = 0; diff --git a/builtin/ls-remote.c b/builtin/ls-remote.c index 1a25df7ee1..ea685e8bb9 100644 --- a/builtin/ls-remote.c +++ b/builtin/ls-remote.c @@ -1,5 +1,6 @@ #include "builtin.h" #include "cache.h" +#include "protocol.h"
[PATCH v5 0/1] Advertise multiple supported proto versions
Changes from V4: remove special cases around advertising version=0. Add a registry of supported wire protocol versions that individual commands can use to declare supported versions before contacting a server. The client will then advertise all supported versions, while the server will choose the first allowed version from the advertised list. Every command that acts as a client or server must now register its supported protocol versions. Josh Steadmon (1): protocol: advertise multiple supported versions builtin/archive.c | 3 + builtin/clone.c | 4 ++ builtin/fetch-pack.c| 4 ++ builtin/fetch.c | 5 ++ builtin/ls-remote.c | 5 ++ builtin/pull.c | 5 ++ builtin/push.c | 4 ++ builtin/receive-pack.c | 3 + builtin/send-pack.c | 3 + builtin/upload-archive.c| 3 + builtin/upload-pack.c | 4 ++ connect.c | 52 +++ protocol.c | 122 +--- protocol.h | 23 ++- remote-curl.c | 27 +--- t/t5551-http-fetch-smart.sh | 1 + t/t5570-git-daemon.sh | 2 +- t/t5601-clone.sh| 38 +-- t/t5700-protocol-v1.sh | 8 +-- t/t5702-protocol-v2.sh | 16 +++-- transport-helper.c | 6 ++ 21 files changed, 256 insertions(+), 82 deletions(-) Range-diff against v4: 1: 3c023991c0 ! 1: 60f6f2fbd8 protocol: advertise multiple supported versions @@ -21,6 +21,12 @@ Clients now advertise the full list of registered versions. Servers select the first allowed version from this advertisement. +Additionally, remove special cases around advertising version=0. +Previously we avoided adding version negotiation fields in server +responses if it looked like the client wanted v0. However, including +these fields does not change behavior, so it's better to have simpler +code. + While we're at it, remove unnecessary externs from function declarations in protocol.h. @@ -245,18 +251,21 @@ { struct child_process *conn; @@ + prog, path, 0, target_host, 0); - /* If using a new version put that stuff here after a second null byte */ +- /* If using a new version put that stuff here after a second null byte */ - if (version > 0) { -+ if (strcmp(version_advert->buf, "version=0")) { - strbuf_addch(, '\0'); +- strbuf_addch(, '\0'); - strbuf_addf(, "version=%d%c", - version, '\0'); -+ strbuf_addf(, "%s%c", version_advert->buf, '\0'); - } +- } ++ /* Add version fields after a second null byte */ ++ strbuf_addch(, '\0'); ++ strbuf_addf(, "%s%c", version_advert->buf, '\0'); packet_write(fd[1], request.buf, request.len); + @@ */ static void push_ssh_options(struct argv_array *args, struct argv_array *env, @@ -264,9 +273,9 @@ - enum protocol_version version, int flags) + const struct strbuf *version_advert, int flags) { - if (variant == VARIANT_SSH && +- if (variant == VARIANT_SSH && - version > 0) { -+ strcmp(version_advert->buf, "version=0")) { ++ if (variant == VARIANT_SSH) { argv_array_push(args, "-o"); argv_array_push(args, "SendEnv=" GIT_PROTOCOL_ENVIRONMENT); - argv_array_pushf(env, GIT_PROTOCOL_ENVIRONMENT "=version=%d", @@ -346,13 +355,13 @@ - if (version > 0) { - argv_array_pushf(>env_array, GIT_PROTOCOL_ENVIRONMENT "=version=%d", - version); -+ if (strcmp(version_advert.buf, "version=0")) { -+ argv_array_pushf(>env_array, -+ GIT_PROTOCOL_ENVIRONMENT "=%s", -+ version_advert.buf); - } +- } ++ argv_array_pushf(>env_array, ++ GIT_PROTOCOL_ENVIRONMENT "=%s", ++ version_advert.buf); } argv_array_push(>args, cmd.buf); + diff --git a/protocol.c b/protocol.c --- a/protocol.c @@ -573,8 +582,7 @@ +static int get_client_protocol_http_header(const struct strbuf *version_advert, + struct strbuf *header) +{ -+ if (version_advert->len > 0 && -+ strcmp(version_advert->buf, "version=0")) { ++ if (version_advert->len > 0) { + strbuf_addf(header, GIT_PROTOCOL_HEADER ": %s", +
Re: [PATCH v1 8/9] diff --color-moved-ws: modify allow-indentation-change
On Fri, Nov 16, 2018 at 3:04 AM Phillip Wood wrote: > > From: Phillip Wood > > Currently diff --color-moved-ws=allow-indentation-change does not > support indentation that contains a mix of tabs and spaces. For > example in commit 546f70f377 ("convert.h: drop 'extern' from function > declaration", 2018-06-30) the function parameters in the following > lines are not colored as moved [1]. > > -extern int stream_filter(struct stream_filter *, > -const char *input, size_t *isize_p, > -char *output, size_t *osize_p); > +int stream_filter(struct stream_filter *, > + const char *input, size_t *isize_p, > + char *output, size_t *osize_p); > > This commit changes the way the indentation is handled to track the > visual size of the indentation rather than the characters in the > indentation. This has they benefit that any whitespace errors do not s/they/the/ > interfer with the move detection (the whitespace errors will still be > highlighted according to --ws-error-highlight). During the discussion > of this feature there were concerns about the correct detection of > indentation for python. However those concerns apply whether or not > we're detecting moved lines so no attempt is made to determine if the > indentation is 'pythonic'. > > [1] Note that before the commit to fix the erroneous coloring of moved > lines each line was colored as a different block, since that commit > they are uncolored. > > Signed-off-by: Phillip Wood > --- > > Notes: > Changes since rfc: > - It now replaces the existing implementation rather than adding a new >mode. > - The indentation deltas are now calculated once for each line and >cached. > - Optimized the whitespace delta comparison to compare string lengths >before comparing the actual strings. > - Modified the calculation of tabs as suggested by Stefan. > - Split out the blank line handling into a separate commit as suggest >by Stefan. > - Fixed some comments pointed out by Stefan. > > diff.c | 130 + > t/t4015-diff-whitespace.sh | 56 > 2 files changed, 129 insertions(+), 57 deletions(-) > > diff --git a/diff.c b/diff.c > index c378ce3daf..89559293e7 100644 > --- a/diff.c > +++ b/diff.c > @@ -750,6 +750,8 @@ struct emitted_diff_symbol { > const char *line; > int len; > int flags; > + int indent_off; > + int indent_width; So this is the trick how we compute the ws related data only once per line. :-) On the other hand, we do not save memory by disabling the ws detection, but I guess that is not a problem for now. Would it make sense to have the new variables be unsigned? (Also a comment on what they are, I needed to read the code to understand off to be offset into the line, where the content starts, and width to be the visual width, as I did not recall the RFC.) > +static void fill_es_indent_data(struct emitted_diff_symbol *es) > [...] > + if (o->color_moved_ws_handling & > + COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) > + fill_es_indent_data(>emitted_symbols->buf[n]); Nice. By reducing the information kept around to ints, we also do not need to alloc/free memory for each line. > +++ b/t/t4015-diff-whitespace.sh > @@ -1901,4 +1901,60 @@ test_expect_success 'compare whitespace delta > incompatible with other space opti > test_i18ngrep allow-indentation-change err > ' > > +test_expect_success 'compare mixed whitespace delta across moved blocks' ' Looks good, Thanks! Stefan
Re: [PATCH] technical doc: add a design doc for the evolve command
On 11/14/2018 7:55 PM, sxe...@google.com wrote: From: Stefan Xenos This document describes what an obsolescence graph for git would look like, the behavior of the evolve command, and the changes planned for other commands. Thanks for putting this together! diff --git a/Documentation/technical/evolve.txt b/Documentation/technical/evolve.txt ... +Git Obsolescence Graph +== + +Objective +- +Track the edits to a commit over time in an obsolescence graph. The file name and the title are in a mismatch. I'd prefer if the title was "Git Evolve Design Document" and this opening paragraph was about the reasons we want a 'git evolve' command. Here is my attempt: The proposed 'git evolve' command will help users craft a high-quality commit history in their topic branches. By working to improve commits one at a time, then running 'git evolve', users can rewrite recent history with more options than interactive rebase. The core benefit is that users can pause their progress and move to other branches before returning to where they left off. Users can also share progress with others using standard 'push', 'fetch', and 'format-patch' commands. +Background +-- Perhaps you can call this "Example"? +Imagine you have three dependent changes up for review and you receive feedback +that requires editing all three changes. While you're editing one, more feedback +arrives on one of the others. What do you do? "three dependent changes" sounds a bit vague enough to me to possibly confuse readers. Perhaps "three sequential patches"? +- Users can view the history of a commit directly (the sequence of amends and + rebases it has undergone, orthogonal to the history of the branch it is on). "the history of a commit" doesn't semantically work, as a commit is an immutable Git object. Instead, I would try to use the term "patch" to describe a change to the codebase, and that takes the form as a list of commits that are improving on each other (but don't actually have each other in their commit history). This means that the lifetime of a patch is described by the commits that are amended or rebased. +- By pushing and pulling the obsolescence graph, users can collaborate more + easily on changes-in-progress. This is better than pushing and pulling the + changes themselves since the obsolescence graph can be used to locate a more + specific merge base, allowing for better merges between different versions of + the same change. (Making a note so I come back to this. I hope to learn what you mean by this "more specific merge base".) + +Similar technologies + ... It can't handle the case where you have +multiple changes sharing the same parent when that parent needs to be rebased Perhaps this could be made more concrete by describing commit history and a specific workflow change using 'git evolve'. Suppose we have two topic branches, topic1 and topic2, that point to commits A and B, respectively.Suppose further that A and B have a common parent C with parent D. If we rebase topic1 relativeto D, then we create new commits C' and A' that are newer versions of commits C and A. It would benice to easily update topic2 to be on a new commit B' with parent C'. Currently, a user needs to knowthat C updated to C', and use 'git rebase --onto C' C topic2'. Instead, if we have a marker showing thatC' is an updated version of C, 'git log topic2' would show that topic2 can be updated, and the 'gitevolve' command would perform the correct action to make B' with parent C'. (This paragraph above is an example of "what can happen now is complicated and demands that the user keep some information in their memory" and "the new workflow is simpler and helps users make the right decision". I think we could use more of these at the start to sell the idea.) +and won't let you collaborate with others on resolving a complicated interactive +rebase. In the same sentence, we have an even more complicated workflow mentioned as an aside. This could be fleshed out more concretely. It could help describing that the current model is for usersto share "!fixup" commits and then one performs an interactive rebase to apply those fixups inthe correct order. If a user instead shares an amended commit, then we are in a difficult state toapply those changes. The new workflow would be to share amended commits and 'git evolve'inserts the correct amended commits in the right order. I'm a big proponent of the teaching philosophy of "examples first". It's easier to talk abstractlyafter going through some concrete examples. You can think of rebase -i as a top-down approach and the evolve command +as the bottom-up approach to the same problem. This comparison is important. Perhaps it is more specific to say "interactive rebase splits a plan torewrite history into independent units of work, while evolve collects independent
Re: [PATCH v1 7/9] diff --color-moved-ws: optimize allow-indentation-change
On Fri, Nov 16, 2018 at 3:04 AM Phillip Wood wrote: > > From: Phillip Wood > > When running > > git diff --color-moved-ws=allow-indentation-change v2.18.0 v2.19.0 > > cmp_in_block_with_wsd() is called 694908327 times. Of those 42.7% > return after comparing a and b. By comparing the lengths first we can > return early in all but 0.03% of those cases without dereferencing the > string pointers. The comparison between a and c fails in 6.8% of > calls, by comparing the lengths first we reject all the failing calls > without dereferencing the string pointers. > > This reduces the time to run the command above by by 42% from 14.6s to > 8.5s. This is still much slower than the normal --color-moved which > takes ~0.6-0.7s to run but is a significant improvement. > > The next commits will replace the current implementation with one that > works with mixed tabs and spaces in the indentation. I think it is > worth optimizing the current implementation first to enable a fair > comparison between the two implementations. Up to here the series looks good and I think we could take it as a preparatory self-standing series. I'll read on. Thanks, Stefan
Re: [PATCH 2/3] remote-curl: tighten "version 2" check for smart-http
On 2018.11.16 03:48, Jeff King wrote: > In a v2 smart-http conversation, the server should reply to our initial > request with a pkt-line saying "version 2" (this is the start of the > "capabilities advertisement"). We check for the string using > starts_with(), but that's overly permissive (it would match "version > 20", for example). > > Let's tighten this check to use strcmp(). Note that we don't need to > worry about a trailing newline here, because the ptk-line code will have > chomped it for us already. > > Signed-off-by: Jeff King > --- > remote-curl.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/remote-curl.c b/remote-curl.c > index dd9bc41aa1..3c9c4a07c3 100644 > --- a/remote-curl.c > +++ b/remote-curl.c > @@ -375,7 +375,7 @@ static void check_smart_http(struct discovery *d, const > char *service, > d->len = src_len; > d->proto_git = 1; > > - } else if (starts_with(line, "version 2")) { > + } else if (!strcmp(line, "version 2")) { > /* >* v2 smart http; do not consume version packet, which will >* be handled elsewhere. > -- > 2.19.1.1636.gc7a073d580 > Looks good to me. Reviewed-by: Josh Steadmon
Re: [PATCH 1/3] remote-curl: refactor smart-http discovery
On 2018.11.16 03:47, Jeff King wrote: > After making initial contact with an http server, we have to decide if > the server supports smart-http, and if so, which version. Our rules are > a bit inconsistent: > > 1. For v0, we require that the content-type indicates a smart-http > response. We also require the response to look vaguely like a > pkt-line starting with "#". If one of those does not match, we fall > back to dumb-http. > > But according to our http protocol spec[1]: > >Dumb servers MUST NOT return a return type starting with >`application/x-git-`. > > If we see the expected content-type, we should consider it > smart-http. At that point we can parse the pkt-line for real, and > complain if it is not syntactically valid. > > 2. For v2, we do not actually check the content-type. Our v2 protocol > spec says[2]: > >When using the http:// or https:// transport a client makes a >"smart" info/refs request as described in `http-protocol.txt`[...] > > and the http spec is clear that for a smart-http[3]: > >The Content-Type MUST be `application/x-$servicename-advertisement`. > > So it is required according to the spec. > > These inconsistencies were easy to miss because of the way the original > code was written as an inline conditional. Let's pull it out into its > own function for readability, and improve a few things: > > - we now predicate the smart/dumb decision entirely on the presence of >the correct content-type > > - we do a real pkt-line parse before deciding how to proceed (and die >if it isn't valid) > > - use skip_prefix() for comparing service strings, instead of >constructing expected output in a strbuf; this avoids dealing with >memory cleanup > > Note that this _is_ tightening what the client will allow. It's all > according to the spec, but it's possible that other implementations > might violate these. However, violating these particular rules seems > like an odd choice for a server to make. > > [1] Documentation/technical/http-protocol.txt, l. 166-167 > [2] Documentation/technical/protocol-v2.txt, l. 63-64 > [3] Documentation/technical/http-protocol.txt, l. 247 > > Helped-by: Josh Steadmon > Signed-off-by: Jeff King > --- > remote-curl.c | 93 --- > 1 file changed, 59 insertions(+), 34 deletions(-) > > diff --git a/remote-curl.c b/remote-curl.c > index 762a55a75f..dd9bc41aa1 100644 > --- a/remote-curl.c > +++ b/remote-curl.c > @@ -330,9 +330,65 @@ static int get_protocol_http_header(enum > protocol_version version, > return 0; > } > > +static void check_smart_http(struct discovery *d, const char *service, > + struct strbuf *type) > +{ > + char *src_buf; > + size_t src_len; > + char *line; > + const char *p; > + > + /* > + * If we don't see x-$service-advertisement, then it's not smart-http. > + * But once we do, we commit to it and assume any other protocol > + * violations are hard errors. > + */ > + if (!skip_prefix(type->buf, "application/x-", ) || > + !skip_prefix(p, service, ) || > + strcmp(p, "-advertisement")) > + return; > + > + /* > + * "Peek" at the first packet by using a separate buf/len pair; some > + * cases below require us leaving the originals intact. > + */ > + src_buf = d->buf; > + src_len = d->len; > + line = packet_read_line_buf(_buf, _len, NULL); > + if (!line) > + die("invalid server response; expected service, got flush > packet"); > + > + if (skip_prefix(line, "# service=", ) && !strcmp(p, service)) { > + /* > + * The header can include additional metadata lines, up > + * until a packet flush marker. Ignore these now, but > + * in the future we might start to scan them. > + */ > + while (packet_read_line_buf(_buf, _len, NULL)) > + ; > + > + /* > + * v0 smart http; callers expect us to soak up the > + * service and header packets > + */ > + d->buf = src_buf; > + d->len = src_len; > + d->proto_git = 1; > + > + } else if (starts_with(line, "version 2")) { > + /* > + * v2 smart http; do not consume version packet, which will > + * be handled elsewhere. > + */ > + d->proto_git = 1; > + > + } else { > + die("invalid server response; got '%s'", line); > + } > +} > + > static struct discovery *discover_refs(const char *service, int for_push) > { > - struct strbuf exp = STRBUF_INIT; > struct strbuf type = STRBUF_INIT; > struct strbuf charset = STRBUF_INIT; > struct strbuf buffer = STRBUF_INIT; > @@ -405,38 +461,8 @@ static struct discovery
Re: [PATCH 0/3] remote-curl smart-http discovery cleanup
On 2018.11.16 03:44, Jeff King wrote: [...] > Amusingly, this does break the test you just added, because it tries to > issue an ERR after claiming "text/html" (and after my patch, we > correctly fall back to dumb-http). Heh yeah, I copied the script from a dumb-http test without reading the spec.
Re: [PATCH v3 1/1] protocol: advertise multiple supported versions
On 2018.11.16 11:45, Junio C Hamano wrote: > Josh Steadmon writes: > > >> What I was alludding to was a lot simpler, though. An advert string > >> "version=0:version=1" from a client that prefers version 0 won't be > >> !strcmp("version=0", advert) but seeing many strcmp() in the patch > >> made me wonder. > > > > Ah I see. The strcmp()s against "version=0" are special cases for where > > it looks like the client does not understand any sort of version > > negotiation. If we see multiple versions listed in the advert, then the > > rest of the selection logic should do the right thing. > > OK, let me try to see if I understand correctly by rephrasing. > > If the client does not say anything about which version it prefers > (because it only knows about version 0 without even realizing that > there is a version negotiation exchange), we substitute the lack of > proposed versions with string "version=0", and the strcmp()s I saw > and was puzzled by are all about special casing such a client. But > we would end up behaving the same way (at least when judged only by > externally visible effects) to a client that is aware of version > negotiation and tells us it prefers version 0 (and nothing else) > with the selection logic anyway. > > Did I get it right? If so, yeah, I think it makes sense to avoid > two codepaths that ends up doing the same thing by removing the > special case. Yes, that is correct. The next version will remove the special cases here. > > However, I think that it might work to remove the special cases. In the > > event that the client is so old that it doesn't understand any form of > > version negotiation, it should just ignore the version fields / > > environment vars. If you think it's cleaner to remove the special cases, > > let me know.
Re: [PATCH v2] read-cache: write all indexes with the same permissions
On Fri, Nov 16, 2018 at 7:29 PM SZEDER Gábor wrote: > > On Fri, Nov 16, 2018 at 06:31:05PM +0100, Christian Couder wrote: > > diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh > > index 2ac47aa0e4..fa1d3d468b 100755 > > --- a/t/t1700-split-index.sh > > +++ b/t/t1700-split-index.sh > > @@ -381,6 +381,26 @@ test_expect_success 'check > > splitIndex.sharedIndexExpire set to "never" and "now" > > test $(ls .git/sharedindex.* | wc -l) -le 2 > > ' > > > > +test_expect_success POSIXPERM 'same mode for index & split index' ' > > + git init same-mode && > > + ( > > + cd same-mode && > > + test_commit A && > > + test_modebits .git/index >index_mode && > > + test_must_fail git config core.sharedRepository && > > + git -c core.splitIndex=true status && > > + shared=$(ls .git/sharedindex.*) && > > I think the command substitution and 'ls' are unnecessary, and > > shared=.git/sharedindex.* > > would work as well. If there is no shared index file with the above we would get: shared=.git/sharedindex.* $ echo $shared .git/sharedindex.* which seems bug prone to me.
Re: [PATCH v2] read-cache: write all indexes with the same permissions
On Fri, Nov 16 2018, SZEDER Gábor wrote: > On Fri, Nov 16, 2018 at 06:31:05PM +0100, Christian Couder wrote: >> diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh >> index 2ac47aa0e4..fa1d3d468b 100755 >> --- a/t/t1700-split-index.sh >> +++ b/t/t1700-split-index.sh >> @@ -381,6 +381,26 @@ test_expect_success 'check splitIndex.sharedIndexExpire >> set to "never" and "now" >> test $(ls .git/sharedindex.* | wc -l) -le 2 >> ' >> >> +test_expect_success POSIXPERM 'same mode for index & split index' ' >> +git init same-mode && >> +( >> +cd same-mode && >> +test_commit A && >> +test_modebits .git/index >index_mode && >> +test_must_fail git config core.sharedRepository && >> +git -c core.splitIndex=true status && >> +shared=$(ls .git/sharedindex.*) && > > I think the command substitution and 'ls' are unnecessary, and > > shared=.git/sharedindex.* > > would work as well. Looks like it. FWIW I just copy/pasted what an adjacent test was doing for consistency, which was added in one of Christian's earlier changes to this behavior. But yeah, if the test can be made simpler in a portable way it would make sense to make this a two-parter test cleanup & bug fix series. >> +case "$shared" in >> +*" "*) >> +# we have more than one??? >> +false ;; >> +*) >> +test_modebits "$shared" >split_index_mode && >> +test_cmp index_mode split_index_mode ;; >> +esac >> +) >> +' >> + >> while read -r mode modebits >> do >> test_expect_success POSIXPERM "split index respects >> core.sharedrepository $mode" ' >> -- >> 2.19.1.1053.g063ed687ac >>
Re: [RFC/PATCH 0/5] stop installing old libexec aliases like "git-init"
On Fri, Nov 16 2018, Michael Haggerty wrote: > On Fri, Nov 16, 2018 at 11:38 AM Ævar Arnfjörð Bjarmason > wrote: >> [...] >> A follow-up on this: We should really fix this for other >> reasons. I.e. compile in some "this is stuff we ourselves think is in >> git". >> >> There's other manifestations of this, e.g.: >> >> git-sizer --help # => shows you help >> git sizer --help # => says it doesn't have a manpage >> >> Because we aren't aware that git-sizer is some external tool, and that >> we should route --help to it. > > That would be nice. This has been an annoying for several tools named > `git-foo` that I have worked on (e.g., git-sizer, git-imerge, > git-when-merged, plus many internal tools). > >> Non-withstanding the arguable bug that things like git-sizer shouldn't >> be allowing themselves to be invoked by "git" like that without >> guaranteeing that it can consume all the options 'git' expects. When I >> had to deal with a similar problem in an external git-* command I was >> maintaining I simply made it an error to invoke it as "git mything" >> instead of "git-mything". > > Hmmm, I always thought that it was intended and supported that an > external tool can name itself `git-foo` so that it can be invoked as > `git foo`. > > Which git options do you think that such a tool should be expected to > support? Many useful ones, like `-C `, `--git-dir=`, > `--work-tree=`, `-c =`, and `--no-replace-objects`, > work pretty much for free if the external tool uses `git` to interact > with the repository. I use such options regularly with external tools. > IMO it would be a regression for these tools to refuse to run when > invoked as, say, `git -C path/to/repo foo`. I don't mean we should forbid it, just that if you have an external git-foo tool meant to be invoked like "git-foo" that and not "git foo" it might be worth to save yourself the trouble and not support the latter. I thought git-sizer was one such tool, since it docs just mention "git-sizer". But yeah, "-c" etc. are useful, and we definitely want to support this in the general case. E.g. for "git-annex" and others that are meant to be used like that. So maybe we should just document this interface better. It seems one implicit dependency is that we expect a manpage for the tool to exist for --help. Or, keep some list of internal git tools and treat them specially. But I see now that would break "git annex --help" in the other direction, but maybe that would be better. I don't know. As I recall I stopped supporting "git" invocation for the tool I was fixing a long time ago because of some funny interaction with terminals & signals. I.e. git itself would eat some of them, but the tool wanted to handle it instead. But I don't remember the details, and can't reproduce it now with: $ cat ~/bin/git-sigint #!/usr/bin/env perl $SIG{INT} = sub { warn localtime . " " . $$ }; sleep 1 while 1; $ git sigint # or git-sigint [... behaves the same either way ...] So maybe it was something else, or I'm misremembering...
Re: [RFC/PATCH] read-cache: write all indexes with the same permissions
On Fri, Nov 16, 2018 at 8:07 PM SZEDER Gábor wrote: > > On Fri, Nov 16, 2018 at 06:41:43PM +0100, Christian Couder wrote: > > On Tue, Nov 13, 2018 at 6:34 PM Ævar Arnfjörð Bjarmason > > wrote: > > > > I'm asking whether the bug in this patch isn't revealing an existing > > > issue with us not having any tests for N number of sharedindex.* > > > files. I.e. we have >1 of them, merge them and prune them, don't we? > > We don't merge shared index files, but write a new one. True. They are immutable like git objects. > With the default 20% threshold a new shared index is written rather > frequently with our usual small test-repos: Side note. Split index is definitely not meant for small repos. But maybe we should have a lower limit (in terms of absolute number of entries) that prevent splitting. This splitting seems excessive. > $ git init > $ git update-index --split-index > $ ls -1 .git/*index* > .git/index > .git/sharedindex.4370042739b31cd17a5c5cd6043a77c9a00df113 > $ echo 1 >file > $ git add file > $ git commit -q -m 1 > $ echo 2 >file > $ git commit -q -m 2 file > $ echo 3 >file > $ git commit -q -m 3 file > $ ls -1 .git/*index* > .git/index > .git/sharedindex.4370042739b31cd17a5c5cd6043a77c9a00df113 > .git/sharedindex.6aedbf71b1a6bdc0018078ec7571e1b21ba4b954 > .git/sharedindex.b9106e9b82a818a0e4e9148224fc44ea98f488a1 > .git/sharedindex.bad0b75d270a431b9e961cfc15df6ec935a67be8 -- Duy
Re: [PATCH v2] read-cache: write all indexes with the same permissions
On Fri, Nov 16, 2018 at 8:10 PM Christian Couder wrote: > > On Fri, Nov 16, 2018 at 7:03 PM Duy Nguyen wrote: > > > > On Fri, Nov 16, 2018 at 6:31 PM Christian Couder > > wrote: > > > diff --git a/read-cache.c b/read-cache.c > > > index 8c924506dd..ea80600bff 100644 > > > --- a/read-cache.c > > > +++ b/read-cache.c > > > @@ -3165,7 +3165,8 @@ int write_locked_index(struct index_state *istate, > > > struct lock_file *lock, > > > struct tempfile *temp; > > > int saved_errno; > > > > > > - temp = mks_tempfile(git_path("sharedindex_XX")); > > > + /* Same permissions as the main .git/index file */ > > > > If the permission is already correct from the beginning (of this temp > > file), should df801f3f9f be reverted since we don't need to adjust > > permission anymore? > > df801f3f9f (read-cache: use shared perms when writing shared index, > 2017-06-25) was fixing the bug that permissions of the shared index > file did not take into account the shared permissions (which are about > core.sharedRepository; "shared" has a different meaning in "shared > index file" and in "shared permissions"). > > This fix only changes permissions before the shared permissions are > taken into account (so before adjust_shared_perm() is called). > > > Or does $GIT_DIR/index go through the same adjust_shared_perm() anyway > > in the end, which means df801f3f9f must stay? > > Yeah, $GIT_DIR/index goes through adjust_shared_perm() too because > create_tempfile() calls adjust_shared_perm(). So indeed df801f3f9f > must stay. Ah thanks. By the time I got to this part > Let's instead make the two consistent by using mks_tempfile_sm() and > passing 0666 in its `mode` argument. went look at that function and back, I forgot about the paragraph above it. -- Duy
Re: [PATCH v2] read-cache: write all indexes with the same permissions
On Fri, Nov 16, 2018 at 7:03 PM Duy Nguyen wrote: > > On Fri, Nov 16, 2018 at 6:31 PM Christian Couder > wrote: > > diff --git a/read-cache.c b/read-cache.c > > index 8c924506dd..ea80600bff 100644 > > --- a/read-cache.c > > +++ b/read-cache.c > > @@ -3165,7 +3165,8 @@ int write_locked_index(struct index_state *istate, > > struct lock_file *lock, > > struct tempfile *temp; > > int saved_errno; > > > > - temp = mks_tempfile(git_path("sharedindex_XX")); > > + /* Same permissions as the main .git/index file */ > > If the permission is already correct from the beginning (of this temp > file), should df801f3f9f be reverted since we don't need to adjust > permission anymore? df801f3f9f (read-cache: use shared perms when writing shared index, 2017-06-25) was fixing the bug that permissions of the shared index file did not take into account the shared permissions (which are about core.sharedRepository; "shared" has a different meaning in "shared index file" and in "shared permissions"). This fix only changes permissions before the shared permissions are taken into account (so before adjust_shared_perm() is called). > Or does $GIT_DIR/index go through the same adjust_shared_perm() anyway > in the end, which means df801f3f9f must stay? Yeah, $GIT_DIR/index goes through adjust_shared_perm() too because create_tempfile() calls adjust_shared_perm(). So indeed df801f3f9f must stay. > If so, perhaps clarify > that in the commit message. There is already the following about df801f3f9f: --- A bug related to this was spotted, fixed and tested for in df801f3f9f ("read-cache: use shared perms when writing shared index", 2017-06-25) and 3ee83f48e5 ("t1700: make sure split-index respects core.sharedrepository", 2017-06-25). However, as noted in those commits we'd still create the file as 0600, and would just re-chmod it depending on the setting of core.sharedRepository. --- So I would think that df801f3f9f should perhaps have explained that create_tempfile() calls adjust_shared_perm(), but I don't think that it is very relevant in this commit message. We are already talking about df801f3f9f which should be enough to explain the issue df801f3f9f fixed, and I think we should not need to explain in more details why df801f3f9f did a good job. It's not as if we are reverting it. We are just complementing it with another fix related to what happens before adjust_shared_perm() is called. I think rewording the comment a bit might help though, for example maybe: /* Same initial permissions as the main .git/index file */ instead of: /* Same permissions as the main .git/index file */ Thanks, Christian.
Re: [RFC/PATCH] read-cache: write all indexes with the same permissions
On Fri, Nov 16, 2018 at 06:41:43PM +0100, Christian Couder wrote: > On Tue, Nov 13, 2018 at 6:34 PM Ævar Arnfjörð Bjarmason > wrote: > > I'm asking whether the bug in this patch isn't revealing an existing > > issue with us not having any tests for N number of sharedindex.* > > files. I.e. we have >1 of them, merge them and prune them, don't we? We don't merge shared index files, but write a new one. > I think we shouldn't have many of them. Usually we should have just > one, though it's possible that switching the shared index files > feature on and off several times or using temporary index files could > create more than one. > > And there is clean_shared_index_files() which calls > should_delete_shared_index() to make sure they are regularly cleaned > up. I would think that it's more common to have more than one shared index files, because a new shared index is written when the number of entries in the split index reaches the threshold given in 'splitIndex.maxPercentChange'. The old ones are kept until they expire, and 'splitIndex.sharedIndexExpire' defaults to 2 weeks (and can even be be set to "never"). With the default 20% threshold a new shared index is written rather frequently with our usual small test-repos: $ git init $ git update-index --split-index $ ls -1 .git/*index* .git/index .git/sharedindex.4370042739b31cd17a5c5cd6043a77c9a00df113 $ echo 1 >file $ git add file $ git commit -q -m 1 $ echo 2 >file $ git commit -q -m 2 file $ echo 3 >file $ git commit -q -m 3 file $ ls -1 .git/*index* .git/index .git/sharedindex.4370042739b31cd17a5c5cd6043a77c9a00df113 .git/sharedindex.6aedbf71b1a6bdc0018078ec7571e1b21ba4b954 .git/sharedindex.b9106e9b82a818a0e4e9148224fc44ea98f488a1 .git/sharedindex.bad0b75d270a431b9e961cfc15df6ec935a67be8 > Anyway it's a different topic and according to what we privately > discussed I just sent > https://public-inbox.org/git/20181116173105.21784-1-chrisc...@tuxfamily.org/ > to fix the current issue.
Re: [PATCH v1 2/9] diff: use whitespace consistently
On Fri, Nov 16, 2018 at 3:04 AM Phillip Wood wrote: > > From: Phillip Wood > > Most of the documentation uses 'whitespace' rather than 'white space' > or 'white spaces' convert to latter two to the former for consistency. Makes sense; this doesn't touch docs, but also code. $ git grep "white space" yields some other places as well (Documentation/git-cat-file.txt and lots in t/) But I guess we keep it to this feature for now instead of a tree wide cleanup. Stefan
Re: [PATCH v2] read-cache: write all indexes with the same permissions
On Fri, Nov 16, 2018 at 06:31:05PM +0100, Christian Couder wrote: > diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh > index 2ac47aa0e4..fa1d3d468b 100755 > --- a/t/t1700-split-index.sh > +++ b/t/t1700-split-index.sh > @@ -381,6 +381,26 @@ test_expect_success 'check splitIndex.sharedIndexExpire > set to "never" and "now" > test $(ls .git/sharedindex.* | wc -l) -le 2 > ' > > +test_expect_success POSIXPERM 'same mode for index & split index' ' > + git init same-mode && > + ( > + cd same-mode && > + test_commit A && > + test_modebits .git/index >index_mode && > + test_must_fail git config core.sharedRepository && > + git -c core.splitIndex=true status && > + shared=$(ls .git/sharedindex.*) && I think the command substitution and 'ls' are unnecessary, and shared=.git/sharedindex.* would work as well. > + case "$shared" in > + *" "*) > + # we have more than one??? > + false ;; > + *) > + test_modebits "$shared" >split_index_mode && > + test_cmp index_mode split_index_mode ;; > + esac > + ) > +' > + > while read -r mode modebits > do > test_expect_success POSIXPERM "split index respects > core.sharedrepository $mode" ' > -- > 2.19.1.1053.g063ed687ac >
Re: [PATCH v2] read-cache: write all indexes with the same permissions
On Fri, Nov 16, 2018 at 6:31 PM Christian Couder wrote: > diff --git a/read-cache.c b/read-cache.c > index 8c924506dd..ea80600bff 100644 > --- a/read-cache.c > +++ b/read-cache.c > @@ -3165,7 +3165,8 @@ int write_locked_index(struct index_state *istate, > struct lock_file *lock, > struct tempfile *temp; > int saved_errno; > > - temp = mks_tempfile(git_path("sharedindex_XX")); > + /* Same permissions as the main .git/index file */ If the permission is already correct from the beginning (of this temp file), should df801f3f9f be reverted since we don't need to adjust permission anymore? Or does $GIT_DIR/index go through the same adjust_shared_perm() anyway in the end, which means df801f3f9f must stay? If so, perhaps clarify that in the commit message. > + temp = mks_tempfile_sm(git_path("sharedindex_XX"), 0, > 0666); > if (!temp) { > oidclr(>base_oid); > ret = do_write_locked_index(istate, lock, flags); -- Duy
Re: [RFC/PATCH] read-cache: write all indexes with the same permissions
On Tue, Nov 13, 2018 at 6:34 PM Ævar Arnfjörð Bjarmason wrote: > > On Tue, Nov 13 2018, Duy Nguyen wrote: > > > On Tue, Nov 13, 2018 at 4:32 PM Ævar Arnfjörð Bjarmason > > wrote: > > I don't have any bright idea how to catch the literal _X file. > > It's a temporary file and will not last long enough for us to verify > > unless we intercept open() calls with LD_PRELOAD. > > Sorry for being unclear. I don't mean how can we catch this specific > bug, that would be uninteresting and hard to test for. > > I'm asking whether the bug in this patch isn't revealing an existing > issue with us not having any tests for N number of sharedindex.* > files. I.e. we have >1 of them, merge them and prune them, don't we? I think we shouldn't have many of them. Usually we should have just one, though it's possible that switching the shared index files feature on and off several times or using temporary index files could create more than one. And there is clean_shared_index_files() which calls should_delete_shared_index() to make sure they are regularly cleaned up. Anyway it's a different topic and according to what we privately discussed I just sent https://public-inbox.org/git/20181116173105.21784-1-chrisc...@tuxfamily.org/ to fix the current issue.
[PATCH v2] read-cache: write all indexes with the same permissions
From: Ævar Arnfjörð Bjarmason Change the code that writes out the shared index to use mks_tempfile_sm() instead of mks_tempfile(). The create_tempfile() function is used to write out the main ".git/index" (via ".git/index.lock") using lock_file(). The create_tempfile() function respects the umask, as it uses open() with 0666, whereas the mks_tempfile() function uses open() with 0600. So mks_tempfile() which is used to create the shared index file is likely to create such a file with restricted permissions compared to the main ".git/index" file. A bug related to this was spotted, fixed and tested for in df801f3f9f ("read-cache: use shared perms when writing shared index", 2017-06-25) and 3ee83f48e5 ("t1700: make sure split-index respects core.sharedrepository", 2017-06-25). However, as noted in those commits we'd still create the file as 0600, and would just re-chmod it depending on the setting of core.sharedRepository. So without core.splitIndex a system with e.g. the umask set to group writeability would work for the members of the group, but not with core.splitIndex set, as members of the group would not be able to access the shared index file. Let's instead make the two consistent by using mks_tempfile_sm() and passing 0666 in its `mode` argument. Note that we cannot use the create_tempfile() function itself that is used to write the main ".git/index" file because we want the XX part of the "sharedindex_XX" argument to be replaced by a pseudo random value and create_tempfile() doesn't do that. Ideally we'd split up the adjust_shared_perm() function to one that can give us the mode we want so we could just call open() instead of open() followed by chmod(), but that's an unrelated cleanup. We already have that minor issue with the "index" file #leftoverbits. Signed-off-by: Ævar Arnfjörð Bjarmason Signed-off-by: Christian Couder --- This is a simpler fix iterating from Ævar's RFC patch and the following discussions: https://public-inbox.org/git/20181113153235.25402-1-ava...@gmail.com/ read-cache.c | 3 ++- t/t1700-split-index.sh | 20 2 files changed, 22 insertions(+), 1 deletion(-) diff --git a/read-cache.c b/read-cache.c index 8c924506dd..ea80600bff 100644 --- a/read-cache.c +++ b/read-cache.c @@ -3165,7 +3165,8 @@ int write_locked_index(struct index_state *istate, struct lock_file *lock, struct tempfile *temp; int saved_errno; - temp = mks_tempfile(git_path("sharedindex_XX")); + /* Same permissions as the main .git/index file */ + temp = mks_tempfile_sm(git_path("sharedindex_XX"), 0, 0666); if (!temp) { oidclr(>base_oid); ret = do_write_locked_index(istate, lock, flags); diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh index 2ac47aa0e4..fa1d3d468b 100755 --- a/t/t1700-split-index.sh +++ b/t/t1700-split-index.sh @@ -381,6 +381,26 @@ test_expect_success 'check splitIndex.sharedIndexExpire set to "never" and "now" test $(ls .git/sharedindex.* | wc -l) -le 2 ' +test_expect_success POSIXPERM 'same mode for index & split index' ' + git init same-mode && + ( + cd same-mode && + test_commit A && + test_modebits .git/index >index_mode && + test_must_fail git config core.sharedRepository && + git -c core.splitIndex=true status && + shared=$(ls .git/sharedindex.*) && + case "$shared" in + *" "*) + # we have more than one??? + false ;; + *) + test_modebits "$shared" >split_index_mode && + test_cmp index_mode split_index_mode ;; + esac + ) +' + while read -r mode modebits do test_expect_success POSIXPERM "split index respects core.sharedrepository $mode" ' -- 2.19.1.1053.g063ed687ac
Re: [RFC/PATCH 0/5] stop installing old libexec aliases like "git-init"
On Fri, Nov 16, 2018 at 11:38 AM Ævar Arnfjörð Bjarmason wrote: > [...] > A follow-up on this: We should really fix this for other > reasons. I.e. compile in some "this is stuff we ourselves think is in > git". > > There's other manifestations of this, e.g.: > > git-sizer --help # => shows you help > git sizer --help # => says it doesn't have a manpage > > Because we aren't aware that git-sizer is some external tool, and that > we should route --help to it. That would be nice. This has been an annoying for several tools named `git-foo` that I have worked on (e.g., git-sizer, git-imerge, git-when-merged, plus many internal tools). > Non-withstanding the arguable bug that things like git-sizer shouldn't > be allowing themselves to be invoked by "git" like that without > guaranteeing that it can consume all the options 'git' expects. When I > had to deal with a similar problem in an external git-* command I was > maintaining I simply made it an error to invoke it as "git mything" > instead of "git-mything". Hmmm, I always thought that it was intended and supported that an external tool can name itself `git-foo` so that it can be invoked as `git foo`. Which git options do you think that such a tool should be expected to support? Many useful ones, like `-C `, `--git-dir=`, `--work-tree=`, `-c =`, and `--no-replace-objects`, work pretty much for free if the external tool uses `git` to interact with the repository. I use such options regularly with external tools. IMO it would be a regression for these tools to refuse to run when invoked as, say, `git -C path/to/repo foo`. Michael
[PATCH v2 2/2] merge: add scissors line on merge conflict
This fixes a bug where the scissors line is placed after the Conflicts: section, in the case where a merge conflict occurs and merge.cleanup = scissors. In addition, we give pull the passthrough option of --cleanup so that it can also take advantage of this change. Signed-off-by: Denton Liu --- Documentation/merge-options.txt | 6 + builtin/merge.c | 16 +++ builtin/pull.c | 6 + t/t7600-merge.sh| 48 + 4 files changed, 76 insertions(+) diff --git a/Documentation/merge-options.txt b/Documentation/merge-options.txt index 63a3fc0954..115e0ca6f0 100644 --- a/Documentation/merge-options.txt +++ b/Documentation/merge-options.txt @@ -27,6 +27,12 @@ they run `git merge`. To make it easier to adjust such scripts to the updated behaviour, the environment variable `GIT_MERGE_AUTOEDIT` can be set to `no` at the beginning of them. +--cleanup=:: + This option determines how the merge message will be cleaned up + before being passed on. Specifically, if the '' is given a + value of `scissors`, scissors will be prepended to the message in + the case of a merge conflict. See also linkgit:git-commit[1]. + --ff:: When the merge resolves as a fast-forward, only update the branch pointer, without creating a merge commit. This is the default diff --git a/builtin/merge.c b/builtin/merge.c index 8f4a5065c2..23a6e6bb93 100644 --- a/builtin/merge.c +++ b/builtin/merge.c @@ -36,6 +36,7 @@ #include "packfile.h" #include "tag.h" #include "alias.h" +#include "wt-status.h" #define DEFAULT_TWOHEAD (1<<0) #define DEFAULT_OCTOPUS (1<<1) @@ -96,6 +97,9 @@ enum ff_type { static enum ff_type fast_forward = FF_ALLOW; +static const char *cleanup_arg; +static int put_scissors; + static int option_parse_message(const struct option *opt, const char *arg, int unset) { @@ -243,6 +247,7 @@ static struct option builtin_merge_options[] = { N_("perform a commit if the merge succeeds (default)")), OPT_BOOL('e', "edit", _edit, N_("edit message before committing")), + OPT_STRING(0, "cleanup", _arg, N_("default"), N_("how to strip spaces and #comments from message")), OPT_SET_INT(0, "ff", _forward, N_("allow fast-forward (default)"), FF_ALLOW), OPT_SET_INT_F(0, "ff-only", _forward, N_("abort if fast-forward is not possible"), @@ -606,6 +611,8 @@ static int git_merge_config(const char *k, const char *v, void *cb) return git_config_string(_twohead, k, v); else if (!strcmp(k, "pull.octopus")) return git_config_string(_octopus, k, v); + else if (!strcmp(k, "commit.cleanup")) + return git_config_string(_arg, k, v); else if (!strcmp(k, "merge.renormalize")) option_renormalize = git_config_bool(k, v); else if (!strcmp(k, "merge.ff")) { @@ -894,6 +901,13 @@ static int suggest_conflicts(void) filename = git_path_merge_msg(the_repository); fp = xfopen(filename, "a"); + if (put_scissors) { + fputc('\n', fp); + wt_status_add_cut_line(fp); + /* comments out the newline from append_conflicts_hint */ + fputc(comment_line_char, fp); + } + append_conflicts_hint(); fputs(msgbuf.buf, fp); strbuf_release(); @@ -1402,6 +1416,8 @@ int cmd_merge(int argc, const char **argv, const char *prefix) if (option_edit < 0) option_edit = default_edit_option(); + put_scissors = cleanup_arg && !strcmp(cleanup_arg, "scissors"); + if (!use_strategies) { if (!remoteheads) ; /* already up-to-date */ diff --git a/builtin/pull.c b/builtin/pull.c index 681c127a07..88245bce0e 100644 --- a/builtin/pull.c +++ b/builtin/pull.c @@ -95,6 +95,7 @@ static char *opt_signoff; static char *opt_squash; static char *opt_commit; static char *opt_edit; +static char *opt_cleanup; static char *opt_ff; static char *opt_verify_signatures; static int opt_autostash = -1; @@ -162,6 +163,9 @@ static struct option pull_options[] = { OPT_PASSTHRU(0, "edit", _edit, NULL, N_("edit message before committing"), PARSE_OPT_NOARG), + OPT_PASSTHRU(0, "cleanup", _cleanup, NULL, + N_("how to strip spaces and #comments from message"), + PARSE_OPT_NOARG), OPT_PASSTHRU(0, "ff", _ff, NULL, N_("allow fast-forward"), PARSE_OPT_NOARG), @@ -625,6 +629,8 @@ static int run_merge(void) argv_array_push(, opt_commit); if (opt_edit) argv_array_push(, opt_edit); + if (opt_cleanup) + argv_array_push(, opt_cleanup); if (opt_ff) argv_array_push(, opt_ff); if (opt_verify_signatures) diff --git
[PATCH v2 1/2] commit: don't add scissors line if one exists in MERGE_MSG
If commit.cleanup = scissors is specified, don't produce a scissors line if one already exists in the MERGE_MSG file. Signed-off-by: Denton Liu --- builtin/commit.c | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/builtin/commit.c b/builtin/commit.c index 0d9828e29e..a74a324b7a 100644 --- a/builtin/commit.c +++ b/builtin/commit.c @@ -659,6 +659,7 @@ static int prepare_to_commit(const char *index_file, const char *prefix, const char *hook_arg2 = NULL; int clean_message_contents = (cleanup_mode != COMMIT_MSG_CLEANUP_NONE); int old_display_comment_prefix; + int merge_contains_scissors = 0; /* This checks and barfs if author is badly specified */ determine_author_info(author_ident); @@ -719,6 +720,8 @@ static int prepare_to_commit(const char *index_file, const char *prefix, strbuf_addbuf(, ); hook_arg1 = "message"; } else if (!stat(git_path_merge_msg(the_repository), )) { + size_t merge_msg_start; + /* * prepend SQUASH_MSG here if it exists and a * "merge --squash" was originally performed @@ -729,8 +732,14 @@ static int prepare_to_commit(const char *index_file, const char *prefix, hook_arg1 = "squash"; } else hook_arg1 = "merge"; + + merge_msg_start = sb.len; if (strbuf_read_file(, git_path_merge_msg(the_repository), 0) < 0) die_errno(_("could not read MERGE_MSG")); + + if (cleanup_mode == COMMIT_MSG_CLEANUP_SCISSORS && + wt_status_locate_end(sb.buf + merge_msg_start, sb.len - merge_msg_start) < sb.len - merge_msg_start) + merge_contains_scissors = 1; } else if (!stat(git_path_squash_msg(the_repository), )) { if (strbuf_read_file(, git_path_squash_msg(the_repository), 0) < 0) die_errno(_("could not read SQUASH_MSG")); @@ -798,7 +807,8 @@ static int prepare_to_commit(const char *index_file, const char *prefix, struct ident_split ci, ai; if (whence != FROM_COMMIT) { - if (cleanup_mode == COMMIT_MSG_CLEANUP_SCISSORS) + if (cleanup_mode == COMMIT_MSG_CLEANUP_SCISSORS && + !merge_contains_scissors) wt_status_add_cut_line(s->fp); status_printf_ln(s, GIT_COLOR_NORMAL, whence == FROM_MERGE @@ -824,7 +834,8 @@ static int prepare_to_commit(const char *index_file, const char *prefix, " Lines starting\nwith '%c' will be ignored, and an empty" " message aborts the commit.\n"), comment_line_char); else if (cleanup_mode == COMMIT_MSG_CLEANUP_SCISSORS && -whence == FROM_COMMIT) +whence == FROM_COMMIT && +!merge_contains_scissors) wt_status_add_cut_line(s->fp); else /* COMMIT_MSG_CLEANUP_SPACE, that is. */ status_printf(s, GIT_COLOR_NORMAL, -- 2.19.1
[PATCH v2 0/2] Fix scissors bug during merge conflict
Thanks for your feedback, Junio. I tried to reroll the patch by adding another option into the MERGE_MODE file but unfortunately, it didn't work completely because doing `merge --squash` doesn't produce a MERGE_MODE. In addition to this, because of the way merge and commit were structured, I needed to reorder a lot of calls because some variables were only being set after I needed them. Unless we want to produce a MERGE_MODE during --squash (which I don't think is worth it) I don't think that this is the way to go. Instead, I just refined my first approach and only checked the contents of MERGE_MSG for a scissors line. The MERGE_MSG is going to be machine-generated anyway so we should be safe from accidentally ignoring a human-placed scissors line. Changes since V1: - * Only check MERGE_MSG for a scissors line instead of all prepended files * Make a variable static in merge where appropriate * Add passthrough options in pull * Add documentation for the new option * Add tests to ensure desired behaviour Denton Liu (2): commit: don't add scissors line if one exists merge: add scissors line on merge conflict Documentation/merge-options.txt | 6 + builtin/commit.c| 15 +-- builtin/merge.c | 16 +++ builtin/pull.c | 6 + t/t7600-merge.sh| 48 + 5 files changed, 89 insertions(+), 2 deletions(-) -- 2.19.1
Re: [PATCH] bundle: dup() output descriptor closer to point-of-use
Hi Peff, On Fri, 16 Nov 2018, Jeff King wrote: > On Thu, Nov 15, 2018 at 09:01:07PM +0100, Johannes Schindelin wrote: > > > > It seems like we should be checking that the stale lockfile isn't left, > > > which is the real problem (the warning is annoying, but is a symptom of > > > the same thing). I.e., something like: > > > > > > test_must_fail git bundle create foobar.bundle master..master && > > > test_path_is_missing foobar.bundle.lock > > > > > > That would already pass on non-Windows platforms, but that's OK. It will > > > never give a false failure. > > > > > > If you don't mind, can you confirm that the test above fails without > > > either of the two patches under discussion? > > > > This test succeeds with your patch as well as with Gaël's, and fails when > > neither patch is applied. So you're right, it is the much better test. > > Thanks for checking! > > > > > Do you want to integrate this test into your patch and run with it, or > > > > do you want me to shepherd your patch? > > > > > > I'll wrap it up with a commit message and a test. > > Actually, I realized there's an even simpler way to do this. Here it is. > > -- >8 -- > Subject: [PATCH] bundle: dup() output descriptor closer to point-of-use > > When writing a bundle to a file, the bundle code actually creates > "your.bundle.lock" using our lockfile interface. We feed that output > descriptor to a child git-pack-objects via run-command, which has the > quirk that it closes the output descriptor in the parent. > > To avoid confusing the lockfile code (which still thinks the descriptor > is valid), we dup() it, and operate on the duplicate. > > However, this has a confusing side effect: after the dup() but before we > call pack-objects, we have _two_ descriptors open to the lockfile. If we > call die() during that time, the lockfile code will try to clean up the > partially-written file. It knows to close() the file before unlinking, > since on some platforms (i.e., Windows) the open file would block the > deletion. But it doesn't know about the duplicate descriptor. On > Windows, triggering an error at the right part of the code will result > in the cleanup failing and the lockfile being left in the filesystem. > > We can solve this by moving the dup() much closer to start_command(), > shrinking the window in which we have the second descriptor open. It's > easy to place this in such a way that no die() is possible. We could > still die due to a signal in the exact wrong moment, but we already > tolerate races there (e.g., a signal could come before we manage to put > the file on the cleanup list in the first place). > > As a bonus, this shields create_bundle() itself from the duplicate-fd > trick, and we can simplify its error handling (note that the lock > rollback now happens unconditionally, but that's OK; it's a noop if we > didn't open the lock in the first place). > > The included test uses an empty bundle to cause a failure at the right > spot in the code, because that's easy to trigger (the other likely > errors are write() problems like ENOSPC). Note that it would already > pass on non-Windows systems (because they are happy to unlink an > already-open file). Thanks, this is a very nice explanation (and now that I do not feel so stressed as I did yesterday, I can easily wrap my head around it). I can confirm that the test fails without the changes to bundle.c, and succeeds with the changes. Thank you so much! Dscho > Based-on-a-patch-by: Gaël Lhez > Signed-off-by: Jeff King > --- > bundle.c| 39 ++- > t/t5607-clone-bundle.sh | 6 ++ > 2 files changed, 24 insertions(+), 21 deletions(-) > > diff --git a/bundle.c b/bundle.c > index 1ef584b93b..6b0f6d8f10 100644 > --- a/bundle.c > +++ b/bundle.c > @@ -243,7 +243,7 @@ static int is_tag_in_date_range(struct object *tag, > struct rev_info *revs) > } > > > -/* Write the pack data to bundle_fd, then close it if it is > 1. */ > +/* Write the pack data to bundle_fd */ > static int write_pack_data(int bundle_fd, struct rev_info *revs) > { > struct child_process pack_objects = CHILD_PROCESS_INIT; > @@ -256,6 +256,20 @@ static int write_pack_data(int bundle_fd, struct > rev_info *revs) > pack_objects.in = -1; > pack_objects.out = bundle_fd; > pack_objects.git_cmd = 1; > + > + /* > + * start_command() will close our descriptor if it's >1. Duplicate it > + * to avoid surprising the caller. > + */ > + if (pack_objects.out > 1) { > + pack_objects.out = dup(pack_objects.out); > + if (pack_objects.out < 0) { > + error_errno(_("unable to dup bundle descriptor")); > + child_process_clear(_objects); > + return -1; > + } > + } > + > if (start_command(_objects)) > return error(_("Could not spawn pack-objects")); > > @@ -421,21 +435,10 @@ int
Re: [PATCH] ref-filter: don't look for objects when outside of a repository
On Fri, Nov 16, 2018 at 02:09:07PM +0900, Junio C Hamano wrote: > Jeff King writes: > > > On Thu, Nov 15, 2018 at 04:38:44AM -0500, Jeff King wrote: > > > >> Is SOURCE_NONE a complete match for what we want? > >> > >> I see problems in both directions: > >> > >> - sorting by "objectname" works now, but it's marked with SOURCE_OBJ, > >>and would be forbidden with your patch. I'm actually not sure if > >>SOURCE_OBJ is accurate; we shouldn't need to access the object to > >>show it (and we are probably wasting effort loading the full contents > >>for tools like for-each-ref). > >> > >>However, that's not the full story. For objectname:short, it _does_ call > >>find_unique_abbrev(). So we expect to have an object directory. > > > > Oops, I'm apparently bad at reading. It is in fact SOURCE_OTHER, which > > makes sense (outside of this whole "--sort outside a repo thing"). > > > > But we'd ideally distinguish between "objectname" (which should be OK > > outside a repo) and "objectname:short" (which currently segfaults). > > Arguably, use of ref-filter machinery in ls-remote, whether it is > given from inside or outside a repo, was a mistake in 1fb20dfd > ("ls-remote: create '--sort' option", 2018-04-09), as the whole > point of "ls-remote" is to peek the list of refs and it is perfectly > normal that the objects listed are not available. I hope that one day 'git ls-remote' will learn to '--format=...' its output, and I think that (re)using the ref-filter machinery would be the right way to go to achive that. Sure, ref-filter supports a lot of format specifiers that don't at all make sense in the context of 'ls-remote' (perhaps we should have a dedicated set of valid_atoms for that), but I think it's perfectly reasonable to do something like: git ls-remote --format=%(refname:strip=2) remote A concrete use case for that could be to eliminate the last remaining shell loops from refs completion.
Re: [PATCHv3 00/23] Bring more repository handles into our code base
On 11/13/2018 7:12 PM, Stefan Beller wrote: Please have a look at the last 4 patches specifically as they were new in the last iteration (but did not receive any comment), as they demonstrate and fix a problem that is only exposed when using GIT_TEST_COMMIT_GRAPH=1 for the test suite. I took a look at these last four and didn't find anything wrong. Rest of the series looks good. While all of the TODOs in the last patch are an imperfect solution, I think it's probably the best we can do for now. Thanks, -Stolee
Re: git pull --rebase=preserve is always rebasing something, even on up-to-date branch
Hi Mikhail, On Mon, 17 Sep 2018, Mikhail Matrosov wrote: > Please try the following: > > mmatrosov@Mikhail-PC:~/test$ git init --bare server > Initialized empty Git repository in /home/mmatrosov/test/server/ > mmatrosov@Mikhail-PC:~/test$ git clone server local > Cloning into 'local'... > warning: You appear to have cloned an empty repository. > done. > mmatrosov@Mikhail-PC:~/test$ cd local > mmatrosov@Mikhail-PC:~/test/local$ echo a > a && git add . && git commit -m A > [master (root-commit) a34c21f] A > 1 file changed, 1 insertion(+) > create mode 100644 a > mmatrosov@Mikhail-PC:~/test/local$ git push > Counting objects: 3, done. > Writing objects: 100% (3/3), 205 bytes | 0 bytes/s, done. > Total 3 (delta 0), reused 0 (delta 0) > To /home/mmatrosov/test/server > * [new branch] master -> master > mmatrosov@Mikhail-PC:~/test/local$ git pull > Already up-to-date. > mmatrosov@Mikhail-PC:~/test/local$ git pull --rebase=preserve > Rebasing (1/1) > Successfully rebased and updated refs/heads/master. > > As you can see, running bare "git pull" just tells me everything is up > to date. However, running "git pull --rebase=preserve" triggers > rebasing of something. This is most likely a `noop` command. > It wont be a problem if it didn't take significant time (especially on > Windows). Why this rebase happens? It is completely redundant and slows > down the pull operation. Looks like a bug to me. It is an implementation detail, and if you want to work on fixing it, please let me know about your availability and C skill level. > > Note that it is important to me, because I want to set "git config > --global pull.rebase preserve". Please note that `preserve` is on its way to deprecation, superseded by `pull.rebase = merges`. Which may, or may not, share the performance penalty you reported. Probably not, though. Ciao, Johannes > But because of this issue, pulling on an up-to-date repository takes a > lot of time. Which is very frustrating. > > Tested with: > * git version 2.19.0.windows.1 in Windows 10 Version 1803 > * git version 2.7.4 in Ubuntu 16.04.3 LTS (inside WSL) > > - > Best regards, Mikhail Matrosov >
Hello Dear I Need An Investment Partner
-- Hello Dear , I came across your contact during my private search Mrs Aisha Al-Qaddafi is my name, the only daughter of late Libyan president, I have funds the sum of $27.5 million USD for investment, I am interested in you for investment project assistance in your country, i shall compensate you 30% of the total sum after the funds are transfer into your account, Reply me urgent for more details Mrs Aisha Al-Qaddafi --
Re: [PATCH v3 11/11] fast-export: add a --show-original-ids option to show original names
On Thu, Nov 15, 2018 at 11:59:56PM -0800, Elijah Newren wrote: > diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh > index d7d73061d0..5690fe2810 100755 > --- a/t/t9350-fast-export.sh > +++ b/t/t9350-fast-export.sh > @@ -77,6 +77,23 @@ test_expect_success 'fast-export > --reference-excluded-parents master~2..master' >test $MASTER = $(git rev-parse --verify refs/heads/rewrite)) > ' > > +test_expect_success 'fast-export --show-original-ids' ' > + > + git fast-export --show-original-ids master >output && > + grep ^original-oid output| sed -e s/^original-oid.// | sort >actual && Nit: 'sed' can do what this 'grep' does: sed -n -e s/^original-oid.//p output | sort >actual && thus sparing a process. > + git rev-list --objects master muss >objects-and-names && > + awk "{print \$1}" objects-and-names | sort >commits-trees-blobs && > + comm -23 actual commits-trees-blobs >unfound && > + test_must_be_empty unfound > +' > + > +test_expect_success 'fast-export --show-original-ids | git fast-import' ' > + > + git fast-export --show-original-ids master muss | git fast-import > --quiet && > + test $MASTER = $(git rev-parse --verify refs/heads/master) && > + test $MUSS = $(git rev-parse --verify refs/tags/muss) > +' > + > test_expect_success 'iso-8859-1' ' > > git config i18n.commitencoding ISO8859-1 && > -- > 2.19.1.1063.g1796373474.dirty >
Re: insteadOf and git-request-pull output
On Thu, Nov 15, 2018 at 01:28:26PM -0500, Konstantin Ryabitsev wrote: > Hi, all: > > Looks like setting url.insteadOf rules alters the output of > git-request-pull. I'm not sure that's the intended use of insteadOf, > which is supposed to replace URLs for local use, not to expose them > publicly (but I may be wrong). E.g.: > > $ git request-pull HEAD^ git://foo.example.com/example | grep example > git://foo.example.com/example > > $ git config url.ssh://bar.insteadOf git://foo > > $ git request-pull HEAD^ git://foo.example.com/example | grep example > ssh://bar.example.com/example > > I think that if we use the "principle of least surprise," insteadOf > rules shouldn't be applied for git-request-pull URLs. I'd like to point out a different use that may change your view. I have an insteadOf alias, gh:, that points to GitHub. Performing the rewrite is definitely the right thing to do, since other users may not have my alias available. I agree that in your case, a rewrite seems less appropriate, but I think we should only skip the rewrite if the value already matches a valid URL. -- brian m. carlson: Houston, Texas, US OpenPGP: https://keybase.io/bk2204 signature.asc Description: PGP signature
[PATCH v1 1/9] diff: document --no-color-moved
From: Phillip Wood Add documentation for --no-color-moved. Signed-off-by: Phillip Wood --- Documentation/diff-options.txt | 4 1 file changed, 4 insertions(+) diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt index 0378cd574e..151690f814 100644 --- a/Documentation/diff-options.txt +++ b/Documentation/diff-options.txt @@ -293,6 +293,10 @@ dimmed-zebra:: `dimmed_zebra` is a deprecated synonym. -- +--no-color-moved:: + Turn off move detection. This can be used to override configuration + settings. It is the same as `--color-moved=no`. + --color-moved-ws=:: This configures how white spaces are ignored when performing the move detection for `--color-moved`. -- 2.19.1
[PATCH v1 4/9] diff --color-moved-ws: demonstrate false positives
From: Phillip Wood 'diff --color-moved-ws=allow-indentation-change' can highlight lines that have internal whitespace changes rather than indentation changes. For example in commit 1a07e59c3e ("Update messages in preparation for i18n", 2018-07-21) the lines - die (_("must end with a color")); + die(_("must end with a color")); are highlighted as moved when they should not be. Modify an existing test to show the problem that will be fixed in the next commit. Signed-off-by: Phillip Wood --- t/t4015-diff-whitespace.sh | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh index a9fb226c5a..eee81a1987 100755 --- a/t/t4015-diff-whitespace.sh +++ b/t/t4015-diff-whitespace.sh @@ -1809,7 +1809,7 @@ test_expect_success 'only move detection ignores white spaces' ' test_cmp expected actual ' -test_expect_success 'compare whitespace delta across moved blocks' ' +test_expect_failure 'compare whitespace delta across moved blocks' ' git reset --hard && q_to_tab <<-\EOF >text.txt && @@ -1827,6 +1827,7 @@ test_expect_success 'compare whitespace delta across moved blocks' ' QQQthat has similar lines QQQto previous blocks, but with different indent QQQYetQAnotherQoutlierQ + QLine with internal w h i t e s p a c e change EOF git add text.txt && @@ -1847,6 +1848,7 @@ test_expect_success 'compare whitespace delta across moved blocks' ' QQthat has similar lines QQto previous blocks, but with different indent QQYetQAnotherQoutlier + QLine with internal whitespace change EOF git diff --color --color-moved --color-moved-ws=allow-indentation-change >actual.raw && @@ -1856,7 +1858,7 @@ test_expect_success 'compare whitespace delta across moved blocks' ' diff --git a/text.txt b/text.txt --- a/text.txt +++ b/text.txt - @@ -1,14 +1,14 @@ + @@ -1,15 +1,15 @@ -QIndented -QText across -Qsome lines @@ -1871,6 +1873,7 @@ test_expect_success 'compare whitespace delta across moved blocks' ' -QQQthat has similar lines -QQQto previous blocks, but with different indent -QQQYetQAnotherQoutlierQ + -QLine with internal w h i t e s p a c e change +QQIndented +QQText across +QQsome lines @@ -1885,6 +1888,7 @@ test_expect_success 'compare whitespace delta across moved blocks' ' +QQthat has similar lines +QQto previous blocks, but with different indent +QQYetQAnotherQoutlier + +QLine with internal whitespace change EOF test_cmp expected actual -- 2.19.1
[PATCH v1 3/9] diff: allow --no-color-moved-ws
From: Phillip Wood Allow --no-color-moved-ws and --color-moved-ws=no to cancel any previous --color-moved-ws option. Signed-off-by: Phillip Wood --- Documentation/diff-options.txt | 7 +++ diff.c | 6 +- 2 files changed, 12 insertions(+), 1 deletion(-) diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt index 57a2f4cb7a..e1744fa80d 100644 --- a/Documentation/diff-options.txt +++ b/Documentation/diff-options.txt @@ -306,6 +306,8 @@ endif::git-diff[] These modes can be given as a comma separated list: + -- +no:: + Do not ignore whitespace when performing move detection. ignore-space-at-eol:: Ignore changes in whitespace at EOL. ignore-space-change:: @@ -322,6 +324,11 @@ allow-indentation-change:: other modes. -- +--no-color-moved-ws:: + Do not ignore whitespace when performing move detection. This can be + used to override configuration settings. It is the same as + `--color-moved-ws=no`. + --word-diff[=]:: Show a word diff, using the to delimit changed words. By default, words are delimited by whitespace; see diff --git a/diff.c b/diff.c index 78cd3958f4..9b9811988b 100644 --- a/diff.c +++ b/diff.c @@ -304,7 +304,9 @@ static int parse_color_moved_ws(const char *arg) strbuf_addstr(, i->string); strbuf_trim(); - if (!strcmp(sb.buf, "ignore-space-change")) + if (!strcmp(sb.buf, "no")) + ret = 0; + else if (!strcmp(sb.buf, "ignore-space-change")) ret |= XDF_IGNORE_WHITESPACE_CHANGE; else if (!strcmp(sb.buf, "ignore-space-at-eol")) ret |= XDF_IGNORE_WHITESPACE_AT_EOL; @@ -5008,6 +5010,8 @@ int diff_opt_parse(struct diff_options *options, if (cm < 0) die("bad --color-moved argument: %s", arg); options->color_moved = cm; + } else if (!strcmp(arg, "--no-color-moved-ws")) { + options->color_moved_ws_handling = 0; } else if (skip_prefix(arg, "--color-moved-ws=", )) { options->color_moved_ws_handling = parse_color_moved_ws(arg); } else if (skip_to_optional_arg_default(arg, "--color-words", >word_regex, NULL)) { -- 2.19.1
[PATCH v1 2/9] diff: use whitespace consistently
From: Phillip Wood Most of the documentation uses 'whitespace' rather than 'white space' or 'white spaces' convert to latter two to the former for consistency. Signed-off-by: Phillip Wood --- Documentation/diff-options.txt | 4 ++-- diff.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt index 151690f814..57a2f4cb7a 100644 --- a/Documentation/diff-options.txt +++ b/Documentation/diff-options.txt @@ -298,7 +298,7 @@ dimmed-zebra:: settings. It is the same as `--color-moved=no`. --color-moved-ws=:: - This configures how white spaces are ignored when performing the + This configures how whitespace is ignored when performing the move detection for `--color-moved`. ifdef::git-diff[] It can be set by the `diff.colorMovedWS` configuration setting. @@ -316,7 +316,7 @@ ignore-all-space:: Ignore whitespace when comparing lines. This ignores differences even if one line has whitespace where the other line has none. allow-indentation-change:: - Initially ignore any white spaces in the move detection, then + Initially ignore any whitespace in the move detection, then group the moved code blocks only into a block if the change in whitespace is the same per line. This is incompatible with the other modes. diff --git a/diff.c b/diff.c index c29b1cce14..78cd3958f4 100644 --- a/diff.c +++ b/diff.c @@ -320,7 +320,7 @@ static int parse_color_moved_ws(const char *arg) if ((ret & COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) && (ret & XDF_WHITESPACE_FLAGS)) - die(_("color-moved-ws: allow-indentation-change cannot be combined with other white space modes")); + die(_("color-moved-ws: allow-indentation-change cannot be combined with other whitespace modes")); string_list_clear(, 0); -- 2.19.1
[PATCH v1 8/9] diff --color-moved-ws: modify allow-indentation-change
From: Phillip Wood Currently diff --color-moved-ws=allow-indentation-change does not support indentation that contains a mix of tabs and spaces. For example in commit 546f70f377 ("convert.h: drop 'extern' from function declaration", 2018-06-30) the function parameters in the following lines are not colored as moved [1]. -extern int stream_filter(struct stream_filter *, -const char *input, size_t *isize_p, -char *output, size_t *osize_p); +int stream_filter(struct stream_filter *, + const char *input, size_t *isize_p, + char *output, size_t *osize_p); This commit changes the way the indentation is handled to track the visual size of the indentation rather than the characters in the indentation. This has they benefit that any whitespace errors do not interfer with the move detection (the whitespace errors will still be highlighted according to --ws-error-highlight). During the discussion of this feature there were concerns about the correct detection of indentation for python. However those concerns apply whether or not we're detecting moved lines so no attempt is made to determine if the indentation is 'pythonic'. [1] Note that before the commit to fix the erroneous coloring of moved lines each line was colored as a different block, since that commit they are uncolored. Signed-off-by: Phillip Wood --- Notes: Changes since rfc: - It now replaces the existing implementation rather than adding a new mode. - The indentation deltas are now calculated once for each line and cached. - Optimized the whitespace delta comparison to compare string lengths before comparing the actual strings. - Modified the calculation of tabs as suggested by Stefan. - Split out the blank line handling into a separate commit as suggest by Stefan. - Fixed some comments pointed out by Stefan. diff.c | 130 + t/t4015-diff-whitespace.sh | 56 2 files changed, 129 insertions(+), 57 deletions(-) diff --git a/diff.c b/diff.c index c378ce3daf..89559293e7 100644 --- a/diff.c +++ b/diff.c @@ -750,6 +750,8 @@ struct emitted_diff_symbol { const char *line; int len; int flags; + int indent_off; + int indent_width; enum diff_symbol s; }; #define EMITTED_DIFF_SYMBOL_INIT {NULL} @@ -780,44 +782,68 @@ struct moved_entry { struct moved_entry *next_line; }; -/** - * The struct ws_delta holds white space differences between moved lines, i.e. - * between '+' and '-' lines that have been detected to be a move. - * The string contains the difference in leading white spaces, before the - * rest of the line is compared using the white space config for move - * coloring. The current_longer indicates if the first string in the - * comparision is longer than the second. - */ -struct ws_delta { - char *string; - unsigned int current_longer : 1; -}; -#define WS_DELTA_INIT { NULL, 0 } - struct moved_block { struct moved_entry *match; - struct ws_delta wsd; + int wsd; /* The whitespace delta of this block */ }; static void moved_block_clear(struct moved_block *b) { - FREE_AND_NULL(b->wsd.string); - b->match = NULL; + memset(b, 0, sizeof(*b)); +} + +static void fill_es_indent_data(struct emitted_diff_symbol *es) +{ + unsigned int off = 0; + int width = 0, tab_width = es->flags & WS_TAB_WIDTH_MASK; + const char *s = es->line; + const int len = es->len; + + /* skip any \v \f \r at start of indentation */ + while (s[off] == '\f' || s[off] == '\v' || + (s[off] == '\r' && off < len - 1)) + off++; + + /* calculate the visual width of indentation */ + while(1) { + if (s[off] == ' ') { + width++; + off++; + } else if (s[off] == '\t') { + width += tab_width - (width % tab_width); + while (s[++off] == '\t') + width += tab_width; + } else { + break; + } + } + + es->indent_off = off; + es->indent_width = width; } static int compute_ws_delta(const struct emitted_diff_symbol *a, -const struct emitted_diff_symbol *b, -struct ws_delta *out) + const struct emitted_diff_symbol *b, + int *out) { - const struct emitted_diff_symbol *longer = a->len > b->len ? a : b; - const struct emitted_diff_symbol *shorter = a->len > b->len ? b : a; - int d = longer->len - shorter->len; + int a_len = a->len, + b_len = b->len, + a_off = a->indent_off, + a_width = a->indent_width, + b_off =
[PATCH v1 0/9] diff --color-moved-ws fixes and enhancment
From: Phillip Wood When trying out the new --color-moved-ws=allow-indentation-change I was disappointed to discover it did not work if the indentation contains a mix of spaces and tabs. This series reworks it so that it does. Since the rfc this series has grown a few fixes at the beginning. The implementation has been reworked, the last two patches correspond to a heavily reworked version the last patch of the rfc version, all the other patches are new. Phillip Wood (9): diff: document --no-color-moved diff: use whitespace consistently diff: allow --no-color-moved-ws diff --color-moved-ws: demonstrate false positives diff --color-moved-ws: fix false positives diff --color-moved=zebra: be stricter with color alternation diff --color-moved-ws: optimize allow-indentation-change diff --color-moved-ws: modify allow-indentation-change diff --color-moved-ws: handle blank lines Documentation/diff-options.txt | 15 ++- diff.c | 219 + t/t4015-diff-whitespace.sh | 99 ++- 3 files changed, 251 insertions(+), 82 deletions(-) -- 2.19.1
[PATCH v1 5/9] diff --color-moved-ws: fix false positives
From: Phillip Wood 'diff --color-moved-ws=allow-indentation-change' can color lines as moved when they are in fact different. For example in commit 1a07e59c3e ("Update messages in preparation for i18n", 2018-07-21) the lines - die (_("must end with a color")); + die(_("must end with a color")); are colored as moved even though they are different. This is because if there is a fuzzy match for the first line of a potential moved block the line is marked as moved before the potential match is checked to see if it actually matches. The fix is to delay marking the line as moved until after we have checked that there really is at least one matching potential moved block. Note that the test modified in the last commit still fails because adding an unmoved line between two moved blocks that are already separated by unmoved lines changes the color of the block following the addition. This should not be the case and will be fixed in the next commit. Signed-off-by: Phillip Wood --- diff.c | 15 +-- 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/diff.c b/diff.c index 9b9811988b..53a7ab5aca 100644 --- a/diff.c +++ b/diff.c @@ -1104,10 +1104,10 @@ static void mark_color_as_moved(struct diff_options *o, continue; } - l->flags |= DIFF_SYMBOL_MOVED_LINE; - - if (o->color_moved == COLOR_MOVED_PLAIN) + if (o->color_moved == COLOR_MOVED_PLAIN) { + l->flags |= DIFF_SYMBOL_MOVED_LINE; continue; + } if (o->color_moved_ws_handling & COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) @@ -1141,10 +1141,13 @@ static void mark_color_as_moved(struct diff_options *o, block_length = 0; } - block_length++; + if (pmb_nr) { + block_length++; - if (flipped_block && o->color_moved != COLOR_MOVED_BLOCKS) - l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT; + l->flags |= DIFF_SYMBOL_MOVED_LINE; + if (flipped_block && o->color_moved != COLOR_MOVED_BLOCKS) + l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT; + } } adjust_last_block(o, n, block_length); -- 2.19.1
[PATCH v1 9/9] diff --color-moved-ws: handle blank lines
From: Phillip Wood When using --color-moved-ws=allow-indentation-change allow lines with the same indentation change to be grouped across blank lines. For now this only works if the blank lines have been moved as well, not for blocks that have just had their indentation changed. This completes the changes to the implementation of --color-moved=allow-indentation-change. Running git diff --color-moved=allow-indentation-change v2.18.0 v2.19.0 now takes 5.0s. This is a saving of 41% from 8.5s for the optimized version of the previous implementation and 66% from the original which took 14.6s. Signed-off-by: Phillip Wood --- Notes: Changes since rfc: - Split these changes into a separate commit. - Detect blank lines when processing the indentation rather than parsing each line twice. - Tweaked the test to make it harder as suggested by Stefan. - Added timing data to the commit message. diff.c | 34 --- t/t4015-diff-whitespace.sh | 41 ++ 2 files changed, 68 insertions(+), 7 deletions(-) diff --git a/diff.c b/diff.c index 89559293e7..072b5bced6 100644 --- a/diff.c +++ b/diff.c @@ -792,9 +792,11 @@ static void moved_block_clear(struct moved_block *b) memset(b, 0, sizeof(*b)); } +#define INDENT_BLANKLINE INT_MIN + static void fill_es_indent_data(struct emitted_diff_symbol *es) { - unsigned int off = 0; + unsigned int off = 0, i; int width = 0, tab_width = es->flags & WS_TAB_WIDTH_MASK; const char *s = es->line; const int len = es->len; @@ -818,8 +820,18 @@ static void fill_es_indent_data(struct emitted_diff_symbol *es) } } - es->indent_off = off; - es->indent_width = width; + /* check if this line is blank */ + for (i = off; i < len; i++) + if (!isspace(s[i])) + break; + + if (i == len) { + es->indent_width = INDENT_BLANKLINE; + es->indent_off = len; + } else { + es->indent_off = off; + es->indent_width = width; + } } static int compute_ws_delta(const struct emitted_diff_symbol *a, @@ -834,6 +846,11 @@ static int compute_ws_delta(const struct emitted_diff_symbol *a, b_width = b->indent_width; int delta; + if (a_width == INDENT_BLANKLINE && b_width == INDENT_BLANKLINE) { + *out = INDENT_BLANKLINE; + return 1; + } + if (a->s == DIFF_SYMBOL_PLUS) delta = a_width - b_width; else @@ -877,6 +894,10 @@ static int cmp_in_block_with_wsd(const struct diff_options *o, if (al != bl) return 1; + /* If 'l' and 'cur' are both blank then they match. */ + if (a_width == INDENT_BLANKLINE && c_width == INDENT_BLANKLINE) + return 0; + /* * The indent changes of the block are known and stored in pmb->wsd; * however we need to check if the indent changes of the current line @@ -888,6 +909,13 @@ static int cmp_in_block_with_wsd(const struct diff_options *o, else delta = c_width - a_width; + /* +* If the previous lines of this block were all blank then set its +* whitespace delta. +*/ + if (pmb->wsd == INDENT_BLANKLINE) + pmb->wsd = delta; + return !(delta == pmb->wsd && al - a_off == cl - c_off && !memcmp(a, b, al) && ! memcmp(a + a_off, c + c_off, al - a_off)); diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh index e023839ba6..9d6f88b07f 100755 --- a/t/t4015-diff-whitespace.sh +++ b/t/t4015-diff-whitespace.sh @@ -1901,10 +1901,20 @@ test_expect_success 'compare whitespace delta incompatible with other space opti test_i18ngrep allow-indentation-change err ' +EMPTY='' test_expect_success 'compare mixed whitespace delta across moved blocks' ' git reset --hard && tr Q_ "\t " <<-EOF >text.txt && + ${EMPTY} + too short without + ${EMPTY} + ___being grouped across blank line + ${EMPTY} + context + lines + to + anchor Indented text to _Qbe further indented by four spaces across Qseveral lines @@ -1918,9 +1928,18 @@ test_expect_success 'compare mixed whitespace delta across moved blocks' ' git commit -m "add text.txt" && tr Q_ "\t " <<-EOF >text.txt && + context + lines + to + anchor QIndented text to QQbe further indented by four spaces across Qseveral lines + ${EMPTY} + QQtoo short without + ${EMPTY} + Q___being grouped across blank line + ${EMPTY} Q_QThese two lines have had their indentation reduced by four spaces QQdifferent indentation change @@
[PATCH v1 6/9] diff --color-moved=zebra: be stricter with color alternation
From: Phillip Wood Currently when using --color-moved=zebra the color of moved blocks depends on the number of lines separating them. This means that adding an odd number of unmoved lines between blocks that are already separated by one or more unmoved lines will change the color of subsequent moved blocks. This does not make much sense as the blocks were already separated by unmoved lines and causes problems when adding lines to test cases. Fix this by only using the alternate colors for adjacent moved blocks. Signed-off-by: Phillip Wood --- Notes: An alternative would be to always alternate the color of blocks whether are not they are adjacent to each other. diff.c | 27 +++ t/t4015-diff-whitespace.sh | 6 +++--- 2 files changed, 22 insertions(+), 11 deletions(-) diff --git a/diff.c b/diff.c index 53a7ab5aca..8c08dd68df 100644 --- a/diff.c +++ b/diff.c @@ -1038,26 +1038,30 @@ static int shrink_potential_moved_blocks(struct moved_block *pmb, * The last block consists of the (n - block_length)'th line up to but not * including the nth line. * + * Returns 0 if the last block is empty or is unset by this function, non zero + * otherwise. + * * NEEDSWORK: This uses the same heuristic as blame_entry_score() in blame.c. * Think of a way to unify them. */ -static void adjust_last_block(struct diff_options *o, int n, int block_length) +static int adjust_last_block(struct diff_options *o, int n, int block_length) { int i, alnum_count = 0; if (o->color_moved == COLOR_MOVED_PLAIN) - return; + return block_length; for (i = 1; i < block_length + 1; i++) { const char *c = o->emitted_symbols->buf[n - i].line; for (; *c; c++) { if (!isalnum(*c)) continue; alnum_count++; if (alnum_count >= COLOR_MOVED_MIN_ALNUM_COUNT) - return; + return 1; } } for (i = 1; i < block_length + 1; i++) o->emitted_symbols->buf[n - i].flags &= ~DIFF_SYMBOL_MOVED_LINE; + return 0; } /* Find blocks of moved code, delegate actual coloring decision to helper */ @@ -1067,14 +1071,15 @@ static void mark_color_as_moved(struct diff_options *o, { struct moved_block *pmb = NULL; /* potentially moved blocks */ int pmb_nr = 0, pmb_alloc = 0; - int n, flipped_block = 1, block_length = 0; + int n, flipped_block = 0, block_length = 0; for (n = 0; n < o->emitted_symbols->nr; n++) { struct hashmap *hm = NULL; struct moved_entry *key; struct moved_entry *match = NULL; struct emitted_diff_symbol *l = >emitted_symbols->buf[n]; + enum diff_symbol last_symbol = 0; switch (l->s) { case DIFF_SYMBOL_PLUS: @@ -1090,7 +1095,7 @@ static void mark_color_as_moved(struct diff_options *o, free(key); break; default: - flipped_block = 1; + flipped_block = 0; } if (!match) { @@ -1101,10 +1106,13 @@ static void mark_color_as_moved(struct diff_options *o, moved_block_clear([i]); pmb_nr = 0; block_length = 0; + flipped_block = 0; + last_symbol = l->s; continue; } if (o->color_moved == COLOR_MOVED_PLAIN) { + last_symbol = l->s; l->flags |= DIFF_SYMBOL_MOVED_LINE; continue; } @@ -1135,19 +1143,22 @@ static void mark_color_as_moved(struct diff_options *o, } } - flipped_block = (flipped_block + 1) % 2; + if (adjust_last_block(o, n, block_length) && + pmb_nr && last_symbol != l->s) + flipped_block = (flipped_block + 1) % 2; + else + flipped_block = 0; - adjust_last_block(o, n, block_length); block_length = 0; } if (pmb_nr) { block_length++; - l->flags |= DIFF_SYMBOL_MOVED_LINE; if (flipped_block && o->color_moved != COLOR_MOVED_BLOCKS) l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT; } + last_symbol = l->s; } adjust_last_block(o, n, block_length); diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh index
[PATCH v1 7/9] diff --color-moved-ws: optimize allow-indentation-change
From: Phillip Wood When running git diff --color-moved-ws=allow-indentation-change v2.18.0 v2.19.0 cmp_in_block_with_wsd() is called 694908327 times. Of those 42.7% return after comparing a and b. By comparing the lengths first we can return early in all but 0.03% of those cases without dereferencing the string pointers. The comparison between a and c fails in 6.8% of calls, by comparing the lengths first we reject all the failing calls without dereferencing the string pointers. This reduces the time to run the command above by by 42% from 14.6s to 8.5s. This is still much slower than the normal --color-moved which takes ~0.6-0.7s to run but is a significant improvement. The next commits will replace the current implementation with one that works with mixed tabs and spaces in the indentation. I think it is worth optimizing the current implementation first to enable a fair comparison between the two implementations. Signed-off-by: Phillip Wood --- diff.c | 19 +++ 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/diff.c b/diff.c index 8c08dd68df..c378ce3daf 100644 --- a/diff.c +++ b/diff.c @@ -829,20 +829,23 @@ static int cmp_in_block_with_wsd(const struct diff_options *o, int n) { struct emitted_diff_symbol *l = >emitted_symbols->buf[n]; - int al = cur->es->len, cl = l->len; + int al = cur->es->len, bl = match->es->len, cl = l->len; const char *a = cur->es->line, *b = match->es->line, *c = l->line; - + const char *orig_a = a; int wslen; /* -* We need to check if 'cur' is equal to 'match'. -* As those are from the same (+/-) side, we do not need to adjust for -* indent changes. However these were found using fuzzy matching -* so we do have to check if they are equal. +* We need to check if 'cur' is equal to 'match'. As those +* are from the same (+/-) side, we do not need to adjust for +* indent changes. However these were found using fuzzy +* matching so we do have to check if they are equal. Here we +* just check the lengths. We delay calling memcmp() to check +* the contents until later as if the length comparison for a +* and c fails we can avoid the call all together. */ - if (strcmp(a, b)) + if (al != bl) return 1; if (!pmb->wsd.string) @@ -870,7 +873,7 @@ static int cmp_in_block_with_wsd(const struct diff_options *o, al -= wslen; } - if (al != cl || memcmp(a, c, al)) + if (al != cl || memcmp(orig_a, b, bl) || memcmp(a, c, al)) return 1; return 0; -- 2.19.1
Re: [PATCH v4 0/1] Advertise multiple supported proto versions
Josh Steadmon writes: >> This one unfortunately seems to interact rather badly with your >> js/remote-archive-v2 topic which is already in 'next', when this >> topic is merged to 'pu', and my attempt to mechanically resolve >> conflicts breaks t5000. > > Hmm, should we drop js/remote-archive-v2 then? Any solution that > unblocks js/remote-archive-v2 will almost certainly depend on this > patch. That one is already in 'next' for quite some time. If you are retrating it, that's OK. Let me revert the topic out of 'next' and discard it, and then queue this one in 'pu'. Thanks.
Re: [RFC/PATCH 0/5] stop installing old libexec aliases like "git-init"
On Fri, Nov 02 2018, Ævar Arnfjörð Bjarmason wrote: > I think up to patch 4 here should be near a state that's ready for > inclusion. > > Although I'm on the fence with the approach in 1/5. Should this be a > giant getopt switch statement like that in a helper script? An > alternative would be to write out a shell file similar to > GIT-BUILD-OPTIONS and source that from this thing. I don't know, what > do you all think? > > The idea with 4/5 was to make this symlink mode the default in > config.mak.uname and have a blacklist of systems like Windows that > couldn't deal with it. > > Since my ad874608d8 ("Makefile: optionally symlink libexec/git-core > binaries to bin/git", 2018-03-13) I see that e.g. Debian and GitLab > have started shipping with the INSTALL_SYMLINKS flag, so making that > unconditional is the next logical step. > > The 5th one is more radical. See > https://public-inbox.org/git/87woyfdkoi@evledraar.gmail.com/ from > back in March for context. > > I'd like to say it's ready, but I've spotted some fallout: > > * Help like "git ninit" suggesting "git init" doesn't work, this is >because load_command_list() in help.c doesn't look out our >in-memory idea of builtins, it reads the libexecdir, so if we don't >have the programs there it doesn't know about it. A follow-up on this: We should really fix this for other reasons. I.e. compile in some "this is stuff we ourselves think is in git". There's other manifestations of this, e.g.: git-sizer --help # => shows you help git sizer --help # => says it doesn't have a manpage Because we aren't aware that git-sizer is some external tool, and that we should route --help to it. Non-withstanding the arguable bug that things like git-sizer shouldn't be allowing themselves to be invoked by "git" like that without guaranteeing that it can consume all the options 'git' expects. When I had to deal with a similar problem in an external git-* command I was maintaining I simply made it an error to invoke it as "git mything" instead of "git-mything". > * GIT_TEST_INSTALLED breaks entirely under this, as early as the >heuristic for "are we built?" being "do we have git-init in >libexecdir?". I tried a bit to make this work, but there's a lot of >dependencies there. > > * We still (and this is also true of my ad874608d8) hardlink >everything in the build dir via a different part of the Makefile, >ideally we should do exactly the same thing there so also normal >tests and not just GIT_TEST_INSTALLED (if that worked) would test >in the same mode. > >I gave making that work a bit of a try and gave up in the Makefile >jungle. > > Ævar Arnfjörð Bjarmason (5): > Makefile: move long inline shell loops in "install" into helper > Makefile: conform some of the code to our coding standards > Makefile: stop hiding failures during "install" > Makefile: add NO_INSTALL_SYMLINKS_FALLBACK switch > Makefile: Add a NO_INSTALL_BUILTIN_EXECDIR_ALIASES flag > > Makefile | 65 +++-- > install_programs | 124 +++ > 2 files changed, 151 insertions(+), 38 deletions(-) > create mode 100755 install_programs
Re: [RFC/PATCH 1/5] Makefile: move long inline shell loops in "install" into helper
On Mon, Nov 12 2018, Johannes Schindelin wrote: > Hi Ævar, > > On Mon, 12 Nov 2018, Ævar Arnfjörð Bjarmason wrote: > >> On Mon, Nov 12 2018, Johannes Schindelin wrote: >> >> > On Fri, 2 Nov 2018, Ævar Arnfjörð Bjarmason wrote: >> > >> >> Move a 37 line for-loop mess out of "install" and into a helper >> >> script. This started out fairly innocent but over the years has grown >> >> into a hard-to-maintain monster, and my recent ad874608d8 ("Makefile: >> >> optionally symlink libexec/git-core binaries to bin/git", 2018-03-13) >> >> certainly didn't help. >> >> >> >> The shell code is ported pretty much as-is (with getopts added), it'll >> >> be fixed & prettified in subsequent commits. >> >> >> >> Signed-off-by: Ævar Arnfjörð Bjarmason >> >> --- >> >> Makefile | 52 >> >> install_programs | 89 >> >> 2 files changed, 103 insertions(+), 38 deletions(-) >> >> create mode 100755 install_programs >> >> >> >> diff --git a/Makefile b/Makefile >> >> index bbfbb4292d..aa6ca1fa68 100644 >> >> --- a/Makefile >> >> +++ b/Makefile >> >> @@ -2808,44 +2808,20 @@ endif >> >> bindir=$$(cd '$(DESTDIR_SQ)$(bindir_SQ)' && pwd) && \ >> >> execdir=$$(cd '$(DESTDIR_SQ)$(gitexec_instdir_SQ)' && pwd) && \ >> >> destdir_from_execdir_SQ=$$(echo '$(gitexecdir_relative_SQ)' | sed -e >> >> 's|[^/][^/]*|..|g') && \ >> >> - { test "$$bindir/" = "$$execdir/" || \ >> >> - for p in git$X $(filter $(install_bindir_programs),$(ALL_PROGRAMS)); >> >> do \ >> >> - $(RM) "$$execdir/$$p" && \ >> >> - test -n "$(INSTALL_SYMLINKS)" && \ >> >> - ln -s "$$destdir_from_execdir_SQ/$(bindir_relative_SQ)/$$p" >> >> "$$execdir/$$p" || \ >> >> - { test -z >> >> "$(NO_INSTALL_HARDLINKS)$(NO_CROSS_DIRECTORY_HARDLINKS)" && \ >> >> - ln "$$bindir/$$p" "$$execdir/$$p" 2>/dev/null || \ >> >> - cp "$$bindir/$$p" "$$execdir/$$p" || exit; } \ >> >> - done; \ >> >> - } && \ >> >> - for p in $(filter $(install_bindir_programs),$(BUILT_INS)); do \ >> >> - $(RM) "$$bindir/$$p" && \ >> >> - test -n "$(INSTALL_SYMLINKS)" && \ >> >> - ln -s "git$X" "$$bindir/$$p" || \ >> >> - { test -z "$(NO_INSTALL_HARDLINKS)" && \ >> >> - ln "$$bindir/git$X" "$$bindir/$$p" 2>/dev/null || \ >> >> - ln -s "git$X" "$$bindir/$$p" 2>/dev/null || \ >> >> - cp "$$bindir/git$X" "$$bindir/$$p" || exit; } \ >> >> - done && \ >> >> - for p in $(BUILT_INS); do \ >> >> - $(RM) "$$execdir/$$p" && \ >> >> - test -n "$(INSTALL_SYMLINKS)" && \ >> >> - ln -s "$$destdir_from_execdir_SQ/$(bindir_relative_SQ)/git$X" >> >> "$$execdir/$$p" || \ >> >> - { test -z "$(NO_INSTALL_HARDLINKS)" && \ >> >> - ln "$$execdir/git$X" "$$execdir/$$p" 2>/dev/null || \ >> >> - ln -s "git$X" "$$execdir/$$p" 2>/dev/null || \ >> >> - cp "$$execdir/git$X" "$$execdir/$$p" || exit; } \ >> >> - done && \ >> >> - remote_curl_aliases="$(REMOTE_CURL_ALIASES)" && \ >> >> - for p in $$remote_curl_aliases; do \ >> >> - $(RM) "$$execdir/$$p" && \ >> >> - test -n "$(INSTALL_SYMLINKS)" && \ >> >> - ln -s "git-remote-http$X" "$$execdir/$$p" || \ >> >> - { test -z "$(NO_INSTALL_HARDLINKS)" && \ >> >> - ln "$$execdir/git-remote-http$X" "$$execdir/$$p" 2>/dev/null >> >> || \ >> >> - ln -s "git-remote-http$X" "$$execdir/$$p" 2>/dev/null || \ >> >> - cp "$$execdir/git-remote-http$X" "$$execdir/$$p" || exit; } \ >> >> - done && \ >> > >> > This indeed looks like a mess... >> > >> >> + ./install_programs \ >> >> + --X="$$X" \ >> >> + --RM="$(RM)" \ >> >> + --bindir="$$bindir" \ >> >> + --bindir-relative="$(bindir_relative_SQ)" \ >> >> + --execdir="$$execdir" \ >> >> + --destdir-from-execdir="$$destdir_from_execdir_SQ" \ >> >> + --flag-install-symlinks="$(INSTALL_SYMLINKS)" \ >> >> + --flag-no-install-hardlinks="$(NO_INSTALL_HARDLINKS)" \ >> >> + >> >> --flag-no-cross-directory-hardlinks="$(NO_CROSS_DIRECTORY_HARDLINKS)" \ >> >> + --list-bindir-standalone="git$X $(filter >> >> $(install_bindir_programs),$(ALL_PROGRAMS))" \ >> >> + --list-bindir-git-dashed="$(filter >> >> $(install_bindir_programs),$(BUILT_INS))" \ >> >> + --list-execdir-git-dashed="$(BUILT_INS)" \ >> >> + --list-execdir-curl-aliases="$(REMOTE_CURL_ALIASES)" && \ >> >> ./check_bindir "z$$bindir" "z$$execdir" "$$bindir/git-add$X" >> >> >> >> .PHONY: install-gitweb install-doc install-man install-man-perl >> >> install-html install-info install-pdf >> >> diff --git a/install_programs b/install_programs >> >> new file mode 100755 >> >> index 00..e287108112 >> >> --- /dev/null >> >> +++ b/install_programs >> >> @@ -0,0 +1,89 @@ >> >> +#!/bin/sh >> >> + >> >> +while test $# != 0 >> >> +do >> >> + case "$1" in >> >> + --X=*) >> >> +
Re: [PATCH v2 1/2] [Outreachy] t3903-stash: test without configured user.name and user.email
Slavica Djukic writes: >>> + git var GIT_COMMITTER_IDENT >actual && >>> + test_cmp expected actual && >> I am not sure what you are testing with this step. There is nothing >> that changed environment variables or configuration since we ran >> "git var" above. Why does this test suspect that somebody in the >> future may break the expectation that after running 'git add' and/or >> 'git stash', our committer identity may have been changed, and how >> would such a breakage happen? > In previous re-roll, you suggested that test should be improved so > that when > reasonable identity is present, git stash executes under that > identity, and not > under the fallback one. Yes, but for that you'd need to be checking the resulting commit object that represents the stash entry. It should be created under the substitute identity. > Here I'm just making sure that after calling > git stash, > we still have same reasonable identity present. I do not think such a test would detect it, even when "git stash" incorrectly used the fallback identity to create the stash entry.
Re: [PATCH] ref-filter: don't look for objects when outside of a repository
Jeff King writes: > If you are arguing that even in a repo we should reject "authorname" > early (just as we would outside of a repo), I could buy that. Yup, that (and replace 'authorname' with anything that won't work with missing objects) for consistency was what I meant.
Re: [PATCH v2 1/2] rebase doc: document rebase.useBuiltin
Johannes Schindelin writes: > Meaning: essentially, `rebase.useBuiltin` was defaulting to `false`, and > if a user installed Git for Windows with the experimental built-in rebase, > it was set to `true` in the system config. Oh, that changes the picture entirely. If that was what was shipped to Windows users for 2.19.X, then I'd say we should be able to trust the switch well enough. I just somehow thought that everybody in the Windows land has been using the -in-C version. >> Having said that, assuming that the switching back to scripted >> version works correctly and assuming that we want to expose this to >> end users, I think the patch text makes sense. > > Indeed. > > I would still love to see the built-in rebase to be used by default in > v2.20.0, and I am reasonably sure that the escape hatch works (because, as > I told you above, it worked in the reverse, making the built-in rebase an > opt-in in Git for Windows v2.19.1). Good. That makes things a lot simpler. Thanks.
[PATCH] bundle: dup() output descriptor closer to point-of-use
On Thu, Nov 15, 2018 at 09:01:07PM +0100, Johannes Schindelin wrote: > > It seems like we should be checking that the stale lockfile isn't left, > > which is the real problem (the warning is annoying, but is a symptom of > > the same thing). I.e., something like: > > > > test_must_fail git bundle create foobar.bundle master..master && > > test_path_is_missing foobar.bundle.lock > > > > That would already pass on non-Windows platforms, but that's OK. It will > > never give a false failure. > > > > If you don't mind, can you confirm that the test above fails without > > either of the two patches under discussion? > > This test succeeds with your patch as well as with Gaël's, and fails when > neither patch is applied. So you're right, it is the much better test. Thanks for checking! > > > Do you want to integrate this test into your patch and run with it, or > > > do you want me to shepherd your patch? > > > > I'll wrap it up with a commit message and a test. Actually, I realized there's an even simpler way to do this. Here it is. -- >8 -- Subject: [PATCH] bundle: dup() output descriptor closer to point-of-use When writing a bundle to a file, the bundle code actually creates "your.bundle.lock" using our lockfile interface. We feed that output descriptor to a child git-pack-objects via run-command, which has the quirk that it closes the output descriptor in the parent. To avoid confusing the lockfile code (which still thinks the descriptor is valid), we dup() it, and operate on the duplicate. However, this has a confusing side effect: after the dup() but before we call pack-objects, we have _two_ descriptors open to the lockfile. If we call die() during that time, the lockfile code will try to clean up the partially-written file. It knows to close() the file before unlinking, since on some platforms (i.e., Windows) the open file would block the deletion. But it doesn't know about the duplicate descriptor. On Windows, triggering an error at the right part of the code will result in the cleanup failing and the lockfile being left in the filesystem. We can solve this by moving the dup() much closer to start_command(), shrinking the window in which we have the second descriptor open. It's easy to place this in such a way that no die() is possible. We could still die due to a signal in the exact wrong moment, but we already tolerate races there (e.g., a signal could come before we manage to put the file on the cleanup list in the first place). As a bonus, this shields create_bundle() itself from the duplicate-fd trick, and we can simplify its error handling (note that the lock rollback now happens unconditionally, but that's OK; it's a noop if we didn't open the lock in the first place). The included test uses an empty bundle to cause a failure at the right spot in the code, because that's easy to trigger (the other likely errors are write() problems like ENOSPC). Note that it would already pass on non-Windows systems (because they are happy to unlink an already-open file). Based-on-a-patch-by: Gaël Lhez Signed-off-by: Jeff King --- bundle.c| 39 ++- t/t5607-clone-bundle.sh | 6 ++ 2 files changed, 24 insertions(+), 21 deletions(-) diff --git a/bundle.c b/bundle.c index 1ef584b93b..6b0f6d8f10 100644 --- a/bundle.c +++ b/bundle.c @@ -243,7 +243,7 @@ static int is_tag_in_date_range(struct object *tag, struct rev_info *revs) } -/* Write the pack data to bundle_fd, then close it if it is > 1. */ +/* Write the pack data to bundle_fd */ static int write_pack_data(int bundle_fd, struct rev_info *revs) { struct child_process pack_objects = CHILD_PROCESS_INIT; @@ -256,6 +256,20 @@ static int write_pack_data(int bundle_fd, struct rev_info *revs) pack_objects.in = -1; pack_objects.out = bundle_fd; pack_objects.git_cmd = 1; + + /* +* start_command() will close our descriptor if it's >1. Duplicate it +* to avoid surprising the caller. +*/ + if (pack_objects.out > 1) { + pack_objects.out = dup(pack_objects.out); + if (pack_objects.out < 0) { + error_errno(_("unable to dup bundle descriptor")); + child_process_clear(_objects); + return -1; + } + } + if (start_command(_objects)) return error(_("Could not spawn pack-objects")); @@ -421,21 +435,10 @@ int create_bundle(struct bundle_header *header, const char *path, bundle_to_stdout = !strcmp(path, "-"); if (bundle_to_stdout) bundle_fd = 1; - else { + else bundle_fd = hold_lock_file_for_update(, path, LOCK_DIE_ON_ERROR); - /* -* write_pack_data() will close the fd passed to it, -* but commit_lock_file() will also try to close
Re: [PATCH v2 1/2] rebase doc: document rebase.useBuiltin
Hi Junio, On Fri, 16 Nov 2018, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason writes: > > > The rebase.useBuiltin variable introduced in 55071ea248 ("rebase: > > start implementing it as a builtin", 2018-08-07) was turned on by > > default in 5541bd5b8f ("rebase: default to using the builtin rebase", > > 2018-08-08), but had no documentation. > > I actually thought that everybody understood that this was merely an > aid to be used during the development of the feature and never meant > to be given to the end users. It may have been from git.git's point of view, but from Git for Windows' point of view it was always meant to be a real feature flag. > With my devil's advocate hat on, how much do we trust it as an > escape hatch? As you know, only a fraction of the bug reports about the built-in rebase came in from Git for Windows: the autostash-with-submodules bug and the perf-regression one. By my counting that is 2 out of 5 bugs coming in via that route. One of the reasons for that was that the built-in rebase that was shipped in Git for Windows v2.19.1 was marked as experimental. And the way I could mark it experimental was by flipping the default to executing the scripted version: https://github.com/git-for-windows/git/commit/cff1a96cfe (you will note that I added the same escape hatch for `git stash` by adding back `git-stash.sh` as `git-legacy-stash.sh` and imitating the same dance as for built-in `rebase`, and I also added back the scripted `git-rebase--interactive.sh` for use by `git-legacy-rebase.sh`). Meaning: essentially, `rebase.useBuiltin` was defaulting to `false`, and if a user installed Git for Windows with the experimental built-in rebase, it was set to `true` in the system config. There was not a single complaint about the scripted `git rebase` being broken in any way. So we *do* have some real-world testing of that feature. (Obviously I have no numbers about Git for Windows' usage, apart from download numbers, and they do not say how many users opted in and how many did not, but Git for Windows v2.19.1 was downloaded more than 2.7 million times so far and I think it is safe to assume that some percentage tested that feature.) > After all, the codepath to hide the "rebase in C" implementation and use > the scripted version were never in 'master' (or 'next' for that matter) > with this variable turned off, so I am reasonably sure it had no serious > exposure to real world usage. See above for a counter-argument. > Having said that, assuming that the switching back to scripted > version works correctly and assuming that we want to expose this to > end users, I think the patch text makes sense. Indeed. I would still love to see the built-in rebase to be used by default in v2.20.0, and I am reasonably sure that the escape hatch works (because, as I told you above, it worked in the reverse, making the built-in rebase an opt-in in Git for Windows v2.19.1). Ciao, Dscho
Re: [PATCH] ref-filter: don't look for objects when outside of a repository
On Fri, Nov 16, 2018 at 02:09:07PM +0900, Junio C Hamano wrote: > >> I see problems in both directions: > >> > >> - sorting by "objectname" works now, but it's marked with SOURCE_OBJ, > >>and would be forbidden with your patch. I'm actually not sure if > >>SOURCE_OBJ is accurate; we shouldn't need to access the object to > >>show it (and we are probably wasting effort loading the full contents > >>for tools like for-each-ref). > >> > >>However, that's not the full story. For objectname:short, it _does_ call > >>find_unique_abbrev(). So we expect to have an object directory. > > > > Oops, I'm apparently bad at reading. It is in fact SOURCE_OTHER, which > > makes sense (outside of this whole "--sort outside a repo thing"). > > > > But we'd ideally distinguish between "objectname" (which should be OK > > outside a repo) and "objectname:short" (which currently segfaults). > > Arguably, use of ref-filter machinery in ls-remote, whether it is > given from inside or outside a repo, was a mistake in 1fb20dfd > ("ls-remote: create '--sort' option", 2018-04-09), as the whole > point of "ls-remote" is to peek the list of refs and it is perfectly > normal that the objects listed are not available. I think it's conceptually reasonable to use the ref-filter machinery. It's just that it was underprepared to handle this out-of-repo case. I think we're not too far off, though. > "ls-remote --sort=authorname" that is run in a repository may not > segfault on a ref that points at a yet-to-be-fetched commit, but it > cannot be doing anything sensible. Is it still better to silently > produce a nonsense result than refusing to --sort no matter what the > sort keys are, whether we are inside or outside a repository? I don't think we produce silent nonsense in the current code (or after any of the discussed solutions), either in a repo or out. We say "fatal: missing object ..." inside a repo if the request cannot be fulfilled. That's not incredibly illuminating, perhaps, but it means we fulfill whatever we _can_ on behalf of the user's request, and bail otherwise. If you are arguing that even in a repo we should reject "authorname" early (just as we would outside of a repo), I could buy that. Technically we can make it work sometimes (if we happen to have fetched everything the other side has), but behaving consistently (and with a decent error message) may trump that. -Peff
[PATCH 3/3] remote-curl: die on server-side errors
From: Josh Steadmon When a smart HTTP server sends an error message via pkt-line, remote-curl will fail to detect the error (which usually results in incorrectly falling back to dumb-HTTP mode). This patch adds a check in check_smart_http() for server-side error messages, as well as a test case for this issue. Signed-off-by: Josh Steadmon Signed-off-by: Jeff King --- remote-curl.c | 3 +++ t/lib-httpd.sh | 1 + t/lib-httpd/apache.conf | 4 t/lib-httpd/error-smart-http.sh | 3 +++ t/t5551-http-fetch-smart.sh | 5 + 5 files changed, 16 insertions(+) create mode 100644 t/lib-httpd/error-smart-http.sh diff --git a/remote-curl.c b/remote-curl.c index 3c9c4a07c3..b1309f2bdc 100644 --- a/remote-curl.c +++ b/remote-curl.c @@ -382,6 +382,9 @@ static void check_smart_http(struct discovery *d, const char *service, */ d->proto_git = 1; + } else if (skip_prefix(line, "ERR ", )) { + die(_("remote error: %s"), p); + } else { die("invalid server response; got '%s'", line); } diff --git a/t/lib-httpd.sh b/t/lib-httpd.sh index a8729f8232..4e946623bb 100644 --- a/t/lib-httpd.sh +++ b/t/lib-httpd.sh @@ -131,6 +131,7 @@ prepare_httpd() { mkdir -p "$HTTPD_DOCUMENT_ROOT_PATH" cp "$TEST_PATH"/passwd "$HTTPD_ROOT_PATH" install_script broken-smart-http.sh + install_script error-smart-http.sh install_script error.sh install_script apply-one-time-sed.sh diff --git a/t/lib-httpd/apache.conf b/t/lib-httpd/apache.conf index 581c010d8f..6de2bc775c 100644 --- a/t/lib-httpd/apache.conf +++ b/t/lib-httpd/apache.conf @@ -117,6 +117,7 @@ Alias /auth/dumb/ www/auth/dumb/ ScriptAliasMatch /smart_*[^/]*/(.*) ${GIT_EXEC_PATH}/git-http-backend/$1 ScriptAlias /broken_smart/ broken-smart-http.sh/ +ScriptAlias /error_smart/ error-smart-http.sh/ ScriptAlias /error/ error.sh/ ScriptAliasMatch /one_time_sed/(.*) apply-one-time-sed.sh/$1 @@ -125,6 +126,9 @@ ScriptAliasMatch /one_time_sed/(.*) apply-one-time-sed.sh/$1 Options ExecCGI + + Options ExecCGI + Options ExecCGI diff --git a/t/lib-httpd/error-smart-http.sh b/t/lib-httpd/error-smart-http.sh new file mode 100644 index 00..e65d447fc4 --- /dev/null +++ b/t/lib-httpd/error-smart-http.sh @@ -0,0 +1,3 @@ +echo "Content-Type: application/x-git-upload-pack-advertisement" +echo +printf "%s" "0019ERR server-side error" diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fetch-smart.sh index 8630b0cc39..ba83e567e5 100755 --- a/t/t5551-http-fetch-smart.sh +++ b/t/t5551-http-fetch-smart.sh @@ -429,5 +429,10 @@ test_expect_success 'GIT_TRACE_CURL_NO_DATA prevents data from being traced' ' ! grep "=> Send data" err ' +test_expect_success 'server-side error detected' ' + test_must_fail git clone $HTTPD_URL/error_smart/repo.git 2>actual && + grep "server-side error" actual +' + stop_httpd test_done -- 2.19.1.1636.gc7a073d580
Re: [PATCH v3 00/11] fast export and import fixes and features
On Thu, Nov 15, 2018 at 11:59:45PM -0800, Elijah Newren wrote: > This is a series of small fixes and features for fast-export and > fast-import, mostly on the fast-export side. > > Changes since v2 (full range-diff below): > * Dropped the final patch; going to try to use Peff's suggestion of > rev-list and diff-tree to get what I need instead > * Inserted a new patch at the beginning to convert pre-existing sha1 > stuff to oid (rename sha1_to_hex() -> oid_to_hex(), rename > anonymize_sha1() to anonymize_oid(), etc.) > * Modified other patches in the series to add calls to oid_to_hex() rather > than sha1_to_hex() Thanks, these changes all look good to me. I have no more nits to pick. :) -Peff
[PATCH 2/3] remote-curl: tighten "version 2" check for smart-http
In a v2 smart-http conversation, the server should reply to our initial request with a pkt-line saying "version 2" (this is the start of the "capabilities advertisement"). We check for the string using starts_with(), but that's overly permissive (it would match "version 20", for example). Let's tighten this check to use strcmp(). Note that we don't need to worry about a trailing newline here, because the ptk-line code will have chomped it for us already. Signed-off-by: Jeff King --- remote-curl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/remote-curl.c b/remote-curl.c index dd9bc41aa1..3c9c4a07c3 100644 --- a/remote-curl.c +++ b/remote-curl.c @@ -375,7 +375,7 @@ static void check_smart_http(struct discovery *d, const char *service, d->len = src_len; d->proto_git = 1; - } else if (starts_with(line, "version 2")) { + } else if (!strcmp(line, "version 2")) { /* * v2 smart http; do not consume version packet, which will * be handled elsewhere. -- 2.19.1.1636.gc7a073d580
[PATCH 1/3] remote-curl: refactor smart-http discovery
After making initial contact with an http server, we have to decide if the server supports smart-http, and if so, which version. Our rules are a bit inconsistent: 1. For v0, we require that the content-type indicates a smart-http response. We also require the response to look vaguely like a pkt-line starting with "#". If one of those does not match, we fall back to dumb-http. But according to our http protocol spec[1]: Dumb servers MUST NOT return a return type starting with `application/x-git-`. If we see the expected content-type, we should consider it smart-http. At that point we can parse the pkt-line for real, and complain if it is not syntactically valid. 2. For v2, we do not actually check the content-type. Our v2 protocol spec says[2]: When using the http:// or https:// transport a client makes a "smart" info/refs request as described in `http-protocol.txt`[...] and the http spec is clear that for a smart-http[3]: The Content-Type MUST be `application/x-$servicename-advertisement`. So it is required according to the spec. These inconsistencies were easy to miss because of the way the original code was written as an inline conditional. Let's pull it out into its own function for readability, and improve a few things: - we now predicate the smart/dumb decision entirely on the presence of the correct content-type - we do a real pkt-line parse before deciding how to proceed (and die if it isn't valid) - use skip_prefix() for comparing service strings, instead of constructing expected output in a strbuf; this avoids dealing with memory cleanup Note that this _is_ tightening what the client will allow. It's all according to the spec, but it's possible that other implementations might violate these. However, violating these particular rules seems like an odd choice for a server to make. [1] Documentation/technical/http-protocol.txt, l. 166-167 [2] Documentation/technical/protocol-v2.txt, l. 63-64 [3] Documentation/technical/http-protocol.txt, l. 247 Helped-by: Josh Steadmon Signed-off-by: Jeff King --- remote-curl.c | 93 --- 1 file changed, 59 insertions(+), 34 deletions(-) diff --git a/remote-curl.c b/remote-curl.c index 762a55a75f..dd9bc41aa1 100644 --- a/remote-curl.c +++ b/remote-curl.c @@ -330,9 +330,65 @@ static int get_protocol_http_header(enum protocol_version version, return 0; } +static void check_smart_http(struct discovery *d, const char *service, +struct strbuf *type) +{ + char *src_buf; + size_t src_len; + char *line; + const char *p; + + /* +* If we don't see x-$service-advertisement, then it's not smart-http. +* But once we do, we commit to it and assume any other protocol +* violations are hard errors. +*/ + if (!skip_prefix(type->buf, "application/x-", ) || + !skip_prefix(p, service, ) || + strcmp(p, "-advertisement")) + return; + + /* +* "Peek" at the first packet by using a separate buf/len pair; some +* cases below require us leaving the originals intact. +*/ + src_buf = d->buf; + src_len = d->len; + line = packet_read_line_buf(_buf, _len, NULL); + if (!line) + die("invalid server response; expected service, got flush packet"); + + if (skip_prefix(line, "# service=", ) && !strcmp(p, service)) { + /* +* The header can include additional metadata lines, up +* until a packet flush marker. Ignore these now, but +* in the future we might start to scan them. +*/ + while (packet_read_line_buf(_buf, _len, NULL)) + ; + + /* +* v0 smart http; callers expect us to soak up the +* service and header packets +*/ + d->buf = src_buf; + d->len = src_len; + d->proto_git = 1; + + } else if (starts_with(line, "version 2")) { + /* +* v2 smart http; do not consume version packet, which will +* be handled elsewhere. +*/ + d->proto_git = 1; + + } else { + die("invalid server response; got '%s'", line); + } +} + static struct discovery *discover_refs(const char *service, int for_push) { - struct strbuf exp = STRBUF_INIT; struct strbuf type = STRBUF_INIT; struct strbuf charset = STRBUF_INIT; struct strbuf buffer = STRBUF_INIT; @@ -405,38 +461,8 @@ static struct discovery *discover_refs(const char *service, int for_push) last->buf_alloc = strbuf_detach(, >len); last->buf = last->buf_alloc; - strbuf_addf(, "application/x-%s-advertisement", service); -
[PATCH 0/3] remote-curl smart-http discovery cleanup
On Thu, Nov 15, 2018 at 01:51:52PM -0800, Josh Steadmon wrote: > > This patch tightens both of those (I also made a few stylistic tweaks, > > and added the ERR condition to show where it would go). I dunno. Part of > > me sees this as a nice cleanup, but maybe it is better to just leave it > > alone. A lot of these behaviors are just how it happens to work now, and > > not part of the spec, but we don't know what might be relying on them. > > At least according to the protocol-v2 and http-protocol docs, the > stricter behavior seems correct: > > For the first point above, dumb servers should never use an > "application/x-git-*" content type (http-protocol.txt line 163-167). > > For the second point, the docs require v2 servers to use > "application/x-git-*" content types. protocol-v2.txt lines 63-65 state > that v2 clients should make a smart http request, while > http-protocol.txt lines 247-252 state that a smart server's response > type must be "application/x-git-*". Thanks for digging into the spec. I agree that it's pretty clear that the appropriate content-type is expected. > Of course we don't know if other implementations follow the spec, but > ISTM that this patch at least doesn't contradict how we've promised the > protocols should work. These seem like pretty unlikely ways for a buggy server to behave, so I think it's a reasonable risk. I also checked GitHub's implementation (which recently learned to speak v2) and made sure that it works. :) I didn't check JGit, but given the provenance, I assume it's fine. Amusingly, this does break the test you just added, because it tries to issue an ERR after claiming "text/html" (and after my patch, we correctly fall back to dumb-http). > If no one has any objections, I'll include the diff below in v2. Thanks > for the help Jeff! I think it makes sense to do the refactoring first as a separate step. And of course it needs a commit message. So how about this series (your original is rebased on top)? [1/3]: remote-curl: refactor smart-http discovery [2/3]: remote-curl: tighten "version 2" check for smart-http [3/3]: remote-curl: die on server-side errors remote-curl.c | 96 + t/lib-httpd.sh | 1 + t/lib-httpd/apache.conf | 4 ++ t/lib-httpd/error-smart-http.sh | 3 ++ t/t5551-http-fetch-smart.sh | 5 ++ 5 files changed, 75 insertions(+), 34 deletions(-) create mode 100644 t/lib-httpd/error-smart-http.sh -Peff
From:Miss:Fatima Yusuf.
From:Miss:Fatima Yusuf. For sure this mail would definitely come to you as a surprise, but do take your good time to go through it, My name is Ms.Fatima Yusuf,i am from Ivory Coast. I lost my parents a year and couple of months ago. My father was a serving director of the Agro-exporting board until his death. He was assassinated by his business partners.Before his death, he made a deposit of US$9.7 Million Dollars here in Cote d'ivoire which was for the purchase of cocoa processing machine and development of another factory before his untimely death. Being that this part of the world experiences political and crises time without number, there is no guarantee of lives and properties. I cannot invest this money here any long, despite the fact it had been my late father's industrial plans. I want you to do me a favor to receive this funds into your country or any safer place as the beneficiary, I have plans to invest this money in continuation with the investment vision of my late father, but not in this place again rather in your country. I have the vision of going into real estate and industrial production or any profitable business venture. I will be ready to compensate you with 20% of the total Amount, now all my hope is banked on you and i really wants to invest this money in your country, where there is stability of Government, political and economic welfare. My greatest worry now is how to move out of this country because my uncle is threatening to kill me as he killed my father,Please do not let anybody hear about this, it is between me and you alone because of my security reason. I am waiting to hear from you. Yours Sincerely, Miss.Fatima Yusuf.
Re: [PATCH v2 1/2] [Outreachy] t3903-stash: test without configured user.name and user.email
Hi Junio, On 16-Nov-18 6:55 AM, Junio C Hamano wrote: Slavica Djukic writes: +test_expect_failure 'stash works when user.name and user.email are not set' ' + git reset && + git var GIT_COMMITTER_IDENT >expected && All the other existing test pieces in this file calls the expected result "expect"; is there a reason why this patch needs to be different (e.g. 'expect' file left by the earlier step needs to be kept unmodified for later use, or something like that)? If not, please avoid making a difference in irrelevant details, as that would waste time of readers by forcing them to guess if there is such a reason that readers cannot immediately see. There is no specific reason for file to be "expected", I'll update that. Anyway, we grab the committer ident we use by default during the test with this command. OK. + >1 && + git add 1 && + git stash && And we make sure we can create stash. + git var GIT_COMMITTER_IDENT >actual && + test_cmp expected actual && I am not sure what you are testing with this step. There is nothing that changed environment variables or configuration since we ran "git var" above. Why does this test suspect that somebody in the future may break the expectation that after running 'git add' and/or 'git stash', our committer identity may have been changed, and how would such a breakage happen? In previous re-roll, you suggested that test should be improved so that when reasonable identity is present, git stash executes under that identity, and not under the fallback one. Here I'm just making sure that after calling git stash, we still have same reasonable identity present. + >2 && + git add 2 && + test_config user.useconfigonly true && + test_config stash.usebuiltin true && Now we start using use-config-only, so unsetting environment variables will cause trouble when Git insists on having an explicitly configured identities. Makes sense. + ( + sane_unset GIT_AUTHOR_NAME && + sane_unset GIT_AUTHOR_EMAIL && + sane_unset GIT_COMMITTER_NAME && + sane_unset GIT_COMMITTER_EMAIL && + test_unconfig user.email && + test_unconfig user.name && And then we try the same test, but without environment or config. Since we are unsetting the environment, in order to be nice for future test writers, we do this in a subshell, so that we do not have to restore the original values of environment variables. Don't we need to be nice the same way for configuration variables, though? We _know_ that nobody sets user.{email,name} config up to this point in the test sequence, so that is why we do not do a "save before test and then restore to the original" dance on them. Even though we are relying on the fact that these two variables are left unset in the configuration file, we unconfig them here anyway, and I do think it is a good idea for documentation purposes (i.e. we are not documenting what we assume the config before running this test would be; we are documenting what state we want these two variables are in when running this "git stash"---that is, they are both unset). So these later part of this test piece makes sense. I still do not know what you wanted to check in the earlier part of the test, though. + git stash + ) +' + test_done Thank you, Slavica
[PATCH v3 00/11] fast export and import fixes and features
This is a series of small fixes and features for fast-export and fast-import, mostly on the fast-export side. Changes since v2 (full range-diff below): * Dropped the final patch; going to try to use Peff's suggestion of rev-list and diff-tree to get what I need instead * Inserted a new patch at the beginning to convert pre-existing sha1 stuff to oid (rename sha1_to_hex() -> oid_to_hex(), rename anonymize_sha1() to anonymize_oid(), etc.) * Modified other patches in the series to add calls to oid_to_hex() rather than sha1_to_hex() Elijah Newren (11): fast-export: convert sha1 to oid git-fast-import.txt: fix documentation for --quiet option git-fast-export.txt: clarify misleading documentation about rev-list args fast-export: use value from correct enum fast-export: avoid dying when filtering by paths and old tags exist fast-export: move commit rewriting logic into a function for reuse fast-export: when using paths, avoid corrupt stream with non-existent mark fast-export: ensure we export requested refs fast-export: add --reference-excluded-parents option fast-import: remove unmaintained duplicate documentation fast-export: add a --show-original-ids option to show original names Documentation/git-fast-export.txt | 23 +++- Documentation/git-fast-import.txt | 23 +++- builtin/fast-export.c | 190 +- fast-import.c | 166 ++ t/t9350-fast-export.sh| 80 - 5 files changed, 268 insertions(+), 214 deletions(-) -: -- > 1: 4c3370c85f fast-export: convert sha1 to oid 1: 8870fb1340 = 2: 6ffa30e3c7 git-fast-import.txt: fix documentation for --quiet option 2: 16d1c3e22d = 3: 1e278f009a git-fast-export.txt: clarify misleading documentation about rev-list args 3: e19f6b36f9 = 4: 9d7b2aef49 fast-export: use value from correct enum 4: 2b305561d5 ! 5: b65a591d4d fast-export: avoid dying when filtering by paths and old tags exist @@ -29,7 +29,7 @@ - oid_to_hex(>object.oid)); + if (!p->parents) { + printf("reset %s\nfrom %s\n\n", -+ name, sha1_to_hex(null_sha1)); ++ name, oid_to_hex(_oid)); + free(buf); + return; + } 5: 607b1dc2b2 ! 6: dde52c9cb6 fast-export: move commit rewriting logic into a function for reuse @@ -47,7 +47,7 @@ - break; - if (!p->parents) { - printf("reset %s\nfrom %s\n\n", -- name, sha1_to_hex(null_sha1)); +- name, oid_to_hex(_oid)); - free(buf); - return; - } @@ -55,7 +55,7 @@ + p = rewrite_commit((struct commit *)tagged); + if (!p) { + printf("reset %s\nfrom %s\n\n", -+ name, sha1_to_hex(null_sha1)); ++ name, oid_to_hex(_oid)); + free(buf); + return; } 6: ec1862e858 ! 7: d9b2e326f0 fast-export: when using paths, avoid corrupt stream with non-existent mark @@ -35,7 +35,7 @@ + * it. + */ + printf("reset %s\nfrom %s\n\n", -+ name, sha1_to_hex(null_sha1)); ++ name, oid_to_hex(_oid)); + continue; + } printf("reset %s\nfrom :%d\n\n", name, 7: 9da26e3ccb ! 8: 9ddb155a70 fast-export: ensure we export requested refs @@ -97,7 +97,7 @@ case OBJ_TAG: handle_tag(name, (struct tag *)object); @@ - name, sha1_to_hex(null_sha1)); + name, oid_to_hex(_oid)); continue; } - printf("reset %s\nfrom :%d\n\n", name, @@ -114,7 +114,7 @@ + * like a ref deletion to me. + */ + printf("reset %s\nfrom %s\n\n", -+ name, sha1_to_hex(null_sha1)); ++ name, oid_to_hex(_oid)); + continue; + } + 8: 7e5fe2f02e ! 9: 595d2e5d30 fast-export: add
[PATCH v3 10/11] fast-import: remove unmaintained duplicate documentation
fast-import.c has started with a comment for nine and a half years re-directing the reader to Documentation/git-fast-import.txt for maintained documentation. Instead of leaving the unmaintained documentation in place, just excise it. Signed-off-by: Elijah Newren --- fast-import.c | 154 -- 1 file changed, 154 deletions(-) diff --git a/fast-import.c b/fast-import.c index 95600c78e0..555d49ad23 100644 --- a/fast-import.c +++ b/fast-import.c @@ -1,157 +1,3 @@ -/* -(See Documentation/git-fast-import.txt for maintained documentation.) -Format of STDIN stream: - - stream ::= cmd*; - - cmd ::= new_blob -| new_commit -| new_tag -| reset_branch -| checkpoint -| progress -; - - new_blob ::= 'blob' lf -mark? -file_content; - file_content ::= data; - - new_commit ::= 'commit' sp ref_str lf -mark? -('author' (sp name)? sp '<' email '>' sp when lf)? -'committer' (sp name)? sp '<' email '>' sp when lf -commit_msg -('from' sp commit-ish lf)? -('merge' sp commit-ish lf)* -(file_change | ls)* -lf?; - commit_msg ::= data; - - ls ::= 'ls' sp '"' quoted(path) '"' lf; - - file_change ::= file_clr -| file_del -| file_rnm -| file_cpy -| file_obm -| file_inm; - file_clr ::= 'deleteall' lf; - file_del ::= 'D' sp path_str lf; - file_rnm ::= 'R' sp path_str sp path_str lf; - file_cpy ::= 'C' sp path_str sp path_str lf; - file_obm ::= 'M' sp mode sp (hexsha1 | idnum) sp path_str lf; - file_inm ::= 'M' sp mode sp 'inline' sp path_str lf -data; - note_obm ::= 'N' sp (hexsha1 | idnum) sp commit-ish lf; - note_inm ::= 'N' sp 'inline' sp commit-ish lf -data; - - new_tag ::= 'tag' sp tag_str lf -'from' sp commit-ish lf -('tagger' (sp name)? sp '<' email '>' sp when lf)? -tag_msg; - tag_msg ::= data; - - reset_branch ::= 'reset' sp ref_str lf -('from' sp commit-ish lf)? -lf?; - - checkpoint ::= 'checkpoint' lf -lf?; - - progress ::= 'progress' sp not_lf* lf -lf?; - - # note: the first idnum in a stream should be 1 and subsequent - # idnums should not have gaps between values as this will cause - # the stream parser to reserve space for the gapped values. An - # idnum can be updated in the future to a new object by issuing - # a new mark directive with the old idnum. - # - mark ::= 'mark' sp idnum lf; - data ::= (delimited_data | exact_data) -lf?; - -# note: delim may be any string but must not contain lf. -# data_line may contain any data but must not be exactly -# delim. - delimited_data ::= 'data' sp '<<' delim lf -(data_line lf)* -delim lf; - - # note: declen indicates the length of binary_data in bytes. - # declen does not include the lf preceding the binary data. - # - exact_data ::= 'data' sp declen lf -binary_data; - - # note: quoted strings are C-style quoting supporting \c for - # common escapes of 'c' (e..g \n, \t, \\, \") or \nnn where nnn - # is the signed byte value in octal. Note that the only - # characters which must actually be escaped to protect the - # stream formatting is: \, " and LF. Otherwise these values - # are UTF8. - # - commit-ish ::= (ref_str | hexsha1 | sha1exp_str | idnum); - ref_str ::= ref; - sha1exp_str ::= sha1exp; - tag_str ::= tag; - path_str::= path| '"' quoted(path)'"' ; - mode::= '100644' | '644' -| '100755' | '755' -| '12' -; - - declen ::= # unsigned 32 bit value, ascii base10 notation; - bigint ::= # unsigned integer value, ascii base10 notation; - binary_data ::= # file content, not interpreted; - - when ::= raw_when | rfc2822_when; - raw_when ::= ts sp tz; - rfc2822_when ::= # Valid RFC 2822 date and time; - - sp ::= # ASCII space character; - lf ::= # ASCII newline (LF) character; - - # note: a colon (':') must precede the numerical value assigned to - # an idnum. This is to distinguish it from a ref or tag name as - # GIT does not permit ':' in ref or tag strings. - # - idnum ::= ':' bigint; - path::= # GIT style file path, e.g. "a/b/c"; - ref ::= # GIT ref name, e.g. "refs/heads/MOZ_GECKO_EXPERIMENT"; - tag ::= # GIT tag name, e.g. "FIREFOX_1_5"; - sha1exp ::= # Any valid GIT SHA1 expression; - hexsha1 ::= # SHA1 in hexadecimal format; - - # note: name and email are UTF8 strings, however name must not - # contain '<' or lf and email must not contain any of the - # following: '<', '>', lf. - # - name ::= # valid GIT author/committer name; - email ::= # valid GIT author/committer email; - ts::= # time since the epoch in seconds, ascii base10 notation; - tz::= # GIT style timezone; - - # note: comments, get-mark, ls-tree, and cat-blob requests may - # appear anywhere in the input, except within a data
[PATCH v3 06/11] fast-export: move commit rewriting logic into a function for reuse
Logic to replace a filtered commit with an unfiltered ancestor is useful elsewhere; put it into a function we can call. Signed-off-by: Elijah Newren --- builtin/fast-export.c | 37 ++--- 1 file changed, 22 insertions(+), 15 deletions(-) diff --git a/builtin/fast-export.c b/builtin/fast-export.c index 7d50f5414e..43e98a38a8 100644 --- a/builtin/fast-export.c +++ b/builtin/fast-export.c @@ -187,6 +187,22 @@ static int get_object_mark(struct object *object) return ptr_to_mark(decoration); } +static struct commit *rewrite_commit(struct commit *p) +{ + for (;;) { + if (p->parents && p->parents->next) + break; + if (p->object.flags & UNINTERESTING) + break; + if (!(p->object.flags & TREESAME)) + break; + if (!p->parents) + return NULL; + p = p->parents->item; + } + return p; +} + static void show_progress(void) { static int counter = 0; @@ -767,21 +783,12 @@ static void handle_tag(const char *name, struct tag *tag) oid_to_hex(>object.oid), type_name(tagged->type)); } - p = (struct commit *)tagged; - for (;;) { - if (p->parents && p->parents->next) - break; - if (p->object.flags & UNINTERESTING) - break; - if (!(p->object.flags & TREESAME)) - break; - if (!p->parents) { - printf("reset %s\nfrom %s\n\n", - name, oid_to_hex(_oid)); - free(buf); - return; - } - p = p->parents->item; + p = rewrite_commit((struct commit *)tagged); + if (!p) { + printf("reset %s\nfrom %s\n\n", + name, oid_to_hex(_oid)); + free(buf); + return; } tagged_mark = get_object_mark(>object); } -- 2.19.1.1063.g1796373474.dirty
[PATCH v3 09/11] fast-export: add --reference-excluded-parents option
git filter-branch has a nifty feature allowing you to rewrite, e.g. just the last 8 commits of a linear history git filter-branch $OPTIONS HEAD~8..HEAD If you try the same with git fast-export, you instead get a history of only 8 commits, with HEAD~7 being rewritten into a root commit. There are two alternatives: 1) Don't use the negative revision specification, and when you're filtering the output to make modifications to the last 8 commits, just be careful to not modify any earlier commits somehow. 2) First run 'git fast-export --export-marks=somefile HEAD~8', then run 'git fast-export --import-marks=somefile HEAD~8..HEAD'. Both are more error prone than I'd like (the first for obvious reasons; with the second option I have sometimes accidentally included too many revisions in the first command and then found that the corresponding extra revisions were not exported by the second command and thus were not modified as I expected). Also, both are poor from a performance perspective. Add a new --reference-excluded-parents option which will cause fast-export to refer to commits outside the specified rev-list-args range by their sha1sum. Such a stream will only be useful in a repository which already contains the necessary commits (much like the restriction imposed when using --no-data). Note from Peff: I think we might be able to do a little more optimization here. If we're exporting HEAD^..HEAD and there's an object in HEAD^ which is unchanged in HEAD, I think we'd still print it (because it would not be marked SHOWN), but we could omit it (by walking the tree of the boundary commits and marking them shown). I don't think it's a blocker for what you're doing here, but just a possible future optimization. Signed-off-by: Elijah Newren --- Documentation/git-fast-export.txt | 17 +++-- builtin/fast-export.c | 42 +++ t/t9350-fast-export.sh| 11 3 files changed, 58 insertions(+), 12 deletions(-) diff --git a/Documentation/git-fast-export.txt b/Documentation/git-fast-export.txt index fda55b3284..f65026662a 100644 --- a/Documentation/git-fast-export.txt +++ b/Documentation/git-fast-export.txt @@ -110,6 +110,18 @@ marks the same across runs. the shape of the history and stored tree. See the section on `ANONYMIZING` below. +--reference-excluded-parents:: + By default, running a command such as `git fast-export + master~5..master` will not include the commit master{tilde}5 + and will make master{tilde}4 no longer have master{tilde}5 as + a parent (though both the old master{tilde}4 and new + master{tilde}4 will have all the same files). Use + --reference-excluded-parents to instead have the the stream + refer to commits in the excluded range of history by their + sha1sum. Note that the resulting stream can only be used by a + repository which already contains the necessary parent + commits. + --refspec:: Apply the specified refspec to each ref exported. Multiple of them can be specified. @@ -119,8 +131,9 @@ marks the same across runs. 'git rev-list', that specifies the specific objects and references to export. For example, `master~10..master` causes the current master reference to be exported along with all objects - added since its 10th ancestor commit and all files common to - master{tilde}9 and master{tilde}10. + added since its 10th ancestor commit and (unless the + --reference-excluded-parents option is specified) all files + common to master{tilde}9 and master{tilde}10. EXAMPLES diff --git a/builtin/fast-export.c b/builtin/fast-export.c index d71e0333d4..78fc67b03a 100644 --- a/builtin/fast-export.c +++ b/builtin/fast-export.c @@ -37,6 +37,7 @@ static int fake_missing_tagger; static int use_done_feature; static int no_data; static int full_tree; +static int reference_excluded_commits; static struct string_list extra_refs = STRING_LIST_INIT_NODUP; static struct string_list tag_refs = STRING_LIST_INIT_NODUP; static struct refspec refspecs = REFSPEC_INIT_FETCH; @@ -597,7 +598,8 @@ static void handle_commit(struct commit *commit, struct rev_info *rev, message += 2; if (commit->parents && - get_object_mark(>parents->item->object) != 0 && + (get_object_mark(>parents->item->object) != 0 || +reference_excluded_commits) && !full_tree) { parse_commit_or_die(commit->parents->item); diff_tree_oid(get_commit_tree_oid(commit->parents->item), @@ -645,13 +647,21 @@ static void handle_commit(struct commit *commit, struct rev_info *rev, unuse_commit_buffer(commit, commit_buffer); for (i = 0, p = commit->parents; p; p = p->next) { - int mark = get_object_mark(>item->object); - if (!mark) +
[PATCH v3 08/11] fast-export: ensure we export requested refs
If file paths are specified to fast-export and a ref points to a commit that does not touch any of the relevant paths, then that ref would sometimes fail to be exported. (This depends on whether any ancestors of the commit which do touch the relevant paths would be exported with that same ref name or a different ref name.) To avoid this problem, put *all* specified refs into extra_refs to start, and then as we export each commit, remove the refname used in the 'commit $REFNAME' directive from extra_refs. Then, in handle_tags_and_duplicates() we know which refs actually do need a manual reset directive in order to be included. This means that we do need some special handling for excluded refs; e.g. if someone runs git fast-export ^master master then they've asked for master to be exported, but they have also asked for the commit which master points to and all of its history to be excluded. That logically means ref deletion. Previously, such refs were just silently omitted from being exported despite having been explicitly requested for export. Signed-off-by: Elijah Newren --- builtin/fast-export.c | 54 -- t/t9350-fast-export.sh | 16 ++--- 2 files changed, 55 insertions(+), 15 deletions(-) diff --git a/builtin/fast-export.c b/builtin/fast-export.c index 227488ae84..d71e0333d4 100644 --- a/builtin/fast-export.c +++ b/builtin/fast-export.c @@ -38,6 +38,7 @@ static int use_done_feature; static int no_data; static int full_tree; static struct string_list extra_refs = STRING_LIST_INIT_NODUP; +static struct string_list tag_refs = STRING_LIST_INIT_NODUP; static struct refspec refspecs = REFSPEC_INIT_FETCH; static int anonymize; static struct revision_sources revision_sources; @@ -612,6 +613,13 @@ static void handle_commit(struct commit *commit, struct rev_info *rev, export_blob(_queued_diff.queue[i]->two->oid); refname = *revision_sources_at(_sources, commit); + /* +* FIXME: string_list_remove() below for each ref is overall +* O(N^2). Compared to a history walk and diffing trees, this is +* just lost in the noise in practice. However, theoretically a +* repo may have enough refs for this to become slow. +*/ + string_list_remove(_refs, refname, 0); if (anonymize) { refname = anonymize_refname(refname); anonymize_ident_line(, _end); @@ -815,7 +823,7 @@ static struct commit *get_commit(struct rev_cmdline_entry *e, char *full_name) /* handle nested tags */ while (tag && tag->object.type == OBJ_TAG) { parse_object(the_repository, >object.oid); - string_list_append(_refs, full_name)->util = tag; + string_list_append(_refs, full_name)->util = tag; tag = (struct tag *)tag->tagged; } if (!tag) @@ -874,25 +882,30 @@ static void get_tags_and_duplicates(struct rev_cmdline_info *info) } /* -* This ref will not be updated through a commit, lets make -* sure it gets properly updated eventually. +* Make sure this ref gets properly updated eventually, whether +* through a commit or manually at the end. */ - if (*revision_sources_at(_sources, commit) || - commit->object.flags & SHOWN) + if (e->item->type != OBJ_TAG) string_list_append(_refs, full_name)->util = commit; + if (!*revision_sources_at(_sources, commit)) *revision_sources_at(_sources, commit) = full_name; } + + string_list_sort(_refs); + string_list_remove_duplicates(_refs, 0); } -static void handle_tags_and_duplicates(void) +static void handle_tags_and_duplicates(struct string_list *extras) { struct commit *commit; int i; - for (i = extra_refs.nr - 1; i >= 0; i--) { - const char *name = extra_refs.items[i].string; - struct object *object = extra_refs.items[i].util; + for (i = extras->nr - 1; i >= 0; i--) { + const char *name = extras->items[i].string; + struct object *object = extras->items[i].util; + int mark; + switch (object->type) { case OBJ_TAG: handle_tag(name, (struct tag *)object); @@ -913,8 +926,24 @@ static void handle_tags_and_duplicates(void) name, oid_to_hex(_oid)); continue; } - printf("reset %s\nfrom :%d\n\n", name, - get_object_mark(>object)); + + mark = get_object_mark(>object); + if (!mark) { +
[PATCH v3 03/11] git-fast-export.txt: clarify misleading documentation about rev-list args
Signed-off-by: Elijah Newren --- Documentation/git-fast-export.txt | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/Documentation/git-fast-export.txt b/Documentation/git-fast-export.txt index ce954be532..fda55b3284 100644 --- a/Documentation/git-fast-export.txt +++ b/Documentation/git-fast-export.txt @@ -119,7 +119,8 @@ marks the same across runs. 'git rev-list', that specifies the specific objects and references to export. For example, `master~10..master` causes the current master reference to be exported along with all objects - added since its 10th ancestor commit. + added since its 10th ancestor commit and all files common to + master{tilde}9 and master{tilde}10. EXAMPLES -- 2.19.1.1063.g1796373474.dirty
[PATCH v3 11/11] fast-export: add a --show-original-ids option to show original names
Knowing the original names (hashes) of commits can sometimes enable post-filtering that would otherwise be difficult or impossible. In particular, the desire to rewrite commit messages which refer to other prior commits (on top of whatever other filtering is being done) is very difficult without knowing the original names of each commit. In addition, knowing the original names (hashes) of blobs can allow filtering by blob-id without requiring re-hashing the content of the blob, and is thus useful as a small optimization. Once we add original ids for both commits and blobs, we may as well add them for tags too for completeness. Perhaps someone will have a use for them. This commit teaches a new --show-original-ids option to fast-export which will make it add a 'original-oid ' line to blob, commits, and tags. It also teaches fast-import to parse (and ignore) such lines. Signed-off-by: Elijah Newren --- Documentation/git-fast-export.txt | 7 +++ Documentation/git-fast-import.txt | 16 builtin/fast-export.c | 20 +++- fast-import.c | 12 t/t9350-fast-export.sh| 17 + 5 files changed, 67 insertions(+), 5 deletions(-) diff --git a/Documentation/git-fast-export.txt b/Documentation/git-fast-export.txt index f65026662a..64c01ba918 100644 --- a/Documentation/git-fast-export.txt +++ b/Documentation/git-fast-export.txt @@ -122,6 +122,13 @@ marks the same across runs. repository which already contains the necessary parent commits. +--show-original-ids:: + Add an extra directive to the output for commits and blobs, + `original-oid `. While such directives will likely be + ignored by importers such as git-fast-import, it may be useful + for intermediary filters (e.g. for rewriting commit messages + which refer to older commits, or for stripping blobs by id). + --refspec:: Apply the specified refspec to each ref exported. Multiple of them can be specified. diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt index 7ab97745a6..43ab3b1637 100644 --- a/Documentation/git-fast-import.txt +++ b/Documentation/git-fast-import.txt @@ -385,6 +385,7 @@ change to the project. 'commit' SP LF mark? + original-oid? ('author' (SP )? SP LT GT SP LF)? 'committer' (SP )? SP LT GT SP LF data @@ -741,6 +742,19 @@ New marks are created automatically. Existing marks can be moved to another object simply by reusing the same `` in another `mark` command. +`original-oid` +~~ +Provides the name of the object in the original source control system. +fast-import will simply ignore this directive, but filter processes +which operate on and modify the stream before feeding to fast-import +may have uses for this information + + + 'original-oid' SP LF + + +where `` is any string not containing LF. + `tag` ~ Creates an annotated tag referring to a specific commit. To create @@ -749,6 +763,7 @@ lightweight (non-annotated) tags see the `reset` command below. 'tag' SP LF 'from' SP LF + original-oid? 'tagger' (SP )? SP LT GT SP LF data @@ -823,6 +838,7 @@ assigned mark. 'blob' LF mark? + original-oid? data diff --git a/builtin/fast-export.c b/builtin/fast-export.c index 78fc67b03a..36c2575de5 100644 --- a/builtin/fast-export.c +++ b/builtin/fast-export.c @@ -38,6 +38,7 @@ static int use_done_feature; static int no_data; static int full_tree; static int reference_excluded_commits; +static int show_original_ids; static struct string_list extra_refs = STRING_LIST_INIT_NODUP; static struct string_list tag_refs = STRING_LIST_INIT_NODUP; static struct refspec refspecs = REFSPEC_INIT_FETCH; @@ -271,7 +272,10 @@ static void export_blob(const struct object_id *oid) mark_next_object(object); - printf("blob\nmark :%"PRIu32"\ndata %lu\n", last_idnum, size); + printf("blob\nmark :%"PRIu32"\n", last_idnum); + if (show_original_ids) + printf("original-oid %s\n", oid_to_hex(oid)); + printf("data %lu\n", size); if (size && fwrite(buf, size, 1, stdout) != 1) die_errno("could not write blob '%s'", oid_to_hex(oid)); printf("\n"); @@ -635,8 +639,10 @@ static void handle_commit(struct commit *commit, struct rev_info *rev, reencoded = reencode_string(message, "UTF-8", encoding); if (!commit->parents) printf("reset %s\n", refname); - printf("commit %s\nmark :%"PRIu32"\n%.*s\n%.*s\ndata %u\n%s", - refname, last_idnum, + printf("commit %s\nmark :%"PRIu32"\n", refname, last_idnum); + if (show_original_ids) + printf("original-oid %s\n", oid_to_hex(>object.oid)); + printf("%.*s\n%.*s\ndata %u\n%s",
[PATCH v3 02/11] git-fast-import.txt: fix documentation for --quiet option
Signed-off-by: Elijah Newren --- Documentation/git-fast-import.txt | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt index e81117d27f..7ab97745a6 100644 --- a/Documentation/git-fast-import.txt +++ b/Documentation/git-fast-import.txt @@ -40,9 +40,10 @@ OPTIONS not contain the old commit). --quiet:: - Disable all non-fatal output, making fast-import silent when it - is successful. This option disables the output shown by - --stats. + Disable the output shown by --stats, making fast-import usually + be silent when it is successful. However, if the import stream + has directives intended to show user output (e.g. `progress` + directives), the corresponding messages will still be shown. --stats:: Display some basic statistics about the objects fast-import has -- 2.19.1.1063.g1796373474.dirty
[PATCH v3 04/11] fast-export: use value from correct enum
ABORT and ERROR happen to have the same value, but come from differnt enums. Use the one from the correct enum, and while at it, rename the values to avoid such problems. Signed-off-by: Elijah Newren --- builtin/fast-export.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/builtin/fast-export.c b/builtin/fast-export.c index f5166ac71e..e2be35f41e 100644 --- a/builtin/fast-export.c +++ b/builtin/fast-export.c @@ -31,8 +31,8 @@ static const char *fast_export_usage[] = { }; static int progress; -static enum { ABORT, VERBATIM, WARN, WARN_STRIP, STRIP } signed_tag_mode = ABORT; -static enum { ERROR, DROP, REWRITE } tag_of_filtered_mode = ERROR; +static enum { SIGNED_TAG_ABORT, VERBATIM, WARN, WARN_STRIP, STRIP } signed_tag_mode = SIGNED_TAG_ABORT; +static enum { TAG_FILTERING_ABORT, DROP, REWRITE } tag_of_filtered_mode = TAG_FILTERING_ABORT; static int fake_missing_tagger; static int use_done_feature; static int no_data; @@ -46,7 +46,7 @@ static int parse_opt_signed_tag_mode(const struct option *opt, const char *arg, int unset) { if (unset || !strcmp(arg, "abort")) - signed_tag_mode = ABORT; + signed_tag_mode = SIGNED_TAG_ABORT; else if (!strcmp(arg, "verbatim") || !strcmp(arg, "ignore")) signed_tag_mode = VERBATIM; else if (!strcmp(arg, "warn")) @@ -64,7 +64,7 @@ static int parse_opt_tag_of_filtered_mode(const struct option *opt, const char *arg, int unset) { if (unset || !strcmp(arg, "abort")) - tag_of_filtered_mode = ERROR; + tag_of_filtered_mode = TAG_FILTERING_ABORT; else if (!strcmp(arg, "drop")) tag_of_filtered_mode = DROP; else if (!strcmp(arg, "rewrite")) @@ -728,7 +728,7 @@ static void handle_tag(const char *name, struct tag *tag) "\n-BEGIN PGP SIGNATURE-\n"); if (signature) switch(signed_tag_mode) { - case ABORT: + case SIGNED_TAG_ABORT: die("encountered signed tag %s; use " "--signed-tags= to handle it", oid_to_hex(>object.oid)); @@ -753,7 +753,7 @@ static void handle_tag(const char *name, struct tag *tag) tagged_mark = get_object_mark(tagged); if (!tagged_mark) { switch(tag_of_filtered_mode) { - case ABORT: + case TAG_FILTERING_ABORT: die("tag %s tags unexported object; use " "--tag-of-filtered-object= to handle it", oid_to_hex(>object.oid)); -- 2.19.1.1063.g1796373474.dirty
[PATCH v3 01/11] fast-export: convert sha1 to oid
Rename anonymize_sha1() to anonymize_oid(() and change its signature, and switch from sha1_to_hex() to oid_to_hex() and from GIT_SHA1_RAWSZ to the_hash_algo->rawsz. Also change a comment and a die string to mention oid instead of sha1. Signed-off-by: Elijah Newren --- builtin/fast-export.c | 25 + 1 file changed, 13 insertions(+), 12 deletions(-) diff --git a/builtin/fast-export.c b/builtin/fast-export.c index 456797c12a..f5166ac71e 100644 --- a/builtin/fast-export.c +++ b/builtin/fast-export.c @@ -243,7 +243,7 @@ static void export_blob(const struct object_id *oid) if (!buf) die("could not read blob %s", oid_to_hex(oid)); if (check_object_signature(oid, buf, size, type_name(type)) < 0) - die("sha1 mismatch in blob %s", oid_to_hex(oid)); + die("oid mismatch in blob %s", oid_to_hex(oid)); object = parse_object_buffer(the_repository, oid, type, size, buf, ); } @@ -330,17 +330,18 @@ static void print_path(const char *path) static void *generate_fake_oid(const void *old, size_t *len) { - static uint32_t counter = 1; /* avoid null sha1 */ - unsigned char *out = xcalloc(GIT_SHA1_RAWSZ, 1); - put_be32(out + GIT_SHA1_RAWSZ - 4, counter++); + static uint32_t counter = 1; /* avoid null oid */ + const unsigned hashsz = the_hash_algo->rawsz; + unsigned char *out = xcalloc(hashsz, 1); + put_be32(out + hashsz - 4, counter++); return out; } -static const unsigned char *anonymize_sha1(const struct object_id *oid) +static const struct object_id *anonymize_oid(const struct object_id *oid) { - static struct hashmap sha1s; - size_t len = GIT_SHA1_RAWSZ; - return anonymize_mem(, generate_fake_oid, oid, ); + static struct hashmap objs; + size_t len = the_hash_algo->rawsz; + return anonymize_mem(, generate_fake_oid, oid, ); } static void show_filemodify(struct diff_queue_struct *q, @@ -399,9 +400,9 @@ static void show_filemodify(struct diff_queue_struct *q, */ if (no_data || S_ISGITLINK(spec->mode)) printf("M %06o %s ", spec->mode, - sha1_to_hex(anonymize ? - anonymize_sha1(>oid) : - spec->oid.hash)); + oid_to_hex(anonymize ? + anonymize_oid(>oid) : + >oid)); else { struct object *object = lookup_object(the_repository, spec->oid.hash); @@ -988,7 +989,7 @@ static void handle_deletes(void) continue; printf("reset %s\nfrom %s\n\n", - refspec->dst, sha1_to_hex(null_sha1)); + refspec->dst, oid_to_hex(_oid)); } } -- 2.19.1.1063.g1796373474.dirty
[PATCH v3 05/11] fast-export: avoid dying when filtering by paths and old tags exist
If --tag-of-filtered-object=rewrite is specified along with a set of paths to limit what is exported, then any tags pointing to old commits that do not contain any of those specified paths cause problems. Since the old tagged commit is not exported, fast-export attempts to rewrite such tags to an ancestor commit which was exported. If no such commit exists, then fast-export currently die()s. Five years after the tag rewriting logic was added to fast-export (see commit 2d8ad4691921, "fast-export: Add a --tag-of-filtered-object option for newly dangling tags", 2009-06-25), fast-import gained the ability to delete refs (see commit 4ee1b225b99f, "fast-import: add support to delete refs", 2014-04-20), so now we do have a valid option to rewrite the tag to. Delete these tags instead of dying. Signed-off-by: Elijah Newren --- builtin/fast-export.c | 9 ++--- t/t9350-fast-export.sh | 16 2 files changed, 22 insertions(+), 3 deletions(-) diff --git a/builtin/fast-export.c b/builtin/fast-export.c index e2be35f41e..7d50f5414e 100644 --- a/builtin/fast-export.c +++ b/builtin/fast-export.c @@ -775,9 +775,12 @@ static void handle_tag(const char *name, struct tag *tag) break; if (!(p->object.flags & TREESAME)) break; - if (!p->parents) - die("can't find replacement commit for tag %s", -oid_to_hex(>object.oid)); + if (!p->parents) { + printf("reset %s\nfrom %s\n\n", + name, oid_to_hex(_oid)); + free(buf); + return; + } p = p->parents->item; } tagged_mark = get_object_mark(>object); diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh index 6a392e87bc..3400ebeb51 100755 --- a/t/t9350-fast-export.sh +++ b/t/t9350-fast-export.sh @@ -325,6 +325,22 @@ test_expect_success 'rewriting tag of filtered out object' ' ) ' +test_expect_success 'rewrite tag predating pathspecs to nothing' ' + test_create_repo rewrite_tag_predating_pathspecs && + ( + cd rewrite_tag_predating_pathspecs && + + test_commit initial && + + git tag -a -m "Some old tag" v0.0.0.0.0.0.1 && + + test_commit bar && + + git fast-export --tag-of-filtered-object=rewrite --all -- bar.t >output && + grep from.$ZERO_OID output + ) +' + cat > limit-by-paths/expected << EOF blob mark :1 -- 2.19.1.1063.g1796373474.dirty
[PATCH v3 07/11] fast-export: when using paths, avoid corrupt stream with non-existent mark
If file paths are specified to fast-export and multiple refs point to a commit that does not touch any of the relevant file paths, then fast-export can hit problems. fast-export has a list of additional refs that it needs to explicitly set after exporting all blobs and commits, and when it tries to get_object_mark() on the relevant commit, it can get a mark of 0, i.e. "not found", because the commit in question did not touch the relevant paths and thus was not exported. Trying to import a stream with a mark corresponding to an unexported object will cause fast-import to crash. Avoid this problem by taking the commit the ref points to and finding an ancestor of it that was exported, and make the ref point to that commit instead. Signed-off-by: Elijah Newren --- builtin/fast-export.c | 13 - t/t9350-fast-export.sh | 20 2 files changed, 32 insertions(+), 1 deletion(-) diff --git a/builtin/fast-export.c b/builtin/fast-export.c index 43e98a38a8..227488ae84 100644 --- a/builtin/fast-export.c +++ b/builtin/fast-export.c @@ -901,7 +901,18 @@ static void handle_tags_and_duplicates(void) if (anonymize) name = anonymize_refname(name); /* create refs pointing to already seen commits */ - commit = (struct commit *)object; + commit = rewrite_commit((struct commit *)object); + if (!commit) { + /* +* Neither this object nor any of its +* ancestors touch any relevant paths, so +* it has been filtered to nothing. Delete +* it. +*/ + printf("reset %s\nfrom %s\n\n", + name, oid_to_hex(_oid)); + continue; + } printf("reset %s\nfrom :%d\n\n", name, get_object_mark(>object)); show_progress(); diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh index 3400ebeb51..299120ba70 100755 --- a/t/t9350-fast-export.sh +++ b/t/t9350-fast-export.sh @@ -382,6 +382,26 @@ test_expect_success 'path limiting with import-marks does not lose unmodified fi grep file0 actual ' +test_expect_success 'avoid corrupt stream with non-existent mark' ' + test_create_repo avoid_non_existent_mark && + ( + cd avoid_non_existent_mark && + + test_commit important-path && + + test_commit ignored && + + git branch A && + git branch B && + + echo foo >>important-path.t && + git add important-path.t && + test_commit more changes && + + git fast-export --all -- important-path.t | git fast-import --force + ) +' + test_expect_success 'full-tree re-shows unmodified files'' git checkout -f simple && git fast-export --full-tree simple >actual && -- 2.19.1.1063.g1796373474.dirty