I am Mr. Hussein Harmuh from Syria. I got my information while I was searching a trusted person, I have a very profitable business offer for you and I can assure you that you will not regret being par
Col. Hussein Kharmusch
Re: [RFC PATCH 07/13] test-reach
On 6/29/2018 5:54 PM, Stefan Beller wrote: On Fri, Jun 29, 2018 at 9:13 AM Derrick Stolee wrote: +# Construct a grid-like commit graph with points (x,y) +# with 1 <= x <= 10, 1 <= y <= 10, where (x,y) has +# parents (x-1, y) and (x, y-1), keeping in mind that +# we drop a parent if a coordinate is nonpositive. +# +# (10,10) +#/ \ +# (10,9)(9,10) +#/ \ / \ +#(10,8)(9,9) (8,10) +# / \/ \ /\ +# ( continued...) +# \ /\ / \/ +#(3,1) (2,2) (1,3) +#\ /\ / +# (2,1) (2,1) +# \/ +# (1,1) +# +# We use branch 'comit-x-y' to refer to (x,y). +# This grid allows interesting reachability and +# non-reachability queries: (x,y) can reach (x',y') +# if and only if x' <= x and y' <= y. This is an interesting DAG, though not very common in reality (81 merges, 18 single parent commits, one root with a depth of at most 10). It reminds me of FELINE as that also has 2 independent numbers :) The design of this graph is exactly so you (the test writer) can know if two commits can reach exactly by whether the two-dimensional keys are comparable. It's wide enough that we can come up with interesting test cases for some of these walks, especially the have/wants negotiation. I guess it is easy to make up artificial test cases for that, but I wonder if we want more variety for a generic reachability test tool (e.g. long chains of parallel histories, octopus, disconnected parts of the graph) +test_expect_success 'setup' ' [...] + git commit-graph write --reachable Is this only to test the full commit-graph?, maybe we'd want to write out the commit graph at (5,10) or so, such that half the walking is tested without graph. What about author/commit dates? There are a lot of ways we can get strange edge cases for commit-graph walks, especially when we start being clever. This test case is only a start to feeling like we have good coverage, but having a non-trivial walk include both outputs (yes/no) for each method is a good start. I think after we get a basic coverage for these methods on this example, we can add examples as necessary when we find tricky code paths that need coverage. Thanks, -Stolee
Re: [RFC PATCH 06/13] commit-reach: move can_all_from_reach_with_flag()
On 6/29/2018 5:47 PM, Stefan Beller wrote: Hi Derrick, +static int reachable(struct commit *from, int with_flag, int assign_flag, time_t min_commit_date) [...] + if (commit->date < min_commit_date) + continue; [...] +int can_all_from_reach_with_flag(struct object_array from, +int with_flag, int assign_flag, +time_t min_commit_date) Can you expand on why we introduce the min_commit_date in the commit message? (It looks like a rebase error as I would have expected a move only, given the subject) This is a case where this should have been two steps: 1. Remove dependence on the global 'oldest_have' in upload-pack.c by creating the min_commit_date parameter. 2. Move the code from upload-pack.c to commit-reach.c. Thanks, -Stolee
Re: [RFC PATCH 03/13] commit-reach: move commit_contains from ref-filter
On 6/29/2018 5:38 PM, Stefan Beller wrote: Hi Derrick, +struct ref_filter_cbdata { + struct ref_array *array; + struct ref_filter *filter; + struct contains_cache contains_cache; + struct contains_cache no_contains_cache; These members also seem to be moved whitespace-inconsistently. Could you clarify in the commit message that you re-indent them or what happened? Sorry, there appears to be something strange that happens when I paste into vim that causes my whitespace to mess up, especially leading whitespace. Usually `git rebase --whitespace=fix` fixes it. I'll be more careful in my next replay of the series.
Re: [PATCH] xdiff: reduce indent heuristic overhead
Excerpts from Stefan Beller's message of 2018-06-29 16:37:41 -0700: > [...] > Jun, Junio > > By changing the authorship we'd want to have a sign off from the original > author, > before applying; in the previous attempt, I was merely taking the code from > mercurial as their copy of xdiff is also LGPLv2 so we are free to use that. I'm fine with signing off the patch. I didn't send this one here mainly because indent heuristic was default off at the time I made the changes, and I wasn't sure about how to test this properly according to the git community standard. If this change is fine without additional tests, I can send it myself, too. > Thanks, > Stefan > > [...]
[PATCH] xdiff: reduce indent heuristic overhead
From: Jun Wu This patch was written originally for mercurial at [1], adding a limit on how long we'd be looking for an optimal indent heuristic. Choose the limit high enough to only limit edge cases. Adds some threshold to avoid expensive cases, like: ``` #!python open('a', 'w').write(" \n" * 100) open('b', 'w').write(" \n" * 101) ``` The indent heuristic is O(N * 20) (N = 100) for the above case, and makes diff much slower. Before this patch (system git 2.14.2): ``` git diff --no-indent-heuristic a b 0.21s user 0.03s system 100% cpu 0.239 total git diff --indent-heuristic a b 0.77s user 0.02s system 99% cpu 0.785 total ``` After this patch (git 2fc74f41, with xdiffi.c patched): ``` # with the changed xdiffi.c git diff --indent-heuristic a b 0.16s user 0.01s system 90% cpu 0.188 total git diff --no-indent-heuristic a b 0.18s user 0.01s system 99% cpu 0.192 total ``` Now turning on indent-heuristic has no visible impact on performance. Differential Revision: https://phab.mercurial-scm.org/D2601 [1] https://phab.mercurial-scm.org/rHGc420792217c89622482005c99e959b9071c109c5 Signed-off-by: Stefan Beller --- Jun, Junio By changing the authorship we'd want to have a sign off from the original author, before applying; in the previous attempt, I was merely taking the code from mercurial as their copy of xdiff is also LGPLv2 so we are free to use that. Thanks, Stefan xdiff/xdiffi.c | 38 +++--- 1 file changed, 35 insertions(+), 3 deletions(-) diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c index 0de1ef463bf..c74ec77da58 100644 --- a/xdiff/xdiffi.c +++ b/xdiff/xdiffi.c @@ -807,6 +807,14 @@ static void xdl_bug(const char *msg) exit(1); } +/* + * For indentation heuristic, skip searching for better slide position after + * checking MAX_BORING lines without finding an improvement. This defends the + * indentation heuristic logic against pathological cases. The value is not + * picked scientifically but should be good enough. + */ +#define MAX_BORING 100 + /* * Move back and forward change groups for a consistent and pretty diff output. * This also helps in finding joinable change groups and reducing the diff @@ -903,19 +911,43 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) { long shift, best_shift = -1; struct split_score best_score; - for (shift = earliest_end; shift <= g.end; shift++) { + /* +* This is O(N * MAX_BLANKS) (N = shift-able lines). +* Even with MAX_BLANKS bounded to a small value, a +* large N could still make this loop take several +* times longer than the main diff algorithm. The +* "boring" value is to help cut down N to something +* like (MAX_BORING + groupsize). +* +* Scan from bottom to top. So we can exit the loop +* without compromising the assumption "for a same best +* score, pick the bottommost shift". +*/ + int boring = 0; + for (shift = g.end; shift >= earliest_end; shift--) { struct split_measurement m; struct split_score score = {0, 0}; + int cmp; measure_split(xdf, shift, &m); score_add_split(&m, &score); measure_split(xdf, shift - groupsize, &m); score_add_split(&m, &score); - if (best_shift == -1 || - score_cmp(&score, &best_score) <= 0) { + + if (best_shift == -1) { + cmp = -1; + } else { + cmp = score_cmp(&score, &best_score); + } + if (cmp < 0) { + boring = 0; best_score.effective_indent = score.effective_indent; best_score.penalty = score.penalty; best_shift = shift; + } else { + boring += 1; + if (boring >= MAX_BORING) + break; } } -- 2.18.0.399.gad0ab374a1-goog
Re: fast-import slowness when importing large files with small differences
On Sat, Jun 30, 2018 at 12:10:24AM +0200, Ævar Arnfjörð Bjarmason wrote: > > On Fri, Jun 29 2018, Mike Hommey wrote: > > > I noticed some slowness when fast-importing data from the Firefox mercurial > > repository, where fast-import spends more than 5 minutes importing ~2000 > > revisions of one particular file. I reduced a testcase while still > > using real data. One could synthesize data with kind of the same > > properties, but I figured real data could be useful. > > > > To reproduce: > > $ git clone https://gist.github.com/b6b8edcff2005cc482cf84972adfbba9.git foo > > $ git init bar > > $ cd bar > > $ python ../foo/import.py ../foo/data.gz | git fast-import --depth=2000 > > > > [...] > > So maybe it would make sense to consolidate the diff code (after all, > > diff-delta.c is an old specialized fork of xdiff). With manual trimming > > of common head and tail, this gets down to 3:33. > > > > I'll also note that Facebook has imported xdiff from the git code base > > into mercurial and improved performance on it, so it might also be worth > > looking at what's worth taking from there. > > It would be interesting to see how does this compares with a more naïve > approach of committing every version of this file one-at-a-time into a > new repository (with & without gc.auto=0). Perhaps deltaing as we go is > suboptimal compared to just writing out a lot of redundant data and > repacking it all at once later. "Just" writing 26GB? And that's only one file. If I were to do that for the whole repository, it would yield a > 100GB pack. Instead of < 2GB currently. Mike
Re: [RFC PATCH 13/13] commit-reach: use can_all_from_reach
Hi Derrick, On Fri, Jun 29, 2018 at 9:13 AM Derrick Stolee wrote: > > The is_descendant_of method previously used in_merge_bases() to check if > the commit can reach any of the commits in the provided list. This had > two performance problems: > > 1. The performance is quadratic in worst-case. > > 2. A single in_merge_bases() call requires walking beyond the target >commit in order to find the full set of boundary commits that may be >merge-bases. > > The can_all_from_reach method avoids this quadratic behavior and can > limit the search beyond the target commits using generation numbers. It > requires a small prototype adjustment to stop using commit-date as a > cutoff, as that optimization is no longer appropriate here. > > Performance was meausured on a copy of the Linux repository using the > 'test-tool reach is_descendant_of' command using this input: > > A:v4.9 > X:v4.10 > X:v4.11 > X:v4.12 > X:v4.13 > X:v4.14 > X:v4.15 > X:v4.16 > X:v4.17 > X.v3.0 > > Note that this input is tailored to demonstrate the quadratic nature of > the previous method, as it will compute merge-bases for v4.9 versus all > of the later versions before checking against v4.1. > > Before: 0.31 s > After: 0.27 s > > Since we previously used the is_descendant_of method in the ref_newer > method, we also measured performance there using > 'test-tool reach ref_newer': > > Before: 0.12 s > After: 0.11 s > > Signed-off-by: Derrick Stolee > --- > > One thing I know is missing from this commit is a special-case to use > the old logic when there is no commit-graph present. The > can_all_from_reach() algorithm can be worse when we do not have good > generation number cutoffs. In the previous case of > can_all_from_reach_with_flags(), we already had an established pattern > of using commit date as a cutoff, so the generation number is only a > second cutoff and the algorithm cannot walk more commits than before. I like this series, Thanks for writing it! Stefan
Re: [RFC PATCH 11/13] commit-reach: make can_all_from_reach... linear
On Fri, Jun 29, 2018 at 9:13 AM Derrick Stolee wrote: > > The can_all_from_reach_with_flags() algorithm is currently quadratic in > the worst case, because it calls the reachable() method for every 'from' > without tracking which commits have already been walked or which can > already reach a commit in 'to'. > > Rewrite the algorithm to walk each commit a constant number of times. > > We also add some optimizations that should work for the main consumer of > this method: fetch negotitation (haves/wants). > > The first step includes using a depth-first-search (DFS) from each from > commit, sorted by ascending generation number. We do not walk beyond the > minimum generation number or the minimum commit date. This DFS is likely > to be faster than the existing reachable() method because we expect > previous ref values to be along the first-parent history. > > If we find a target commit, then we mark everything in the DFS stack as > a RESULT. This expands the set of targets for the other from commits. We > also mark the visited commits using 'assign_flag' to prevent re-walking > the same code. > > We still need to clear our flags at the end, which is why we will have a > total of three visits to each commit. > > Performance was measured on the Linux repository using > 'test-tool reach can_all_from_reach'. The input included rows seeded by > tag values. The "small" case included X-rows as v4.[0-9]* and Y-rows as > v3.[0-9]*. This mimics a (very large) fetch that says "I have all major > v3 releases and want all major v4 releases." The "large" case included > X-rows as "v4.*" and Y-rows as "v3.*". This adds all release-candidate > tags to the set, which does not greatly increase the number of objects > that are considered, but does increase the number of 'from' commits, > demonstrating the quadratic nature of the previous code. > > Small Case > -- > > Before: 1.45 s > After: 0.34 s > > Large Case > -- > > Before: 5.83 s > After: 0.37 s > > Note how the time increases between the two cases in the two versions. > The new code increases relative to the number of commits that need to be > walked, but not directly relative to the number of 'from' commits. > > Signed-off-by: Derrick Stolee > --- > commit-reach.c | 122 ++--- > commit-reach.h | 3 +- > upload-pack.c | 5 +- > 3 files changed, 82 insertions(+), 48 deletions(-) > > diff --git a/commit-reach.c b/commit-reach.c > index 992ad5cdc7..8e24455d9f 100644 > --- a/commit-reach.c > +++ b/commit-reach.c > @@ -530,64 +530,88 @@ int commit_contains(struct ref_filter *filter, struct > commit *commit, > return is_descendant_of(commit, list); > } > > -static int reachable(struct commit *from, int with_flag, int assign_flag, > time_t min_commit_date) > +static int compare_commits_by_gen(const void *_a, const void *_b) > { > - struct prio_queue work = { compare_commits_by_commit_date }; > + const struct commit *a = (const struct commit *)_a; > + const struct commit *b = (const struct commit *)_b; > > - prio_queue_put(&work, from); > - while (work.nr) { > - struct commit_list *list; > - struct commit *commit = prio_queue_get(&work); > - > - if (commit->object.flags & with_flag) { > - from->object.flags |= assign_flag; > - break; > - } > - if (!commit->object.parsed) > - parse_object(&commit->object.oid); > - if (commit->object.flags & TMP_MARK) > - continue; > - commit->object.flags |= TMP_MARK; > - if (commit->date < min_commit_date) > - continue; > - for (list = commit->parents; list; list = list->next) { > - struct commit *parent = list->item; > - if (!(parent->object.flags & TMP_MARK)) > - prio_queue_put(&work, parent); > - } > - } > - from->object.flags |= TMP_MARK; > - clear_commit_marks(from, TMP_MARK); > - clear_prio_queue(&work); > - return (from->object.flags & assign_flag); > + if (a->generation < b->generation) > + return -1; > + if (a->generation > b->generation) > + return 1; > + return 0; > } > > int can_all_from_reach_with_flag(struct object_array *from, > int with_flag, int assign_flag, > -time_t min_commit_date) > +time_t min_commit_date, > +uint32_t min_generation) > { > + struct commit **list = NULL; > int i; > + int result = 1; > > + ALLOC_ARRAY(list, from->nr); > for (i = 0; i < from->nr; i++) { > - struct object *from_one = from->objects[i].item; > + list[i]
Re: [PATCH] fetch: when deepening, check connectivity fully
> > That is the way it should work, but after thinking about it once more, I > > realize that it isn't. > > > > opt->shallow_file is not set to anything. And fetch-pack updates the > > shallow file by itself (at least, that is my understanding of > > update_shallow() in fetch-pack.c) before fetch calls check_connected(), > > meaning that if check_connected() fails, there is still no rollback of > > the shallow file. > > Ouch. We need to fix that; otherwise, a broken server will keep > giving you a corrupt repository even with this fix, no? Yes, that is true - the repository will be left in a corrupt state. I did some more investigation, and usually things are OK because unpack-trees runs a fsck_walk on all the objects it unpacks (with --shallow-file appropriately set). Things are bad only if the packfile is empty (or, presumably, if the packfile only has unrelated objects). I managed to use the one-time-sed mechanism to craft a response that triggers this case. This both enables me to avoid the brittle check of "rev-list" appearing in the GIT_TRACE output (as discussed in [1]), and to verify that a rollback of the shallow file works, once such a patch is written. I'll look into writing such a patch, so feel free to hold off on this until that patch is done. [1] https://public-inbox.org/git/20180627225105.155996-1-jonathanta...@google.com/
Re: [PATCH v3 00/32] object-store: lookup_commit
On 06/29, Junio C Hamano wrote: > Stefan Beller writes: > > > On Fri, Jun 29, 2018 at 11:03 AM Junio C Hamano wrote: > >> > >> Junio C Hamano writes: > >> > >> > One technique these (not just this) recent efforts seem to be > >> > forgetting is to introduce "new" names that take a_repo and then > >> > make the existing one a thin wrapper that calls the new one with > >> > &the_repo as the argument. > > > > So you'd rather want to see it less invasive done similar to > > NO_THE_INDEX_COMPATIBILITY_MACROS ? Someone (jrnieder?) > > called that a failed experiment, as now we need to carry that baggage > > for quite some time and never cleaned up the started migration; > > only recently Duy started to kill off the_index, which would finish > > that migration? > > I do not think it was a failed experiment at all. In fact, most of > our code is still happy with the single in-core instance of > the_index, and I think it is a mistake to trying to use the variant > that take an istate instance as parameter just for the sake of it. > > IOW, if there is a good concrete reason why it helps to teach the > set of functions involved in a callchain to operate on an in-core > instance of the index_state, passing an istate through the callchain > is a very welcome change. If you do not do so, and then just > replace $foo_cache(...) with $foo_index(&the_index, ...), that is an > irritating and useless code churn that harms other topics in flight. I 100% think that we need to continue these refactorings with both the object store as well as with the_index (removing the index macros and removing the dependency on global state). The whole compat macros most definitely was a failed experiment as we still haven't rid the code base of them yet. Having two APIs which can be used interchangeably from different pieces of library code which have very different semantics is _very_ difficult for contributors to reason about and work around. Especially when one API relies on implied global state while the other does not. (If global state isn't involved, as in the char[20] -> OID work, then having two APIs is 'ok' so long as you're working towards converging to one API). Having clean APIs makes it incredible easy to reason about sections of code, reason about multi-threading, and implement newer features in a sane manner (talking about submodules here). I implemented submodule support for grep twice, once using a multi-process paradigm because we could not have two repositories open in a single process and then an in-process one once a minimum viable solution for opening multiple repositories in a single process was created. Being able to open up a submodule repository in-process made things infinitely simpler to reason about as well as made it much easier to handle errors. This work needs to continue if we want to improve the submodule experience in git. Yes this might mean there are a few more conflicts when merging a series (because of the scope of these refactorings) but it is well worth it given what the end-state will look like. -- Brandon Williams
Re: fast-import slowness when importing large files with small differences
On Fri, Jun 29 2018, Mike Hommey wrote: > I noticed some slowness when fast-importing data from the Firefox mercurial > repository, where fast-import spends more than 5 minutes importing ~2000 > revisions of one particular file. I reduced a testcase while still > using real data. One could synthesize data with kind of the same > properties, but I figured real data could be useful. > > To reproduce: > $ git clone https://gist.github.com/b6b8edcff2005cc482cf84972adfbba9.git foo > $ git init bar > $ cd bar > $ python ../foo/import.py ../foo/data.gz | git fast-import --depth=2000 > > [...] > So maybe it would make sense to consolidate the diff code (after all, > diff-delta.c is an old specialized fork of xdiff). With manual trimming > of common head and tail, this gets down to 3:33. > > I'll also note that Facebook has imported xdiff from the git code base > into mercurial and improved performance on it, so it might also be worth > looking at what's worth taking from there. It would be interesting to see how does this compares with a more naïve approach of committing every version of this file one-at-a-time into a new repository (with & without gc.auto=0). Perhaps deltaing as we go is suboptimal compared to just writing out a lot of redundant data and repacking it all at once later.
Re: [RFC PATCH 08/13] test-reach: test reduce_heads()
On Fri, Jun 29, 2018 at 9:13 AM Derrick Stolee wrote: > > Signed-off-by: Derrick Stolee > --- > t/t6600-test-reach.sh | 26 ++ > 1 file changed, 26 insertions(+) > > diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh > index c9337b6b46..0f60db9c60 100755 > --- a/t/t6600-test-reach.sh > +++ b/t/t6600-test-reach.sh > @@ -78,4 +78,30 @@ test_expect_success 'ref_newer:miss' ' > test_reach_two_modes "ref_newer" > ' > > +test_expect_success 'reduce_heads' ' > + cat >input <<- EOF && > + X:commit-1-10 > + X:commit-2-8 > + X:commit-3-6 > + X:commit-4-4 > + X:commit-1-7 > + X:commit-2-5 > + X:commit-3-3 > + X:commit-5-1 > + Y:commit-10-10 > + Y:commit-3-10 > + Y:commit-9-9 > + Y:commit-8-1 > + EOF > + printf "reduce_heads(X):\n" >expect && > + git rev-parse commit-5-1 >>expect && See 6ac767e5c00 (t6036, t6042: prefer test_cmp to sequences of test, 2018-05-24) on how to avoid some forks here: git rev-parse >expect \ many HEADs or Tips &&
Re: [RFC PATCH 04/13] upload-pack: make reachable() more generic
Derrick Stolee writes: > In anticipation of moving the reachable() method to commit-reach.c, > modify the prototype to be more generic to flags known outside of > upload-pack.c. Also rename 'want' to 'from' to make the statement > more clear outside of the context of haves/wants negotiation. FWIW, I find the order of things done quite sensible. Rename, extend, etc., to prepare before moving and then move, not the other way around. You would want to do the same for some symbols named overly broadly you moved in earlier steps, too. Also, if you will be making this function a global one in a later step, now is the time to give it a good name suitable in the new global context before moving in this preparatory step. If it will be file-scope static in its new home, then reachable() may still be a good enough name. Let's keep reading...
Re: [RFC PATCH 03/13] commit-reach: move commit_contains from ref-filter
Derrick Stolee writes: > +int commit_contains(struct ref_filter *filter, struct commit *commit, > + struct commit_list *list, struct contains_cache *cache) This is a symbol that is used to be file-local private. Is it named appropriately in the new context, which is "globally visible throughout the system"? The convention to call into it now must be documented a lot better (e.g. how should list/cache etc are to be prepared?). > +{ > + if (filter->with_commit_tag_algo) > + return contains_tag_algo(commit, list, cache) == CONTAINS_YES; > + return is_descendant_of(commit, list); > +} > diff --git a/commit-reach.h b/commit-reach.h > index 35ec9f0ddb..986fb388d5 100644 > --- a/commit-reach.h > +++ b/commit-reach.h > @@ -2,42 +2,24 @@ > #define __COMMIT_REACH_H__ > > #include "commit.h" > +#include "commit-slab.h" > +#include "ref-filter.h" > > -struct commit_list *get_merge_bases_many(struct commit *one, > - int n, > - struct commit **twos); > -struct commit_list *get_merge_bases_many_dirty(struct commit *one, > -int n, > -struct commit **twos); > -struct commit_list *get_merge_bases(struct commit *one, struct commit *two); > -struct commit_list *get_octopus_merge_bases(struct commit_list *in); > - > -/* To be used only when object flags after this call no longer matter */ > -struct commit_list *get_merge_bases_many_dirty(struct commit *one, int n, > struct commit **twos); > - > -int is_descendant_of(struct commit *commit, struct commit_list *with_commit); > -int in_merge_bases_many(struct commit *commit, int nr_reference, struct > commit **reference); > -int in_merge_bases(struct commit *commit, struct commit *reference); > - > +int ref_newer(const struct object_id *new_oid, const struct object_id > *old_oid); > > /* > - * Takes a list of commits and returns a new list where those > - * have been removed that can be reached from other commits in > - * the list. It is useful for, e.g., reducing the commits > - * randomly thrown at the git-merge command and removing > - * redundant commits that the user shouldn't have given to it. > - * > - * This function destroys the STALE bit of the commit objects' > - * flags. The above removal of lines is sloppy; they are mostly duplicates in commit.h that should never have been moved here in the first place, no? > + * Unknown has to be "0" here, because that's the default value for > + * contains_cache slab entries that have not yet been assigned. > */ > -struct commit_list *reduce_heads(struct commit_list *heads); > +enum contains_result { > + CONTAINS_UNKNOWN = 0, > + CONTAINS_NO, > + CONTAINS_YES > +}; Are these names specific enough, or were they OK in the limited context inside ref-filter but now are overly broad as globally visible names? I suspect it might be the latter.
Re: [RFC PATCH 07/13] test-reach
On Fri, Jun 29, 2018 at 9:13 AM Derrick Stolee wrote: > +# Construct a grid-like commit graph with points (x,y) > +# with 1 <= x <= 10, 1 <= y <= 10, where (x,y) has > +# parents (x-1, y) and (x, y-1), keeping in mind that > +# we drop a parent if a coordinate is nonpositive. > +# > +# (10,10) > +#/ \ > +# (10,9)(9,10) > +#/ \ / \ > +#(10,8)(9,9) (8,10) > +# / \/ \ /\ > +# ( continued...) > +# \ /\ / \/ > +#(3,1) (2,2) (1,3) > +#\ /\ / > +# (2,1) (2,1) > +# \/ > +# (1,1) > +# > +# We use branch 'comit-x-y' to refer to (x,y). > +# This grid allows interesting reachability and > +# non-reachability queries: (x,y) can reach (x',y') > +# if and only if x' <= x and y' <= y. This is an interesting DAG, though not very common in reality (81 merges, 18 single parent commits, one root with a depth of at most 10). It reminds me of FELINE as that also has 2 independent numbers :) I guess it is easy to make up artificial test cases for that, but I wonder if we want more variety for a generic reachability test tool (e.g. long chains of parallel histories, octopus, disconnected parts of the graph) > +test_expect_success 'setup' ' [...] > + git commit-graph write --reachable Is this only to test the full commit-graph?, maybe we'd want to write out the commit graph at (5,10) or so, such that half the walking is tested without graph. What about author/commit dates?
Re: [RFC PATCH 01/13] commit-reach: move walk methods from commit.c
Derrick Stolee writes: > Signed-off-by: Derrick Stolee > --- > Makefile | 1 + > commit-reach.c | 359 + > commit-reach.h | 41 ++ > commit.c | 358 > 4 files changed, 401 insertions(+), 358 deletions(-) > create mode 100644 commit-reach.c "blame -bs -C HEAD^.." tells us that all lines except for the initial #include of this file came from commit.c verbatim, which is good for a "move lines" patch like this one. > create mode 100644 commit-reach.h "blame -bs -C -C HEAD^.." tells us, but "blame -bs -C HEAD^.." does not tell us, that most of the lines are from commit.h, which is a very bad sign. reduce_heads(), for example, have two duplicated decl and depending on which header file is included, the source files will risk getting inconsistent definition once the headers start to deviate from each other.
Re: [RFC PATCH 06/13] commit-reach: move can_all_from_reach_with_flag()
Hi Derrick, > +static int reachable(struct commit *from, int with_flag, int assign_flag, > time_t min_commit_date) [...] > + if (commit->date < min_commit_date) > + continue; [...] > +int can_all_from_reach_with_flag(struct object_array from, > +int with_flag, int assign_flag, > +time_t min_commit_date) Can you expand on why we introduce the min_commit_date in the commit message? (It looks like a rebase error as I would have expected a move only, given the subject) Thanks, Stefan
Re: [RFC PATCH 05/13] upload-pack: refactor ok_to_give_up()
Hi Derrick, > -static int ok_to_give_up(void) > +static int can_all_from_reach_with_flag(struct object_array from, This method is hard to read; at first I thought you missed a word, but then I realized that it asks "can all 'from' members reach ['something'] and we pass in the 'flag', so maybe all_reachable_flags or something? > /* no way to tell if this is reachable by While at it, you may want to fix the comment here.
Re: [RFC PATCH 03/13] commit-reach: move commit_contains from ref-filter
Hi Derrick, > +struct ref_filter_cbdata { > + struct ref_array *array; > + struct ref_filter *filter; > + struct contains_cache contains_cache; > + struct contains_cache no_contains_cache; These members also seem to be moved whitespace-inconsistently. Could you clarify in the commit message that you re-indent them or what happened?
Re: [RFC PATCH 01/13] commit-reach: move walk methods from commit.c
Hi Derrick, > +/* Remember to update object flag allocation in object.h */ > +#define PARENT1 (1u<<16) > +#define PARENT2 (1u<<17) > +#define STALE (1u<<18) > +#define RESULT (1u<<19) Something is up with whitespaces here, apart from that this patch looks good.
Re: [PATCH] xdiff: reduce indent heuristic overhead
Stefan Beller writes: > This patch was written originally for mercurial at > https://phab.mercurial-scm.org/rHGc420792217c89622482005c99e959b9071c109c5 > > changeset: 36674:c420792217c8 > user:Jun Wu > date:Sat Mar 03 12:39:11 2018 -0800 > files: mercurial/thirdparty/xdiff/xdiffi.c > description: > xdiff: reduce indent heuristic overhead ... This should come with in-body header to credit the original author as the author, I think.
Re: What's cooking in git.git (Jun 2018, #07; Thu, 28)
On Thu, Jun 28 2018, Junio C Hamano wrote: > The tip of 'next' has been rewound and it currently has only 4 > topics. Quite a many topics are cooking in 'pu' and need to be > sifted into good bins (for 'next') and the remainder. Help is > appreciated in that area ;-) Per my https://public-inbox.org/git/cacbzzx4yg5h5kk4nfqz_nzawema+nh3h-39ohtch4xwsa6f...@mail.gmail.com/ (seems to not have been seen) I'd like to suggest that: [...] > * ab/fetch-tags-noclobber (2018-05-16) 9 commits This be ejected for now. [...] > * ab/checkout-default-remote (2018-06-11) 8 commits And this be merged down to "next".
Re: fast-import slowness when importing large files with small differences
+cc Jun Wu, original author of these patches On Fri, Jun 29, 2018 at 1:39 PM Jeff King wrote: > > Interesting pieces regarding performance: > > > > c420792217c8 xdiff: reduce indent heuristic overhead > > https://phab.mercurial-scm.org/rHGc420792217c89622482005c99e959b9071c109c5 Going by the mailing list, the first patch was not brought over yet, so sending it here was warranted. Jun, you may want to take ownership of https://public-inbox.org/git/20180629202811.131265-1-sbel...@google.com/ as I merely resend it to the git mailing list? If not, that is fine, too. > > > > f33a87cf60cc xdiff: add a preprocessing step that trims files > > https://phab.mercurial-scm.org/rHGf33a87cf60ccb8b46e06b85e60bc5031420707d6 > > > > I'll see if I can make that into patches. > > Apparently the second one is not so trivial as you might hope; see > https://public-inbox.org/git/1520337165-sup-4504@x1c/. Thanks so much, this saves me further effort to dig there. So I'll just stop porting this. Thanks, Stefan
Re: fast-import slowness when importing large files with small differences
On Fri, Jun 29, 2018 at 01:14:52PM -0700, Stefan Beller wrote: > Interesting pieces regarding performance: > > c420792217c8 xdiff: reduce indent heuristic overhead > https://phab.mercurial-scm.org/rHGc420792217c89622482005c99e959b9071c109c5 > > f33a87cf60cc xdiff: add a preprocessing step that trims files > https://phab.mercurial-scm.org/rHGf33a87cf60ccb8b46e06b85e60bc5031420707d6 > > I'll see if I can make that into patches. Apparently the second one is not so trivial as you might hope; see https://public-inbox.org/git/1520337165-sup-4504@x1c/. -Peff
[PATCH] xdiff: reduce indent heuristic overhead
This patch was written originally for mercurial at https://phab.mercurial-scm.org/rHGc420792217c89622482005c99e959b9071c109c5 changeset: 36674:c420792217c8 user:Jun Wu date:Sat Mar 03 12:39:11 2018 -0800 files: mercurial/thirdparty/xdiff/xdiffi.c description: xdiff: reduce indent heuristic overhead Adds some threshold to avoid expensive cases, like: ``` #!python open('a', 'w').write(" \n" * 100) open('b', 'w').write(" \n" * 101) ``` The indent heuristic is O(N * 20) (N = 100) for the above case, and makes diff much slower. Before this patch (system git 2.14.2): ``` git diff --no-indent-heuristic a b 0.21s user 0.03s system 100% cpu 0.239 total git diff --indent-heuristic a b 0.77s user 0.02s system 99% cpu 0.785 total ``` After this patch (git 2fc74f41, with xdiffi.c patched): ``` # with the changed xdiffi.c git diff --indent-heuristic a b 0.16s user 0.01s system 90% cpu 0.188 total git diff --no-indent-heuristic a b 0.18s user 0.01s system 99% cpu 0.192 total ``` Now turning on indent-heuristic has no visible impact on performance. Differential Revision: https://phab.mercurial-scm.org/D2601 Signed-off-by: Stefan Beller --- This applies on our master branch, I have not thought of a good commit message or if we need to test it. Thanks, Stefan xdiff/xdiffi.c | 38 +++--- 1 file changed, 35 insertions(+), 3 deletions(-) diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c index 0de1ef463bf..c74ec77da58 100644 --- a/xdiff/xdiffi.c +++ b/xdiff/xdiffi.c @@ -807,6 +807,14 @@ static void xdl_bug(const char *msg) exit(1); } +/* + * For indentation heuristic, skip searching for better slide position after + * checking MAX_BORING lines without finding an improvement. This defends the + * indentation heuristic logic against pathological cases. The value is not + * picked scientifically but should be good enough. + */ +#define MAX_BORING 100 + /* * Move back and forward change groups for a consistent and pretty diff output. * This also helps in finding joinable change groups and reducing the diff @@ -903,19 +911,43 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) { long shift, best_shift = -1; struct split_score best_score; - for (shift = earliest_end; shift <= g.end; shift++) { + /* +* This is O(N * MAX_BLANKS) (N = shift-able lines). +* Even with MAX_BLANKS bounded to a small value, a +* large N could still make this loop take several +* times longer than the main diff algorithm. The +* "boring" value is to help cut down N to something +* like (MAX_BORING + groupsize). +* +* Scan from bottom to top. So we can exit the loop +* without compromising the assumption "for a same best +* score, pick the bottommost shift". +*/ + int boring = 0; + for (shift = g.end; shift >= earliest_end; shift--) { struct split_measurement m; struct split_score score = {0, 0}; + int cmp; measure_split(xdf, shift, &m); score_add_split(&m, &score); measure_split(xdf, shift - groupsize, &m); score_add_split(&m, &score); - if (best_shift == -1 || - score_cmp(&score, &best_score) <= 0) { + + if (best_shift == -1) { + cmp = -1; + } else { + cmp = score_cmp(&score, &best_score); + } + if (cmp < 0) { + boring = 0; best_score.effective_indent = score.effective_indent; best_score.penalty = score.penalty; best_shift = shift; + } else { + boring += 1; + if (boring >= MAX_BORING) + break; } } -- 2.18.0.399.gad0ab374a1-goog
Re: fast-import slowness when importing large files with small differences
On Fri, Jun 29, 2018 at 3:18 AM Mike Hommey wrote: > > Hi, > > I noticed some slowness when fast-importing data from the Firefox mercurial > repository, where fast-import spends more than 5 minutes importing ~2000 > revisions of one particular file. I reduced a testcase while still > using real data. One could synthesize data with kind of the same > properties, but I figured real data could be useful. I cc'd Jameson, who refactored memory allocation in fast-import recently. (I am not aware of other refactorings in the area of fast-import) > To reproduce: [...] > Memory total: 2282 KiB >pools: 2048 KiB > objects: 234 KiB > [...] > Obviously, sha1'ing 26GB is not going to be free, but it's also not the > dominating cost, according to perf: > > 63.52% git-fast-import git-fast-import [.] create_delta_index So this doesn't sound like a memory issue, but a diffing/deltaing issue. > So maybe it would make sense to consolidate the diff code (after all, > diff-delta.c is an old specialized fork of xdiff). With manual trimming > of common head and tail, this gets down to 3:33. This sounds interesting. I'd love to see that code to be unified. > I'll also note that Facebook has imported xdiff from the git code base > into mercurial and improved performance on it, so it might also be worth > looking at what's worth taking from there. So starting with https://www.mercurial-scm.org/repo/hg/rev/34e2ff1f9cd8 ("xdiff: vendor xdiff library from git") they adapted it slightly: $ hg log --template '{node|short} {desc|firstline}\n' -- mercurial/thirdparty/xdiff/ a2baa61bbb14 xdiff: move stdint.h to xdiff.h d40b9e29c114 xdiff: fix a hard crash on Windows 651c80720eed xdiff: silence a 32-bit shift warning on Windows d255744de97a xdiff: backport int64_t and uint64_t types to Windows e5b14f5b8b94 xdiff: resolve signed unsigned comparison warning f1ef0e53e628 xdiff: use int64 for hash table size f0d9811dda8e xdiff: remove unused xpp and xecfg parameters 49fe6249937a xdiff: remove unused flags parameter 882657a9f768 xdiff: replace {unsigned ,}long with {u,}int64_t 0c7350656f93 xdiff: add comments for fields in xdfile_t f33a87cf60cc xdiff: add a preprocessing step that trims files 3cf40112efb7 xdiff: remove xmerge related logic 90f8fe72446c xdiff: remove xemit related logic b5bb0f99064d xdiff: remove unused structure, functions, and constants 09f320067591 xdiff: remove whitespace related feature 1f9bbd1d6b8a xdiff: fix builds on Windows c420792217c8 xdiff: reduce indent heuristic overhead b3c9c483cac9 xdiff: add a bdiff hunk mode 9e7b14caf67f xdiff: remove patience and histogram diff algorithms 34e2ff1f9cd8 xdiff: vendor xdiff library from git Interesting pieces regarding performance: c420792217c8 xdiff: reduce indent heuristic overhead https://phab.mercurial-scm.org/rHGc420792217c89622482005c99e959b9071c109c5 f33a87cf60cc xdiff: add a preprocessing step that trims files https://phab.mercurial-scm.org/rHGf33a87cf60ccb8b46e06b85e60bc5031420707d6 I'll see if I can make that into patches. Thanks, Stefan
Re: What's cooking in git.git (Jun 2018, #07; Thu, 28)
Stefan Beller writes: >> * jt/remove-pack-bitmap-global (2018-06-21) 2 commits >> - pack-bitmap: add free function >> - pack-bitmap: remove bitmap_git global variable >> >> The effort to move globals to per-repository in-core structure >> continues. > > This is mostly done, though Peff seems to expect a reroll with > clarification on how the series is structured? > https://public-inbox.org/git/20180611211033.gb26...@sigill.intra.peff.net/ That one has been resolved by squashing the updated log message in, I think, so we should be able to merge it down. >> * sb/submodule-move-head-error-msg (2018-06-25) 1 commit >> - submodule.c: report the submodule that an error occurs in >> >> Needs a reroll. >> cf. <20180622081713.5360-1-szeder@gmail.com> > > https://public-inbox.org/git/xmqqmuviq2n7@gitster-ct.c.googlers.com/ > > suggests that you applied that change and a reroll would not be needed. Yup, I forgot about that one. Thanks. > It is easy to quantify how often we are bitten by code churn > (that you call useless here); and very hard to quantify bugs By definition, churn is useless. Useful ones are refactoring ;-). And when you do want to operate on _the_ single in-core instance, not having to say &the_index in the argument and use $foo_cache() function does *not* become a source of "bugs". It just saves typing, and turning it to $foo_index(&the_index,...) does *not* make it less error prone. >> * sb/diff-color-move-more (2018-06-25) 11 commits >> - diff: fix a sparse 'dubious one-bit signed bitfield' error >> - SQUASH??? t/4015 GETTEXT_POISON emergency fix >> - SQUASH? Documentation breakage emergency fix > [...] >> >> "git diff --color-moved" feature has further been tweaked. >> >> Needs to be cleaned-up with various fix-up bits applied inline. > > I'll resend with those squashes and another (test-)fix SZEDER > mentioned soon. The interdiff of the topic looked alright. Thanks.
Re: [PATCH 3/3] .mailmap: map names with multiple emails to the same author identity
Stefan Beller writes: >> It may be even better if we can arraange the author of the patch to >> be the one who is involved, with "Helped-by: Stefan". > > ok, I'll think how I can help but not write the code. I do not think there is any code required. All it takes is a volunteer coordinater who sends the patch with in-body From: header pointing at the person whose name and address is getting unified, after getting an OK for doing so from the person involved.
Re: [PATCH v3 00/32] object-store: lookup_commit
Stefan Beller writes: > On Fri, Jun 29, 2018 at 11:03 AM Junio C Hamano wrote: >> >> Junio C Hamano writes: >> >> > One technique these (not just this) recent efforts seem to be >> > forgetting is to introduce "new" names that take a_repo and then >> > make the existing one a thin wrapper that calls the new one with >> > &the_repo as the argument. > > So you'd rather want to see it less invasive done similar to > NO_THE_INDEX_COMPATIBILITY_MACROS ? Someone (jrnieder?) > called that a failed experiment, as now we need to carry that baggage > for quite some time and never cleaned up the started migration; > only recently Duy started to kill off the_index, which would finish > that migration? I do not think it was a failed experiment at all. In fact, most of our code is still happy with the single in-core instance of the_index, and I think it is a mistake to trying to use the variant that take an istate instance as parameter just for the sake of it. IOW, if there is a good concrete reason why it helps to teach the set of functions involved in a callchain to operate on an in-core instance of the index_state, passing an istate through the callchain is a very welcome change. If you do not do so, and then just replace $foo_cache(...) with $foo_index(&the_index, ...), that is an irritating and useless code churn that harms other topics in flight.
send-email: change the default value of sendmail.validate
The purpose of this configuration option is to prevent your emails from blowing up on SMTP servers (rather than Extended SMTP servers). However, I find it often confuses people whose patches are otherwise correct, and they don't know how to solve the issue. I haven't seen an SMTP server in a very long time which doesn't support extended SMTP. The default behavior should probably change. If not, the error message should be more clear about action items to address the issue. I'll send a patch around to change this shortly. -- Drew DeVault
Re: [PATCH 3/3] .mailmap: map names with multiple emails to the same author identity
On Fri, Jun 29, 2018 at 10:42 AM Junio C Hamano wrote: > > Stefan Beller writes: > > > There are multiple author idents who have different email addresses, but > > the same name; assume they are the same person, as the world of open source > > is actually rather small. > > Thanks for an interesting experiment. As with 2/3, I suspect that > most of the contents in the patch may be correct, but I'd rather see > these confirmed by those whose names appear in the patch. > > IOW, I would not feel comfortable applying a patch, unless it looks > like this (just taking a single person as a random example), > and I do not mind many such individual patches: right, I did not quite expect this patch to be applied, though it looked like such a sweet shortcut. I might just email all of them and ask for them to update their .mailmap entries. Thanks, Stefan > Subject: .mailmap: unify the same Ben Peart cc'd Ben, so we could start here with a patch. :) > These multiple author identities in our history are actually the > same person. Map them to the latest address. > > Signed-off-by: Stefan Beller > Acked-by: Ben Peart > --- > > diff --git a/.mailmap b/.mailmap > index ff96ef7401f..2607846582a 100644 > --- a/.mailmap > +++ b/.mailmap > @@ -5,54 +5,86 @@ > Amos Waterland > Ben Walton > Benoit Sigoure > +Ben Peart Ben Peart > +Ben Peart Ben Peart > Bernt Hansen > Brandon Casey > > It may be even better if we can arraange the author of the patch to > be the one who is involved, with "Helped-by: Stefan". ok, I'll think how I can help but not write the code. Thanks, Stefan
Re: [PATCH v3 00/32] object-store: lookup_commit
On Fri, Jun 29, 2018 at 11:03 AM Junio C Hamano wrote: > > Junio C Hamano writes: > > > One technique these (not just this) recent efforts seem to be > > forgetting is to introduce "new" names that take a_repo and then > > make the existing one a thin wrapper that calls the new one with > > &the_repo as the argument. So you'd rather want to see it less invasive done similar to NO_THE_INDEX_COMPATIBILITY_MACROS ? Someone (jrnieder?) called that a failed experiment, as now we need to carry that baggage for quite some time and never cleaned up the started migration; only recently Duy started to kill off the_index, which would finish that migration? > FWIW, here is how I am resolving semantic conflicts that I found so > far while merging this topic to 'pu', which is stored in the > merge-fix/ mechanism so that I can reuse it while rebuilding 'pu'. > > -- >8 -- > Subject: [PATCH] merge-fix/sb/object-store-lookup > > --- > builtin/bisect--helper.c | 2 +- > builtin/branch-diff.c| 2 +- > negotiator/default.c | 3 ++- These look as if we can just use the_repository (as you did below). > commit-graph.c | 4 ++-- That is what I want to tackle next, and apparently you already did by using the repository *r in there. Thanks, Stefan
Re: [GSoC][PATCH v5 0/3] rebase -i: rewrite reflog operations in C
Junio C Hamano writes: > Let's aggregate these topics into a single topic, and perhaps call > it ag/rebase-i-in-c or something like that. Pretending as if they > are separately replaceable does not make much sense, as you are not > rerolling the earlier one and keep going forward with producing more > parts that depends on the parts that have been submitted earlier. So here is what I tentatively did. $ git log --oneline --reverse master..ag/rebase-i-in-c 4d303fb608 rebase--interactive: rewrite append_todo_help() in C b4ffe143a9 editor: add a function to launch the sequence editor 4ebe39cef9 rebase--interactive: rewrite the edit-todo functionality in C 0ff6bf7646 sequencer: add a new function to silence a command, except if it fails. 36784b351f rebase -i: rewrite setup_reflog_action() in C 415cac57ee rebase -i: rewrite checkout_onto() in C In several hours please fetch from me and look for "Merge branch 'ag/rebase-i-in-c' to pu" to see how they exactly look like; some of the patches might not be the latest ones, in which case you may need to prod me to get them replaced (resending them as a whole with incremented v$n header is probably the easiest if we need to do so). Thanks.
Re: [PATCH v3 00/32] object-store: lookup_commit
Junio C Hamano writes: > One technique these (not just this) recent efforts seem to be > forgetting is to introduce "new" names that take a_repo and then > make the existing one a thin wrapper that calls the new one with > &the_repo as the argument. FWIW, here is how I am resolving semantic conflicts that I found so far while merging this topic to 'pu', which is stored in the merge-fix/ mechanism so that I can reuse it while rebuilding 'pu'. -- >8 -- Subject: [PATCH] merge-fix/sb/object-store-lookup --- builtin/bisect--helper.c | 2 +- builtin/branch-diff.c| 2 +- commit-graph.c | 4 ++-- negotiator/default.c | 3 ++- 4 files changed, 6 insertions(+), 5 deletions(-) diff --git a/builtin/bisect--helper.c b/builtin/bisect--helper.c index fc02f889e6..b27f645744 100644 --- a/builtin/bisect--helper.c +++ b/builtin/bisect--helper.c @@ -231,7 +231,7 @@ static int bisect_write(const char *state, const char *rev, goto fail; } - commit = lookup_commit_reference(&oid); + commit = lookup_commit_reference(the_repository, &oid); log_commit(fp, "%s", state, commit); if (!nolog) diff --git a/builtin/branch-diff.c b/builtin/branch-diff.c index 8a16352e3a..b8604e3fec 100644 --- a/builtin/branch-diff.c +++ b/builtin/branch-diff.c @@ -329,7 +329,7 @@ static void output_pair_header(struct diff_options *diffopt, struct strbuf *buf, strbuf_addf(buf, " %d: %s", j + 1, find_unique_abbrev(&b_util->oid, DEFAULT_ABBREV)); - commit = lookup_commit_reference(oid); + commit = lookup_commit_reference(the_repository, oid); if (commit) { const char *commit_buffer = get_commit_buffer(commit, NULL); const char *subject; diff --git a/commit-graph.c b/commit-graph.c index e4dee03679..41a0133ff7 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -924,7 +924,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g) cur_fanout_pos++; } - graph_commit = lookup_commit(&cur_oid); + graph_commit = lookup_commit(r, &cur_oid); if (!parse_commit_in_graph_one(g, graph_commit)) graph_report("failed to parse %s from commit-graph", oid_to_hex(&cur_oid)); @@ -950,7 +950,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g) hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i); - graph_commit = lookup_commit(&cur_oid); + graph_commit = lookup_commit(r, &cur_oid); odb_commit = (struct commit *)create_object(r, cur_oid.hash, alloc_commit_node(r)); if (parse_commit_internal(odb_commit, 0, 0)) { graph_report("failed to parse %s from object database", diff --git a/negotiator/default.c b/negotiator/default.c index 382fc77722..d8c92281bb 100644 --- a/negotiator/default.c +++ b/negotiator/default.c @@ -38,7 +38,8 @@ static void rev_list_push(struct negotiation_state *ns, static int clear_marks(const char *refname, const struct object_id *oid, int flag, void *cb_data) { - struct object *o = deref_tag(parse_object(oid), refname, 0); + struct object *o = deref_tag(the_repository, +parse_object(the_repository, oid), refname, 0); if (o && o->type == OBJ_COMMIT) clear_commit_marks((struct commit *)o, -- 2.18.0-129-ge3331758f1
Re: [PATCH v3 00/32] object-store: lookup_commit
Stefan Beller writes: > This continues the elimination of global variables in the object store and > teaches lookup_commit[_reference] and alike to handle a_repository. > > This is also available as > https://github.com/stefanbeller/git/tree/object-store-lookup-commit > or applies on top of 02f70d63027 (Merge branch 'sb/object-store-grafts' > into next, 2018-06-28). > > Picking a base for this one is really hard, as nearly all series currently > cooking or in flight collide with it on one or two lines. (lookup_* is used > heavily, who would have thought?) One technique these (not just this) recent efforts seem to be forgetting is to introduce "new" names that take a_repo and then make the existing one a thin wrapper that calls the new one with &the_repo as the argument. IOW, a new call to lookup_commit_reference(oid) in a function that is happy to access the default repo (and notice taht 99% of our code is happy with using just the currrent repository the user started the git process for) does not need to be updated to call the function with the same name with a different function signature lookup_commit_reference(the_repo, oid). If we name the new one lookup_commit_reference_in_repo(r, oid) and make the callers that already take a repo pointer call it, while leaving the callers that do not even know or care about multiple in-core repo instances alone to keep calling lookup_commit_reference(oid), you do not have to worry too much about colliding. That way, you do not have to step on each others' toes with avoidable semantic merge conflicts and incrementally improve things over time, no? Thanks.
Re: [PATCH 3/3] .mailmap: map names with multiple emails to the same author identity
Stefan Beller writes: > There are multiple author idents who have different email addresses, but > the same name; assume they are the same person, as the world of open source > is actually rather small. Thanks for an interesting experiment. As with 2/3, I suspect that most of the contents in the patch may be correct, but I'd rather see these confirmed by those whose names appear in the patch. IOW, I would not feel comfortable applying a patch, unless it looks like this (just taking a single person as a random example), and I do not mind many such individual patches: Subject: .mailmap: unify the same Ben Peart These multiple author identities in our history are actually the same person. Map them to the latest address. Signed-off-by: Stefan Beller Acked-by: Ben Peart --- diff --git a/.mailmap b/.mailmap index ff96ef7401f..2607846582a 100644 --- a/.mailmap +++ b/.mailmap @@ -5,54 +5,86 @@ Amos Waterland Ben Walton Benoit Sigoure +Ben Peart Ben Peart +Ben Peart Ben Peart Bernt Hansen Brandon Casey It may be even better if we can arraange the author of the patch to be the one who is involved, with "Helped-by: Stefan". Thanks.
Re: [PATCH 1/3] .mailmap: merge different spellings of names
Stefan Beller writes: > This is a continuation of 94b410bba86 (.mailmap: Map email > addresses to names, 2013-07-12), merging names that are > spelled differently but have the same author email to the > same person. > > Most spellings differed in accents or the order of names. Thanks. This one is sensible.
Re: [RFC PATCH 00/13] Consolidate reachability logic
This RFC is available as a GitHub pull request [1]. Thanks, -Stolee [1] https://github.com/derrickstolee/git/pull/8
Re: [GSoC][PATCH v5 0/3] rebase -i: rewrite reflog operations in C
Alban Gruin writes: > This patch series rewrites the reflog operations from shell to C. This > is part of the effort to rewrite interactive rebase in C. > > The first commit is dedicated to creating a function to silence a > command, as the sequencer will do in several places with these patches. > > This branch is based on ag/rebase-i-rewrite-todo, and does not conflict > with pu (as of 2018-06-29). Let's aggregate these topics into a single topic, and perhaps call it ag/rebase-i-in-c or something like that. Pretending as if they are separately replaceable does not make much sense, as you are not rerolling the earlier one and keep going forward with producing more parts that depends on the parts that have been submitted earlier. These three patches looked more-or-less good; I may have spotted a few nits and suggested improvements, though. Thanks.
Re: [GSoC][PATCH v5 2/3] rebase -i: rewrite setup_reflog_action() in C
Alban Gruin writes: > + git rebase--helper --prepare-branch "$switch_to" ${verbose:+--verbose} > init_basic_state > > init_revisions_and_shortrevisions > diff --git a/sequencer.c b/sequencer.c > index d9545b366..dd0cf3cb5 100644 > --- a/sequencer.c > +++ b/sequencer.c > @@ -3134,6 +3134,37 @@ static const char *reflog_message(struct replay_opts > *opts, > return buf.buf; > } > ... > +int prepare_branch_to_be_rebased(struct replay_opts *opts, const char > *commit, > + int verbose) > +{ > + const char *action; > + > + if (commit && *commit) { The sloppyness of this callee is forced by the sloppy caller, which does not check if "$switch_to" has any value before calling "prepare branch to be rebased" function. A less sloppy caller would check to see if we have the optional "before doing anything, check out this branch, as it is the branch we are trying to rebase" argument, and refrain from calling this function, so there is no need for this "if commit is given and is not an empty string" check. And hopefully, when GSoC is over, the caller, that is still in shell at this step, would also be rewritten in C and by that time the caller will become less sloppy, removing the need for this check. Hence, I think the reason why the check is still here, and our desire to eventually remove the check, should be documented with an in-code comment around here (the usual "NEEDSWORK:" comment is fine).
Re: [PATCH 2/3] .mailmap: assume Jason McMullan to be the same person
Stefan Beller writes: > Over the years of contributing to open source, I realized the world of > open source is smaller than I originally thought and a name is still > a pretty unique thing. So let's assume these two author idents are the > same person. > > In 10813e0d3c7 (.mailmap: update long-lost friends with multiple defunct > addresses, 2013-08-12) we learned both their email bounce, so asking is > no option. True, but leaving them as-is is probably a better option in such a case. Until we get new contribution from the same person, at which time we have the chance we've been awaiting to ask the question, it is not harming anybody to keep these two names as if they are two different people. > Signed-off-by: Stefan Beller > --- > .mailmap | 5 + > 1 file changed, 1 insertion(+), 4 deletions(-) > > diff --git a/.mailmap b/.mailmap > index f165222a782..ff96ef7401f 100644 > --- a/.mailmap > +++ b/.mailmap > @@ -82,10 +82,7 @@ J. Bruce Fields > > J. Bruce Fields > Jakub Narębski > James Y Knight > -# The 2 following authors are probably the same person, > -# but both emails bounce. > -Jason McMullan > -Jason McMullan > +Jason McMullan Jason McMullan > > Jason Riedy > Jason Riedy > Jay Soffian
Re: [PATCH v3] fetch-pack: support negotiation tip whitelist
Jonathan Tan writes: >> fetch is a perfect example of supporting all three. I can do >> >> git fetch origin SHA1 >> git fetch origin master >> git fetch origin refs/heads/*:refs/heads/* > > OK, Brandon managed to convince me that this is fine. I've included glob > support, supporting the same globs that git notes supports. "git notes"??? As this is to be used in the context of "git fetch", using glob e.g. "refs/heads/*", is sensible and good enough. I was actually wondering if we want the head-match refs/heads/, but as "git fetch origin refs/heads/" does not work that way, I think we shouldn't. This is a tangent, but didn't ref-in-want wanted to use head-match refs/heads/ to match everything under refs/heads/? If the latest incarnation wants to do so, we may want to fix that. > diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt > index 97d3217df..6e4db1738 100644 > --- a/Documentation/fetch-options.txt > +++ b/Documentation/fetch-options.txt > @@ -42,6 +42,22 @@ the current repository has the same history as the source > repository. > .git/shallow. This option updates .git/shallow and accept such > refs. > > +--negotiation-tip=:: > + By default, Git will report, to the server, commits reachable > + from all local refs to find common commits in an attempt to > + reduce the size of the to-be-received packfile. If specified, > + Git will only report commits reachable from the given tips. > + This is useful to speed up fetches when the user knows which > + local ref is likely to have commits in common with the > + upstream ref being fetched. > ++ > +This option may be specified more than once; if so, Git will report > +commits reachable from any of the given commits. > ++ > +The argument to this option may be a glob on ref names, a ref, or the > (possibly > +abbreviated SHA-1 of a commit. Specifying a glob is equivalent to specifying > +this option multiple times, one for each matching ref name. > + > +static int add_oid(const char *refname, const struct object_id *oid, int > flags, > +void *cb_data) > +{ > + struct oid_array *oids = cb_data; > + oid_array_append(oids, oid); > + return 0; > +} This by itself is not worth a reason to reroll, but please make it a habit to have a blank line after the run of decls before the first statement, at least while we still forbid decl-after-stmt. The result is easier to read that way. > +static void add_negotiation_tips(struct git_transport_options *smart_options) > +{ > + struct oid_array *oids = xcalloc(1, sizeof(*oids)); > + int i; > + for (i = 0; i < negotiation_tip.nr; i++) { > + const char *s = negotiation_tip.items[i].string; > + int old_nr; > + if (!has_glob_specials(s)) { > + struct object_id oid; > + if (get_oid(s, &oid)) > + die("%s is not a valid object", s); > + oid_array_append(oids, &oid); > + continue; > + } > + old_nr = oids->nr; > + for_each_glob_ref(add_oid, s, oids); > + if (old_nr == oids->nr) > + warning("Ignoring --negotiation-tip=%s because it does > not match any refs", > + s); > + } > + smart_options->negotiation_tips = oids; > +} This may insert duplicate object ids if two refs point at the same object, or nego globs match the same ref twice, but it is OK to have duplicate object ids in the resulting oids array is OK, because rev_list_insert_ref() at the end will dedup them anyway. > diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh > index e402aee6a..8532a6faf 100755 > --- a/t/t5510-fetch.sh > +++ b/t/t5510-fetch.sh > @@ -865,4 +865,82 @@ test_expect_success C_LOCALE_OUTPUT 'fetch compact > output' ' > test_cmp expect actual > ' > > +setup_negotiation_tip () { > + SERVER="$1" > + URL="$2" > + USE_PROTOCOL_V2="$3" > + > + rm -rf "$SERVER" client trace && > + git init "$SERVER" && > + test_commit -C "$SERVER" alpha_1 && > + test_commit -C "$SERVER" alpha_2 && > + git -C "$SERVER" checkout --orphan beta && > + test_commit -C "$SERVER" beta_1 && > + test_commit -C "$SERVER" beta_2 && > + > + git clone "$URL" client && > + > + if [ "$USE_PROTOCOL_V2" -eq 1 ] Style: "if test ..." > + then > + git -C "$SERVER" config protocol.version 2 broken &&-chaining? > + git -C client config protocol.version 2 > + fi && > + > + test_commit -C "$SERVER" beta_s && > + git -C "$SERVER" checkout master && > + test_commit -C "$SERVER" alpha_s && > + git -C "$SERVER" tag -d alpha_1 alpha_2 beta_1 beta_2 > +}
[RFC PATCH 07/13] test-reach
--- Makefile | 1 + t/helper/test-reach.c | 123 ++ t/helper/test-tool.c | 1 + t/helper/test-tool.h | 1 + t/t6600-test-reach.sh | 81 5 files changed, 207 insertions(+) create mode 100644 t/helper/test-reach.c create mode 100755 t/t6600-test-reach.sh diff --git a/Makefile b/Makefile index 89ad873ce0..c2a2c6457e 100644 --- a/Makefile +++ b/Makefile @@ -716,6 +716,7 @@ TEST_BUILTINS_OBJS += test-mktemp.o TEST_BUILTINS_OBJS += test-online-cpus.o TEST_BUILTINS_OBJS += test-path-utils.o TEST_BUILTINS_OBJS += test-prio-queue.o +TEST_BUILTINS_OBJS += test-reach.o TEST_BUILTINS_OBJS += test-read-cache.o TEST_BUILTINS_OBJS += test-ref-store.o TEST_BUILTINS_OBJS += test-regex.o diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c new file mode 100644 index 00..d57660af45 --- /dev/null +++ b/t/helper/test-reach.c @@ -0,0 +1,123 @@ +#include "test-tool.h" +#include "cache.h" +#include "commit-reach.h" +#include "config.h" +#include "parse-options.h" +#include "tag.h" + +int cmd__reach(int ac, const char **av) +{ + struct object_id oid_A, oid_B; + struct commit *A, *B; + struct commit **X, **Y; + int nr_X, alloc_X, nr_Y, alloc_Y; + struct commit_list *list_X, *list_Y; + struct strbuf buf = STRBUF_INIT; + + setup_git_directory(); + get_git_config(default_git_config, 0); + + if (argc < 2) + exit(1); + + /* load input data */ + A = B = NULL; + list_X = list_Y = NULL; + nr_X = nr_Y = 0; + alloc_X = alloc_Y = 16; + ALLOC_ARRAY(X, alloc_X); + ALLOC_ARRAY(Y, alloc_Y); + + while (strbuf_getline(&buf, stdin) != EOF) { + struct object_id oid; + struct object *o; + struct commit *c; + if (buf.len < 3) + continue; + + if (get_oid_committish(buf.buf + 2, &oid)) + die("failed to resolve %s", buf.buf + 2); + + o = parse_object(&oid); + o = deref_tag_noverify(o); + + if (!o) + die("failed to load commit for input %s resulting in oid %s\n", + buf.buf, oid_to_hex(&oid)); + + c = object_as_type(o, OBJ_COMMIT, 0); + + if (!c) + die("failed to load commit for input %s resulting in oid %s\n", + buf.buf, oid_to_hex(&oid)); + + switch (buf.buf[0]) { + case 'A': + oidcpy(&oid_A, &oid); + A = c; + break; + + case 'B': + oidcpy(&oid_B, &oid); + B = c; + break; + + case 'X': + ALLOC_GROW(X, nr_X + 1, alloc_X); + X[nr_X++] = c; + commit_list_insert(c, &list_X); + break; + + case 'Y': + ALLOC_GROW(Y, nr_Y + 1, alloc_Y); + Y[nr_Y++] = c; + commit_list_insert(c, &list_Y); + break; + + default: + die("unexpected start of line: %c", buf.buf[0]); + } + } + strbuf_release(&buf); + + if (ac > 2 && !strcmp(av[2], "graph:off")) + core_commit_graph = 0; + if (ac > 2 && !strcmp(av[2], "graph:on")) + core_commit_graph = 1; + + if (!strcmp(av[1], "ref_newer")) + printf("%s:%d\n", av[1], ref_newer(&oid_A, &oid_B)); + else if (!strcmp(av[1], "in_merge_base")) + printf("%s:%d\n", av[1], in_merge_bases(A, B)); + else if (!strcmp(av[1], "get_merge_bases_many")) { + struct commit_list *list = get_merge_bases_many(A, nr_X, X); + printf("%s(A,X):\n", av[1]); + while (list) { + printf("%s\n", oid_to_hex(&list->item->object.oid)); + list = list->next; + } + + list = get_merge_bases_many(B, nr_Y, Y); + printf("%s(B,Y):\n", av[1]); + while (list) { + printf("%s\n", oid_to_hex(&list->item->object.oid)); + list = list->next; + } + } else if (!strcmp(av[1], "reduce_heads")) { + struct commit_list *list = reduce_heads(list_X); + printf("%s(X):\n", av[1]); + while (list) { + printf("%s\n", oid_to_hex(&list->item->object.oid)); + list = list->next; + } + + list = reduce_heads(list_Y); +
[RFC PATCH 08/13] test-reach: test reduce_heads()
Signed-off-by: Derrick Stolee --- t/t6600-test-reach.sh | 26 ++ 1 file changed, 26 insertions(+) diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh index c9337b6b46..0f60db9c60 100755 --- a/t/t6600-test-reach.sh +++ b/t/t6600-test-reach.sh @@ -78,4 +78,30 @@ test_expect_success 'ref_newer:miss' ' test_reach_two_modes "ref_newer" ' +test_expect_success 'reduce_heads' ' + cat >input <<- EOF && + X:commit-1-10 + X:commit-2-8 + X:commit-3-6 + X:commit-4-4 + X:commit-1-7 + X:commit-2-5 + X:commit-3-3 + X:commit-5-1 + Y:commit-10-10 + Y:commit-3-10 + Y:commit-9-9 + Y:commit-8-1 + EOF + printf "reduce_heads(X):\n" >expect && + git rev-parse commit-5-1 >>expect && + git rev-parse commit-4-4 >>expect && + git rev-parse commit-3-6 >>expect && + git rev-parse commit-2-8 >>expect && + git rev-parse commit-1-10 >>expect && + printf "reduce_heads(Y):\n" >>expect && + git rev-parse commit-10-10 >>expect && + test_reach_two_modes "reduce_heads" +' + test_done -- 2.18.0.118.gd4f65b8d14
[RFC PATCH 05/13] upload-pack: refactor ok_to_give_up()
In anticipation of consolidating all commit reachability algorithms, refactor ok_to_give_up() in order to allow splitting its logic into an external method. Signed-off-by: Derrick Stolee --- upload-pack.c | 28 +--- 1 file changed, 17 insertions(+), 11 deletions(-) diff --git a/upload-pack.c b/upload-pack.c index 95c56dc027..2f09cadbc0 100644 --- a/upload-pack.c +++ b/upload-pack.c @@ -365,34 +365,40 @@ static int reachable(struct commit *from, int with_flag, int assign_flag) return (from->object.flags & assign_flag); } -static int ok_to_give_up(void) +static int can_all_from_reach_with_flag(struct object_array from, + int with_flag, int assign_flag) { int i; - if (!have_obj.nr) - return 0; - - for (i = 0; i < want_obj.nr; i++) { - struct object *want = want_obj.objects[i].item; + for (i = 0; i < from.nr; i++) { + struct object *from_one = from.objects[i].item; - if (want->flags & COMMON_KNOWN) + if (from_one->flags & assign_flag) continue; - want = deref_tag(want, "a want line", 0); - if (!want || want->type != OBJ_COMMIT) { + from_one = deref_tag(from_one, "a from object", 0); + if (!from_one || from_one->type != OBJ_COMMIT) { /* no way to tell if this is reachable by * looking at the ancestry chain alone, so * leave a note to ourselves not to worry about * this object anymore. */ - want_obj.objects[i].item->flags |= COMMON_KNOWN; + from.objects[i].item->flags |= assign_flag; continue; } - if (!reachable((struct commit *)want, THEY_HAVE, COMMON_KNOWN)) + if (!reachable((struct commit *)from_one, with_flag, assign_flag)) return 0; } return 1; } +static int ok_to_give_up(void) +{ + if (!have_obj.nr) + return 0; + + return can_all_from_reach_with_flag(want_obj, THEY_HAVE, COMMON_KNOWN); +} + static int get_common_commits(void) { struct object_id oid; -- 2.18.0.118.gd4f65b8d14
[RFC PATCH 11/13] commit-reach: make can_all_from_reach... linear
The can_all_from_reach_with_flags() algorithm is currently quadratic in the worst case, because it calls the reachable() method for every 'from' without tracking which commits have already been walked or which can already reach a commit in 'to'. Rewrite the algorithm to walk each commit a constant number of times. We also add some optimizations that should work for the main consumer of this method: fetch negotitation (haves/wants). The first step includes using a depth-first-search (DFS) from each from commit, sorted by ascending generation number. We do not walk beyond the minimum generation number or the minimum commit date. This DFS is likely to be faster than the existing reachable() method because we expect previous ref values to be along the first-parent history. If we find a target commit, then we mark everything in the DFS stack as a RESULT. This expands the set of targets for the other from commits. We also mark the visited commits using 'assign_flag' to prevent re-walking the same code. We still need to clear our flags at the end, which is why we will have a total of three visits to each commit. Performance was measured on the Linux repository using 'test-tool reach can_all_from_reach'. The input included rows seeded by tag values. The "small" case included X-rows as v4.[0-9]* and Y-rows as v3.[0-9]*. This mimics a (very large) fetch that says "I have all major v3 releases and want all major v4 releases." The "large" case included X-rows as "v4.*" and Y-rows as "v3.*". This adds all release-candidate tags to the set, which does not greatly increase the number of objects that are considered, but does increase the number of 'from' commits, demonstrating the quadratic nature of the previous code. Small Case -- Before: 1.45 s After: 0.34 s Large Case -- Before: 5.83 s After: 0.37 s Note how the time increases between the two cases in the two versions. The new code increases relative to the number of commits that need to be walked, but not directly relative to the number of 'from' commits. Signed-off-by: Derrick Stolee --- commit-reach.c | 122 ++--- commit-reach.h | 3 +- upload-pack.c | 5 +- 3 files changed, 82 insertions(+), 48 deletions(-) diff --git a/commit-reach.c b/commit-reach.c index 992ad5cdc7..8e24455d9f 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -530,64 +530,88 @@ int commit_contains(struct ref_filter *filter, struct commit *commit, return is_descendant_of(commit, list); } -static int reachable(struct commit *from, int with_flag, int assign_flag, time_t min_commit_date) +static int compare_commits_by_gen(const void *_a, const void *_b) { - struct prio_queue work = { compare_commits_by_commit_date }; + const struct commit *a = (const struct commit *)_a; + const struct commit *b = (const struct commit *)_b; - prio_queue_put(&work, from); - while (work.nr) { - struct commit_list *list; - struct commit *commit = prio_queue_get(&work); - - if (commit->object.flags & with_flag) { - from->object.flags |= assign_flag; - break; - } - if (!commit->object.parsed) - parse_object(&commit->object.oid); - if (commit->object.flags & TMP_MARK) - continue; - commit->object.flags |= TMP_MARK; - if (commit->date < min_commit_date) - continue; - for (list = commit->parents; list; list = list->next) { - struct commit *parent = list->item; - if (!(parent->object.flags & TMP_MARK)) - prio_queue_put(&work, parent); - } - } - from->object.flags |= TMP_MARK; - clear_commit_marks(from, TMP_MARK); - clear_prio_queue(&work); - return (from->object.flags & assign_flag); + if (a->generation < b->generation) + return -1; + if (a->generation > b->generation) + return 1; + return 0; } int can_all_from_reach_with_flag(struct object_array *from, int with_flag, int assign_flag, -time_t min_commit_date) +time_t min_commit_date, +uint32_t min_generation) { + struct commit **list = NULL; int i; + int result = 1; + ALLOC_ARRAY(list, from->nr); for (i = 0; i < from->nr; i++) { - struct object *from_one = from->objects[i].item; + list[i] = (struct commit *)from->objects[i].item; - if (from_one->flags & assign_flag) - continue; - from_one = deref_tag(from_one, "a from object", 0); - if (!from_one || from_one->type != OBJ_COMMIT) { -
[RFC PATCH 03/13] commit-reach: move commit_contains from ref-filter
Signed-off-by: Derrick Stolee --- commit-reach.c | 119 +++ commit-reach.h | 44 +-- fast-import.c | 1 + ref-filter.c | 147 +++-- 4 files changed, 141 insertions(+), 170 deletions(-) diff --git a/commit-reach.c b/commit-reach.c index 80cdb738f6..6cfd7379ce 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -409,3 +409,122 @@ int ref_newer(const struct object_id *new_oid, const struct object_id *old_oid) unmark_and_free(used, TMP_MARK); return found; } + +/* + * Mimicking the real stack, this stack lives on the heap, avoiding stack + * overflows. + * + * At each recursion step, the stack items points to the commits whose + * ancestors are to be inspected. + */ +struct contains_stack { + int nr, alloc; + struct contains_stack_entry { + struct commit *commit; + struct commit_list *parents; + } *contains_stack; +}; + +static int in_commit_list(const struct commit_list *want, struct commit *c) +{ + for (; want; want = want->next) + if (!oidcmp(&want->item->object.oid, &c->object.oid)) + return 1; + return 0; +} + +/* + * Test whether the candidate is contained in the list. + * Do not recurse to find out, though, but return -1 if inconclusive. + */ +static enum contains_result contains_test(struct commit *candidate, + const struct commit_list *want, + struct contains_cache *cache, + uint32_t cutoff) +{ + enum contains_result *cached = contains_cache_at(cache, candidate); + + /* If we already have the answer cached, return that. */ + if (*cached) + return *cached; + + /* or are we it? */ + if (in_commit_list(want, candidate)) { + *cached = CONTAINS_YES; + return CONTAINS_YES; + } + + /* Otherwise, we don't know; prepare to recurse */ + parse_commit_or_die(candidate); + + if (candidate->generation < cutoff) + return CONTAINS_NO; + + return CONTAINS_UNKNOWN; +} + +static void push_to_contains_stack(struct commit *candidate, struct contains_stack *contains_stack) +{ + ALLOC_GROW(contains_stack->contains_stack, contains_stack->nr + 1, contains_stack->alloc); + contains_stack->contains_stack[contains_stack->nr].commit = candidate; + contains_stack->contains_stack[contains_stack->nr++].parents = candidate->parents; +} + +static enum contains_result contains_tag_algo(struct commit *candidate, + const struct commit_list *want, + struct contains_cache *cache) +{ + struct contains_stack contains_stack = { 0, 0, NULL }; + enum contains_result result; + uint32_t cutoff = GENERATION_NUMBER_INFINITY; + const struct commit_list *p; + + for (p = want; p; p = p->next) { + struct commit *c = p->item; + load_commit_graph_info(c); + if (c->generation < cutoff) + cutoff = c->generation; + } + + result = contains_test(candidate, want, cache, cutoff); + if (result != CONTAINS_UNKNOWN) + return result; + + push_to_contains_stack(candidate, &contains_stack); + while (contains_stack.nr) { + struct contains_stack_entry *entry = &contains_stack.contains_stack[contains_stack.nr - 1]; + struct commit *commit = entry->commit; + struct commit_list *parents = entry->parents; + + if (!parents) { + *contains_cache_at(cache, commit) = CONTAINS_NO; + contains_stack.nr--; + } + /* +* If we just popped the stack, parents->item has been marked, +* therefore contains_test will return a meaningful yes/no. +*/ + else switch (contains_test(parents->item, want, cache, cutoff)) { + case CONTAINS_YES: + *contains_cache_at(cache, commit) = CONTAINS_YES; + contains_stack.nr--; + break; + case CONTAINS_NO: + entry->parents = parents->next; + break; + case CONTAINS_UNKNOWN: + push_to_contains_stack(parents->item, &contains_stack); + break; + } + } + free(contains_stack.contains_stack); + return contains_test(candidate, want, cache, cutoff); +} + +int commit_contains(struct ref_filter *filter, struct commit *commit, + struct commit_list *list, struct contains_cache *cache) +{ + if (filter->with_commit_tag_algo) + return contains_
[RFC PATCH 04/13] upload-pack: make reachable() more generic
In anticipation of moving the reachable() method to commit-reach.c, modify the prototype to be more generic to flags known outside of upload-pack.c. Also rename 'want' to 'from' to make the statement more clear outside of the context of haves/wants negotiation. Signed-off-by: Derrick Stolee --- upload-pack.c | 23 +++ 1 file changed, 11 insertions(+), 12 deletions(-) diff --git a/upload-pack.c b/upload-pack.c index 87c6722ea5..95c56dc027 100644 --- a/upload-pack.c +++ b/upload-pack.c @@ -28,7 +28,6 @@ #define OUR_REF(1u << 12) #define WANTED (1u << 13) #define COMMON_KNOWN (1u << 14) -#define REACHABLE (1u << 15) #define SHALLOW(1u << 16) #define NOT_SHALLOW(1u << 17) @@ -334,36 +333,36 @@ static int got_oid(const char *hex, struct object_id *oid) return 0; } -static int reachable(struct commit *want) +static int reachable(struct commit *from, int with_flag, int assign_flag) { struct prio_queue work = { compare_commits_by_commit_date }; - prio_queue_put(&work, want); + prio_queue_put(&work, from); while (work.nr) { struct commit_list *list; struct commit *commit = prio_queue_get(&work); - if (commit->object.flags & THEY_HAVE) { - want->object.flags |= COMMON_KNOWN; + if (commit->object.flags & with_flag) { + from->object.flags |= assign_flag; break; } if (!commit->object.parsed) parse_object(&commit->object.oid); - if (commit->object.flags & REACHABLE) + if (commit->object.flags & TMP_MARK) continue; - commit->object.flags |= REACHABLE; + commit->object.flags |= TMP_MARK; if (commit->date < oldest_have) continue; for (list = commit->parents; list; list = list->next) { struct commit *parent = list->item; - if (!(parent->object.flags & REACHABLE)) + if (!(parent->object.flags & TMP_MARK)) prio_queue_put(&work, parent); } } - want->object.flags |= REACHABLE; - clear_commit_marks(want, REACHABLE); + from->object.flags |= TMP_MARK; + clear_commit_marks(from, TMP_MARK); clear_prio_queue(&work); - return (want->object.flags & COMMON_KNOWN); + return (from->object.flags & assign_flag); } static int ok_to_give_up(void) @@ -388,7 +387,7 @@ static int ok_to_give_up(void) want_obj.objects[i].item->flags |= COMMON_KNOWN; continue; } - if (!reachable((struct commit *)want)) + if (!reachable((struct commit *)want, THEY_HAVE, COMMON_KNOWN)) return 0; } return 1; -- 2.18.0.118.gd4f65b8d14
[RFC PATCH 12/13] commit-reach: use is_descendant_of for ref_newer
The ref_newer method is used by 'git push' to detect if a force-push is requried. This method does not use any kind of cutoff when walking, so in the case of a force-push will walk all reachable commits. The is_descendant_of method already uses paint_down_to_common along with cutoffs. By translating the ref_newer arguments into the commit and commit_list required by is_descendant_of, we can have one fewer commit walk and also improve our performance! For a copy of the Linux repository, 'test-tool reach ref_newer' presents the following improvements with the specified input. Input - A:v4.9 B:v3.19 Before: 0.11 s After: 0.10 s To test the negative case, add a new commit with parent v3.19, regenerate the commit-graph, and then run with B pointing at that commit. Before: 0.52 s After: 0.12 s Signed-off-by: Derrick Stolee --- commit-reach.c | 17 - commit.c | 7 +-- commit.h | 6 +- fetch-pack.c | 3 ++- sha1-name.c| 3 ++- walker.c | 3 ++- 6 files changed, 20 insertions(+), 19 deletions(-) diff --git a/commit-reach.c b/commit-reach.c index 8e24455d9f..249e9a4fac 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -376,7 +376,7 @@ int ref_newer(const struct object_id *new_oid, const struct object_id *old_oid) { struct object *o; struct commit *old_commit, *new_commit; - struct commit_list *list, *used; + struct commit_list *list = NULL; int found = 0; /* @@ -396,18 +396,9 @@ int ref_newer(const struct object_id *new_oid, const struct object_id *old_oid) if (parse_commit(new_commit) < 0) return 0; - used = list = NULL; - commit_list_insert(new_commit, &list); - while (list) { - new_commit = pop_most_recent_commit(&list, TMP_MARK); - commit_list_insert(new_commit, &used); - if (new_commit == old_commit) { - found = 1; - break; - } - } - unmark_and_free(list, TMP_MARK); - unmark_and_free(used, TMP_MARK); + commit_list_insert(old_commit, &list); + found = is_descendant_of(new_commit, list); + free_commit_list(list); return found; } diff --git a/commit.c b/commit.c index d4ddaf4879..9870682673 100644 --- a/commit.c +++ b/commit.c @@ -556,7 +556,8 @@ void commit_list_sort_by_date(struct commit_list **list) } struct commit *pop_most_recent_commit(struct commit_list **list, - unsigned int mark) + unsigned int mark, + uint32_t min_generation) { struct commit *ret = pop_commit(list); struct commit_list *parents = ret->parents; @@ -565,7 +566,9 @@ struct commit *pop_most_recent_commit(struct commit_list **list, struct commit *commit = parents->item; if (!parse_commit(commit) && !(commit->object.flags & mark)) { commit->object.flags |= mark; - commit_list_insert_by_date(commit, list); + + if (commit->generation >= min_generation) + commit_list_insert_by_date(commit, list); } parents = parents->next; } diff --git a/commit.h b/commit.h index 7e0f273720..5eaeded5e2 100644 --- a/commit.h +++ b/commit.h @@ -159,9 +159,13 @@ extern const char *skip_blank_lines(const char *msg); /** Removes the first commit from a list sorted by date, and adds all * of its parents. + * + * The parents are not added if their generation number is strictly + * lower than min_generation. **/ struct commit *pop_most_recent_commit(struct commit_list **list, - unsigned int mark); + unsigned int mark, + uint32_t min_generation); struct commit *pop_commit(struct commit_list **stack); diff --git a/fetch-pack.c b/fetch-pack.c index a320ce9872..351e3d4bcd 100644 --- a/fetch-pack.c +++ b/fetch-pack.c @@ -600,7 +600,8 @@ static void mark_recent_complete_commits(struct fetch_pack_args *args, while (complete && cutoff <= complete->item->date) { print_verbose(args, _("Marking %s as complete"), oid_to_hex(&complete->item->object.oid)); - pop_most_recent_commit(&complete, COMPLETE); + pop_most_recent_commit(&complete, COMPLETE, + GENERATION_NUMBER_ZERO); } } diff --git a/sha1-name.c b/sha1-name.c index 60d9ef3c7e..471a54464d 100644 --- a/sha1-name.c +++ b/sha1-name.c @@ -1141,7 +1141,8 @@ static int get_oid_oneline(const char *prefix, struct object_id *oid, struct commit *commit; int matches; - commit = pop_most_recent_commit(&list, ONELINE_SEEN); + commit =
[RFC PATCH 10/13] commit-reach: test is_descendant_of
Signed-off-by: Derrick Stolee --- t/helper/test-reach.c | 2 ++ t/t6600-test-reach.sh | 25 + 2 files changed, 27 insertions(+) diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c index 88639a2945..14aaef5bff 100644 --- a/t/helper/test-reach.c +++ b/t/helper/test-reach.c @@ -89,6 +89,8 @@ int cmd__reach(int ac, const char **av) printf("%s:%d\n", av[1], ref_newer(&oid_A, &oid_B)); else if (!strcmp(av[1], "in_merge_base")) printf("%s:%d\n", av[1], in_merge_bases(A, B)); + else if (!strcmp(av[1], "is_descendant_of")) + printf("%s(A,X):%d\n", av[1], is_descendant_of(A, list_X)); else if (!strcmp(av[1], "get_merge_bases_many")) { struct commit_list *list = get_merge_bases_many(A, nr_X, X); printf("%s(A,X):\n", av[1]); diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh index ef25e70174..62655ae650 100755 --- a/t/t6600-test-reach.sh +++ b/t/t6600-test-reach.sh @@ -78,6 +78,31 @@ test_expect_success 'ref_newer:miss' ' test_reach_two_modes ref_newer ' +test_expect_success 'is_descendant_of:hit' ' + cat >input <<- EOF && + A:commit-5-7 + X:commit-4-8 + X:commit-6-6 + X:commit-1-1 + EOF + cat >expect <<- EOF && + is_descendant_of(A,X):1 + EOF + test_reach_two_modes is_descendant_of +' + +test_expect_success 'is_descendant_of:miss' ' + cat >input <<- EOF && + A:commit-5-7 + X:commit-4-8 + X:commit-6-6 + EOF + cat >expect <<- EOF && + is_descendant_of(A,X):0 + EOF + test_reach_two_modes is_descendant_of +' + test_expect_success 'reduce_heads' ' cat >input <<- EOF && X:commit-1-10 -- 2.18.0.118.gd4f65b8d14
[RFC PATCH 09/13] commit-reach: test can_all_from_reach
Signed-off-by: Derrick Stolee --- commit-reach.c| 52 +++ commit-reach.h| 4 +++- t/helper/test-reach.c | 7 -- t/t6600-test-reach.sh | 51 +++--- upload-pack.c | 2 +- 5 files changed, 105 insertions(+), 11 deletions(-) diff --git a/commit-reach.c b/commit-reach.c index 44c09669ec..992ad5cdc7 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -562,14 +562,14 @@ static int reachable(struct commit *from, int with_flag, int assign_flag, time_t return (from->object.flags & assign_flag); } -int can_all_from_reach_with_flag(struct object_array from, +int can_all_from_reach_with_flag(struct object_array *from, int with_flag, int assign_flag, time_t min_commit_date) { int i; - for (i = 0; i < from.nr; i++) { - struct object *from_one = from.objects[i].item; + for (i = 0; i < from->nr; i++) { + struct object *from_one = from->objects[i].item; if (from_one->flags & assign_flag) continue; @@ -580,7 +580,7 @@ int can_all_from_reach_with_flag(struct object_array from, * leave a note to ourselves not to worry about * this object anymore. */ - from.objects[i].item->flags |= assign_flag; + from->objects[i].item->flags |= assign_flag; continue; } if (!reachable((struct commit *)from_one, with_flag, @@ -589,3 +589,47 @@ int can_all_from_reach_with_flag(struct object_array from, } return 1; } + +int can_all_from_reach(struct commit_list *from, struct commit_list *to) +{ + struct object_array from_objs = OBJECT_ARRAY_INIT; + time_t min_commit_date = from->item->date; + struct commit_list *from_iter = from; + struct commit_list *to_iter = to; + int result; + + while (from_iter) { + add_object_array(&from_iter->item->object, NULL, &from_objs); + + if (from_iter->item->date < min_commit_date) + min_commit_date = from_iter->item->date; + + from_iter = from_iter->next; + } + + while (to_iter) { + if (to_iter->item->date < min_commit_date) + min_commit_date = to_iter->item->date; + + to_iter->item->object.flags |= PARENT2; + + to_iter = to_iter->next; + } + + result = can_all_from_reach_with_flag(&from_objs, PARENT2, PARENT1, + min_commit_date); + + while (from) { + clear_commit_marks(from->item, PARENT1); + from = from->next; + } + + while (to) { + clear_commit_marks(to->item, PARENT2); + to = to->next; + } + + object_array_clear(&from_objs); + + return result; +} diff --git a/commit-reach.h b/commit-reach.h index c3da8488eb..8ab06af2eb 100644 --- a/commit-reach.h +++ b/commit-reach.h @@ -27,7 +27,9 @@ int commit_contains(struct ref_filter *filter, struct commit *commit, * the 'with_flag' bit on? Mark the commits in 'from' that can reach * such commits with 'assign_flag'. */ -int can_all_from_reach_with_flag(struct object_array from, int with_flag, +int can_all_from_reach_with_flag(struct object_array *from, int with_flag, int assign_flag, time_t min_commit_date); +int can_all_from_reach(struct commit_list *from, struct commit_list *to); + #endif diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c index d57660af45..88639a2945 100644 --- a/t/helper/test-reach.c +++ b/t/helper/test-reach.c @@ -15,9 +15,9 @@ int cmd__reach(int ac, const char **av) struct strbuf buf = STRBUF_INIT; setup_git_directory(); - get_git_config(default_git_config, 0); + git_config(git_default_config, NULL); - if (argc < 2) + if (ac < 2) exit(1); /* load input data */ @@ -117,6 +117,9 @@ int cmd__reach(int ac, const char **av) printf("%s\n", oid_to_hex(&list->item->object.oid)); list = list->next; } + } else if (!strcmp(av[1], "can_all_from_reach")) { + int result = can_all_from_reach(list_X, list_Y); + printf("%s(X,Y):%d\n", av[1], result); } exit(0); diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh index 0f60db9c60..ef25e70174 100755 --- a/t/t6600-test-reach.sh +++ b/t/t6600-test-reach.sh @@ -64,7 +64,7 @@ test_expect_success 'ref_newer:miss' ' cat >expect <<- EOF && ref_newer:0 EOF - test_reach_two_modes "ref_newer" + test_reach_two_modes ref_newer ' test_expect_success 'ref_newer:miss' ' @@ -
[RFC PATCH 13/13] commit-reach: use can_all_from_reach
The is_descendant_of method previously used in_merge_bases() to check if the commit can reach any of the commits in the provided list. This had two performance problems: 1. The performance is quadratic in worst-case. 2. A single in_merge_bases() call requires walking beyond the target commit in order to find the full set of boundary commits that may be merge-bases. The can_all_from_reach method avoids this quadratic behavior and can limit the search beyond the target commits using generation numbers. It requires a small prototype adjustment to stop using commit-date as a cutoff, as that optimization is no longer appropriate here. Performance was meausured on a copy of the Linux repository using the 'test-tool reach is_descendant_of' command using this input: A:v4.9 X:v4.10 X:v4.11 X:v4.12 X:v4.13 X:v4.14 X:v4.15 X:v4.16 X:v4.17 X.v3.0 Note that this input is tailored to demonstrate the quadratic nature of the previous method, as it will compute merge-bases for v4.9 versus all of the later versions before checking against v4.1. Before: 0.31 s After: 0.27 s Since we previously used the is_descendant_of method in the ref_newer method, we also measured performance there using 'test-tool reach ref_newer': Before: 0.12 s After: 0.11 s Signed-off-by: Derrick Stolee --- One thing I know is missing from this commit is a special-case to use the old logic when there is no commit-graph present. The can_all_from_reach() algorithm can be worse when we do not have good generation number cutoffs. In the previous case of can_all_from_reach_with_flags(), we already had an established pattern of using commit date as a cutoff, so the generation number is only a second cutoff and the algorithm cannot walk more commits than before. commit-reach.c| 34 +- commit-reach.h| 3 ++- t/helper/test-reach.c | 2 +- 3 files changed, 20 insertions(+), 19 deletions(-) diff --git a/commit-reach.c b/commit-reach.c index 249e9a4fac..a823d6965c 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -273,17 +273,15 @@ struct commit_list *get_merge_bases(struct commit *one, struct commit *two) */ int is_descendant_of(struct commit *commit, struct commit_list *with_commit) { + struct commit_list *from_list = NULL; + int result; if (!with_commit) return 1; - while (with_commit) { - struct commit *other; - other = with_commit->item; - with_commit = with_commit->next; - if (in_merge_bases(other, commit)) - return 1; - } - return 0; + commit_list_insert(commit, &from_list); + result = can_all_from_reach(from_list, with_commit, 0); + free_commit_list(from_list); + return result; } /* @@ -605,10 +603,11 @@ int can_all_from_reach_with_flag(struct object_array *from, return result; } -int can_all_from_reach(struct commit_list *from, struct commit_list *to) +int can_all_from_reach(struct commit_list *from, struct commit_list *to, + int cutoff_by_min_date) { struct object_array from_objs = OBJECT_ARRAY_INIT; - time_t min_commit_date = from->item->date; + time_t min_commit_date = cutoff_by_min_date ? from->item->date : 0; struct commit_list *from_iter = from; struct commit_list *to_iter = to; int result; @@ -617,20 +616,21 @@ int can_all_from_reach(struct commit_list *from, struct commit_list *to) while (from_iter) { add_object_array(&from_iter->item->object, NULL, &from_objs); - if (from_iter->item->date < min_commit_date) + if (!parse_commit(from_iter->item) && + from_iter->item->date < min_commit_date) min_commit_date = from_iter->item->date; from_iter = from_iter->next; } while (to_iter) { - parse_commit(to_iter->item); - - if (to_iter->item->date < min_commit_date) - min_commit_date = to_iter->item->date; + if (!parse_commit(to_iter->item)) { + if (to_iter->item->date < min_commit_date) + min_commit_date = to_iter->item->date; - if (to_iter->item->generation < min_generation) - min_generation = to_iter->item->generation; + if (to_iter->item->generation < min_generation) + min_generation = to_iter->item->generation; + } to_iter->item->object.flags |= PARENT2; diff --git a/commit-reach.h b/commit-reach.h index 3eb4c057e6..180f865d7d 100644 --- a/commit-reach.h +++ b/commit-reach.h @@ -31,6 +31,7 @@ int can_all_from_reach_with_flag(struct object_array *from, int with_flag, int assign_flag, time_t min_commit_date, u
[RFC PATCH 06/13] commit-reach: move can_all_from_reach_with_flag()
Signed-off-by: Derrick Stolee --- commit-reach.c | 61 + commit-reach.h | 8 +++ upload-pack.c | 62 +++--- 3 files changed, 72 insertions(+), 59 deletions(-) diff --git a/commit-reach.c b/commit-reach.c index 6cfd7379ce..44c09669ec 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -1,5 +1,6 @@ #include "cache.h" #include "commit.h" +#include "commit-graph.h" #include "decorate.h" #include "prio-queue.h" #include "tree.h" @@ -528,3 +529,63 @@ int commit_contains(struct ref_filter *filter, struct commit *commit, return contains_tag_algo(commit, list, cache) == CONTAINS_YES; return is_descendant_of(commit, list); } + +static int reachable(struct commit *from, int with_flag, int assign_flag, time_t min_commit_date) +{ + struct prio_queue work = { compare_commits_by_commit_date }; + + prio_queue_put(&work, from); + while (work.nr) { + struct commit_list *list; + struct commit *commit = prio_queue_get(&work); + + if (commit->object.flags & with_flag) { + from->object.flags |= assign_flag; + break; + } + if (!commit->object.parsed) + parse_object(&commit->object.oid); + if (commit->object.flags & TMP_MARK) + continue; + commit->object.flags |= TMP_MARK; + if (commit->date < min_commit_date) + continue; + for (list = commit->parents; list; list = list->next) { + struct commit *parent = list->item; + if (!(parent->object.flags & TMP_MARK)) + prio_queue_put(&work, parent); + } + } + from->object.flags |= TMP_MARK; + clear_commit_marks(from, TMP_MARK); + clear_prio_queue(&work); + return (from->object.flags & assign_flag); +} + +int can_all_from_reach_with_flag(struct object_array from, +int with_flag, int assign_flag, +time_t min_commit_date) +{ + int i; + + for (i = 0; i < from.nr; i++) { + struct object *from_one = from.objects[i].item; + + if (from_one->flags & assign_flag) + continue; + from_one = deref_tag(from_one, "a from object", 0); + if (!from_one || from_one->type != OBJ_COMMIT) { + /* no way to tell if this is reachable by +* looking at the ancestry chain alone, so +* leave a note to ourselves not to worry about +* this object anymore. +*/ + from.objects[i].item->flags |= assign_flag; + continue; + } + if (!reachable((struct commit *)from_one, with_flag, + assign_flag, min_commit_date)) + return 0; + } + return 1; +} diff --git a/commit-reach.h b/commit-reach.h index 986fb388d5..c3da8488eb 100644 --- a/commit-reach.h +++ b/commit-reach.h @@ -22,4 +22,12 @@ define_commit_slab(contains_cache, enum contains_result); int commit_contains(struct ref_filter *filter, struct commit *commit, struct commit_list *list, struct contains_cache *cache); +/* + * Can every commit or tag in the 'from' array reach a commit that has + * the 'with_flag' bit on? Mark the commits in 'from' that can reach + * such commits with 'assign_flag'. + */ +int can_all_from_reach_with_flag(struct object_array from, int with_flag, +int assign_flag, time_t min_commit_date); + #endif diff --git a/upload-pack.c b/upload-pack.c index 2f09cadbc0..7c58cb8f5e 100644 --- a/upload-pack.c +++ b/upload-pack.c @@ -22,6 +22,7 @@ #include "quote.h" #include "upload-pack.h" #include "serve.h" +#include "commit-reach.h" /* Remember to update object flag allocation in object.h */ #define THEY_HAVE (1u << 11) @@ -333,70 +334,13 @@ static int got_oid(const char *hex, struct object_id *oid) return 0; } -static int reachable(struct commit *from, int with_flag, int assign_flag) -{ - struct prio_queue work = { compare_commits_by_commit_date }; - - prio_queue_put(&work, from); - while (work.nr) { - struct commit_list *list; - struct commit *commit = prio_queue_get(&work); - - if (commit->object.flags & with_flag) { - from->object.flags |= assign_flag; - break; - } - if (!commit->object.parsed) - parse_object(&commit->object.oid); - if (commit->object.flags & TMP_MARK) - continue; - commit
[RFC PATCH 01/13] commit-reach: move walk methods from commit.c
Signed-off-by: Derrick Stolee --- Makefile | 1 + commit-reach.c | 359 + commit-reach.h | 41 ++ commit.c | 358 4 files changed, 401 insertions(+), 358 deletions(-) create mode 100644 commit-reach.c create mode 100644 commit-reach.h diff --git a/Makefile b/Makefile index 0cb6590f24..89ad873ce0 100644 --- a/Makefile +++ b/Makefile @@ -828,6 +828,7 @@ LIB_OBJS += column.o LIB_OBJS += combine-diff.o LIB_OBJS += commit.o LIB_OBJS += commit-graph.o +LIB_OBJS += commit-reach.o LIB_OBJS += compat/obstack.o LIB_OBJS += compat/terminal.o LIB_OBJS += config.o diff --git a/commit-reach.c b/commit-reach.c new file mode 100644 index 00..1438393165 --- /dev/null +++ b/commit-reach.c @@ -0,0 +1,359 @@ +#include "cache.h" +#include "prio-queue.h" +#include "commit-reach.h" + +/* Remember to update object flag allocation in object.h */ +#define PARENT1 (1u<<16) +#define PARENT2 (1u<<17) +#define STALE (1u<<18) +#define RESULT (1u<<19) + +static const unsigned all_flags = (PARENT1 | PARENT2 | STALE | RESULT); + +static int queue_has_nonstale(struct prio_queue *queue) +{ + int i; + for (i = 0; i < queue->nr; i++) { + struct commit *commit = queue->array[i].data; + if (!(commit->object.flags & STALE)) + return 1; + } + return 0; +} + +/* all input commits in one and twos[] must have been parsed! */ +static struct commit_list *paint_down_to_common(struct commit *one, int n, + struct commit **twos, + int min_generation) +{ + struct prio_queue queue = { compare_commits_by_gen_then_commit_date }; + struct commit_list *result = NULL; + int i; + uint32_t last_gen = GENERATION_NUMBER_INFINITY; + + one->object.flags |= PARENT1; + if (!n) { + commit_list_append(one, &result); + return result; + } + prio_queue_put(&queue, one); + + for (i = 0; i < n; i++) { + twos[i]->object.flags |= PARENT2; + prio_queue_put(&queue, twos[i]); + } + + while (queue_has_nonstale(&queue)) { + struct commit *commit = prio_queue_get(&queue); + struct commit_list *parents; + int flags; + + if (commit->generation > last_gen) + BUG("bad generation skip %8x > %8x at %s", + commit->generation, last_gen, + oid_to_hex(&commit->object.oid)); + last_gen = commit->generation; + + if (commit->generation < min_generation) + break; + + flags = commit->object.flags & (PARENT1 | PARENT2 | STALE); + if (flags == (PARENT1 | PARENT2)) { + if (!(commit->object.flags & RESULT)) { + commit->object.flags |= RESULT; + commit_list_insert_by_date(commit, &result); + } + /* Mark parents of a found merge stale */ + flags |= STALE; + } + parents = commit->parents; + while (parents) { + struct commit *p = parents->item; + parents = parents->next; + if ((p->object.flags & flags) == flags) + continue; + if (parse_commit(p)) + return NULL; + p->object.flags |= flags; + prio_queue_put(&queue, p); + } + } + + clear_prio_queue(&queue); + return result; +} + +static struct commit_list *merge_bases_many(struct commit *one, int n, struct commit **twos) +{ + struct commit_list *list = NULL; + struct commit_list *result = NULL; + int i; + + for (i = 0; i < n; i++) { + if (one == twos[i]) + /* +* We do not mark this even with RESULT so we do not +* have to clean it up. +*/ + return commit_list_insert(one, &result); + } + + if (parse_commit(one)) + return NULL; + for (i = 0; i < n; i++) { + if (parse_commit(twos[i])) + return NULL; + } + + list = paint_down_to_common(one, n, twos, 0); + + while (list) { + struct commit *commit = pop_commit(&list); + if (!(commit->object.flags & STALE)) + commit_list_insert_by_date(commit, &result); + } + return result; +} + +struct commit_list *get_octopus_merge_bases(struct commit_list *in) +{ + stru
[RFC PATCH 00/13] Consolidate reachability logic
This RFC is a bit unpolished because I was mostly seeing where the idea could go. I wanted to achieve the following: 1. Consolidate several different commit walks into one file 2. Reduce duplicate reachability logic 3. Increase testability (correctness and performance) 4. Improve performance of reachability queries My approach is mostly in three parts: I. Move code to a new commit-reach.c file. II. Add a 'test-tool reach' command to test these methods directly. III. Modify the logic by improving performance and calling methods with similar logic but different prototypes. The 'test-tool reach' command is helpful to make sure I don't break anything as I change the logic, but also so I can test methods that are normally only exposed by other more complicated commands. For instance, ref_newer() is part of 'git push -f' and ok_to_give_up() is buried deep within fetch negotiation. Both of these methods have some problematic performacne issues that are corrected by this series. As I discovered them, it was clear that it would be better to consolidate walk logic instead of discovering a new walk in another file hidden somewhere. For the ok_to_give_up() method, I refactored the method so I could pull the logic out of the depths of fetch negotiation. In the commit "commit-reach: make can_all_from_reach... linear" I discuss how the existing algorithm is quadratic and how we can make it linear. Also, we can use heuristic knowledge about the shape of the commit graph and the usual haves/wants to get some extra performance bonus. (The heuristic is to do a DFS with first-parents first, and stop on first found result. We expect haves/wants to include ref tips, which typically have their previous values in their first-parent history.) The "test-reach" commit in particular is not split well or described well for mailing-list review. I figured I would send the RFC instead of tweaking it carefully, because I will need to re-do most of these changes after more of the object-store series is merged. I don't plan to send a v1 patch until the lookup_commit() and parse_commit() code is stable again. Thanks, -Stolee Derrick Stolee (13): commit-reach: move walk methods from commit.c commit-reach: move ref_newer from remote.c commit-reach: move commit_contains from ref-filter upload-pack: make reachable() more generic upload-pack: refactor ok_to_give_up() commit-reach: move can_all_from_reach_with_flag() test-reach test-reach: test reduce_heads() commit-reach: test can_all_from_reach commit-reach: test is_descendant_of commit-reach: make can_all_from_reach... linear commit-reach: use is_descendant_of for ref_newer commit-reach: use can_all_from_reach Makefile | 2 + builtin/remote.c | 1 + commit-reach.c| 656 ++ commit-reach.h| 37 +++ commit.c | 365 +-- commit.h | 6 +- fast-import.c | 1 + fetch-pack.c | 3 +- http-push.c | 1 + ref-filter.c | 147 +- remote.c | 48 +--- remote.h | 1 - sha1-name.c | 3 +- t/helper/test-reach.c | 128 + t/helper/test-tool.c | 1 + t/helper/test-tool.h | 1 + t/t6600-test-reach.sh | 177 upload-pack.c | 58 +--- walker.c | 3 +- 19 files changed, 1035 insertions(+), 604 deletions(-) create mode 100644 commit-reach.c create mode 100644 commit-reach.h create mode 100644 t/helper/test-reach.c create mode 100755 t/t6600-test-reach.sh base-commit: d4f65b8d141e041eb5e558cd9e763873e29863b9 -- 2.18.0.118.gd4f65b8d14
[RFC PATCH 02/13] commit-reach: move ref_newer from remote.c
Signed-off-by: Derrick Stolee --- builtin/remote.c | 1 + commit-reach.c | 52 commit-reach.h | 2 ++ http-push.c | 1 + remote.c | 48 +--- remote.h | 1 - 6 files changed, 57 insertions(+), 48 deletions(-) diff --git a/builtin/remote.c b/builtin/remote.c index 1a82d850a2..341f704ada 100644 --- a/builtin/remote.c +++ b/builtin/remote.c @@ -9,6 +9,7 @@ #include "refs.h" #include "refspec.h" #include "argv-array.h" +#include "commit-reach.h" static const char * const builtin_remote_usage[] = { N_("git remote [-v | --verbose]"), diff --git a/commit-reach.c b/commit-reach.c index 1438393165..80cdb738f6 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -1,5 +1,10 @@ #include "cache.h" +#include "commit.h" +#include "decorate.h" #include "prio-queue.h" +#include "tree.h" +#include "revision.h" +#include "tag.h" #include "commit-reach.h" /* Remember to update object flag allocation in object.h */ @@ -357,3 +362,50 @@ void reduce_heads_replace(struct commit_list **heads) free_commit_list(*heads); *heads = result; } + +static void unmark_and_free(struct commit_list *list, unsigned int mark) +{ + while (list) { + struct commit *commit = pop_commit(&list); + commit->object.flags &= ~mark; + } +} + +int ref_newer(const struct object_id *new_oid, const struct object_id *old_oid) +{ + struct object *o; + struct commit *old_commit, *new_commit; + struct commit_list *list, *used; + int found = 0; + + /* +* Both new_commit and old_commit must be commit-ish and new_commit is descendant of +* old_commit. Otherwise we require --force. +*/ + o = deref_tag(parse_object(old_oid), NULL, 0); + if (!o || o->type != OBJ_COMMIT) + return 0; + old_commit = (struct commit *) o; + + o = deref_tag(parse_object(new_oid), NULL, 0); + if (!o || o->type != OBJ_COMMIT) + return 0; + new_commit = (struct commit *) o; + + if (parse_commit(new_commit) < 0) + return 0; + + used = list = NULL; + commit_list_insert(new_commit, &list); + while (list) { + new_commit = pop_most_recent_commit(&list, TMP_MARK); + commit_list_insert(new_commit, &used); + if (new_commit == old_commit) { + found = 1; + break; + } + } + unmark_and_free(list, TMP_MARK); + unmark_and_free(used, TMP_MARK); + return found; +} diff --git a/commit-reach.h b/commit-reach.h index 244f48c5f2..35ec9f0ddb 100644 --- a/commit-reach.h +++ b/commit-reach.h @@ -38,4 +38,6 @@ struct commit_list *reduce_heads(struct commit_list *heads); */ void reduce_heads_replace(struct commit_list **heads); +int ref_newer(const struct object_id *new_oid, const struct object_id *old_oid); + #endif diff --git a/http-push.c b/http-push.c index 7e38522098..218315b00b 100644 --- a/http-push.c +++ b/http-push.c @@ -13,6 +13,7 @@ #include "argv-array.h" #include "packfile.h" #include "object-store.h" +#include "commit-reach.h" #ifdef EXPAT_NEEDS_XMLPARSE_H #include diff --git a/remote.c b/remote.c index abe80c1397..dcb5a33fac 100644 --- a/remote.c +++ b/remote.c @@ -11,6 +11,7 @@ #include "string-list.h" #include "mergesort.h" #include "argv-array.h" +#include "commit-reach.h" enum map_direction { FROM_SRC, FROM_DST }; @@ -1781,53 +1782,6 @@ int resolve_remote_symref(struct ref *ref, struct ref *list) return 1; } -static void unmark_and_free(struct commit_list *list, unsigned int mark) -{ - while (list) { - struct commit *commit = pop_commit(&list); - commit->object.flags &= ~mark; - } -} - -int ref_newer(const struct object_id *new_oid, const struct object_id *old_oid) -{ - struct object *o; - struct commit *old_commit, *new_commit; - struct commit_list *list, *used; - int found = 0; - - /* -* Both new_commit and old_commit must be commit-ish and new_commit is descendant of -* old_commit. Otherwise we require --force. -*/ - o = deref_tag(parse_object(old_oid), NULL, 0); - if (!o || o->type != OBJ_COMMIT) - return 0; - old_commit = (struct commit *) o; - - o = deref_tag(parse_object(new_oid), NULL, 0); - if (!o || o->type != OBJ_COMMIT) - return 0; - new_commit = (struct commit *) o; - - if (parse_commit(new_commit) < 0) - return 0; - - used = list = NULL; - commit_list_insert(new_commit, &list); - while (list) { - new_commit = pop_most_recent_commit(&list, TMP_MARK); - commit_list_insert(new_commit, &used); - if (new_commit == old_commit) { - fo
Re: [PATCH] sequencer: use configured comment character
Johannes Schindelin writes: > In short: the code is fine, but yes, I had to convince myself by looking > through the code. (Hinting at a possible improvement of the commit > message.) Yup, that exactly was what I was hoping readers to realize.
[GSoC][PATCH v5 2/3] rebase -i: rewrite setup_reflog_action() in C
This rewrites (the misnamed) setup_reflog_action() from shell to C. The new version is called prepare_branch_to_be_rebased(). A new command is added to rebase--helper.c, “checkout-base”, as well as a new flag, “verbose”, to avoid silencing the output of the checkout operation called by checkout_base_commit(). The function `run_git_checkout()` will also be used in the next commit, therefore its code is not part of `checkout_base_commit()`. The shell version is then stripped in favour of a call to the helper. As $GIT_REFLOG_ACTION is no longer set at the first call of checkout_onto(), a call to comment_for_reflog() is added at the beginning of this function. Signed-off-by: Alban Gruin --- builtin/rebase--helper.c | 9 +++-- git-rebase--interactive.sh | 16 ++-- sequencer.c| 31 +++ sequencer.h| 3 +++ 4 files changed, 43 insertions(+), 16 deletions(-) diff --git a/builtin/rebase--helper.c b/builtin/rebase--helper.c index 731a64971..6c789e78b 100644 --- a/builtin/rebase--helper.c +++ b/builtin/rebase--helper.c @@ -13,12 +13,12 @@ static const char * const builtin_rebase_helper_usage[] = { int cmd_rebase__helper(int argc, const char **argv, const char *prefix) { struct replay_opts opts = REPLAY_OPTS_INIT; - unsigned flags = 0, keep_empty = 0, rebase_merges = 0; + unsigned flags = 0, keep_empty = 0, rebase_merges = 0, verbose = 0; int abbreviate_commands = 0, rebase_cousins = -1; enum { CONTINUE = 1, ABORT, MAKE_SCRIPT, SHORTEN_OIDS, EXPAND_OIDS, CHECK_TODO_LIST, SKIP_UNNECESSARY_PICKS, REARRANGE_SQUASH, - ADD_EXEC, APPEND_TODO_HELP, EDIT_TODO + ADD_EXEC, APPEND_TODO_HELP, EDIT_TODO, PREPARE_BRANCH } command = 0; struct option options[] = { OPT_BOOL(0, "ff", &opts.allow_ff, N_("allow fast-forward")), @@ -28,6 +28,7 @@ int cmd_rebase__helper(int argc, const char **argv, const char *prefix) OPT_BOOL(0, "rebase-merges", &rebase_merges, N_("rebase merge commits")), OPT_BOOL(0, "rebase-cousins", &rebase_cousins, N_("keep original branch points of cousins")), + OPT__VERBOSE(&verbose, N_("be verbose")), OPT_CMDMODE(0, "continue", &command, N_("continue rebase"), CONTINUE), OPT_CMDMODE(0, "abort", &command, N_("abort rebase"), @@ -51,6 +52,8 @@ int cmd_rebase__helper(int argc, const char **argv, const char *prefix) OPT_CMDMODE(0, "edit-todo", &command, N_("edit the todo list during an interactive rebase"), EDIT_TODO), + OPT_CMDMODE(0, "prepare-branch", &command, + N_("prepare the branch to be rebased"), PREPARE_BRANCH), OPT_END() }; @@ -94,5 +97,7 @@ int cmd_rebase__helper(int argc, const char **argv, const char *prefix) return !!append_todo_help(0, keep_empty); if (command == EDIT_TODO && argc == 1) return !!edit_todo_list(flags); + if (command == PREPARE_BRANCH && argc == 2) + return !!prepare_branch_to_be_rebased(&opts, argv[1], verbose); usage_with_options(builtin_rebase_helper_usage, options); } diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh index 2defe607f..77e972bb6 100644 --- a/git-rebase--interactive.sh +++ b/git-rebase--interactive.sh @@ -72,6 +72,7 @@ collapse_todo_ids() { # Switch to the branch in $into and notify it in the reflog checkout_onto () { + comment_for_reflog start GIT_REFLOG_ACTION="$GIT_REFLOG_ACTION: checkout $onto_name" output git checkout $onto || die_abort "$(gettext "could not detach HEAD")" git update-ref ORIG_HEAD $orig_head @@ -119,19 +120,6 @@ initiate_action () { esac } -setup_reflog_action () { - comment_for_reflog start - - if test ! -z "$switch_to" - then - GIT_REFLOG_ACTION="$GIT_REFLOG_ACTION: checkout $switch_to" - output git checkout "$switch_to" -- || - die "$(eval_gettext "Could not checkout \$switch_to")" - - comment_for_reflog start - fi -} - init_basic_state () { orig_head=$(git rev-parse --verify HEAD) || die "$(gettext "No HEAD?")" mkdir -p "$state_dir" || die "$(eval_gettext "Could not create temporary \$state_dir")" @@ -211,7 +199,7 @@ git_rebase__interactive () { return 0 fi - setup_reflog_action + git rebase--helper --prepare-branch "$switch_to" ${verbose:+--verbose} init_basic_state init_revisions_and_shortrevisions diff --git a/sequencer.c b/sequencer.c index d9545b366..dd0cf3cb5 100644 --- a/sequencer.c +++ b/sequencer.c @@ -3134,6 +3134,37 @@ static const char *reflog_message(struct
[GSoC][PATCH v5 0/3] rebase -i: rewrite reflog operations in C
This patch series rewrites the reflog operations from shell to C. This is part of the effort to rewrite interactive rebase in C. The first commit is dedicated to creating a function to silence a command, as the sequencer will do in several places with these patches. This branch is based on ag/rebase-i-rewrite-todo, and does not conflict with pu (as of 2018-06-29). Changes since v4: - Changing the order of setup_reflog_action() and checkout_onto() rewrites in the series - checkout_onto() is no longer renamed in C - setup_reflog_action() is renamed to prepare_branch_to_be_rebased(), and not to checkout_onto(). Alban Gruin (3): sequencer: add a new function to silence a command, except if it fails. rebase -i: rewrite setup_reflog_action() in C rebase -i: rewrite checkout_onto() in C builtin/rebase--helper.c | 14 - git-rebase--interactive.sh | 39 ++ sequencer.c| 101 +++-- sequencer.h| 6 +++ 4 files changed, 98 insertions(+), 62 deletions(-) -- 2.18.0
[GSoC][PATCH v5 1/3] sequencer: add a new function to silence a command, except if it fails.
This adds a new function, run_command_silent_on_success(), to redirect the stdout and stderr of a command to a strbuf, and then to run that command. This strbuf is printed only if the command fails. It is functionnaly similar to output() from git-rebase.sh. run_git_commit() is then refactored to use of run_command_silent_on_success(). Signed-off-by: Alban Gruin --- sequencer.c | 51 +-- 1 file changed, 25 insertions(+), 26 deletions(-) diff --git a/sequencer.c b/sequencer.c index cb7ec9807..d9545b366 100644 --- a/sequencer.c +++ b/sequencer.c @@ -769,6 +769,23 @@ N_("you have staged changes in your working tree\n" #define VERIFY_MSG (1<<4) #define CREATE_ROOT_COMMIT (1<<5) +static int run_command_silent_on_success(struct child_process *cmd) +{ + struct strbuf buf = STRBUF_INIT; + int rc; + + cmd->stdout_to_stderr = 1; + rc = pipe_command(cmd, + NULL, 0, + NULL, 0, + &buf, 0); + + if (rc) + fputs(buf.buf, stderr); + strbuf_release(&buf); + return rc; +} + /* * If we are cherry-pick, and if the merge did not result in * hand-editing, we will hit this commit and inherit the original @@ -823,18 +840,11 @@ static int run_git_commit(const char *defmsg, struct replay_opts *opts, cmd.git_cmd = 1; - if (is_rebase_i(opts)) { - if (!(flags & EDIT_MSG)) { - cmd.stdout_to_stderr = 1; - cmd.err = -1; - } - - if (read_env_script(&cmd.env_array)) { - const char *gpg_opt = gpg_sign_opt_quoted(opts); + if (is_rebase_i(opts) && read_env_script(&cmd.env_array)) { + const char *gpg_opt = gpg_sign_opt_quoted(opts); - return error(_(staged_changes_advice), -gpg_opt, gpg_opt); - } + return error(_(staged_changes_advice), +gpg_opt, gpg_opt); } argv_array_push(&cmd.args, "commit"); @@ -864,21 +874,10 @@ static int run_git_commit(const char *defmsg, struct replay_opts *opts, if (opts->allow_empty_message) argv_array_push(&cmd.args, "--allow-empty-message"); - if (cmd.err == -1) { - /* hide stderr on success */ - struct strbuf buf = STRBUF_INIT; - int rc = pipe_command(&cmd, - NULL, 0, - /* stdout is already redirected */ - NULL, 0, - &buf, 0); - if (rc) - fputs(buf.buf, stderr); - strbuf_release(&buf); - return rc; - } - - return run_command(&cmd); + if (is_rebase_i(opts) && !(flags & EDIT_MSG)) + return run_command_silent_on_success(&cmd); + else + return run_command(&cmd); } static int rest_is_empty(const struct strbuf *sb, int start) -- 2.18.0
[GSoC][PATCH v5 3/3] rebase -i: rewrite checkout_onto() in C
This rewrites checkout_onto() from shell to C. A new command (“checkout-onto”) is added to rebase--helper.c. The shell version is then stripped. Signed-off-by: Alban Gruin --- builtin/rebase--helper.c | 7 ++- git-rebase--interactive.sh | 25 - sequencer.c| 19 +++ sequencer.h| 3 +++ 4 files changed, 32 insertions(+), 22 deletions(-) diff --git a/builtin/rebase--helper.c b/builtin/rebase--helper.c index 6c789e78b..0091e094b 100644 --- a/builtin/rebase--helper.c +++ b/builtin/rebase--helper.c @@ -18,7 +18,8 @@ int cmd_rebase__helper(int argc, const char **argv, const char *prefix) enum { CONTINUE = 1, ABORT, MAKE_SCRIPT, SHORTEN_OIDS, EXPAND_OIDS, CHECK_TODO_LIST, SKIP_UNNECESSARY_PICKS, REARRANGE_SQUASH, - ADD_EXEC, APPEND_TODO_HELP, EDIT_TODO, PREPARE_BRANCH + ADD_EXEC, APPEND_TODO_HELP, EDIT_TODO, PREPARE_BRANCH, + CHECKOUT_ONTO } command = 0; struct option options[] = { OPT_BOOL(0, "ff", &opts.allow_ff, N_("allow fast-forward")), @@ -54,6 +55,8 @@ int cmd_rebase__helper(int argc, const char **argv, const char *prefix) EDIT_TODO), OPT_CMDMODE(0, "prepare-branch", &command, N_("prepare the branch to be rebased"), PREPARE_BRANCH), + OPT_CMDMODE(0, "checkout-onto", &command, + N_("checkout a commit"), CHECKOUT_ONTO), OPT_END() }; @@ -99,5 +102,7 @@ int cmd_rebase__helper(int argc, const char **argv, const char *prefix) return !!edit_todo_list(flags); if (command == PREPARE_BRANCH && argc == 2) return !!prepare_branch_to_be_rebased(&opts, argv[1], verbose); + if (command == CHECKOUT_ONTO && argc == 4) + return !!checkout_onto(&opts, argv[1], argv[2], argv[3], verbose); usage_with_options(builtin_rebase_helper_usage, options); } diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh index 77e972bb6..b68f108f2 100644 --- a/git-rebase--interactive.sh +++ b/git-rebase--interactive.sh @@ -28,17 +28,6 @@ case "$comment_char" in ;; esac -orig_reflog_action="$GIT_REFLOG_ACTION" - -comment_for_reflog () { - case "$orig_reflog_action" in - ''|rebase*) - GIT_REFLOG_ACTION="rebase -i ($1)" - export GIT_REFLOG_ACTION - ;; - esac -} - die_abort () { apply_autostash rm -rf "$state_dir" @@ -70,14 +59,6 @@ collapse_todo_ids() { git rebase--helper --shorten-ids } -# Switch to the branch in $into and notify it in the reflog -checkout_onto () { - comment_for_reflog start - GIT_REFLOG_ACTION="$GIT_REFLOG_ACTION: checkout $onto_name" - output git checkout $onto || die_abort "$(gettext "could not detach HEAD")" - git update-ref ORIG_HEAD $orig_head -} - get_missing_commit_check_level () { check_level=$(git config --get rebase.missingCommitsCheck) check_level=${check_level:-ignore} @@ -176,7 +157,8 @@ EOF git rebase--helper --check-todo-list || { ret=$? - checkout_onto + git rebase--helper --checkout-onto "$onto_name" "$onto" \ + "$orig_head" ${verbose:+--verbose} exit $ret } @@ -186,7 +168,8 @@ EOF onto="$(git rebase--helper --skip-unnecessary-picks)" || die "Could not skip unnecessary pick commands" - checkout_onto + git rebase--helper --checkout-onto "$onto_name" "$onto" "$orig_head" \ + ${verbose:+--verbose} require_clean_work_tree "rebase" exec git rebase--helper ${force_rebase:+--no-ff} $allow_empty_message \ --continue diff --git a/sequencer.c b/sequencer.c index dd0cf3cb5..ec6de4eda 100644 --- a/sequencer.c +++ b/sequencer.c @@ -3165,6 +3165,25 @@ int prepare_branch_to_be_rebased(struct replay_opts *opts, const char *commit, return 0; } +int checkout_onto(struct replay_opts *opts, + const char *onto_name, const char *onto, + const char *orig_head, unsigned verbose) +{ + struct object_id oid; + const char *action = reflog_message(opts, "start", "checkout %s", onto_name); + + if (get_oid(orig_head, &oid)) + return error(_("%s: not a valid OID"), orig_head); + + if (run_git_checkout(opts, onto, verbose, action)) { + apply_autostash(opts); + sequencer_remove_state(opts); + return error(_("could not detach HEAD")); + } + + return update_ref(NULL, "ORIG_HEAD", &oid, NULL, 0, UPDATE_REFS_MSG_ON_ERR); +} + static const char rescheduled_advice[] = N_("Could not execute the todo command\n" "\n" diff --git a/sequencer.h b/sequencer.h index 3821420b0..ba5e59f02 100644 --- a/seq
Re: [PATCH v3 00/32] object-store: lookup_commit
On 6/28/2018 9:21 PM, Stefan Beller wrote: This continues the elimination of global variables in the object store and teaches lookup_commit[_reference] and alike to handle a_repository. This is also available as https://github.com/stefanbeller/git/tree/object-store-lookup-commit or applies on top of 02f70d63027 (Merge branch 'sb/object-store-grafts' into next, 2018-06-28). Picking a base for this one is really hard, as nearly all series currently cooking or in flight collide with it on one or two lines. (lookup_* is used heavily, who would have thought?); I really needed 'sb/object-store-grafts' to build on; and there is only one other series before that in current next, which this series would have conflicted with, if not based on top of it. In "The state of the object store series" [1], this was tentatively named "sb/object-store-lookup". As usual, this series was easy to review. It has a few places where it conflicts with ds/commit-graph-fsck, but they are not hard to resolve. I created a merge commit between that branch and your version of this series [1]. It passes all tests on my machine. Thanks, -Stolee [1] https://github.com/derrickstolee/git/commits/commit-graph-fsck-and-object-store
Re: [PATCH v3 10/32] commit: add repository argument to parse_commit_buffer
On 6/28/2018 9:22 PM, Stefan Beller wrote: Add a repository argument to allow the callers of parse_commit_buffer to be more specific about which repository to act on. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Jonathan Nieder Signed-off-by: Stefan Beller --- commit.c| 4 ++-- commit.h| 3 ++- object.c| 2 +- sha1-file.c | 2 +- 4 files changed, 6 insertions(+), 5 deletions(-) diff --git a/commit.c b/commit.c index 4803c8be1df..75d0bdede84 100644 --- a/commit.c +++ b/commit.c @@ -363,7 +363,7 @@ const void *detach_commit_buffer(struct commit *commit, unsigned long *sizep) return ret; } -int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long size, int check_graph) +int parse_commit_buffer_the_repository(struct commit *item, const void *buffer, unsigned long size, int check_graph) { const char *tail = buffer; const char *bufptr = buffer; @@ -448,7 +448,7 @@ int parse_commit_gently(struct commit *item, int quiet_on_missing) return error("Object %s not a commit", oid_to_hex(&item->object.oid)); } - ret = parse_commit_buffer(item, buffer, size, 0); + ret = parse_commit_buffer(the_repository, item, buffer, size, 0); if (save_commit_buffer && !ret) { set_commit_buffer(item, buffer, size); return 0; diff --git a/commit.h b/commit.h index cd80dab59c1..f326c13622b 100644 --- a/commit.h +++ b/commit.h @@ -82,7 +82,8 @@ struct commit *lookup_commit_reference_by_name(const char *name); */ struct commit *lookup_commit_or_die(const struct object_id *oid, const char *ref_name); -int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long size, int check_graph); +#define parse_commit_buffer(r, i, b, s, g) parse_commit_buffer_##r(i, b, s, g) +int parse_commit_buffer_the_repository(struct commit *item, const void *buffer, unsigned long size, int check_graph); int parse_commit_gently(struct commit *item, int quiet_on_missing); static inline int parse_commit(struct commit *item) { This part also conflicts with ds/commit-graph-fsck because I added a 'parse_commit_internal' between parse_commit_buffer() and parse_commit_gently(). Likely, you'll want to add a repository struct to that method, too. That series also adds new callers to parse_commit_buffer(). diff --git a/object.c b/object.c index 530c55e41e4..5494c0cbaa1 100644 --- a/object.c +++ b/object.c @@ -214,7 +214,7 @@ struct object *parse_object_buffer_the_repository(const struct object_id *oid, e } else if (type == OBJ_COMMIT) { struct commit *commit = lookup_commit(the_repository, oid); if (commit) { - if (parse_commit_buffer(commit, buffer, size, 1)) + if (parse_commit_buffer(the_repository, commit, buffer, size, 1)) return NULL; if (!get_cached_commit_buffer(commit, NULL)) { set_commit_buffer(commit, buffer, size); diff --git a/sha1-file.c b/sha1-file.c index de4839e634c..75ba30b4ab1 100644 --- a/sha1-file.c +++ b/sha1-file.c @@ -1801,7 +1801,7 @@ static void check_commit(const void *buf, size_t size) { struct commit c; memset(&c, 0, sizeof(c)); - if (parse_commit_buffer(&c, buf, size, 0)) + if (parse_commit_buffer(the_repository, &c, buf, size, 0)) die("corrupt commit"); }
Re: [PATCH v3 09/32] commit: add repository argument to lookup_commit
On 6/28/2018 9:21 PM, Stefan Beller wrote: Add a repository argument to allow callers of lookup_commit to be more specific about which repository to handle. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. In ds/commit-graph-fsck I add a few new callers to lookup_commit(), which will affect this patch on a reroll. Thanks, -Stolee
Re: [PATCH] sequencer: use configured comment character
Hi Junio, On Thu, 28 Jun 2018, Junio C Hamano wrote: > Aaron Schrab writes: > > > Use configured comment character when generating comments about branches > > in an instruction sheet. Failure to honor this configuration causes a > > failure to parse the resulting instruction sheet. > > > > Signed-off-by: Aaron Schrab > > --- > > sequencer.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/sequencer.c b/sequencer.c > > index 4034c0461b..caf91af29d 100644 > > --- a/sequencer.c > > +++ b/sequencer.c > > @@ -3991,7 +3991,7 @@ static int make_script_with_merges(struct > > pretty_print_context *pp, > > entry = oidmap_get(&state.commit2label, &commit->object.oid); > > > > if (entry) > > - fprintf(out, "\n# Branch %s\n", entry->string); > > + fprintf(out, "\n%c Branch %s\n", comment_line_char, > > entry->string); > > else > > fprintf(out, "\n"); > > Would this interact OK with core.commentchar set to "auto"? The idea of "auto" is: If set to "auto", `git-commit` would select a character that is not the beginning character of any line in existing commit messages. As there are no pre-existing lines in that script (apart from the ones we are about to add with the todo_help), the setting "auto" is pretty moot and we will fall back to the default comment char (or, if there was a previous core.commentChar that was parsed, that one). In short: the code is fine, but yes, I had to convince myself by looking through the code. (Hinting at a possible improvement of the commit message.) Ciao, Dscho
Re: What's cooking in git.git (Jun 2018, #07; Thu, 28)
On 6/28/2018 6:42 PM, Stefan Beller wrote: On Thu, Jun 28, 2018 at 2:40 PM Junio C Hamano wrote: * ds/commit-graph-fsck (2018-06-27) 22 commits [...] "git fsck" learns to make sure the optional commit-graph file is in a sane state. Is this ready for 'next'? I hope so, as I plan to reroll the next object store series based on it. I'll also review that. I think the series is ready. The only new feedback in a while is your style comments on "commit-graph: add '--reachable' option", which can hopefully be squashed in. I'm also open to sending a new series on top for new feedback. Thanks, -Stolee
Re: [PATCH v5 0/9] Document/fix/warn about rebase incompatibilities and inconsistencies
Hi Elijah On 27/06/18 08:23, Elijah Newren wrote: > git-rebase has lots of options that are mutually incompatible. Even among > aspects of its behavior that is common to all rebase types, it has a number > of inconsistencies. This series tries to document, fix, and/or warn users > about many of these. > > Changes since v4 (short branch-diff at the end): > - Fixed --strategy vs. --strategy-option (in patch 3, carries over to > context region in later commit) > > v4 didn't get a lot of feedback (though it was picked up by Junio), so I'll > repeat the areas that would be of most note for reviewers since v3: > > * I have left patch 9 in RFC state; since v3 it has an expanded the > commit message with an in-depth usability rationale for the > change. > * It sounded like Junio was slightly unclear about the intent of the > wording in Patch 1. Not sure if my answer (in email) was sufficient or > if there are wording improvements others might suggest. > * I'm assuming the --keep-empty and/or --empty={drop,halt,keep} (see > comments on patch 5 of v3) can be resolved in a later series. I'm sorry I've not got round to looking at the last couple of versions. Unfortunately I'm about to go off-line for a couple of weeks so I just wanted to let you know I wasn't ignoring you!. If they haven't been merged when I get back on-line I'll have a look then Best Wishes Phillip > Elijah Newren (9): > git-rebase.txt: document incompatible options > git-rebase.sh: update help messages a bit > t3422: new testcases for checking when incompatible options passed > git-rebase: error out when incompatible options passed > git-rebase.txt: address confusion between --no-ff vs --force-rebase > directory-rename-detection.txt: technical docs on abilities and > limitations > git-rebase.txt: document behavioral differences between modes > t3401: add directory rename testcases for rebase and am > git-rebase: make --allow-empty-message the default > > Documentation/git-rebase.txt | 135 ++ > .../technical/directory-rename-detection.txt | 115 +++ > git-rebase.sh | 43 +- > t/t3401-rebase-and-am-rename.sh | 105 ++ > t/t3404-rebase-interactive.sh | 7 +- > t/t3405-rebase-malformed.sh | 11 +- > t/t3422-rebase-incompatible-options.sh| 88 > 7 files changed, 462 insertions(+), 42 deletions(-) > create mode 100644 Documentation/technical/directory-rename-detection.txt > create mode 100755 t/t3401-rebase-and-am-rename.sh > create mode 100755 t/t3422-rebase-incompatible-options.sh > > 1: 3f454ebc5e = 1: 3f454ebc5e git-rebase.txt: document incompatible options > 2: 31a5a071a6 = 2: 31a5a071a6 git-rebase.sh: update help messages a bit > 3: 5a2b5eec79 ! 3: bc3a5a3f95 t3422: new testcases for checking when > incompatible options passed > @@ -62,7 +62,7 @@ > + > +test_expect_failure "$opt incompatible with > --strategy-option=ours" " > +git checkout B^0 && > -+test_must_fail git rebase $opt --strategy=ours A > ++test_must_fail git rebase $opt --strategy-option=ours A > +" > + > +test_expect_failure "$opt incompatible with --interactive" " > 4: 1e1c83724a ! 4: ca3b8327f7 git-rebase: error out when incompatible > options passed > @@ -104,7 +104,7 @@ > -test_expect_failure "$opt incompatible with > --strategy-option=ours" " > +test_expect_success "$opt incompatible with > --strategy-option=ours" " > git checkout B^0 && > - test_must_fail git rebase $opt --strategy=ours A > + test_must_fail git rebase $opt --strategy-option=ours A > " > > -test_expect_failure "$opt incompatible with --interactive" " > 5: 51023269d3 = 5: 6ac359359e git-rebase.txt: address confusion between > --no-ff vs --force-rebase > 6: f017d45dd9 = 6: e5c5db9110 directory-rename-detection.txt: technical > docs on abilities and limitations > 7: 0a359df404 = 7: e330437305 git-rebase.txt: document behavioral > differences between modes > 8: beaadceaef = 8: f704f7eee8 t3401: add directory rename testcases for > rebase and am > 9: 431b2c36d5 = 9: 436f597487 git-rebase: make --allow-empty-message the > default >
Re: [PATCH v7 07/22] commit-graph: add 'verify' subcommand
On 6/27/2018 5:59 PM, Jonathan Tan wrote: +int verify_commit_graph(struct repository *r, struct commit_graph *g) I haven't had the time to review this patch set, but I did rebase my object store refactoring [1] on this and wrote a test: static void test_verify_commit_graph(const char *gitdir, const char *worktree) { struct repository r; char *graph_name; struct commit_graph *graph; repo_init(&r, gitdir, worktree); graph_name = get_commit_graph_filename(r.objects->objectdir); graph = load_commit_graph_one(graph_name); printf("verification returned %d\n", verify_commit_graph(&r, graph)); repo_clear(&r); } However, it doesn't work because verify_commit_graph() invokes parse_commit_internal(), which tries to look up replace refs in the_repository. I think that verify_commit_graph() should not take a repository argument for now. To minimize churn on the review of this patch set, and to minimize diffs when we migrate parse_commit_internal() (and likely other functions) to take in a repository argument, I would be OK with something like the following instead: int verify_commit_graph(struct commit_graph *g) { /* * NEEDSWORK: Make r into a parameter when all functions * invoked by this function are not hardcoded to operate on * the_repository. */ struct repository *r = the_repository; /* ... */ As for my rebased refactoring, I'll send the patches to the mailing list once Junio updates ds/commit-graph-fsck with these latest changes, so that I can rebase once again on that and ensure that everything still works. [1] https://public-inbox.org/git/cover.1529616356.git.jonathanta...@google.com/ Thanks for looking into this, Jonathan. At some point I took my series and moved it on top of Stefan's lookup_object() series, but then moved off of it since those commits were not in 'pu'. This 'struct repository *r' didn't get removed in that process. Thanks, -Stolee
[PATCH v2 1/1] Makefile: fix the "built from commit" code
From: Johannes Schindelin In ed32b788c06 (version --build-options: report commit, too, if possible, 2017-12-15), we introduced code to let `git version --build-options` report the current commit from which the binaries were built, if any. To prevent erroneous commits from being reported (e.g. when unpacking Git's source code from a .tar.gz file into a subdirectory of a different Git project, as e.g. git_osx_installer does), we painstakingly set GIT_CEILING_DIRECTORIES when trying to determine the current commit. Except that we got the quoting wrong, and that variable therefore does not have the desired effect. The issue is that the $(shell) is resolved before the output is stuffed into the command-line with -DGIT_BUILT_FROM_COMMIT, and therefore is *not* inside quotes. And thus backslashing the quotes is wrong, as the quote gets literally inserted into the CEILING_DIRECTORIES variable. Let's fix that quoting, and while at it, also suppress the unhelpful message fatal: not a git repository (or any of the parent directories): .git that gets printed to stderr if no current commit could be determined, and might scare the occasional developer who simply tries to build Git from scratch. Signed-off-by: Johannes Schindelin --- Makefile | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/Makefile b/Makefile index 0cb6590f2..c70f823a0 100644 --- a/Makefile +++ b/Makefile @@ -2021,8 +2021,9 @@ version.sp version.s version.o: GIT-VERSION-FILE GIT-USER-AGENT version.sp version.s version.o: EXTRA_CPPFLAGS = \ '-DGIT_VERSION="$(GIT_VERSION)"' \ '-DGIT_USER_AGENT=$(GIT_USER_AGENT_CQ_SQ)' \ - '-DGIT_BUILT_FROM_COMMIT="$(shell GIT_CEILING_DIRECTORIES=\"$(CURDIR)/..\" \ - git rev-parse -q --verify HEAD || :)"' + '-DGIT_BUILT_FROM_COMMIT="$(shell \ + GIT_CEILING_DIRECTORIES="$(CURDIR)/.." \ + git rev-parse -q --verify HEAD 2>/dev/null)"' $(BUILT_INS): git$X $(QUIET_BUILT_IN)$(RM) $@ && \ -- gitgitgadget
[PATCH v2 0/1] Fix "built from commit" logic
When I tried recently to build macOS installers via Tim Harper's wonderful project at https://github.com/timcharper/git_osx_installer, it worked (with a couple of quirks), but it reported to be built from a commit that I first could not place. Turns out that the git_osx_installer project insists on building Git from a .tar.gz file (even if I have the source code right here, in a perfectly fine worktree). And due to a bug in the logic I introduced, it did not stop looking for a Git repository where it should have stopped. The end effect is that `git version --build-options` reports being built from git_osx_installer's HEAD. This commit fixes that, and also suppresses the error when no repository could be found. Changes since v1: - the commit message now sports an explanatory paragraph, copy-edited from Peff's reply. Johannes Schindelin (1): Makefile: fix the "built from commit" code Makefile | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) base-commit: e3331758f12da22f4103eec7efe1b5304a9be5e9 Published-As: https://github.com/gitgitgadget/git/releases/tags/pr-7%2Fdscho%2Ffix-build-options-commit-info-v2 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-7/dscho/fix-build-options-commit-info-v2 Pull-Request: https://github.com/gitgitgadget/git/pull/7 Range-diff vs v1: 1: e0e41d0b8 ! 1: aca087479 Makefile: fix the "built from commit" code @@ -15,6 +15,11 @@ Except that we got the quoting wrong, and that variable therefore does not have the desired effect. +The issue is that the $(shell) is resolved before the output is stuffed +into the command-line with -DGIT_BUILT_FROM_COMMIT, and therefore is +*not* inside quotes. And thus backslashing the quotes is wrong, as the +quote gets literally inserted into the CEILING_DIRECTORIES variable. + Let's fix that quoting, and while at it, also suppress the unhelpful message -- gitgitgadget
Re: [PATCH 1/1] Makefile: fix the "built from commit" code
Hi Peff, On Thu, 28 Jun 2018, Jeff King wrote: > On Thu, Jun 28, 2018 at 10:27:32AM -0700, Junio C Hamano wrote: > > > Johannes Schindelin writes: > > > > >> I.e.: > > >> > > >> FOO='with spaces' > > >> BAR=$FOO sh -c 'echo $BAR' > > >> > > >> works just fine. > > > > > > $ x="two spaces" > > > > > > $ echo $x > > > two spaces > > > > > > Maybe we should quote a little bit more religiously. > > > > Both of you are wrong ;-) Technically, you did not contradict anything I said. > > Of course, the lack of dq around echo's argument makes shell split > > two and spaces into two args and feed them separately to echo, and > > causes echo to show them with a single SP in between. Peff's > > exampel should have been > > > > BAR=$FOO sh -c 'echo "$BAR"' > > Yes, that's a better example. I was primarily trying to show that the > outer shell did not barf with "spaces: command not found". > > > But that does not have much to do with the primary point Peff was > > talking about, which is that in this sequence: > > > > $ x="two spaces" > > $ y="$x" > > $ z=$x > > $ echo "x=<$x>" "y=<$y>" "z=<$z>" > > > > assignment to y and z behave identically, i.e. dq around "$x" when > > assigning to y is not needed. > > I actually had to test it to convince myself that one-shot assignments > behaved the same way, but they do. The mere fact that you had to test it out to convince yourself suggests to me that my suspicion "Maybe we should quote a little bit more religiously" was 100% spot on. After all, almost *none* of the reviews on this mailing list involve anything like "testing it out". It happens in the mailing program, and it stays in the mailing program. Ciao, Dscho
Re: [RFC PATCH v5] Implement --first-parent for git rev-list --bisect
Hi Junio, On Thu, 28 Jun 2018, Junio C Hamano wrote: > What I meant by "many separte grep calls" was to contrast these two > approaches: > > * Have one typical output spelled out as "expect", take an output >from an actual run into "actual", make them comparable and then >do a compare (which does not use grep; it uses sort in the >"making comparable" phase) > > * Not have any typical output, take an output from an actual run, >and have _code_ that inspects the output encode the rule over the >output (e.g. "these must exist" is inspected with many grep >invocations) > > Two things the "output must have 9 entries, and these 9 must be > mentioned" we see at the beginning of this message does not verify > are (1) exact dist value given to each of these entries and (2) the > order in which these entries appear in the output. The latter is > something we document, and the test should cover. The former does > not have to be cast in stone (i.e. I do not think it does not make a > difference to label the edge commits with dist=1 or dist=0 as long > as everything is consistent), but if there is no strong reason to > keep it possible for us to later change how the numbers are assigned, > I am OK if the test cast it in stone. > > Another reason (other than "many separate invocation of grep") to > favor the former approach (i.e. use real-looking expected output, > munge it and the real output into comparable forms and then compare) > is that it is easier to see what aspect of the output we care (and > we do not care) about. > > It is harder to see the omission of exact dist value and ordering of > entries in the outpu in the latter approach, and more importantly, > know if the omission was deliberate (e.g. it was merely an example) > or a mere mistake. > > With "using a real-looking expected output, make it and the actual > output comparable and then compare" approach, the aspects in the > output we choose not to care about will show in the "make them > comparable" munging. If we do not care the exact dist values, there > would be something like s/dist=[0-9]*/dist=X/ to mask the exact > value before making the two comparable to see that the expect and > the actual files have the same entries. If we still do care about > the entries are ordered by the dist values, there would be something > that sorts the entries with the actual dist values before doing that > masking to ensure if the order is correct. The problem here is of course that you *cannot* set the exact output in stone, because of sorting instabilities. So you have to play tricks to sort (twice, with different keys) the expected output and the actual output, to verify that all the expected commits are listed (and only those) and to verify that they are ordered by the distance, in descending order. And this trick, while it still makes the test correct and stable and yadda yadda, *also* makes this trick a lot less readable than my version. And therefore makes it more difficult to verify that your test actually does what it is supposed to do. Ciao, Dscho
ag/rebase-i-append-todo-help, was Re: What's cooking in git.git (Jun 2018, #07; Thu, 28)
Hi Junio, Le 28/06/2018 à 23:40, Junio C Hamano a écrit : > * ag/rebase-i-append-todo-help (2018-06-14) 2 commits > - rebase--interactive: rewrite append_todo_help() in C > - Merge branch 'ag/rebase-p' into ag/rebase-i-append-todo-help > (this branch is used by ag/rebase-i-rewrite-todo.) > > Stepwise rewriting of the machinery of "rebase -i" into C continues. > > Needs a reroll. > cf. > > As I moved append_todo_help() to `rebase-interactive.c`, it should be because I did not changed `msg = _(…)` to `msg = N_(…)`. It was pointed out on IRC that it was not necessary, after all[0]. Basically, N_() is needed for static variables, not for "dynamic" variables like `msg` in append_todo_help(). [0] http://colabti.org/irclogger/irclogger_log/git-devel?date=2018-06-26#l44 Cheers, Alban
fast-import slowness when importing large files with small differences
Hi, I noticed some slowness when fast-importing data from the Firefox mercurial repository, where fast-import spends more than 5 minutes importing ~2000 revisions of one particular file. I reduced a testcase while still using real data. One could synthesize data with kind of the same properties, but I figured real data could be useful. To reproduce: $ git clone https://gist.github.com/b6b8edcff2005cc482cf84972adfbba9.git foo $ git init bar $ cd bar $ python ../foo/import.py ../foo/data.gz | git fast-import --depth=2000 (--depth=2000 to minimize the pack size) The python script doesn't have much overhead: $ time python ../foo/import.py ../foo/data.gz > /dev/null real0m14.564s user0m9.813s sys 0m4.703s It generates about 26GB of data from that 4.2MB data.gz. $ python ../foo/import.py ../foo/data.gz | time git fast-import --depth=2000 git-fast-import statistics: - Alloc'd objects: 5000 Total objects: 1868 ( 133 duplicates ) blobs : 1868 ( 133 duplicates 1867 deltas of 1868 attempts) trees :0 ( 0 duplicates 0 deltas of 0 attempts) commits:0 ( 0 duplicates 0 deltas of 0 attempts) tags :0 ( 0 duplicates 0 deltas of 0 attempts) Total branches: 0 ( 0 loads ) marks: 1024 ( 0 unique) atoms: 0 Memory total: 2282 KiB pools: 2048 KiB objects: 234 KiB - pack_report: getpagesize()= 4096 pack_report: core.packedGitWindowSize = 1073741824 pack_report: core.packedGitLimit = 35184372088832 pack_report: pack_used_ctr= 0 pack_report: pack_mmap_calls = 0 pack_report: pack_open_windows= 0 / 0 pack_report: pack_mapped = 0 / 0 - 321.61user 6.60system 5:50.08elapsed 93%CPU (0avgtext+0avgdata 83192maxresident)k 0inputs+10568outputs (0major+38689minor)pagefaults 0swaps (The resulting pack is 5.3MB, fwiw) Obviously, sha1'ing 26GB is not going to be free, but it's also not the dominating cost, according to perf: 63.52% git-fast-import git-fast-import [.] create_delta_index 17.46% git-fast-import git-fast-import [.] sha1_compression_states 9.89% git-fast-import git-fast-import [.] ubc_check 6.23% git-fast-import git-fast-import [.] create_delta 2.49% git-fast-import git-fast-import [.] sha1_process That's a whole lot of time spent on create_delta_index. FWIW, if delta was 100% free (yes, I tested that), the fast-import would take 1:40 with the following profile: 58.74% git-fast-import git-fast-import [.] sha1_compression_states 32.45% git-fast-import git-fast-import [.] ubc_check 8.25% git-fast-import git-fast-import [.] sha1_process I toyed with the idea of eliminating common head and tail before creating the delta, and got some promising result: a fast-import taking 3:22 instead of 5:50, with the following profile: 34.67% git-fast-import git-fast-import [.] create_delta_index 30.88% git-fast-import git-fast-import [.] sha1_compression_states 17.15% git-fast-import git-fast-import [.] ubc_check 7.25% git-fast-import git-fast-import [.] store_object 4.47% git-fast-import git-fast-import [.] sha1_process 2.72% git-fast-import git-fast-import [.] create_delta2 The resulting pack is however much larger (for some reason, many objects are left non-deltaed), and the deltas are partly broken (they don't apply cleanly), but that just tells the code is not ready to be sent. I don't expect working code would be much slower than this. The remaining question is whether this is beneficial for more normal cases. I also seemed to remember when I tested a while ago, that somehow xdiff handles those files faster than diff-delta, and I'm wondering if it would make sense to to make the pack code use xdiff. So I tested replacing diff_delta with a call to xdi_diff_outf with a callback that does nothing and zeroed out xpparam_t and xdemitconf_t (not sure that's best, though, I haven't looked very deeply), and that finished in 5:15 with the following profile (without common head trimming, xdiff-interface apparently does common tail trimming): 32.99% git-fast-import git-fast-import [.] xdl_prepare_ctx.isra.0 20.42% git-fast-import git-fast-import [.] sha1_compression_states 15.26% git-fast-import git-fast-import [.] xdl_hash_record 11.65% git-fast-import git-fast-import [.] ubc_check 3.09% git-fast-import git-