Re: [PATCH 0/5] Add a new "sparse" tree walk algorithm
On 11/28/2018 5:18 PM, Ævar Arnfjörð Bjarmason wrote: This is really interesting. I tested this with: diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 124b1bafc4..5c7615f06c 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -3143 +3143 @@ static void get_object_list(int ac, const char **av) - mark_edges_uninteresting(, show_edge, sparse); + mark_edges_uninteresting(, show_edge, 1); To emulate having a GIT_TEST_* mode for this, which seems like a good idea since it turned up a lot of segfaults in pack-objects. I wasn't able to get a backtrace for that since it always happens indirectly, and I didn't dig enough to see how to manually invoke it the right way. Thanks for double-checking this. I had run a similar test in my prototype implementation, but over-simplified some code when rewriting it for submission (and then forgot to re-run that test). Specifically, these null checks are important: diff --git a/list-objects.c b/list-objects.c index 9bb93d1640..7e864b4db8 100644 --- a/list-objects.c +++ b/list-objects.c @@ -232,6 +232,10 @@ static void add_edge_parents(struct commit *commit, for (parents = commit->parents; parents; parents = parents->next) { struct commit *parent = parents->item; struct tree *tree = get_commit_tree(parent); + + if (!tree) + continue; + oidset_insert(set, >object.oid); if (!(parent->object.flags & UNINTERESTING)) @@ -261,6 +265,8 @@ void mark_edges_uninteresting(struct rev_info *revs, if (sparse) { struct tree *tree = get_commit_tree(commit); + if (!tree) + continue; if (commit->object.flags & UNINTERESTING) tree->object.flags |= UNINTERESTING; I will definitely include a GIT_TEST_* variable in v2. Thanks, -Stolee
Re: [PATCH 0/5] Add a new "sparse" tree walk algorithm
On Wed, Nov 28 2018, Derrick Stolee via GitGitGadget wrote: > One of the biggest remaining pain points for users of very large > repositories is the time it takes to run 'git push'. We inspected some slow > pushes by our developers and found that the "Enumerating Objects" phase of a > push was very slow. This is unsurprising, because this is why reachability > bitmaps exist. However, reachability bitmaps are not available to us because > of the single pack-file requirement. The bitmap approach is intended for > servers anyway, and clients have a much different behavior pattern. > > Specifically, clients are normally pushing a very small number of objects > compared to the entire working directory. A typical user changes only a > small cone of the working directory, so let's use that to our benefit. > > Create a new "sparse" mode for 'git pack-objects' that uses the paths that > introduce new objects to direct our search into the reachable trees. By > collecting trees at each path, we can then recurse into a path only when > there are uninteresting and interesting trees at that path. This gains a > significant performance boost for small topics while presenting a > possibility of packing extra objects. > > The main algorithm change is in patch 4, but is set up a little bit in > patches 1 and 2. > > As demonstrated in the included test script, we see that the existing > algorithm can send extra objects due to the way we specify the "frontier". > But we can send even more objects if a user copies objects from one folder > to another. I say "copy" because a rename would (usually) change the > original folder and trigger a walk into that path, discovering the objects. > > In order to benefit from this approach, the user can opt-in using the > pack.useSparse config setting. This setting can be overridden using the > '--no-sparse' option. This is really interesting. I tested this with: diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 124b1bafc4..5c7615f06c 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -3143 +3143 @@ static void get_object_list(int ac, const char **av) - mark_edges_uninteresting(, show_edge, sparse); + mark_edges_uninteresting(, show_edge, 1); To emulate having a GIT_TEST_* mode for this, which seems like a good idea since it turned up a lot of segfaults in pack-objects. I wasn't able to get a backtrace for that since it always happens indirectly, and I didn't dig enough to see how to manually invoke it the right way.
[PATCH 0/5] Add a new "sparse" tree walk algorithm
One of the biggest remaining pain points for users of very large repositories is the time it takes to run 'git push'. We inspected some slow pushes by our developers and found that the "Enumerating Objects" phase of a push was very slow. This is unsurprising, because this is why reachability bitmaps exist. However, reachability bitmaps are not available to us because of the single pack-file requirement. The bitmap approach is intended for servers anyway, and clients have a much different behavior pattern. Specifically, clients are normally pushing a very small number of objects compared to the entire working directory. A typical user changes only a small cone of the working directory, so let's use that to our benefit. Create a new "sparse" mode for 'git pack-objects' that uses the paths that introduce new objects to direct our search into the reachable trees. By collecting trees at each path, we can then recurse into a path only when there are uninteresting and interesting trees at that path. This gains a significant performance boost for small topics while presenting a possibility of packing extra objects. The main algorithm change is in patch 4, but is set up a little bit in patches 1 and 2. As demonstrated in the included test script, we see that the existing algorithm can send extra objects due to the way we specify the "frontier". But we can send even more objects if a user copies objects from one folder to another. I say "copy" because a rename would (usually) change the original folder and trigger a walk into that path, discovering the objects. In order to benefit from this approach, the user can opt-in using the pack.useSparse config setting. This setting can be overridden using the '--no-sparse' option. Derrick Stolee (5): revision: add mark_tree_uninteresting_sparse list-objects: consume sparse tree walk pack-objects: add --sparse option revision: implement sparse algorithm pack-objects: create pack.useSparse setting Documentation/git-pack-objects.txt | 9 +- bisect.c | 2 +- builtin/pack-objects.c | 9 +- builtin/rev-list.c | 2 +- http-push.c| 2 +- list-objects.c | 51 ++- list-objects.h | 4 +- revision.c | 113 +++ revision.h | 2 + t/t5322-pack-objects-sparse.sh | 139 + 10 files changed, 323 insertions(+), 10 deletions(-) create mode 100755 t/t5322-pack-objects-sparse.sh base-commit: a1598010f775d82b5adf12c29d0f5bc9b41434c6 Published-As: https://github.com/gitgitgadget/git/releases/tags/pr-89%2Fderrickstolee%2Fpush%2Fsparse-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-89/derrickstolee/push/sparse-v1 Pull-Request: https://github.com/gitgitgadget/git/pull/89 -- gitgitgadget