In a partial clone, whenever a sparse checkout occurs, the existence of
all blobs in the index is verified, whether they are included or
excluded by the .git/info/sparse-checkout specification. This
significantly degrades performance because a lazy fetch occurs whenever
the existence of a missing blob is checked.

This is because cache_tree_update() checks the existence of all objects
in the index, whether or not CE_SKIP_WORKTREE is set on them. Teach
cache_tree_update() to skip checking CE_SKIP_WORKTREE objects when the
repository is a partial clone. This improves performance for sparse
checkout and also other operations that use cache_tree_update().

Instead of completely removing the check, an argument could be made that
the check should instead be replaced by a check that the blob is
promised, but for performance reasons, I decided not to do this.
If the user needs to verify the repository, it can be done using fsck
(which will notify if a tree points to a missing and non-promised blob,
whether the blob is included or excluded by the sparse-checkout
specification).

Signed-off-by: Jonathan Tan <jonathanta...@google.com>
---
Changes from v1:

After feedback, I restricted this to partial clone. Once restricted, I
agree with Ben that this can be done for all users of
cache_tree_update(), not just unpack-trees, so I have removed the
ability to control the behavior using a flag.

I also took the opportunity to simplify the missing check by using a
variable.
---
 cache-tree.c                     |  6 +++++-
 t/t1090-sparse-checkout-scope.sh | 33 ++++++++++++++++++++++++++++++++
 2 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/cache-tree.c b/cache-tree.c
index 5ce51468f0..f210481f9b 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -326,6 +326,7 @@ static int update_one(struct cache_tree *it,
                unsigned mode;
                int expected_missing = 0;
                int contains_ita = 0;
+               int ce_missing_ok;
 
                path = ce->name;
                pathlen = ce_namelen(ce);
@@ -355,8 +356,11 @@ static int update_one(struct cache_tree *it,
                        i++;
                }
 
+               ce_missing_ok = mode == S_IFGITLINK || missing_ok ||
+                       (repository_format_partial_clone &&
+                        ce_skip_worktree(ce));
                if (is_null_oid(oid) ||
-                   (mode != S_IFGITLINK && !missing_ok && 
!has_object_file(oid))) {
+                   (!ce_missing_ok && !has_object_file(oid))) {
                        strbuf_release(&buffer);
                        if (expected_missing)
                                return -1;
diff --git a/t/t1090-sparse-checkout-scope.sh b/t/t1090-sparse-checkout-scope.sh
index 25d7c700f6..090b7fc3d3 100755
--- a/t/t1090-sparse-checkout-scope.sh
+++ b/t/t1090-sparse-checkout-scope.sh
@@ -63,4 +63,37 @@ test_expect_success 'return to full checkout of master' '
        test "$(cat b)" = "modified"
 '
 
+test_expect_success 'in partial clone, sparse checkout only fetches needed 
blobs' '
+       test_create_repo server &&
+       git clone "file://$(pwd)/server" client &&
+
+       test_config -C server uploadpack.allowfilter 1 &&
+       test_config -C server uploadpack.allowanysha1inwant 1 &&
+       echo a >server/a &&
+       echo bb >server/b &&
+       mkdir server/c &&
+       echo ccc >server/c/c &&
+       git -C server add a b c/c &&
+       git -C server commit -m message &&
+
+       test_config -C client core.sparsecheckout 1 &&
+       test_config -C client extensions.partialclone origin &&
+       echo "!/*" >client/.git/info/sparse-checkout &&
+       echo "/a" >>client/.git/info/sparse-checkout &&
+       git -C client fetch --filter=blob:none origin &&
+       git -C client checkout FETCH_HEAD &&
+
+       git -C client rev-list HEAD \
+               --quiet --objects --missing=print >unsorted_actual &&
+       (
+               printf "?" &&
+               git hash-object server/b &&
+               printf "?" &&
+               git hash-object server/c/c
+       ) >unsorted_expect &&
+       sort unsorted_actual >actual &&
+       sort unsorted_expect >expect &&
+       test_cmp expect actual
+'
+
 test_done
-- 
2.19.0.271.gfe8321ec05.dirty

Reply via email to