On 8/23/2019 7:17 PM, Elijah Newren wrote:
> On Tue, Aug 20, 2019 at 8:12 AM Derrick Stolee via GitGitGadget
> <[email protected]> wrote:
>>
>> From: Derrick Stolee <[email protected]>
>>
>> When someone wants to clone a large repository, but plans to work
>> using a sparse-checkout file, they either need to do a full
>> checkout first and then reduce the patterns they included, or
>> clone with --no-checkout, set up their patterns, and then run
>> a checkout manually. This requires knowing a lot about the repo
>> shape and how sparse-checkout works.
>>
>> Add a new '--sparse' option to 'git clone' that initializes the
>> sparse-checkout file to include the following patterns:
>>
>> /*
>> !/*/*
>>
>> These patterns include every file in the root directory, but
>> no directories. This allows a repo to include files like a
>> README or a bootstrapping script to grow enlistments from that
>> point.
>
> Nice.
>
>>
>> Signed-off-by: Derrick Stolee <[email protected]>
>> ---
>> Documentation/git-clone.txt | 8 +++++++-
>> builtin/clone.c | 27 +++++++++++++++++++++++++++
>> t/t1091-sparse-checkout-builtin.sh | 13 +++++++++++++
>> 3 files changed, 47 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
>> index 34011c2940..0fe91d2f04 100644
>> --- a/Documentation/git-clone.txt
>> +++ b/Documentation/git-clone.txt
>> @@ -15,7 +15,7 @@ SYNOPSIS
>> [--dissociate] [--separate-git-dir <git dir>]
>> [--depth <depth>] [--[no-]single-branch] [--no-tags]
>> [--recurse-submodules[=<pathspec>]] [--[no-]shallow-submodules]
>> - [--[no-]remote-submodules] [--jobs <n>] [--] <repository>
>> + [--[no-]remote-submodules] [--jobs <n>] [--sparse] [--]
>> <repository>
>> [<directory>]
>>
>> DESCRIPTION
>> @@ -156,6 +156,12 @@ objects from the source repository into a pack in the
>> cloned repository.
>> used, neither remote-tracking branches nor the related
>> configuration variables are created.
>>
>> +--sparse::
>> + Initialize the sparse-checkout file so the working
>> + directory starts with only the files in the root
>> + of the repository. The sparse-checkout file can be
>> + modified to grow the working directory as needed.
>> +
>> --mirror::
>> Set up a mirror of the source repository. This implies `--bare`.
>> Compared to `--bare`, `--mirror` not only maps local branches of the
>> diff --git a/builtin/clone.c b/builtin/clone.c
>> index f665b28ccc..d6d49a73ff 100644
>> --- a/builtin/clone.c
>> +++ b/builtin/clone.c
>> @@ -60,6 +60,7 @@ static const char *real_git_dir;
>> static char *option_upload_pack = "git-upload-pack";
>> static int option_verbosity;
>> static int option_progress = -1;
>> +static int option_sparse_checkout;
>> static enum transport_family family;
>> static struct string_list option_config = STRING_LIST_INIT_NODUP;
>> static struct string_list option_required_reference =
>> STRING_LIST_INIT_NODUP;
>> @@ -147,6 +148,8 @@ static struct option builtin_clone_options[] = {
>> OPT_PARSE_LIST_OBJECTS_FILTER(&filter_options),
>> OPT_BOOL(0, "remote-submodules", &option_remote_submodules,
>> N_("any cloned submodules will use their remote-tracking
>> branch")),
>> + OPT_BOOL(0, "sparse", &option_sparse_checkout,
>> + N_("initialize sparse-checkout file to include only
>> files at root")),
>> OPT_END()
>> };
>>
>> @@ -734,6 +737,27 @@ static void update_head(const struct ref *our, const
>> struct ref *remote,
>> }
>> }
>>
>> +static int git_sparse_checkout_init(const char *repo)
>> +{
>> + struct argv_array argv = ARGV_ARRAY_INIT;
>> + int result = 0;
>> + argv_array_pushl(&argv, "-C", repo, "sparse-checkout", "init", NULL);
>> +
>> + /*
>> + * We must apply the setting in the current process
>> + * for the later checkout to use the sparse-checkout file.
>> + */
>> + core_apply_sparse_checkout = 1;
>> +
>> + if (run_command_v_opt(argv.argv, RUN_GIT_CMD)) {
>> + error(_("failed to initialize sparse-checkout"));
>> + result = 1;
>> + }
>
> Sigh...so much forking of additional processes. I'd really rather
> that we were reducing how much of this we are doing in the codebase
> instead of adding more. Every fork makes following stuff in a
> debugger harder.
At the moment, this is the simplest way to do this interaction. The
init subcommand is doing multiple things, and we can consider moving
this to be a library method instead of builtin-specific code later.
This is not a huge performance hit, as "clone" is called only once
per repo.
>> +
>> + argv_array_clear(&argv);
>> + return result;
>> +}
>> +
>> static int checkout(int submodule_progress)
>> {
>> struct object_id oid;
>> @@ -1107,6 +1131,9 @@ int cmd_clone(int argc, const char **argv, const char
>> *prefix)
>> if (option_required_reference.nr || option_optional_reference.nr)
>> setup_reference();
>>
>> + if (option_sparse_checkout && git_sparse_checkout_init(repo))
>> + return 1;
>> +
>> remote = remote_get(option_origin);
>>
>> strbuf_addf(&default_refspec, "+%s*:%s*", src_ref_prefix,
>> diff --git a/t/t1091-sparse-checkout-builtin.sh
>> b/t/t1091-sparse-checkout-builtin.sh
>> index 35ab84aabd..b7d5f15830 100755
>> --- a/t/t1091-sparse-checkout-builtin.sh
>> +++ b/t/t1091-sparse-checkout-builtin.sh
>> @@ -87,4 +87,17 @@ test_expect_success 'init with existing sparse-checkout' '
>> test_cmp expect dir
>> '
>>
>> +test_expect_success 'clone --sparse' '
>> + git clone --sparse repo clone &&
>> + git -C clone sparse-checkout list >actual &&
>> + cat >expect <<-EOF &&
>> + /*
>> + !/*/*
>> + EOF
>> + test_cmp expect actual &&
>> + ls clone >dir &&
>> + echo a >expect &&
>> + test_cmp expect dir
>
> Checking that a toplevel entry is present, but not checking that an
> entry from a subdir is missing as expected?
This test is checking that the file "a" is the _only_ entry in the root
of the repo. The directories "folder1" and "folder2" are not present, since
we are comparing the ls output to "expect".
Thanks,
-Stolee