Kevin Wern <[email protected]> writes:
> Use transport_download_primer and transport_prime_clone in git clone.
> This only supports a fully connected packfile.
>
> transport_prime_clone and transport_download_primer are executed
> completely independent of transport_(get|fetch)_remote_refs, et al.
> transport_download_primer is executed based on the existence of an
> alt_resource. The idea is that the "prime clone" execution should be
> able to attempt retrieving an alternate resource without dying, as
> opposed to depending on the result of upload pack's "capabilities" to
> indicate whether or not the client can attempt it.
>
> If a resumable resource is available, execute a codepath with the
> following modular components:
> - downloading resource to a specific directory
> - using the resource (for pack, indexing and generating the bundle
> file)
> - cleaning up the resource (if the download or use fails)
> - cleaning up the resource (if the download or use succeeds)
>
> If resume is interrupted on the client side, the alternate resource
> info is written to the RESUMABLE file in the git directory.
>
> On resume, the required info is extracted by parsing the created
> config file, and that info is used to determine the work and git
> directories. If these cannot be determined, the program exits.
> The writing of the refspec and determination of the initial git
> directories are skipped, along with transport_prime_clone.
>
> The main purpose of this series of patches is to flesh out a codepath
> for automatic resuming, manual resuming, and leaving a resumable
> directory on exit--the logic for when to do these still needs more
> work.
>
> Signed-off-by: Kevin Wern <[email protected]>
> ---
> Documentation/git-clone.txt | 16 ++
> builtin/clone.c | 590
> +++++++++++++++++++++++++++++++++++++-------
> t/t9904-git-prime-clone.sh | 181 ++++++++++++++
> 3 files changed, 698 insertions(+), 89 deletions(-)
> create mode 100755 t/t9904-git-prime-clone.sh
>
> diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
> index b7c467a..5934bb6 100644
> --- a/Documentation/git-clone.txt
> +++ b/Documentation/git-clone.txt
> @@ -16,6 +16,7 @@ SYNOPSIS
> [--depth <depth>] [--[no-]single-branch]
> [--recursive | --recurse-submodules] [--] <repository>
> [<directory>]
> +'git clone --resume <resumable_dir>'
>
> DESCRIPTION
> -----------
> @@ -172,6 +173,12 @@ objects from the source repository into a pack in the
> cloned repository.
> via ssh, this specifies a non-default path for the command
> run on the other end.
>
> +--prime-clone <prime-clone>::
> +-p <prime-clone>::
Not many other options have single letter shorthand. Is it expected
that it is worth to let this option squat on a short-and-sweet "-p",
perhaps because it is so frequently used?
> +--resume::
> + Resume a partially cloned repo in a "resumable" state. This
> + can only be specified with a single local directory (<resumable
> + dir>). This is incompatible with all other options.
> +
> +<resumable_dir>::
> + The directory of the partial clone. This could be either the
> + work directory or the git directory.
I think these should be described this way:
--resume <resumable_dir>::
description if what resume option does and how resumable_dir
is used by the option.
in a single bullet point.
> diff --git a/builtin/clone.c b/builtin/clone.c
> index 9ac6c01..d9a13dc 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -8,7 +8,9 @@
> * Clone a repository into a different directory that does not yet exist.
> */
>
> +#include "cache.h"
> #include "builtin.h"
I do not think you need to include cache.h if you are including
builtin.h; Documentation/CodingGuidelines says:
- The first #include in C files, except in platform specific compat/
implementations, must be either "git-compat-util.h", "cache.h" or
"builtin.h". You do not have to include more than one of these.
> @@ -40,17 +42,20 @@ static const char * const builtin_clone_usage[] = {
>
> static int option_no_checkout, option_bare, option_mirror,
> option_single_branch = -1;
> static int option_local = -1, option_no_hardlinks, option_shared,
> option_recursive;
> +static int option_resume;
> static char *option_template, *option_depth;
> -static char *option_origin = NULL;
> +static const char *option_origin = NULL;
Is this change related to anything you are doing here?
If you are fixing things while at it, please don't ;-) If you really
want to, please also remove " = NULL", from this line and also from
the next line. Also do not add " = NULL" at the end of alt_res.
> static char *option_branch = NULL;
> ...
> +static const struct alt_resource *alt_res = NULL;
> +static char *get_filename(const char *dir)
> +{
> + char *dir_copy = xstrdup(dir);
> + strip_trailing_slashes(dir_copy);
> + char *filename, *final = NULL;
> +
> + filename = find_last_dir_sep(dir);
> +
> + if (filename && *(++filename))
> + final = xstrdup(filename);
> +
> + free(dir_copy);
> + return final;
> +}
Hmph, don't we have our own basename(3) lookalike that knows about
dir-sep already?
> @@ -451,6 +475,7 @@ static const char *junk_work_tree;
> static const char *junk_git_dir;
> static enum {
> JUNK_LEAVE_NONE,
> + JUNK_LEAVE_RESUMABLE,
> JUNK_LEAVE_REPO,
> JUNK_LEAVE_ALL
> } junk_mode = JUNK_LEAVE_NONE;
> @@ -460,6 +485,29 @@ N_("Clone succeeded, but checkout failed.\n"
> "You can inspect what was checked out with 'git status'\n"
> "and retry the checkout with 'git checkout -f HEAD'\n");
>
> +static const char junk_leave_resumable_msg[] =
> +N_("Clone interrupted while copying resumable resource.\n"
> + "Try using 'git clone --resume <new_directory>',\n"
> + "where <new_directory> is either the new working \n"
> + "directory or git directory.\n\n"
> + "If this does not succeed, it could be because the\n"
> + "resource has been moved, corrupted, or changed.\n"
> + "If this is the case, you should remove <new_directory>\n"
> + "and run the original command.\n");
> +
> +static void write_resumable_resource()
> +{
> + const char *filename = git_path_resumable();
> + struct strbuf content = STRBUF_INIT;
> + strbuf_addf(&content, "%s\n%s\n", alt_res->url, alt_res->filetype);
> + int fd = open(filename, O_WRONLY | O_CREAT, 0666);
> + if (fd < 0)
> + die_errno(_("Could not open '%s' for writing"), filename);
> + if (write_in_full(fd, content.buf, content.len) != content.len)
> + die_errno(_("Could not write to '%s'"), filename);
> + close(fd);
> +}
>
> static void remove_junk(void)
> {
> struct strbuf sb = STRBUF_INIT;
> @@ -467,7 +515,11 @@ static void remove_junk(void)
> switch (junk_mode) {
> case JUNK_LEAVE_REPO:
> warning("%s", _(junk_leave_repo_msg));
> - /* fall-through */
> + return;
> + case JUNK_LEAVE_RESUMABLE:
> + write_resumable_resource();
> + warning("%s", _(junk_leave_resumable_msg));
> + return;
Nice.
> @@ -562,7 +614,7 @@ static void write_remote_refs(const struct ref
> *local_refs)
> die("%s", err.buf);
>
> for (r = local_refs; r; r = r->next) {
> - if (!r->peer_ref)
> + if (!r->peer_ref || ref_exists(r->peer_ref->name))
> continue;
> if (ref_transaction_create(t, r->peer_ref->name,
> r->old_oid.hash,
> 0, NULL, &err))
What is this change about?
> @@ -820,11 +872,296 @@ static void dissociate_from_references(void)
> free(alternates);
> }
>
> +static int do_index_pack(const char *in_pack_file, const char *out_idx_file)
> +{
> + const char *argv[] = { "index-pack", "--clone-bundle", "-v",
> + "--check-self-contained-and-connected", "-o",
> + out_idx_file, in_pack_file, NULL };
> + return run_command_v_opt(argv, RUN_GIT_CMD|RUN_COMMAND_NO_STDOUT);
> +}
That looks vaguely familiar ;-)
> +static const char *setup_and_index_pack(const char *filename)
> +{
> + const char *primer_idx_path = NULL, *primer_bndl_path = NULL;
> + primer_idx_path = replace_extension(filename, ".pack", ".idx");
> + primer_bndl_path = replace_extension(filename, ".pack", ".bndl");
> +
> + if (!(primer_idx_path && primer_bndl_path)) {
> + warning("invalid pack filename '%s', falling back to full "
> + "clone", filename);
> + return NULL;
> + }
> +
> + if (!file_exists(primer_bndl_path)) {
> + if (do_index_pack(filename, primer_idx_path)) {
> + warning("could not index primer pack, falling back to "
> + "full clone");
> + return NULL;
> + }
> + }
Can it be another (undetected) failure mode that .bndl somehow
already existed, but not .idx, leaving the resulting object store in
an incosistent state? Can do_index_pack() fail and leave .bndl
behind to get you into such a state?
> +static int write_bundle_refs(const char *bundle_filename)
> +{
> + struct ref_transaction *t;
> + struct bundle_header history_tips;
> + const char *temp_ref_base = "resume";
> + struct strbuf err = STRBUF_INIT;
> + int i;
> +
> + init_bundle_header(&history_tips, bundle_filename);
> + read_bundle_header(&history_tips);
> +
> + t = ref_transaction_begin(&err);
> + for (i = 0; i < history_tips.references.nr; i++) {
> + struct strbuf ref_name = STRBUF_INIT;
> + strbuf_addf(&ref_name, "refs/temp/%s/%s/temp-%s",
> + option_origin, temp_ref_base,
> + sha1_to_hex(history_tips.references.list[i].sha1));
Can we do this without polluting refs/temp/ namespace?
I am imagining that you are first fetching the .pack file from
sideways when primer service is available, running index-pack on it
to produce the bundle, and the step after that is to run "git fetch"
against the original remote to fill the gap between the bit-stale
history you got in the bundle and the reality that has progressed
since the primer pack was made, and you need a way to tell to the
other end that you already have the history leading to these refs
when you run "git fetch". I think a bit better way to do so is to
send these has ".have" while you run the "fetch".
Wouldn't it do if you add the "--advertise-bundle-tips=<bndl>"
option to "git fetch", move the code to read the bundle header to
it, and point the bundle's filename with the option when you spawn
"git fetch"?