Nikhil Benesch <nikhil.bene...@gmail.com> writes:

> This commit is a revival of Lars Hjemli's 2009 patch to provide an
> option to include submodules in the output of `git archive`.
>
> The `--recurse-submodules` option (named consistently with fetch, clone,
> and ls-files) will recursively traverse submodules in the repository and
> consider their contents for inclusion in the output archive, subject to
> any pathspec filters. Like other commands that have learned
> `--recurse-submodules`, submodules that have not been checked out will
> not be traversed.
>
> Signed-off-by: Nikhil Benesch <nikhil.bene...@gmail.com>
> ---

I'll let others comment on the proposed log message.

> diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
> index cfa1e4ebe..f223f9e05 100644
> --- a/Documentation/git-archive.txt
> +++ b/Documentation/git-archive.txt
> @@ -11,8 +11,8 @@ SYNOPSIS
>  [verse]
>  'git archive' [--format=<fmt>] [--list] [--prefix=<prefix>/] [<extra>]
>             [-o <file> | --output=<file>] [--worktree-attributes]
> -           [--remote=<repo> [--exec=<git-upload-archive>]] <tree-ish>
> -           [<path>...]
> +           [--recurse-submodules] [--remote=<repo> 
> [--exec=<git-upload-archive>]]
> +           <tree-ish> [<path>...]

With the reflowing of lines it was not immediately obvious, but the
only change is to addd [--recurse-submodules] in front of the
existing [--remote=<repo>] option.  It would have probably made more
sense to add new things at the end, just because these options can
be given in any order.

> +--recurse-submodules::
> +     Recursively include the contents of any checked-out submodules in
> +     the archive.

OK.  

A question to the submodule folks: Is "checked-out" the right terminology
in this context?  I am wondering if we have reached more official set
of words to express submodule states discussed in [*1*].

> -     if (S_ISDIR(mode) || S_ISGITLINK(mode)) {
> -             if (args->verbose)
> -                     fprintf(stderr, "%.*s\n", (int)path.len, path.buf);
> -             err = write_entry(args, sha1, path.buf, path.len, mode);
> -             if (err)
> -                     return err;
> -             return (S_ISDIR(mode) ? READ_TREE_RECURSIVE : 0);
> -     }
> -
>       if (args->verbose)
>               fprintf(stderr, "%.*s\n", (int)path.len, path.buf);
> -     return write_entry(args, sha1, path.buf, path.len, mode);
> +     err = write_entry(args, sha1, path.buf, path.len, mode);
> +     if (err)
> +             return err;
> +     if (S_ISDIR(mode) || (S_ISGITLINK(mode) && args->recurse_submodules &&
> +                           !add_submodule_odb(path_without_prefix)))
> +             return READ_TREE_RECURSIVE;
> +     return 0;

So, we used to (and without the option we still) stop recursing when
we see a gitlink after asking write_entry() to write the path itself
out, but with the option, we keep going.

How does pathspec limiting work with this codepath?  If you have a
superproject with a regular file "sub/README" that belongs to the
superproject itself, and a submodule at "sub/module/", what makes
sure that this command

    $ git archive --recurse-submodules HEAD -- 'sub/R*' 'sub/module/lib/'

looks only in 'lib/' part of the submodule?

> diff --git a/t/t5005-archive-submodules.sh b/t/t5005-archive-submodules.sh
> new file mode 100755
> index 000000000..747e38627
> --- /dev/null
> +++ b/t/t5005-archive-submodules.sh
> @@ -0,0 +1,112 @@
> +#!/bin/sh
> +
> +test_description='git archive can include submodule content'
> +
> +. ./test-lib.sh
> +
> +add_file()
> +{

Style.  A shell function begins like so (see Documentation/CodingGuidelines):

        add_file () {

> +     git add $1 &&

Put dq around "$1" to make it easier to reuse, even if you know you
only happen to use pathnames without any $IFS whitespace characters.

I think all of these comments apply to the other shell function you
added to this file..

> +test_expect_success 'by default, submodules are not included' '
> +     echo "File 1" >1 &&
> +     add_file 1 &&
> +     add_submodule 2 3 &&
> +     add_submodule 4 5 &&
> +     cat <<EOF >expected &&
> +1
> +2/
> +4/
> +EOF

Use <<-EOF to allow you to indent your here-doc data.  Also when
your here-doc data does not need any $variable_interpolation, quote
the end marker to help reduce reviewer's mental burden, i.e.

        cat <<-\EOF >expected &&
        1
        2/
        4/
        EOF

The same comment applies to other tests (I won't even repeat this).

> +     git archive HEAD >normal.tar &&
> +     tar -tf normal.tar >actual &&

Lose "-" from "-tf"; that's the canonical and the most portable way
to spell the option to "tar".

> +test_expect_success 'packed submodules are supported' '
> +     msg=$(cd 2 && git repack -ad && git count-objects) &&
> +     test "$msg" = "0 objects, 0 kilobytes" &&
> +     git archive --recurse-submodules HEAD >packed.tar &&
> +     tar -tf packed.tar >actual &&
> +     test_cmp expected actual
> +'

I am not sure how packing would be expected to break anything, but
an extra test may not hurt.

I notice that you only test the case where the submodule's ".git" 
is embedded in its working tree (as opposed to ".git" being a
"gitdir:" file that points at the real location of the repository,
somewhere inside ".git/modules/" of the superproject).  You probably
want to make sure that also works, too.

I also notice that you do not use any pathspec in your "git archive"
invocation.  You'd want to make sure that works as expected.

Thanks.


[Reference]

*1* <20170209020855.23486-1-sbel...@google.com>

Reply via email to