[PATCH] submodule recursion in git-archive

2013-11-25 Thread Nick Townsend
All,
My first git patch - so shout out if I’ve got the etiquette wrong! Or of course 
if I’ve missed something.
I googled around looking for solutions to my problem but just came up with a 
few shell-scripts
that didn’t quite get the functionality I needed.
The first patch fixes some typos that crept in to existing doc and 
declarations. It is required
for the second which actually implements the changes.

All comments gratefully received!

Regards
Nick Townsend

Subject: [PATCH 1/2] submodule: add_submodule_odb() usability

Although add_submodule_odb() is documented as being
externally usable, it is declared static and also
has incorrect documentation.

This commit fixes those and makes no changes to
existing code using them. All tests still pass.
---
 Documentation/technical/api-ref-iteration.txt | 4 ++--
 submodule.c   | 2 +-
 submodule.h   | 1 +
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/Documentation/technical/api-ref-iteration.txt 
b/Documentation/technical/api-ref-iteration.txt
index aa1c50f..cbee624 100644
--- a/Documentation/technical/api-ref-iteration.txt
+++ b/Documentation/technical/api-ref-iteration.txt
@@ -50,10 +50,10 @@ submodules object database. You can do this by a 
code-snippet like
 this:
 
const char *path = "path/to/submodule"
-   if (!add_submodule_odb(path))
+   if (add_submodule_odb(path))
die("Error submodule '%s' not populated.", path);
 
-`add_submodule_odb()` will return an non-zero value on success. If you
+`add_submodule_odb()` will return a zero value on success. If you
 do not do this you will get an error for each ref that it does not point
 to a valid object.
 
diff --git a/submodule.c b/submodule.c
index 1905d75..1ea46be 100644
--- a/submodule.c
+++ b/submodule.c
@@ -143,7 +143,7 @@ void stage_updated_gitmodules(void)
die(_("staging updated .gitmodules failed"));
 }
 
-static int add_submodule_odb(const char *path)
+int add_submodule_odb(const char *path)
 {
struct strbuf objects_directory = STRBUF_INIT;
struct alternate_object_database *alt_odb;
diff --git a/submodule.h b/submodule.h
index 7beec48..3e3cdca 100644
--- a/submodule.h
+++ b/submodule.h
@@ -41,5 +41,6 @@ int find_unpushed_submodules(unsigned char new_sha1[20], 
const char *remotes_nam
struct string_list *needs_pushing);
 int push_unpushed_submodules(unsigned char new_sha1[20], const char 
*remotes_name);
 void connect_work_tree_and_git_dir(const char *work_tree, const char *git_dir);
+int add_submodule_odb(const char *path);
 
 #endif
-- 
1.8.3.4 (Apple Git-47)

Subject: [PATCH 2/2] archive: allow submodule recursion on git-archive

When using git-archive to produce a dump of a
repository, the existing code does not recurse
into a submodule when it encounters it in the tree
traversal. These changes add a command line flag
that permits this.

Note that the submodules must be updated in the
repository, otherwise this cannot take place.

The feature is disabled for remote repositories as
the git_work_tree fails. This is a possible future
enhancement.

Two additional fields are added to archiver_args:
  * recurse  - a boolean indicator
  * treepath - the path part of the tree-ish
   eg. the 'www' in HEAD:www

The latter is used within the archive writer to
determin the correct path for the submodule .git
file.

Signed-off-by: Nick Townsend 
---
 Documentation/git-archive.txt |  9 +
 archive.c | 38 --
 archive.h |  2 ++
 3 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
index b97aaab..b4df735 100644
--- a/Documentation/git-archive.txt
+++ b/Documentation/git-archive.txt
@@ -11,6 +11,7 @@ SYNOPSIS
 [verse]
 'git archive' [--format=] [--list] [--prefix=/] []
  [-o  | --output=] [--worktree-attributes]
+ [--recursive|--recurse-submodules]
  [--remote= [--exec=]] 
  [...]
 
@@ -51,6 +52,14 @@ OPTIONS
 --prefix=/::
Prepend / to each filename in the archive.
 
+--recursive::
+--recurse-submodules::
+   Archive entries in submodules. Errors occur if the submodules
+   have not been initialized and updated.
+   Run `git submodule update --init --recursive` immediately after
+   the clone is finished to avoid this.
+   This option is not available with remote repositories.
+
 -o ::
 --output=::
Write the archive to  instead of stdout.
diff --git a/archive.c b/archive.c
index 346f3b2..f6313c9 100644
--- a/archive.c
+++ b/archive.c
@@ -5,6 +5,7 @@
 #include "archive.h"
 #include "parse-options.h"
 #include "unpack-trees.h"
+#include "submodule.h"
 
 static char const * const archive_usage[] = {
N_("

Re: [PATCH] submodule recursion in git-archive

2013-11-26 Thread Nick Townsend

On 26 Nov 2013, at 16:28, René Scharfe  wrote:

> Am 26.11.2013 23:18, schrieb Junio C Hamano:
>> René Scharfe  writes:
>> 
>>> Thanks for the patches!  Please send only one per message (the second
>>> one as a reply to the first one, or both as replies to a cover letter),
>>> though -- that makes commenting on them much easier.
>>> 
>>> Side note: Documentation/SubmittingPatches doesn't mention that (yet),
>>> AFAICS.
>> 
>> OK, how about doing this then?
>> 
>> Documentation/SubmittingPatches | 7 ++-
>> 1 file changed, 6 insertions(+), 1 deletion(-)
>> 
>> diff --git a/Documentation/SubmittingPatches 
>> b/Documentation/SubmittingPatches
>> index 7055576..304b3c0 100644
>> --- a/Documentation/SubmittingPatches
>> +++ b/Documentation/SubmittingPatches
>> @@ -140,7 +140,12 @@ comment on the changes you are submitting.  It is 
>> important for
>> a developer to be able to "quote" your changes, using standard
>> e-mail tools, so that they may comment on specific portions of
>> your code.  For this reason, all patches should be submitted
>> -"inline".  If your log message (including your name on the
>> +"inline".  A patch series that consists of N commits is sent as N
>> +separate e-mail messages, or a cover letter message (see below) with
>> +N separate e-mail messages, each being a response to the cover
>> +letter.
>> +
>> +If your log message (including your name on the
>> Signed-off-by line) is not writable in ASCII, make sure that
>> you send off a message in the correct encoding.
> 
> OK, but the repetition of "cover letter" and "e-mail messages"
> irritates me slightly for some reason.  What about the following?
> 
> -- >8 --
> Subject: [PATCH] SubmittingPatches: document how to handle multiple patches
> 
> Signed-off-by: Rene Scharfe 
> ---
> Documentation/SubmittingPatches |   11 +--
> 1 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
> index 7055576..e6d46ed 100644
> --- a/Documentation/SubmittingPatches
> +++ b/Documentation/SubmittingPatches
> @@ -139,8 +139,15 @@ People on the Git mailing list need to be able to read 
> and
> comment on the changes you are submitting.  It is important for
> a developer to be able to "quote" your changes, using standard
> e-mail tools, so that they may comment on specific portions of
> -your code.  For this reason, all patches should be submitted
> -"inline".  If your log message (including your name on the
> +your code.  For this reason, each patch should be submitted
> +"inline" in a separate message.
> +
> +Multiple related patches should be grouped into their own e-mail
> +thread to help readers find all parts of the series.  To that end,
> +send them as replies to either an additional "cover letter" message
> +(see below), the first patch, or the respective preceding patch.
> +
> +If your log message (including your name on the
> Signed-off-by line) is not writable in ASCII, make sure that
> you send off a message in the correct encoding.
> 
> -- 
> 1.7.8
> 
> 
That seems clear to me.
At any rate I’m going to rework this based on the collective input and will 
submit them again.
Please check my other replies as there are some discussion points!

Nick--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] submodule recursion in git-archive

2013-11-26 Thread Nick Townsend

On 26 Nov 2013, at 14:38, Heiko Voigt  wrote:

> Hi,
> 
> I like where this is going.
> 
> On Tue, Nov 26, 2013 at 04:17:43PM +0100, René Scharfe wrote:
>> Am 26.11.2013 01:04, schrieb Nick Townsend:
>>> +   strbuf_addstr(&dotgit, work_tree);
>>> +   strbuf_addch(&dotgit, '/');
>>> +   if (args->treepath) {
>>> + strbuf_addstr(&dotgit, args->treepath);
>>> + strbuf_addch(&dotgit, '/');
>>> +   }
>>> +   strbuf_add(&dotgit, 
>>> path_without_prefix,strlen(path_without_prefix)-1);
>>> +   if (add_submodule_odb(dotgit.buf))
>>> + die("Can't add submodule: %s", dotgit.buf);
>> 
>> Hmm, I wonder if we can traverse the tree and load all submodule object
>> databases before traversing it again to actually write file contents.
>> That would spare the user from getting half of an archive together with
>> that error message.
> 
> I am not sure whether we should die here. What about submodules that
> have not been initialized and or cloned? I think that is a quite regular
> use case for example for libraries that not everyone needs or big media
> submodules which only the design team uses. How about skipping them (maybe
> issuing a warning) by returning 0 here and proceeding?
> 
> Cheers Heiko

I agree that issuing a warning and continuing is best. If the submodule hasn’t 
been setup
then we should respect that and keep the current behaviour (just archive the 
directory entry).
There is some further debate to be had about the extent to which this should 
work with
un-initialized submodules which I’ll discuss in other replies.

Thanks
Nick--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] submodule recursion in git-archive

2013-11-26 Thread Nick Townsend

On 26 Nov 2013, at 14:18, Junio C Hamano  wrote:

> René Scharfe  writes:
> 
>> Thanks for the patches!  Please send only one per message (the second
>> one as a reply to the first one, or both as replies to a cover letter),
>> though -- that makes commenting on them much easier.
>> 
>> Side note: Documentation/SubmittingPatches doesn't mention that (yet),
>> AFAICS.
> 
> OK, how about doing this then?
> 
> Documentation/SubmittingPatches | 7 ++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
> index 7055576..304b3c0 100644
> --- a/Documentation/SubmittingPatches
> +++ b/Documentation/SubmittingPatches
> @@ -140,7 +140,12 @@ comment on the changes you are submitting.  It is 
> important for
> a developer to be able to "quote" your changes, using standard
> e-mail tools, so that they may comment on specific portions of
> your code.  For this reason, all patches should be submitted
> -"inline".  If your log message (including your name on the
> +"inline".  A patch series that consists of N commits is sent as N
> +separate e-mail messages, or a cover letter message (see below) with
> +N separate e-mail messages, each being a response to the cover
> +letter.
> +
> +If your log message (including your name on the
> Signed-off-by line) is not writable in ASCII, make sure that
> you send off a message in the correct encoding.
> 
> 
>>> The feature is disabled for remote repositories as
>>> the git_work_tree fails. This is a possible future
>>> enhancement.
>> 
>> Hmm, curious.  Why does it fail?  I guess that happens with bare
>> repositories, only, right?  (Which are the most likely kind of remote
>> repos to encounter, of course.)
> 
> Yeah, I do not think of a reason why it should fail in a bare
> repository, either. "git archive" is about writing out the contents
> of an already recorded tree, so there shouldn't be a reason to even
> call get_git_work_tree() in the first place.
> 
See below for a discussion of why I use the .git file in the work tree to 
load the objects for the submodule. I also thought it should work in a
remote repository - but I ran it on a properly initialized remote repository and
it failed. Since I didn’t need it for my immediate use-case I just decided to 
disable 
it with an error. I can look into this further, but we must decide about the 
question 
below first…

> Even if the code is run inside a repository with a working tree,
> when producing a tarball out of an ancient commit that had a
> submodule not at its current location, --recurse-submodules option
> should do the right thing, so asking for working tree location of
> that submodule to find its repository is wrong, I think.  It may
> happen to find one if the archived revision is close enough to what
> is currently checked out, but that may not necessarily be the case.
> 
> At that point when the code discovers an S_ISGITLINK entry, it
> should have both a pathname to the submodule relative to the
> toplevel and the commit object name bound to that submodule
> location.  What it should do, when it does not find the repository
> at the given path (maybe because there is no working tree, or the
> sudmodule directory has moved over time) is roughly:
> 
> - Read from .gitmodules at the top-level from the tree it is
>   creating the tarball out of;
> 
> - Find "submodule.$name.path" entry that records that path to the
>   submodule; and then
> 
> - Using that $name, find the stashed-away location of the submodule
>   repository in $GIT_DIR/modules/$name.
> 
> or something like that.
> 
> This is a related tangent, but when used in a repository that people
> often use as their remote, the repository discovery may have to
> interact with the relative URL.  People often ship .gitmodules with
> 
>   [submodule "bar"]
>   URL = ../bar.git
>   path = barDir
> 
> for a top-level project "foo" that can be cloned thusly:
> 
>   git clone git://site.xz/foo.git
> 
> and host bar.git to be clonable with
> 
>   git clone git://site.xz/bar.git barDir/
> 
> inside the working tree of the foo project.  In such a case, when
> "archive --recurse-submodules" is running, it would find the
> repository for the "bar" submodule at "../bar.git", I would think.
> 
> So this part needs a bit more thought, I am afraid.

I see that there is a lot of potential complexity around setting up a submodule:
* The .gitmodules file can be dirty (easy to flag, but should we allow archive 
to proceed?)
* Users can mess with settings both prior to git submodule init and before git 
submodule update.
* What if it’s a raw clone and the user manually changes things between init 
and update?
* I’m not a git-internals expert but looking through the code I see that you 
can add additional object
directories and change paths as you show above.

For those reasons I deliberately decided not to reproduce the above logic all 
by myself.
On the other hand

Re: [PATCH] submodule recursion in git-archive

2013-11-26 Thread Nick Townsend
On 26 Nov 2013, at 07:17, René Scharfe  wrote:

> Am 26.11.2013 01:04, schrieb Nick Townsend:
>> My first git patch - so shout out if I’ve got the etiquette wrong! Or
>> of course if I’ve missed something.
> 
> Thanks for the patches!  Please send only one per message (the second
> one as a reply to the first one, or both as replies to a cover letter),
> though -- that makes commenting on them much easier.
> 
> Side note: Documentation/SubmittingPatches doesn't mention that (yet),
> AFAICS.
> 
>> Subject: [PATCH 1/2] submodule: add_submodule_odb() usability
>> 
>> Although add_submodule_odb() is documented as being
>> externally usable, it is declared static and also
>> has incorrect documentation.
>> 
>> This commit fixes those and makes no changes to
>> existing code using them. All tests still pass.
> 
> Sign-off missing (see Documentation/SubmittingPatches).
> 
>> ---
>> Documentation/technical/api-ref-iteration.txt | 4 ++--
>> submodule.c   | 2 +-
>> submodule.h   | 1 +
>> 3 files changed, 4 insertions(+), 3 deletions(-)
>> 
>> diff --git a/Documentation/technical/api-ref-iteration.txt 
>> b/Documentation/technical/api-ref-iteration.txt
>> index aa1c50f..cbee624 100644
>> --- a/Documentation/technical/api-ref-iteration.txt
>> +++ b/Documentation/technical/api-ref-iteration.txt
>> @@ -50,10 +50,10 @@ submodules object database. You can do this by a 
>> code-snippet like
>> this:
>> 
>>  const char *path = "path/to/submodule"
>> -if (!add_submodule_odb(path))
>> +if (add_submodule_odb(path))
>>  die("Error submodule '%s' not populated.", path);
>> 
>> -`add_submodule_odb()` will return an non-zero value on success. If you
>> +`add_submodule_odb()` will return a zero value on success. If you
> 
> "return zero on success" instead?

I like the brevity of your suggestion. Again, I just used what was there…

> 
>> do not do this you will get an error for each ref that it does not point
>> to a valid object.
>> 
>> diff --git a/submodule.c b/submodule.c
>> index 1905d75..1ea46be 100644
>> --- a/submodule.c
>> +++ b/submodule.c
>> @@ -143,7 +143,7 @@ void stage_updated_gitmodules(void)
>>  die(_("staging updated .gitmodules failed"));
>> }
>> 
>> -static int add_submodule_odb(const char *path)
>> +int add_submodule_odb(const char *path)
>> {
>>  struct strbuf objects_directory = STRBUF_INIT;
>>  struct alternate_object_database *alt_odb;
>> diff --git a/submodule.h b/submodule.h
>> index 7beec48..3e3cdca 100644
>> --- a/submodule.h
>> +++ b/submodule.h
>> @@ -41,5 +41,6 @@ int find_unpushed_submodules(unsigned char new_sha1[20], 
>> const char *remotes_nam
>>  struct string_list *needs_pushing);
>> int push_unpushed_submodules(unsigned char new_sha1[20], const char 
>> *remotes_name);
>> void connect_work_tree_and_git_dir(const char *work_tree, const char 
>> *git_dir);
>> +int add_submodule_odb(const char *path);
>> 
>> #endif
> 
>> Subject: [PATCH 2/2] archive: allow submodule recursion on git-archive
>> 
>> When using git-archive to produce a dump of a
>> repository, the existing code does not recurse
>> into a submodule when it encounters it in the tree
>> traversal. These changes add a command line flag
>> that permits this.
>> 
>> Note that the submodules must be updated in the
>> repository, otherwise this cannot take place.
>> 
>> The feature is disabled for remote repositories as
>> the git_work_tree fails. This is a possible future
>> enhancement.
> 
> Hmm, curious.  Why does it fail?  I guess that happens with bare
> repositories, only, right?  (Which are the most likely kind of remote
> repos to encounter, of course.)

I’m not sure why it failed - I didn’t think it should - but it did.
See discussion in other email.

> 
>> Two additional fields are added to archiver_args:
>>  * recurse  - a boolean indicator
>>  * treepath - the path part of the tree-ish
>>   eg. the 'www' in HEAD:www
>> 
>> The latter is used within the archive writer to
>> determin the correct path for the submodule .git
>> file.
>> 
>> Signed-off-by: Nick Townsend 
>> ---
>> Documentation/git-archive.txt |  9 +
>> archive.c | 38 --
>> archive.h |  2 ++
>&g

Re: [PATCH] submodule recursion in git-archive

2013-12-02 Thread Nick Townsend

On 27 Nov 2013, at 11:43, Junio C Hamano  wrote:

> Nick Townsend  writes:
> 
>> On 26 Nov 2013, at 14:18, Junio C Hamano  wrote:
>> 
>>> Even if the code is run inside a repository with a working tree,
>>> when producing a tarball out of an ancient commit that had a
>>> submodule not at its current location, --recurse-submodules option
>>> should do the right thing, so asking for working tree location of
>>> that submodule to find its repository is wrong, I think.  It may
>>> happen to find one if the archived revision is close enough to what
>>> is currently checked out, but that may not necessarily be the case.
>>> 
>>> At that point when the code discovers an S_ISGITLINK entry, it
>>> should have both a pathname to the submodule relative to the
>>> toplevel and the commit object name bound to that submodule
>>> location.  What it should do, when it does not find the repository
>>> at the given path (maybe because there is no working tree, or the
>>> sudmodule directory has moved over time) is roughly:
>>> 
>>> - Read from .gitmodules at the top-level from the tree it is
>>>  creating the tarball out of;
>>> 
>>> - Find "submodule.$name.path" entry that records that path to the
>>>  submodule; and then
>>> 
>>> - Using that $name, find the stashed-away location of the submodule
>>>  repository in $GIT_DIR/modules/$name.
>>> 
>>> or something like that.
>>> 
>>> This is a related tangent, but when used in a repository that people
>>> often use as their remote, the repository discovery may have to
>>> interact with the relative URL.  People often ship .gitmodules with
>>> 
>>> [submodule "bar"]
>>> URL = ../bar.git
>>> path = barDir
>>> 
>>> for a top-level project "foo" that can be cloned thusly:
>>> 
>>> git clone git://site.xz/foo.git
>>> 
>>> and host bar.git to be clonable with
>>> 
>>> git clone git://site.xz/bar.git barDir/
>>> 
>>> inside the working tree of the foo project.  In such a case, when
>>> "archive --recurse-submodules" is running, it would find the
>>> repository for the "bar" submodule at "../bar.git", I would think.
>>> 
>>> So this part needs a bit more thought, I am afraid.
>> 
>> I see that there is a lot of potential complexity around setting up a 
>> submodule:
> 
> No question about it.
> 
>> * The .gitmodules file can be dirty (easy to flag, but should we
>> allow archive to proceed?)
> 
> As we are discussing "archive", which takes a tree object from the
> top-level project that is recorded in the object database, the
> information _about_ the submodule in question should come from the
> given tree being archived.  There is no reason for the .gitmodules
> file that happens to be sitting in the working tree of the top-level
> project to be involved in the decision, so its dirtyness should not
> matter, I think.  If the tree being archived has a submodule whose
> name is "kernel" at path "linux/" (relative to the top-level
> project), its repository should be at .git/modules/kernel in the
> layout recent git-submodule prepares, and we should find that
> path-and-name mapping from .gitmodules recorded in that tree object
> we are archiving. The version that happens to be checked out to the
> working tree may have moved the submodule to a new path "linux-3.0/"
> and "linux-3.0/.git" may have "gitdir: .git/modules/kernel" in it,
> but when archiving a tree that has the submodule at "linux/", it
> would not help---we would not know to look at "linux-3.0/.git" to
> learn that information anyway because .gitmodules in the working
> tree would say that the submodule at path "linux-3.0/" is with name
> "kernel", and would not tell us anything about "linux/".
> 
>> * Users can mess with settings both prior to git submodule init
>> and before git submodule update.
> 
> I think this is irrelevant for exactly the same reason as above.
> 
> What makes this tricker, however, is how to deal with an old-style
> repository, where the submodule repositories are embedded in the
> working tree that happens to be checked out.  In that case, we may
> have to read .gitmodules from two places, i.e.
> 
> (1) We are archiving a tree with a submodule at "linux/";
> 
> (2) We read .gitmodules from that tree and l

Fwd: [PATCH] submodule recursion in git-archive

2013-12-02 Thread Nick Townsend


Begin forwarded message:

> From: Nick Townsend 
> Subject: Re: [PATCH] submodule recursion in git-archive
> Date: 2 December 2013 16:00:50 GMT-8
> To: Junio C Hamano 
> Cc: René Scharfe , Jens Lehmann , 
> git@vger.kernel.org, Jeff King 
> 
> 
> On 27 Nov 2013, at 11:43, Junio C Hamano  wrote:
> 
>> Nick Townsend  writes:
>> 
>>> On 26 Nov 2013, at 14:18, Junio C Hamano  wrote:
>>> 
>>>> Even if the code is run inside a repository with a working tree,
>>>> when producing a tarball out of an ancient commit that had a
>>>> submodule not at its current location, --recurse-submodules option
>>>> should do the right thing, so asking for working tree location of
>>>> that submodule to find its repository is wrong, I think.  It may
>>>> happen to find one if the archived revision is close enough to what
>>>> is currently checked out, but that may not necessarily be the case.
>>>> 
>>>> At that point when the code discovers an S_ISGITLINK entry, it
>>>> should have both a pathname to the submodule relative to the
>>>> toplevel and the commit object name bound to that submodule
>>>> location.  What it should do, when it does not find the repository
>>>> at the given path (maybe because there is no working tree, or the
>>>> sudmodule directory has moved over time) is roughly:
>>>> 
>>>> - Read from .gitmodules at the top-level from the tree it is
>>>> creating the tarball out of;
>>>> 
>>>> - Find "submodule.$name.path" entry that records that path to the
>>>> submodule; and then
>>>> 
>>>> - Using that $name, find the stashed-away location of the submodule
>>>> repository in $GIT_DIR/modules/$name.
>>>> 
>>>> or something like that.
>>>> 
>>>> This is a related tangent, but when used in a repository that people
>>>> often use as their remote, the repository discovery may have to
>>>> interact with the relative URL.  People often ship .gitmodules with
>>>> 
>>>>[submodule "bar"]
>>>>URL = ../bar.git
>>>>path = barDir
>>>> 
>>>> for a top-level project "foo" that can be cloned thusly:
>>>> 
>>>>git clone git://site.xz/foo.git
>>>> 
>>>> and host bar.git to be clonable with
>>>> 
>>>>git clone git://site.xz/bar.git barDir/
>>>> 
>>>> inside the working tree of the foo project.  In such a case, when
>>>> "archive --recurse-submodules" is running, it would find the
>>>> repository for the "bar" submodule at "../bar.git", I would think.
>>>> 
>>>> So this part needs a bit more thought, I am afraid.
>>> 
>>> I see that there is a lot of potential complexity around setting up a 
>>> submodule:
>> 
>> No question about it.
>> 
>>> * The .gitmodules file can be dirty (easy to flag, but should we
>>> allow archive to proceed?)
>> 
>> As we are discussing "archive", which takes a tree object from the
>> top-level project that is recorded in the object database, the
>> information _about_ the submodule in question should come from the
>> given tree being archived.  There is no reason for the .gitmodules
>> file that happens to be sitting in the working tree of the top-level
>> project to be involved in the decision, so its dirtyness should not
>> matter, I think.  If the tree being archived has a submodule whose
>> name is "kernel" at path "linux/" (relative to the top-level
>> project), its repository should be at .git/modules/kernel in the
>> layout recent git-submodule prepares, and we should find that
>> path-and-name mapping from .gitmodules recorded in that tree object
>> we are archiving. The version that happens to be checked out to the
>> working tree may have moved the submodule to a new path "linux-3.0/"
>> and "linux-3.0/.git" may have "gitdir: .git/modules/kernel" in it,
>> but when archiving a tree that has the submodule at "linux/", it
>> would not help---we would not know to look at "linux-3.0/.git" to
>> learn that information anyway because .gitmodules in the working
>> tree would say that the submodule at path "linux-3.0/" is with name
>> "kernel", and would not tell us anything about "linux/".
>> 
>

[PATCH] submodule recursion in git-archive

2013-12-02 Thread Nick Townsend

From: Nick Townsend 
Subject: Re: [PATCH] submodule recursion in git-archive
Date: 2 December 2013 15:55:36 GMT-8
To: Heiko Voigt 
Cc: Junio C Hamano , René Scharfe , Jens 
Lehmann , git@vger.kernel.org, Jeff King 


On 29 Nov 2013, at 14:38, Heiko Voigt  wrote:

> On Wed, Nov 27, 2013 at 11:43:44AM -0800, Junio C Hamano wrote:
>> Nick Townsend  writes:
>>> * The .gitmodules file can be dirty (easy to flag, but should we
>>> allow archive to proceed?)
>> 
>> As we are discussing "archive", which takes a tree object from the
>> top-level project that is recorded in the object database, the
>> information _about_ the submodule in question should come from the
>> given tree being archived.  There is no reason for the .gitmodules
>> file that happens to be sitting in the working tree of the top-level
>> project to be involved in the decision, so its dirtyness should not
>> matter, I think.  If the tree being archived has a submodule whose
>> name is "kernel" at path "linux/" (relative to the top-level
>> project), its repository should be at .git/modules/kernel in the
>> layout recent git-submodule prepares, and we should find that
>> path-and-name mapping from .gitmodules recorded in that tree object
>> we are archiving. The version that happens to be checked out to the
>> working tree may have moved the submodule to a new path "linux-3.0/"
>> and "linux-3.0/.git" may have "gitdir: .git/modules/kernel" in it,
>> but when archiving a tree that has the submodule at "linux/", it
>> would not help---we would not know to look at "linux-3.0/.git" to
>> learn that information anyway because .gitmodules in the working
>> tree would say that the submodule at path "linux-3.0/" is with name
>> "kernel", and would not tell us anything about "linux/".
>> 
>>> * Users can mess with settings both prior to git submodule init
>>> and before git submodule update.
>> 
>> I think this is irrelevant for exactly the same reason as above.
>> 
>> What makes this tricker, however, is how to deal with an old-style
>> repository, where the submodule repositories are embedded in the
>> working tree that happens to be checked out.  In that case, we may
>> have to read .gitmodules from two places, i.e.
>> 
>> (1) We are archiving a tree with a submodule at "linux/";
>> 
>> (2) We read .gitmodules from that tree and learn that the submodule
>> has name "kernel";
>> 
>> (3) There is no ".git/modules/kernel" because the repository uses
>> the old layout (if the user never was interested in this
>> submodule, .git/modules/kernel may also be missing, and we
>> should tell these two cases apart by checking .git/config to
>> see if a corresponding entry for the "kernel" submodule exists
>> there);
>> 
>> (4) In a repository that uses the old layout, there must be the
>> repository somewhere embedded in the current working tree (this
>> inability to remove is why we use the new layout these days).
>> We can learn where it is by looking at .gitmodules in the
>> working tree---map the name "kernel" we learned earlier, and
>> map it to the current path ("linux-3.0/" if you have been
>> following this example so far).
>> 
>> And in that fallback context, I would say that reading from a dirty
>> (or "messed with by the user") .gitmodules is the right thing to
>> do.  Perhaps the user may be in the process of moving the submodule
>> in his working tree with
>> 
>>$ mv linux-3.0 linux-3.2
>>$ git config -f .gitmodules submodule.kernel.path linux-3.2
>> 
>> but hasn't committed the change yet.
>> 
>>> For those reasons I deliberately decided not to reproduce the
>>> above logic all by myself.
>> 
>> As I already hinted, I agree that the "how to find the location of
>> submodule repository, given a particular tree in the top-level
>> project the submodule belongs to and the path to the submodule in
>> question" deserves a separate thread to discuss with area experts.
> 
> FYI, I already started to implement this lookup of submodule paths early
> this year[1] but have not found the time to proceed on that yet. I am
> planning to continue on that topic soonish. We need it to implement a
> correct recursive fetch with clone on-demand as a basis for the future
> recursive checkout.
> 
> During the work on this I hit too many open questions. Tha

[PATCH] Improvements to git-archive tests and add_submodule_odb()

2013-12-02 Thread Nick Townsend
As per the previous patch request, I’ve delayed the work on git-archive.
However the following two patches (attached as replies) should still
be considered.
Kind Regards
Nick--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Improvements to git-archive tests and add_submodule_odb()

2013-12-02 Thread Nick Townsend
From: Nick Townsend 
Date: Mon, 25 Nov 2013 15:31:09 -0800
Subject: [PATCH 1/2] submodule: add_submodule_odb() usability

Although add_submodule_odb() is documented as being
externally usable, it is declared static and also
has incorrect documentation.  This commit fixes those
and makes no changes to existing code using them.
All tests still pass.

Improved wording based on Rene Scharfe feedback

Signed-off-by: Nick Townsend 
---
 Documentation/technical/api-ref-iteration.txt | 4 ++--
 submodule.c   | 2 +-
 submodule.h   | 1 +
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/Documentation/technical/api-ref-iteration.txt 
b/Documentation/technical/api-ref-iteration.txt
index aa1c50f..02adfd4 100644
--- a/Documentation/technical/api-ref-iteration.txt
+++ b/Documentation/technical/api-ref-iteration.txt
@@ -50,10 +50,10 @@ submodules object database. You can do this by a 
code-snippet like
 this:
 
const char *path = "path/to/submodule"
-   if (!add_submodule_odb(path))
+   if (add_submodule_odb(path))
die("Error submodule '%s' not populated.", path);
 
-`add_submodule_odb()` will return an non-zero value on success. If you
+`add_submodule_odb()` will return zero on success. If you
 do not do this you will get an error for each ref that it does not point
 to a valid object.
 
diff --git a/submodule.c b/submodule.c
index 1905d75..1ea46be 100644
--- a/submodule.c
+++ b/submodule.c
@@ -143,7 +143,7 @@ void stage_updated_gitmodules(void)
die(_("staging updated .gitmodules failed"));
 }
 
-static int add_submodule_odb(const char *path)
+int add_submodule_odb(const char *path)
 {
struct strbuf objects_directory = STRBUF_INIT;
struct alternate_object_database *alt_odb;
diff --git a/submodule.h b/submodule.h
index 7beec48..3e3cdca 100644
--- a/submodule.h
+++ b/submodule.h
@@ -41,5 +41,6 @@ int find_unpushed_submodules(unsigned char new_sha1[20], 
const char *remotes_nam
struct string_list *needs_pushing);
 int push_unpushed_submodules(unsigned char new_sha1[20], const char 
*remotes_name);
 void connect_work_tree_and_git_dir(const char *work_tree, const char *git_dir);
+int add_submodule_odb(const char *path);
 
 #endif
-- 
1.8.5.4.g9d8cd78.dirty

On 2 Dec 2013, at 16:10, Nick Townsend  wrote:

> As per the previous patch request, I’ve delayed the work on git-archive.
> However the following two patches (attached as replies) should still
> be considered.
> Kind Regards
> Nick

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Improvements to git-archive tests and add_submodule_odb()

2013-12-02 Thread Nick Townsend
From: Nick Townsend 
Date: Sat, 30 Nov 2013 16:54:20 -0800
Subject: [PATCH 2/2] Additional git-archive tests

Interplay between paths specified in three ways now tested:
* After a : in the tree-ish,
* As a pathspec in the command,
* By virtue of the current working directory

Note that these tests are based on the behaviours
as found in 1.8.5. They may not be intentional.
They were developed to regression test enhancements
made to parse_treeish_arg() in archive.c

Signed-off-by: Nick Townsend 
---
 t/t5004-archive-corner-cases.sh | 67 +
 1 file changed, 67 insertions(+)

diff --git a/t/t5004-archive-corner-cases.sh b/t/t5004-archive-corner-cases.sh
index 67f3b54..8b70e4a 100755
--- a/t/t5004-archive-corner-cases.sh
+++ b/t/t5004-archive-corner-cases.sh
@@ -113,4 +113,71 @@ test_expect_success 'archive empty subtree by direct 
pathspec' '
check_dir extract sub
 '
 
+test_expect_success 'setup - repository with subdirs' '
+   mkdir -p a/b/{c,d} &&
+   echo af >a/af &&
+   echo bf >a/b/bf &&
+   echo cf >a/b/c/cf &&
+   git add a &&
+   git commit -m "commit 1" &&
+   git tag -a -m "rev-1" rev-1
+'
+
+test_expect_success 'archive subtree from root by treeish' '
+   git archive --format=tar HEAD:a >atreeroot.tar &&
+   make_dir extract &&
+   "$TAR" xf atreeroot.tar -C extract &&
+   check_dir extract af b b/bf b/c b/c/cf
+'
+
+test_expect_success 'archive subtree from root with pathspec' '
+   git archive --format=tar HEAD a >atreepath.tar &&
+   make_dir extract &&
+   "$TAR" xf atreepath.tar -C extract &&
+   check_dir extract a a/af a/b a/b/bf a/b/c a/b/c/cf
+'
+
+test_expect_success 'archive subtree from root by 2-level treeish' '
+   git archive --format=tar HEAD:a/b >abtreeroot.tar &&
+   make_dir extract &&
+   "$TAR" xf abtreeroot.tar -C extract &&
+   check_dir extract bf c c/cf
+'
+
+test_expect_success 'archive subtree from subdir' '
+   cd a &&
+   git archive --format=tar HEAD >../asubtree.tar &&
+   cd .. &&
+   make_dir extract &&
+   "$TAR" xf asubtree.tar -C extract &&
+   check_dir extract af b b/bf b/c b/c/cf
+'
+
+test_expect_success 'archive subtree from subdir with treeish' '
+   cd a &&
+   git archive --format=tar HEAD:./b >../absubtree.tar &&
+   cd .. &&
+   make_dir extract &&
+   "$TAR" xf absubtree.tar -C extract &&
+   check_dir extract bf c c/cf
+'
+
+test_expect_success 'archive subtree from subdir with treeish and pathspec' '
+   cd a &&
+   git archive --format=tar HEAD:./b c >../absubtree.tar &&
+   cd .. &&
+   make_dir extract &&
+   "$TAR" xf absubtree.tar -C extract &&
+   check_dir extract c c/cf
+'
+
+test_expect_success 'archive subtree from subdir with alt treeish' '
+   cd a &&
+   git archive --format=tar HEAD:b >../abxsubtree.tar &&
+   cd .. &&
+   make_dir extract &&
+   "$TAR" xf abxsubtree.tar -C extract &&
+   check_dir extract bf c c/cf
+'
+
 test_done
-- 
1.8.5.4.g9d8cd78.dirty

On 2 Dec 2013, at 16:10, Nick Townsend  wrote:

> As per the previous patch request, I’ve delayed the work on git-archive.
> However the following two patches (attached as replies) should still
> be considered.
> Kind Regards
> Nick

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Additional git-archive tests

2013-12-04 Thread Nick Townsend

Interplay between paths specified in three ways now tested:
* After a : in the tree-ish,
* As a pathspec in the command,
* By virtue of the current working directory

Note that these tests are based on the behaviours
as found in 1.8.5. They may not be intentional.
They were developed to regression test enhancements
made to parse_treeish_arg() in archive.c

Helped-by: Eric Sunshine 
Signed-off-by: Nick Townsend 
---
 t/t5004-archive-corner-cases.sh | 71 +
 1 file changed, 71 insertions(+)

diff --git a/t/t5004-archive-corner-cases.sh b/t/t5004-archive-corner-cases.sh
index 67f3b54..a81a836 100755
--- a/t/t5004-archive-corner-cases.sh
+++ b/t/t5004-archive-corner-cases.sh
@@ -113,4 +113,75 @@ test_expect_success 'archive empty subtree by direct 
pathspec' '
check_dir extract sub
 '
 
+test_expect_success 'setup - repository with subdirs' '
+   mkdir -p a/b/c a/b/d &&
+   echo af >a/af &&
+   echo bf >a/b/bf &&
+   echo cf >a/b/c/cf &&
+   git add a &&
+   git commit -m "commit 1" &&
+   git tag -a -m "rev-1" rev-1
+'
+
+test_expect_success 'archive subtree from root by treeish' '
+   git archive --format=tar HEAD:a >atreeroot.tar &&
+   make_dir extract &&
+   "$TAR" xf atreeroot.tar -C extract &&
+   check_dir extract af b b/bf b/c b/c/cf
+'
+
+test_expect_success 'archive subtree from root with pathspec' '
+   git archive --format=tar HEAD a >atreepath.tar &&
+   make_dir extract &&
+   "$TAR" xf atreepath.tar -C extract &&
+   check_dir extract a a/af a/b a/b/bf a/b/c a/b/c/cf
+'
+
+test_expect_success 'archive subtree from root by 2-level treeish' '
+   git archive --format=tar HEAD:a/b >abtreeroot.tar &&
+   make_dir extract &&
+   "$TAR" xf abtreeroot.tar -C extract &&
+   check_dir extract bf c c/cf
+'
+
+test_expect_success 'archive subtree from subdir' '
+   (
+   cd a &&
+   git archive --format=tar HEAD >../asubtree.tar
+   ) &&
+   make_dir extract &&
+   "$TAR" xf asubtree.tar -C extract &&
+   check_dir extract af b b/bf b/c b/c/cf
+'
+
+test_expect_success 'archive subtree from subdir with treeish' '
+   (
+   cd a &&
+   git archive --format=tar HEAD:./b >../absubtree.tar
+   ) &&
+   make_dir extract &&
+   "$TAR" xf absubtree.tar -C extract &&
+   check_dir extract bf c c/cf
+'
+
+test_expect_success 'archive subtree from subdir with treeish and pathspec' '
+   (
+   cd a &&
+   git archive --format=tar HEAD:./b c >../absubtree.tar
+   ) &&
+   make_dir extract &&
+   "$TAR" xf absubtree.tar -C extract &&
+   check_dir extract c c/cf
+'
+
+test_expect_success 'archive subtree from subdir with alt treeish' '
+   (
+   cd a &&
+   git archive --format=tar HEAD:b >../abxsubtree.tar
+   ) &&
+   make_dir extract &&
+   "$TAR" xf abxsubtree.tar -C extract &&
+   check_dir extract bf c c/cf
+'
+
 test_done
-- 
1.8.5

On 3 Dec 2013, at 09:54, Junio C Hamano  wrote:

> Eric Sunshine  writes:
> 
>>> +test_expect_success 'archive subtree from subdir' '
>>> +   cd a &&
>>> +   git archive --format=tar HEAD >../asubtree.tar &&
>>> +   cd .. &&
>>> +   make_dir extract &&
>>> +   "$TAR" xf asubtree.tar -C extract &&
>>> +   check_dir extract af b b/bf b/c b/c/cf
>>> +'
>> 
>> If git-archive fails, the subsequent 'cd ..' will not be invoked,
>> hence all tests following this one will fail since the current
>> directory has not been restored. If you place the 'cd a' in a
>> subshell, then the current directory remains unchanged for commands
>> outside the subshell (and you can drop the 'cd ..'):
>> 
>>(
>>cd a &&
>>git archive ...
>>) &&
>>make_dir ...
>>...
> 
> Thanks, and please indent the commands run in the subshell for
> better readability.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Additional git-archive tests

2013-12-09 Thread Nick Townsend

On 5 Dec 2013, at 11:52, Junio C Hamano  wrote:

> Nick Townsend  writes:
> 
>> Interplay between paths specified in three ways now tested:
>> * After a : in the tree-ish,
>> * As a pathspec in the command,
>> * By virtue of the current working directory
>> 
>> Note that these tests are based on the behaviours
>> as found in 1.8.5. They may not be intentional.
>> They were developed to regression test enhancements
>> made to parse_treeish_arg() in archive.c
> 
> In other words, are all these new tests expected to pass?
> 
Sorry about that - the first four tests do pass with 1.8.5, the last
three tests do not currently pass. I believe they should pass and
with my reworked git-archive patch (similar to the previous one)they do.
(though I removed that update from this set pending the discussion).
Currently 5 and 6 fail with the message:
‘fatal: current working directory is untracked’
and  number 7 fails with: ‘fatal: Not a valid object name'

> My cursory read of parse_treeish_arg() in archive.c is:
> 
> - It reads the given object with get_sha1(), checking if it is a
>   commit-ish or tree-ish to decide if it wants to add the pax
>   header to record the commit object name;
> 
> - It parses the tree object;
> 
> - If run from a subdirectory, attempts to grab the "prefix"
>   (i.e. the path to the current subdirectory---in the tests you
>   added, they are all "a/") out of that tree object (it errors out
>   if it can't); and then
> 
> - It archives the tree object.
> 
> So I do not think it is expected to accept tree object names with
> the HEAD: style syntax, if the user expects a predictable and
> consistent result.  The third step above attempts to make sure that
> you name a tree-ish that corresponds to the top-level of the
> project, i.e. with no .
> 
That’s not quite right - paths do work from the root directory - see tests.
It was this very useful capability that I sought to add and generalized
the code to do. 
> What seems to be supported are:
> 
>cd a && git archive HEAD ;# archives HEAD:a tree
>cd a && git archive HEAD -- b ;# archives a/b/ part of HEAD:a as b/
> 
> Specifically, it appears that HEAD:./b, HEAD:b etc. are not designed
> to work, at least to me.
> 
Clearly when you specify them today they don’t work, but I believe they
should work.

> I am not saying that these should _not_ work, but it is unclear what
> it means to "work".  For example, what should this do?
> 
>cd a && git archive HEAD:./b $pathspec
> 
I think we can clear this up by documenting the cases and choosing
sensible behaviour _for git-archive_ ie. what people might expect.
Here is a suggestion:

a. The pathspec is used as a selector to include things in the archive.
it is applied after the cwd and treeish selection.

b. The current working directory (if present) prefixes a path in the object
if one is present.

c. The path in the object (if present) is prefixed by the cwd (if present) and
used to select items for archiving. However the composite path so created
*is not present* in the archive - ie. the leading components are stripped.
(This is very useful behaviour for creating archives without leading paths)

d. As currently the —prefix option (not the prefix from setup_git_directory)
 is prepended to all entries in the archive.

These correspond to the use cases that I have for git archive - extracting all
or part of a multiple submodule tree into a tgz file, with or without leading
directories.

> The extended SHA-1 expression HEAD:./b in the subdirectory a/ is
> interpreted by get_sha1_with_context_1() to be the name of the tree
> object at path "a/b" in the commit HEAD.  Further, you are giving a
> pathspec while in a subdirectory a/ to the command.  What should
> that pathspec be relative to?
> 
> In a normal Git command, the pathspec always is relative to the
> current subdirectory, so, the way to learn about the tree object
> a/b/c in the HEAD while in subdirectory a/ would be:
> 
>cd a && git ls-tree HEAD b/c
> 
> But what should the command line for archive to grab HEAD:a/b/c be?
> It feels wrong to say:
> 
>cd a && git archive HEAD:./b b/c
It’s clear to me that if you are in a subdirectory, that is an implicit prefix 
on the 
./b so you retrieve HEAD:a/b  which then archives everything in b without the
leading a/b. The pathspec is wrong (including the b) and this archive is empty. 
> 
> and it also feels wrong to say
> 
>cd a && git archive HEAD:./b c
> 
That looks fine to me given the rules suggested above. Also git-parse manages
to return the correct thing in this case, so I’d expect this to work.

> No matter what we would do, we should behave consisten