[PATCH 1/4] git-gui: provide question helper for retry fallback on Windows

2019-09-26 Thread Heiko Voigt via GitGitGadget
From: Heiko Voigt 

Make use of the new environment variable GIT_ASK_YESNO to support the
recently implemented fallback in case unlink, rename or rmdir fail for
files in use on Windows. The added dialog will present a yes/no question
to the the user which will currently be used by the windows compat layer
to let the user retry a failed file operation.

Signed-off-by: Heiko Voigt 
Signed-off-by: Johannes Schindelin 
---
 Makefile  |  2 ++
 git-gui--askyesno | 51 +++
 git-gui.sh|  3 +++
 3 files changed, 56 insertions(+)
 create mode 100755 git-gui--askyesno

diff --git a/Makefile b/Makefile
index fe30be38dc..85633b73df 100644
--- a/Makefile
+++ b/Makefile
@@ -291,6 +291,7 @@ install: all
$(QUIET)$(INSTALL_D0)'$(DESTDIR_SQ)$(gitexecdir_SQ)' $(INSTALL_D1)
$(QUIET)$(INSTALL_X0)git-gui $(INSTALL_X1) 
'$(DESTDIR_SQ)$(gitexecdir_SQ)'
$(QUIET)$(INSTALL_X0)git-gui--askpass $(INSTALL_X1) 
'$(DESTDIR_SQ)$(gitexecdir_SQ)'
+   $(QUIET)$(INSTALL_X0)git-gui--askyesno $(INSTALL_X1) 
'$(DESTDIR_SQ)$(gitexecdir_SQ)'
$(QUIET)$(foreach p,$(GITGUI_BUILT_INS), 
$(INSTALL_L0)'$(DESTDIR_SQ)$(gitexecdir_SQ)/$p' 
$(INSTALL_L1)'$(DESTDIR_SQ)$(gitexecdir_SQ)/git-gui' 
$(INSTALL_L2)'$(DESTDIR_SQ)$(gitexecdir_SQ)/$p' $(INSTALL_L3) &&) true
 ifdef GITGUI_WINDOWS_WRAPPER
$(QUIET)$(INSTALL_R0)git-gui.tcl $(INSTALL_R1) 
'$(DESTDIR_SQ)$(gitexecdir_SQ)'
@@ -309,6 +310,7 @@ uninstall:
$(QUIET)$(CLEAN_DST) '$(DESTDIR_SQ)$(gitexecdir_SQ)'
$(QUIET)$(REMOVE_F0)'$(DESTDIR_SQ)$(gitexecdir_SQ)'/git-gui $(REMOVE_F1)
$(QUIET)$(REMOVE_F0)'$(DESTDIR_SQ)$(gitexecdir_SQ)'/git-gui--askpass 
$(REMOVE_F1)
+   $(QUIET)$(REMOVE_F0)'$(DESTDIR_SQ)$(gitexecdir_SQ)'/git-gui--askyesno 
$(REMOVE_F1)
$(QUIET)$(foreach p,$(GITGUI_BUILT_INS), 
$(REMOVE_F0)'$(DESTDIR_SQ)$(gitexecdir_SQ)'/$p $(REMOVE_F1) &&) true
 ifdef GITGUI_WINDOWS_WRAPPER
$(QUIET)$(REMOVE_F0)'$(DESTDIR_SQ)$(gitexecdir_SQ)'/git-gui.tcl 
$(REMOVE_F1)
diff --git a/git-gui--askyesno b/git-gui--askyesno
new file mode 100755
index 00..cf9c990d09
--- /dev/null
+++ b/git-gui--askyesno
@@ -0,0 +1,51 @@
+#!/bin/sh
+# Tcl ignores the next line -*- tcl -*- \
+exec wish "$0" -- "$@"
+
+# This is an implementation of a simple yes no dialog
+# which is injected into the git commandline by git gui
+# in case a yesno question needs to be answered.
+
+set NS {}
+set use_ttk [package vsatisfies [package provide Tk] 8.5]
+if {$use_ttk} {
+   set NS ttk
+}
+
+if {$argc < 1} {
+   puts stderr "Usage: $argv0 "
+   exit 1
+} else {
+   set prompt [join $argv " "]
+}
+
+${NS}::frame .t
+${NS}::label .t.m -text $prompt -justify center -width 400px
+.t.m configure -wraplength 400px
+pack .t.m -side top -fill x -padx 20 -pady 20 -expand 1
+pack .t -side top -fill x -ipadx 20 -ipady 20 -expand 1
+
+${NS}::frame .b
+${NS}::frame .b.left -width 200
+${NS}::button .b.yes -text Yes -command yes
+${NS}::button .b.no  -text No  -command no
+
+
+pack .b.left -side left -expand 1 -fill x
+pack .b.yes -side left -expand 1
+pack .b.no -side right -expand 1 -ipadx 5
+pack .b -side bottom -fill x -ipadx 20 -ipady 15
+
+bind .  {exit 0}
+bind .  {exit 1}
+
+proc no {} {
+   exit 1
+}
+
+proc yes {} {
+   exit 0
+}
+
+wm title . "Question?"
+tk::PlaceWindow .
diff --git a/git-gui.sh b/git-gui.sh
index f9b323abff..76d8139b8d 100755
--- a/git-gui.sh
+++ b/git-gui.sh
@@ -1248,6 +1248,9 @@ set have_tk85 [expr {[package vcompare $tk_version "8.5"] 
>= 0}]
 if {![info exists env(SSH_ASKPASS)]} {
set env(SSH_ASKPASS) [gitexec git-gui--askpass]
 }
+if {![info exists env(GIT_ASK_YESNO)]} {
+   set env(GIT_ASK_YESNO) [gitexec git-gui--askyesno]
+}
 
 ##
 ##
-- 
gitgitgadget



Re: [PATCH] gitk: fix --all behavior combined with --not

2019-07-11 Thread Heiko Voigt
On Wed, Jul 10, 2019 at 11:40:50AM -0700, Junio C Hamano wrote:
> Heiko Voigt  writes:
> 
> > behavior. How about '--all-include-head'. Then e.g.
> >
> > git rev-parse --all-include-head --all --not origin/master
> >
> > would include the head ref like you proposed below?
> >
> > What do you think? Or would you rather go the route of changing
> > rev-parse behavior?
> 
> Depends on what you mean by the above.  Do you mean that now the end
> user needs to say
> 
>   gitk --all-include-head --not origin/master
> 
> to get a rough equivalent of
> 
>   git log --graph --oneline --all --not origin/master
> 
> due to the discrepancy between how "rev-parse" and "rev-list" treat
> their "--all" option?  Or do you mean that the end user still says
> "--all", and after (reliably by some means) making sure that "--all"
> given by the end-user is a request for "all refs and HEAD", we turn
> that into the above internal rev-parse call?

Sorry for being not specific enough. I would be aiming for the latter
and gitk would prepend --all-include-head to its rev-parse call. To have some
code to talk about something like this (based on your pointer and also not
compile tested):

diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c
index f8bbe6d47e..03928ee566 100644
--- a/builtin/rev-parse.c
+++ b/builtin/rev-parse.c
@@ -585,6 +585,7 @@ static void handle_ref_opt(const char *pattern, const char 
*prefix)
 int cmd_rev_parse(int argc, const char **argv, const char *prefix)
 {
int i, as_is = 0, verify = 0, quiet = 0, revs_count = 0, type = 0;
+   int all_include_head = 0;
int did_repo_setup = 0;
int has_dashdash = 0;
int output_prefix = 0;
@@ -764,8 +765,14 @@ int cmd_rev_parse(int argc, const char **argv, const char 
*prefix)
}
continue;
}
+   if (!strcmp(arg, "--all-include-head")) {
+   all_include_head = 1;
+   continue;
+   }
if (!strcmp(arg, "--all")) {
for_each_ref(show_reference, NULL);
+   if (all_include_head)
+   head_ref(show_reference, NULL);
clear_ref_exclusion(&ref_excludes);
continue;
}
diff --git a/gitk-git/gitk b/gitk-git/gitk
index a14d7a16b2..ddd1de5377 100755
--- a/gitk-git/gitk
+++ b/gitk-git/gitk
@@ -294,8 +294,8 @@ proc parseviewrevs {view revs} {
 
 if {$revs eq {}} {
set revs HEAD
-} elseif {[lsearch -exact $revs --all] >= 0} {
-   lappend revs HEAD
+} else {
+   linsert revs 0 --all-include-head
 }
 if {[catch {set ids [eval exec git rev-parse $revs]} err]} {
# we get stdout followed by stderr in $err

> If the former, then quite honestly, we shouldn't doing anything,
> perhaps other than reverting 4d5e1b1319.  The users can type
> 
>   $ gitk --all HEAD --not origin/master
>   $ gitk $commit --not --all HEAD
> 
> themselves, instead of --all-include-head.

Yes the former would not make anything better.

> If the latter, I am not sure what the endgame should be.  

Please see the diff above.

> It certainly *is* safer not to unconditionallyl and unilaterally
> change the behaviour of "rev-parse --all", so I am all for starting
> with small and fully backward compatible change, but wouldn't
> scripts other than gitk want the same behaviour?  

Yes probably, but in my experience, if some behavior is around for a long time,
someone will rely on it and rev-parse seems like a candidate that might get
used in scripts for CIs or similar. E.g. in a bare repo someone might
explicitely want to omit HEAD.

> To put it the other way around, what use case would we have that we
> want to enumerate all refs but not HEAD, *and* exclude HEAD only
> when HEAD is detached?  I can see the use of "what are commits
> reachable from the current HEAD but not reachable from any of the
> refs/*?" and that would be useful whether HEAD is detached or is on
> a concrete branch, so "rev-parse --all" that does not include
> detached HEAD alone does not feel so useful at least to me.

What about my example. My use case is: Show me everything that is not merged
into a stable branch (i.e. origin/master). For a human viewer it does not
really matter if an extra detachted HEAD is shown, but for a CI script it
might. Ok this might be quite artificial, what do you think?

> I am reasonably sure that back when "rev-parse --all" was invented,
> the us

Re: [PATCH] gitk: fix --all behavior combined with --not

2019-07-10 Thread Heiko Voigt
On Mon, Jul 08, 2019 at 10:16:50PM -0700, Junio C Hamano wrote:
> Junio C Hamano  writes:
> 
> > The "--all" in rev-list family (including "git log") unconditionally
> > include HEAD.  The glitch here is that "--all" in rev-parse does
> > not.  And 4d5e1b1319 was an attempt to "fix" that, i.e. make "--all"
> > imply "HEAD".
> 
> And it becomes really tempting to get rid of that "let's tweak
> --all" hack and declare that "rev-parse --all" is simply buggy,
> proposing a simple "bugfix" that may look like this (not even
> compile tested, but you get the idea).

Thanks for this nice pointer.

Lets think about this a little more, because this would give us a proper
solution. There would be a need to be backwards compatible to not break
peoples scripts right? The documentation says --all "Show all refs found
in refs/" so IMO we need some extra option that changes the '--all'
behavior. How about '--all-include-head'. Then e.g.

git rev-parse --all-include-head --all --not origin/master

would include the head ref like you proposed below?

What do you think? Or would you rather go the route of changing
rev-parse behavior?

Cheers Heiko

> 
>  builtin/rev-parse.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c
> index f8bbe6d47e..94f9a6efba 100644
> --- a/builtin/rev-parse.c
> +++ b/builtin/rev-parse.c
> @@ -766,6 +766,7 @@ int cmd_rev_parse(int argc, const char **argv, const char 
> *prefix)
>   }
>   if (!strcmp(arg, "--all")) {
>   for_each_ref(show_reference, NULL);
> + head_ref(show_reference, NULL);
>   clear_ref_exclusion(&ref_excludes);
>   continue;
>   }


Re: [PATCH] gitk: fix --all behavior combined with --not

2019-07-10 Thread Heiko Voigt
On Mon, Jul 08, 2019 at 09:55:00PM -0700, Junio C Hamano wrote:
> Junio C Hamano  writes:
> 
> > Heiko Voigt  writes:
> >
> >> In commit 4d5e1b1319 ("gitk: Show detached HEAD if --all is specified",
> >> 2014-09-09) the intention was to have detached HEAD shown when the --all
> >> argument is given.
> >
> > The "do we have --all?" test added by that old commit is not quite
> > satisfying in the first place.  E.g. we do not check if there is a
> > double-dash before it.  This change also relies on an ancient design
> > mistake of allowing non-dashed options before a dashed one, adding
> > more to dissatisfaction by making a future change to correct the
> > design mistake harder.
> 
> Actually, I do not think this patch is a good idea.
> 
[...]
> 
> As the code is _already_ finding the _exact_ location on the command
> line where "--all" appears, I think you can go one step further and
> make sure you insert the "HEAD" immediately after "--all", as that
> exactly matches what you (and the ancient 4d5e1b1319) are trying to
> achieve: pretend as if "--all" always include "HEAD", even when it
> is detached.
> 
> This is orthogonal to the question I posed in my earlier reply
> (i.e. "we found --all; is it really a 'give me all refs' request
> given by the user, or something else (is it an argument to another
> option, like "--grep '--all'", or is it pathspec after '--'), but
> assuming that we have reliably found the "--all" on the command line
> the user meant as "give me all refs", I think inserting HEAD
> immediately after that location would be the right solution.  It is
> incorrect to unconditionally append as your original example shows,
> but it is equally incorrect to unconditionally prepend.

Yes I agree, there are too many other use cases that my change will
break. I tried to replace a hack with another quick hack, but that did
not make it better.

Will reply to the other mail with some more questions.

Cheers Heiko


Re: [PATCH] gitk: fix --all behavior combined with --not

2019-07-04 Thread Heiko Voigt
Hi Dscho,

On Thu, Jul 04, 2019 at 12:38:44PM +0200, Johannes Schindelin wrote:
> On Thu, 4 Jul 2019, Heiko Voigt wrote:
[...]
> > Signed-off-by: Heiko Voigt 
> 
> Good description.

Thanks. I am actually surprised that for almost 5 years nobody noticed
this. It seems either nobody is using --not this way or everyone took it
as a feature that HEAD would be removed and will complain once this get
released ;)

I usually use the caret notation, but I guess this time I was lazy and
the dash was easier to type...

> > ---
> >  gitk | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/gitk b/gitk
> > index a14d7a1..19d95cd 100755
> > --- a/gitk
> > +++ b/gitk
> > @@ -295,7 +295,7 @@ proc parseviewrevs {view revs} {
> >  if {$revs eq {}} {
> > set revs HEAD
> >  } elseif {[lsearch -exact $revs --all] >= 0} {
> > -   lappend revs HEAD
> > +   linsert revs 0 HEAD
> 
> For a moment, I wondered whether there is any case where `HEAD` might not
> be appropriate as first argument, but you're right, the revision parsing
> machinery allows mixing options and rev arguments.

Thanks for double checking.

> In short: this patch looks good to me.

Thanks for the quick review!

Cheers Heiko


[PATCH] gitk: fix --all behavior combined with --not

2019-07-04 Thread Heiko Voigt
In commit 4d5e1b1319 ("gitk: Show detached HEAD if --all is specified",
2014-09-09) the intention was to have detached HEAD shown when the --all
argument is given.

This was solved by appending HEAD to the revs list. By doing that the
behavior using the --not argument is now broken, since that inverts the
meaning of all following arguments passed to git rev-parse.

Lets fix this by prepending HEAD instead of appending, this way there
can not be any '--not' in front.

This was discovered because

gitk --all --not origin/master

does not display the same revs as

gitk --all ^origin/master

which it should.

Signed-off-by: Heiko Voigt 
---
 gitk | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gitk b/gitk
index a14d7a1..19d95cd 100755
--- a/gitk
+++ b/gitk
@@ -295,7 +295,7 @@ proc parseviewrevs {view revs} {
 if {$revs eq {}} {
set revs HEAD
 } elseif {[lsearch -exact $revs --all] >= 0} {
-   lappend revs HEAD
+   linsert revs 0 HEAD
 }
 if {[catch {set ids [eval exec git rev-parse $revs]} err]} {
# we get stdout followed by stderr in $err
-- 
2.21.0



Re: Adding nested repository with slash adds files instead of gitlink

2018-06-19 Thread Heiko Voigt
On Mon, Jun 18, 2018 at 11:12:15AM -0700, Brandon Williams wrote:
> On 06/18, Duy Nguyen wrote:
> > This sounds like the submodule specific code in pathspec.c, which has
> > been replaced with something else in bw/pathspec-sans-the-index. If
> > you have time, try a version without those changes (e.g. v2.13 or
> > before) to see if it's a possible culprit.
> 
> I just tested this with v2.13 and saw the same issue.  I don't actually
> think this ever worked in the way you want it to Heiko.  Maybe git add
> needs to be taught to be more intelligent when trying to add a submodule
> which doesn't exist in the index.

That was also my guess, since my feeling is that this is a quite rare
use case. Adding submodules alone is not a daily thing, let alone
selecting different changes after 'git submodule add'.

I also think git could be more intelligent here.

Cheers Heiko


Re: Adding nested repository with slash adds files instead of gitlink

2018-06-19 Thread Heiko Voigt
Hi,

On Mon, Jun 18, 2018 at 05:55:44PM +0200, Kevin Daudt wrote:
> On Mon, Jun 18, 2018 at 01:19:19PM +0200, Heiko Voigt wrote:
> > I just discovered that when you have a slash at the end of a nested
> > repository, the files contained in the repository get added instead of
> > the gitlink.
[...]
> > 
> > I just thought I put this out there. Will have a look if I find the time
> > to cook up a proper testcase and investigate.
> > 
> > Cheers Heiko
> 
> This has been the case as far as I can remember, and is basically lore
> in the #git irc channel).
> 
> This can also be reproduced by just cloning a repo inside another repo
> and running `git add path/`.

Interesting and nobody complained to the mailinglist? IMO, there is no
reason 'git add path' and 'git add path/' should behave differently or
is there?

So it seems it is a very seldom operation (in my daily work it is at
least) and people just accepted it as a quirk of git.

If someone wants to look into changing this feel free, otherwise I will
have a look, but I am not sure when yet.

Cheers Heiko


Adding nested repository with slash adds files instead of gitlink

2018-06-18 Thread Heiko Voigt
Hi,

I just discovered that when you have a slash at the end of a nested
repository, the files contained in the repository get added instead of
the gitlink.

I found this when I was adding a submodule and wanted to commit a small
change before that. You get the slash by using tab autocompletion.

Here is a recipe to reproduce:

mkdir test
cd test; git init
touch a; git add a; git commit -m a
mkdir ../test.git; (cd ../test.git; git init --bare)
git remote add origin ../test.git
git push origin master
git submodule add ../test.git submodule
git reset
git add submodule/

Now instead of just submodule gitlink there is an entry for submodule/a
in the index.

I just thought I put this out there. Will have a look if I find the time
to cook up a proper testcase and investigate.

Cheers Heiko


Re: [PATCH] t5526: test recursive submodules when fetching moved submodules

2018-06-14 Thread Heiko Voigt
On Thu, Jun 14, 2018 at 10:37:30AM -0700, Stefan Beller wrote:
> The topic merged in 0c7ecb7c311 (Merge branch 'sb/submodule-move-nested',
> 2018-05-08) provided support for moving nested submodules.
> 
> Remove the NEEDSWORK comment and implement the nested submodules test as
> the comment hinted at.
> 
> Signed-off-by: Stefan Beller 
> ---

Looks good to me.

Cheers Heiko


Re: [PATCH] submodule: fix NULL correctness in renamed broken submodules

2018-06-14 Thread Heiko Voigt
Hi,

On Thu, Jun 14, 2018 at 10:31:07AM -0700, Stefan Beller wrote:
> When fetching with recursing into submodules, the fetch logic inspects
> the superproject which submodules actually need to be fetched. This is
> tricky for submodules that were renamed in the fetched range of commits.
> This was implemented in c68f8375760 (implement fetching of moved
> submodules, 2017-10-16), and this patch fixes a mistake in the logic
> there.
> 
> When the warning is printed, the `name` might be NULL as
> default_name_or_path can return NULL, so fix the warning to use the path
> as obtained from the diff machinery, as that is not NULL.
> 
> While at it, make sure we only attempt to load the submodule if a git
> directory of the submodule is found as default_name_or_path will return
> NULL in case the git directory cannot be found. Note that passing NULL
> to submodule_from_name is just a semantic error, as submodule_from_name
> accepts NULL as a value, but then the return value is not the submodule
> that was asked for, but some arbitrary other submodule. (Cf. 'config_from'
> in submodule-config.c: "If any parameter except the cache is a NULL
> pointer just return the first submodule. Can be used to check whether
> there are any submodules parsed.")
> 
> Reported-by: Duy Nguyen 
> Helped-by: Duy Nguyen 
> Helped-by: Heiko Voigt 
> Signed-off-by: Stefan Beller 
> ---

Looks good to me.

Cheers Heiko


Re: BUG: submodule code prints '(null)'

2018-06-14 Thread Heiko Voigt
On Mon, Jun 11, 2018 at 03:56:16PM -0700, Stefan Beller wrote:
> On Sat, Jun 9, 2018 at 4:04 AM Duy Nguyen  wrote:
> >
> > On Tue, Jun 05, 2018 at 05:31:41PM +0200, Duy Nguyen wrote:
> > > I do not know how to reproduce this (and didn't bother to look deeply
> > > into it after I found it was not a trivial fix) but one of my "git
> > > fetch" showed
> > >
> > > warning: Submodule in commit be2db96a6c506464525f588da59cade0cedddb5e
> > > at path: '(null)' collides with a submodule named the same. Skipping
> > > it.
> >
> > The problem is default_name_or_path() can return NULL when a submodule
> > is not populated. The fix could simply be printing path instead of
> > name (because we are talking about path in the commit message), like
> > below.
> >
> > But I don't really understand c68f837576 (implement fetching of moved
> > submodules - 2017-10-16), the commit that made this change, and not
> > sure if we should be reporting name here or path. Heiko?
> >
> > diff --git a/submodule.c b/submodule.c
> > index 939d6870ec..61c2177755 100644
> > --- a/submodule.c
> > +++ b/submodule.c
> > @@ -745,7 +745,7 @@ static void collect_changed_submodules_cb(struct 
> > diff_queue_struct *q,
> 
> [Not in the context of this patch, but in the code right before the
> context starts:]
> 
> name = default_name_or_path(p->two->path);
> /* make sure name does not collide with existing one */
> submodule = submodule_from_name(the_repository, commit_oid, name);
> if (submodule) {
> 
> Currently I see 4 callers of default_name_or_path and the other 3 except this
> one have checks against NULL in place, which is good.
> However I think we have to guard this even more, and have to check
> for !name before we call submodule_from_name.
> 
> It is technically ok to call submodule_from_name with a NULL name,
> but it is semantically broken, see the comment in config_from that
> is called from submodule_from_name:
> 
> /*
>  * If any parameter except the cache is a NULL pointer just
>  * return the first submodule. Can be used to check whether
>  * there are any submodules parsed.
>  */
> if (!treeish_name || !key) {
> ...
> 
> 
> > warning("Submodule in commit %s at path: "
> > "'%s' collides with a submodule 
> > named "
> > "the same. Skipping it.",
> > -   oid_to_hex(commit_oid), name);
> > +   oid_to_hex(commit_oid), 
> > p->two->path);
> 
> This is correct for the error message, both in terms of not crashing as well
> as correctness, we really need to report a *path* here and no the 
> name_or_path,
> which  default_name_or_path gives.

Sorry for the late reply. I agree with Stefan, this change is correct,
since we are talking about the path in the warning message anyway. It
seems to me that this resulted from the name being the same as the path
in this location here.

I think we should report both here, if we can, the path and the name.

We are skipping the submodule if we can not get a name later:

if (!name) 
continue;

I also agree, that it does not make sense to call submodule_from_name
with a NULL name. I think we should simply skip calling it in case the
name is NULL and then let the code later handle it.

E.g.: 

...
/* make sure name does not collide with existing one */
if (name)   
submodule = submodule_from_name(the_repository, 
commit_oid, name);
if (submodule) {
warning("Submodule in commit %s at path: "
...

Would you want to update your patch? Or should I put one on top?

Cheers Heiko


Re: git merge banch w/ different submodule revision

2018-05-04 Thread Heiko Voigt
Hi,

On Fri, May 04, 2018 at 08:29:32AM +, Middelschulte, Leif wrote:
> Am Donnerstag, den 03.05.2018, 18:42 +0200 schrieb Heiko Voigt:
> > I still do not understand how the current behaviour is mismatching with
> > users expectations. Let's assume that you directly tracked the files of
> > L in your product repository P, without any submodule boundary. How
> > would the behavior be different? Would it be? If D started on an older
> > revision and gets merged into a newer revision, there can always be
> > regressions even without submodules.
> > 
> > Why would the core developer need to be informed about mismatching
> > revisions if he himself advanced the submodule?
> In that case you'd be right. I should have picked my example more wisely.
> Assume right here that not a core developer, but another developer advanced
> the submodule (also via feature branch + merge).
> > 
> > It seems to me that you do not want to mix integration testing and
> > testing of the feature itself. 
> That's on point. That's why it would be nice if git *at least* warned
> about the different revisions wrt submodules.
> 
> But, I guess, I learned something about submodules:
> I used to think of submodules as means to pin down a specific revision like: 
> `ver == x`.
> Now I'm learning that submodules are treated as `ver >= x` during a merge.

Well a submodule version is pinned down as long a you do not change it
and commit it. The same as files and the goal is to make submodules
behave as close to normal files as possible. And git "warns" about
changed submodules by displaying them in the diff.

Actually the use case you are describing is not even involving a real
merge for submodules. It is just changing the pointer to another
revision.

> > How about just testing/reviewing on the
> > branch then? You would still get the submodule revision D was working on
> > and then in a later stage check if integration with everything else
> > works.
> Sure. But if the behavior deviates after a merge the merging developer is 
> currently not
> aware that it *might* have to do with different submodule revisions used, not 
> the "actual" code merged.
> 
> Like not even "beware: the (feature) branch you've merged used an 'older' 
> revision of X"

The submodule is part of the "actual" code and should be reviewed the
same. Maybe you want to set the diff.submodule option to 'diff' ? Then
git shows the actual diff of the changed contents in the submodule and
it would be more obvious how the code changed.

At the moment it seems to me that you want submodules to behave
differently than we handle normal files/directories which is the
opposite direction we have been trying to get git into. My feeling
though is that this should be covered by the review process instead of a
failing merge. Another option would be that you could write a hook that
warns reviewers that they are merging a submodule update.

Cheers Heiko


Re: git merge banch w/ different submodule revision

2018-05-03 Thread Heiko Voigt
Hi,

On Wed, May 02, 2018 at 07:30:25AM +, Middelschulte, Leif wrote:
> Am Montag, den 30.04.2018, 19:02 +0200 schrieb Heiko Voigt:
> > On Thu, Apr 26, 2018 at 03:19:36PM -0700, Stefan Beller wrote:
> > > Stefan wrote:
> > > > See 
> > > > https://github.com/git/git/commit/68d03e4a6e448aa557f52adef92595ac4d6cd4bd
> > > > (68d03e4a6e (Implement automatic fast-forward merge for submodules, 
> > > > 2010-07-07)
> > > > to explain the situation you encounter. (specifically merge_submodule
> > > > at the end of the diff)
> > > 
> > > +cc Heiko, author of that commit.
> > 
> > In that commit we tried to be very careful about. I do not understand
> > the situation in which the current strategy would be wrong by default.
> > 
> > We only merge if the following applies:
> > 
> >  * The changes in the superproject on both sides point forward in the
> >submodule.
> > 
> >  * One side is contained in the other. Contained from the submodule
> >perspective. Sides from the superproject merge perspective.
> > 
> > So in case of the mentioned rewind of a submodule: Only one side of the
> > 3-way merge would point forward and the merge would fail.
> > 
> > I can imagine, that in case of a temporary revert of a commit in the
> > submodule that you would not want that merged into some other branch.
> > But that would be the same without submodules. If you merge a temporary
> > revert from another branch you will not get any conflict.
> > 
> > So maybe someone can explain the use case in which one would get the
> > results that seem wrong?
> In an ideal world, where there are no regressions between revisions, a
> fast-forward is appropriate. However, we might have regressions within
> submodules.
> 
> So the usecase is the following:
> 
> Environment:
> - We have a base library L that is developed by some team (Team B).
> - Another team (Team A) developes a product P based on those libraries using 
> git-flow.
> 
> Case:
> The problem occurs, when a developer (D) of Team A tries to have a feature
> that he developed on a branch accepted by a core developer of P:
> If a core developer of P advanced the reference of L within P (linear 
> history), he might
> deem the work D insufficient. Not because of the actual work by D, but 
> regressions
> that snuck into L. The core developer will not be informed about the 
> missmatching
> revisions of L.
> 
> So it would be nice if there was some kind of switch or at least some trigger.

I still do not understand how the current behaviour is mismatching with
users expectations. Let's assume that you directly tracked the files of
L in your product repository P, without any submodule boundary. How
would the behavior be different? Would it be? If D started on an older
revision and gets merged into a newer revision, there can always be
regressions even without submodules.

Why would the core developer need to be informed about mismatching
revisions if he himself advanced the submodule?

It seems to me that you do not want to mix integration testing and
testing of the feature itself. How about just testing/reviewing on the
branch then? You would still get the submodule revision D was working on
and then in a later stage check if integration with everything else
works.

Cheers Heiko


Re: git merge banch w/ different submodule revision

2018-04-30 Thread Heiko Voigt
On Thu, Apr 26, 2018 at 03:19:36PM -0700, Stefan Beller wrote:
> Stefan wrote:
> > See 
> > https://github.com/git/git/commit/68d03e4a6e448aa557f52adef92595ac4d6cd4bd
> > (68d03e4a6e (Implement automatic fast-forward merge for submodules, 
> > 2010-07-07)
> > to explain the situation you encounter. (specifically merge_submodule
> > at the end of the diff)
> 
> +cc Heiko, author of that commit.

In that commit we tried to be very careful about. I do not understand
the situation in which the current strategy would be wrong by default.

We only merge if the following applies:

 * The changes in the superproject on both sides point forward in the
   submodule.

 * One side is contained in the other. Contained from the submodule
   perspective. Sides from the superproject merge perspective.

So in case of the mentioned rewind of a submodule: Only one side of the
3-way merge would point forward and the merge would fail.

I can imagine, that in case of a temporary revert of a commit in the
submodule that you would not want that merged into some other branch.
But that would be the same without submodules. If you merge a temporary
revert from another branch you will not get any conflict.

So maybe someone can explain the use case in which one would get the
results that seem wrong?

Cheers Heiko


Re: submodule..ignore vs git add -u

2018-03-15 Thread Heiko Voigt
Hi,

On Mon, Mar 12, 2018 at 04:59:25PM +0100, Miklos Vajna wrote:
> Let's say I have a fairly simple submodule setup where I do 'git
> checkout' inside the submodule to check out a different commit, so the
> outer repo 'git diff' shows a submodule update.
> 
> In that case
> 
> git config submodule..ignore all
> 
> makes 'git diff' or 'git commit -a' ignore the change in the outer repo,
> but not 'git add -u'.
> 
> Reading the git-config documentation if this is intentional behavior,
> I'm a bit confused. It specifies that:
> 
> - "git status" and the diff family: handle this setting
> - git submodule commands: ignore this setting
> 
> So that about 'git add -u', is it expected that it ignores this setting
> as well?
> 
> I guess either the doc should say 'git add -u' doesn't handle this
> setting or 'git add -u' should handle it. Happy to try to make a patch
> that does the later, but I though better ask first. :-)

Have a look here for a previous discussion.

https://public-inbox.org/git/20131204221659.GA7326@sandbox-ub/

I think I never got around finishing those patches, because the
discussion died and there was no reply from the original poster asking
for this.

Maybe you could have a look at my original branch and whether that would
be the behavior you expect. I had a look into porting those patches to
the current master, but there are still some test failures.

You can see and test my current WIP branch here:

https://github.com/hvoigt/git/commits/hv/fix_ignore_all_submodules_update1

Cheers Heiko


Re: [PATCH V4] config: add --expiry-date

2017-11-30 Thread Heiko Voigt
On Mon, Nov 20, 2017 at 03:37:03PM -0500, Jeff King wrote:
> On Mon, Nov 20, 2017 at 12:28:11PM -0800, Stefan Beller wrote:
> 
> > > +cc Stefan, who added the die(). It may be that we don't care that much
> > > these days about recovering from broken .gitmodules files.
> > 
> > By that you mean commits like 37f52e9344 (submodule-config:
> > keep shallow recommendation around, 2016-05-26) for example?
> > That adds a git_config_bool to the submodule config machinery.
> 
> I actually meant ea2fa5a338 (submodule-config: keep update strategy
> around, 2016-02-29), which adds an actual die() into parse_config(). But
> yeah, I think the end result is the same.
> 
> > I agree that we'd want to be more careful, but for now I'd put it to the
> > #leftoverbits.
> 
> Fine by me. While I think the original intent was to be more lenient to
> malformed .gitmodules, it's not like we're seeing bug reports about it.

My original intent was not about being more lenient about malformed
.gitmodules but having a way to deal with repository history that might
have a malformed .gitmodules in its history. Since depending on the
branch it is on it might be quite carved in stone.
On an active project it would not be that easy to rewrite history to get
out of that situation.

When a .gitmodules file in the worktree is malformed it is easy to fix.
That is not the case when we are reading configurations from blobs.

My guess why there are no reports is that maybe not too many users are
using this infrastructure yet, plus it is probably seldom that someone
edits the .gitmodules file by hand which could lead to such a situation.
But if such an error occurs it will be very annoying if we die while
parsing submodule configurations. The only solution I see currently is
to turn submodule recursion off completely.

But maybe I am being overly cautious here.

Cheers Heiko


Re: [PATCH] status: do not get confused by submodules in excluded directories

2017-10-25 Thread Heiko Voigt
On Wed, Oct 25, 2017 at 10:28:25AM +0900, Junio C Hamano wrote:
> Heiko Voigt  writes:
> 
> > On Tue, Oct 24, 2017 at 02:18:49PM +0900, Junio C Hamano wrote:
> >> Johannes Schindelin  writes:
> >> 
> >> > We meticulously pass the `exclude` flag to the `treat_directory()`
> >> > function so that we can indicate that files in it are excluded rather
> >> > than untracked when recursing.
> >> >
> >> > But we did not yet treat submodules the same way.
> >> 
> >> ... "because of that, we ended up showing < >> what situation>>" would be a nice thing to have here, so that it can
> >> be copied to the release notes for the bugfix.  
> >
> > Yes I agree that would be nice here. It was not immediately obvious that
> > this only applies when using both flags: -u and --ignored.
> 
> Does any of you care to fill in the <> then? ;-)

How about:

Because of that, we ended up showing the submodule as untracked and its
content as ignored files when using the --ignored and -u flags with git
status.

? But maybe Dscho also has some more information to add about his
situation?

Cheers Heiko


Re: [PATCH] status: do not get confused by submodules in excluded directories

2017-10-24 Thread Heiko Voigt
On Tue, Oct 24, 2017 at 02:18:49PM +0900, Junio C Hamano wrote:
> Johannes Schindelin  writes:
> 
> > We meticulously pass the `exclude` flag to the `treat_directory()`
> > function so that we can indicate that files in it are excluded rather
> > than untracked when recursing.
> >
> > But we did not yet treat submodules the same way.
> 
> ... "because of that, we ended up showing < what situation>>" would be a nice thing to have here, so that it can
> be copied to the release notes for the bugfix.  

Yes I agree that would be nice here. It was not immediately obvious that
this only applies when using both flags: -u and --ignored.

Seems to be a corner that not many people are using. At first I thought
a plain 'git status' would show that behavior...

> How far back a release do we want to make this fix applicable?  It
> seems that it applies cleanly to maint-2.13 without breaking from my
> quick test, so that is probably where I'll queue this, even though
> we may no longer issue further maintenance releases on that track.
> 
> Any comment from submodule folks?
> 
> Sorry that I didn't notice this was left unattended by anybody til
> now.  Will queue while waiting for those who are into submodules to
> respond.

Looks good to me.

Cheers Heiko


Re: [PATCH 1/2] t5526: check for name/path collision in submodule fetch

2017-10-23 Thread Heiko Voigt
On Thu, Oct 19, 2017 at 11:11:08AM -0700, Stefan Beller wrote:
> Signed-off-by: Stefan Beller 
> ---
> 
> This is just to test the corner case we're discussing.
> Applies on top of origin/hv/fetch-moved-submodules-on-demand.
> 
> 
>  t/t5526-fetch-submodules.sh | 42 ++
>  1 file changed, 42 insertions(+)
> 
> diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
> index a552ad4ead..c82d519e06 100755
> --- a/t/t5526-fetch-submodules.sh
> +++ b/t/t5526-fetch-submodules.sh
> @@ -571,6 +571,7 @@ test_expect_success 'fetching submodule into a broken 
> repository' '
>  '
>  
>  test_expect_success "fetch new commits when submodule got renamed" '
> + test_when_finished "rm -rf downstream_rename" &&
>   git clone . downstream_rename &&
>   (
>   cd downstream_rename &&
> @@ -605,4 +606,45 @@ test_expect_success "fetch new commits when submodule 
> got renamed" '
>   test_cmp expect actual
>  '
>  
> +test_expect_success "warn on submodule name/path clash, but new commits 
> fetched in renamed" '
> + test_when_finished "rm -rf downstream_rename" &&
> + git clone . downstream_rename &&
> + (
> + cd downstream_rename &&
> + git submodule update --init &&
> +# NEEDSWORK: we omitted --recursive for the submodule update here since
> +# that does not work. See test 7001 for mv "moving nested submodules"
> +# for details. Once that is fixed we should add the --recursive option
> +# here.
> + git checkout -b rename &&
> + git mv submodule submodule_renamed &&
> + (
> + cd submodule_renamed &&
> + git checkout -b rename_sub &&
> + echo a >a &&
> + git add a &&
> + git commit -ma &&
> + git push origin rename_sub &&
> + git rev-parse HEAD >../../expect
> + ) &&
> + git add submodule_renamed &&
> + git commit -m "update renamed submodule" &&
> + # produce collision, note that we use no submodule command
> + git clone ../submodule submodule &&
> + git add submodule &&

A small note even though this is not meant for inclusion: This would
break when I start working on teaching 'git add' to set default values
in .gitmodules when available.

But I guess I will discover a few other places, when starting that, that
will break in the tests anyway.

Cheers Heiko


Re: [PATCH 2/2] fetch, push: keep separate lists of submodules and gitlinks

2017-10-23 Thread Heiko Voigt
On Thu, Oct 19, 2017 at 11:11:09AM -0700, Stefan Beller wrote:
> Currently when fetching we collect the names of submodules to be fetched
> in a list. As we also want to support fetching 'gitlinks, that happen to
> have a repo checked out at the right place', we'll just pretend that these
> are submodules. We do that by assuming their path is their name. This in
> turn can yield collisions between the name-namespace and the
> path-namespace. (See the previous test for a demonstration.)
> 
> This patch rewrites the code such that we treat the 'real submodule' case
> differently from the 'gitlink, but ok' case. This introduces a bit
> of code duplication, but gets rid of the confusing mapping between names
> and paths.
> 
> The test is incomplete as the long term vision is not achieved yet.
> (which would be fetching both the renamed submodule as well as
> the gitlink thing, putting them in place via e.g. git-pull)
> 
> Signed-off-by: Stefan Beller 
> ---
> 
>  Heiko,
>  Junio,
> 
>  I assumed the code would ease up a lot more, but now I am undecided if
>  I want to keep arguing as the code is not stopping to be ugly. :)

So we are basically coming to the same conclusion? :) My previous
fallback approach basically did the same but with the old architecture
(without parallel fetch, ...) and was already ugly.

With the fallback on submodule default names approach we can keep most
of the old functionality and keep the code that handles that minimal.

Since there is only a small (IMO quite unlikely) cornercase that could
break peoples expectations I would like to have a look whether anyone
even notices the behavioral change on next or master. If there are
complaints we can still extend and add the two lists.

>  The idea is to treat submodule and gitlinks separately, with submodules
>  supporting renames, and gitlinks as a historic artefact.
>  
>  Sorry for the noise about code ugliness.

Why sorry? For me it is actually interesting to see you basically coming
to the same conclusions.

Cheers Heiko


Re: [PATCH v4 3/3] submodule: simplify decision tree whether to or not to fetch

2017-10-19 Thread Heiko Voigt
On Thu, Oct 19, 2017 at 09:36:47AM +0900, Junio C Hamano wrote:
> Brandon Williams  writes:
> 
> > On 10/16, Heiko Voigt wrote:
> >> To make extending this logic later easier.
> >
> > This makes things so much clearer, thanks!
> 
> I agree that it is clear to see what the code after the patch does,
> but the code before the patch is so convoluted to follow that it is
> a bit hard to see if the code before and after are doing the same
> thing, though ;-)

That is why I would appreciate some extra pairs of eyes on this :) I
tried to be as careful as possible when refactoring this, but since it
is quite convoluted something might have slipped through. The testsuite
does not show anything, but there might be corner cases that are not
tested I guess.

Will hopefully have time to look into the comments to the main patch of
this series tomorrow. Did not get around to properly do that yet.

Cheers Heiko

> >> Signed-off-by: Heiko Voigt 
> >> ---
> >>  submodule.c | 74 
> >> ++---
> >>  1 file changed, 37 insertions(+), 37 deletions(-)
> >> 
> >> diff --git a/submodule.c b/submodule.c
> >> index 71d1773e2e..82d206eb65 100644
> >> --- a/submodule.c
> >> +++ b/submodule.c
> >> @@ -1187,6 +1187,31 @@ struct submodule_parallel_fetch {
> >>  };
> >>  #define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0}
> >>  
> >> +static int get_fetch_recurse_config(const struct submodule *submodule,
> >> +  struct submodule_parallel_fetch *spf)
> >> +{
> >> +  if (spf->command_line_option != RECURSE_SUBMODULES_DEFAULT)
> >> +  return spf->command_line_option;
> >> +
> >> +  if (submodule) {
> >> +  char *key;
> >> +  const char *value;
> >> +
> >> +  int fetch_recurse = submodule->fetch_recurse;
> >> +  key = xstrfmt("submodule.%s.fetchRecurseSubmodules", 
> >> submodule->name);
> >> +  if (!repo_config_get_string_const(the_repository, key, &value)) 
> >> {
> >> +  fetch_recurse = parse_fetch_recurse_submodules_arg(key, 
> >> value);
> >> +  }
> >> +  free(key);
> >> +
> >> +  if (fetch_recurse != RECURSE_SUBMODULES_NONE)
> >> +  /* local config overrules everything except commandline 
> >> */
> >> +  return fetch_recurse;
> >> +  }
> >> +
> >> +  return spf->default_option;
> >> +}
> >> +
> >>  static int get_next_submodule(struct child_process *cp,
> >>  struct strbuf *err, void *data, void **task_cb)
> >>  {
> >> @@ -1214,46 +1239,21 @@ static int get_next_submodule(struct child_process 
> >> *cp,
> >>}
> >>}
> >>  
> >> -  default_argv = "yes";
> >> -  if (spf->command_line_option == RECURSE_SUBMODULES_DEFAULT) {
> >> -  int fetch_recurse = RECURSE_SUBMODULES_NONE;
> >> -
> >> -  if (submodule) {
> >> -  char *key;
> >> -  const char *value;
> >> -
> >> -  fetch_recurse = submodule->fetch_recurse;
> >> -  key = 
> >> xstrfmt("submodule.%s.fetchRecurseSubmodules", submodule->name);
> >> -  if 
> >> (!repo_config_get_string_const(the_repository, key, &value)) {
> >> -  fetch_recurse = 
> >> parse_fetch_recurse_submodules_arg(key, value);
> >> -  }
> >> -  free(key);
> >> -  }
> >> -
> >> -  if (fetch_recurse != RECURSE_SUBMODULES_NONE) {
> >> -  if (fetch_recurse == RECURSE_SUBMODULES_OFF)
> >> -  continue;
> >> -  if (fetch_recurse == 
> >> RECURSE_SUBMODULES_ON_DEMAND) {
> >> -  if 
> >> (!unsorted_string_list_lookup(&changed_submodule_names,
> >> -   
> >> submodule->name))
> >> -  continue;
> >> -   

[PATCH v4 2/3] implement fetching of moved submodules

2017-10-16 Thread Heiko Voigt
We store the changed submodules paths to calculate which submodule needs
fetching. This does not work for moved submodules since their paths do
not stay the same in case of a moved submodules. In case of new
submodules we do not have a path in the current checkout, since they
just appeared in this fetch.

It is more general to collect the submodule names for changes instead of
their paths to include the above cases. If we do not have a
configuration for a gitlink we rely on constructing a default name from
the path if a git repository can be found at its path. We skip
non-configured gitlinks whose default name collides with a configured
one.

With the change described above we implement 'on-demand' fetching of
changes in moved submodules.

Signed-off-by: Heiko Voigt 
---
 submodule-config.h  |   3 +
 submodule.c | 138 
 t/t5526-fetch-submodules.sh |  35 +++
 3 files changed, 138 insertions(+), 38 deletions(-)

diff --git a/submodule-config.h b/submodule-config.h
index e3845831f6..a5503a5d17 100644
--- a/submodule-config.h
+++ b/submodule-config.h
@@ -22,6 +22,9 @@ struct submodule {
int recommend_shallow;
 };
 
+#define SUBMODULE_INIT { NULL, NULL, NULL, RECURSE_SUBMODULES_NONE, \
+   NULL, NULL, SUBMODULE_UPDATE_STRATEGY_INIT, {0}, -1 };
+
 struct submodule_cache;
 struct repository;
 
diff --git a/submodule.c b/submodule.c
index 63e7094e16..71d1773e2e 100644
--- a/submodule.c
+++ b/submodule.c
@@ -21,7 +21,7 @@
 #include "parse-options.h"
 
 static int config_update_recurse_submodules = RECURSE_SUBMODULES_OFF;
-static struct string_list changed_submodule_paths = STRING_LIST_INIT_DUP;
+static struct string_list changed_submodule_names = STRING_LIST_INIT_DUP;
 static int initialized_fetch_ref_tips;
 static struct oid_array ref_tips_before_fetch;
 static struct oid_array ref_tips_after_fetch;
@@ -674,11 +674,11 @@ const struct submodule *submodule_from_ce(const struct 
cache_entry *ce)
 }
 
 static struct oid_array *submodule_commits(struct string_list *submodules,
-  const char *path)
+  const char *name)
 {
struct string_list_item *item;
 
-   item = string_list_insert(submodules, path);
+   item = string_list_insert(submodules, name);
if (item->util)
return (struct oid_array *) item->util;
 
@@ -687,39 +687,67 @@ static struct oid_array *submodule_commits(struct 
string_list *submodules,
return (struct oid_array *) item->util;
 }
 
+struct collect_changed_submodules_cb_data {
+   struct string_list *changed;
+   const struct object_id *commit_oid;
+};
+
+/*
+ * this would normally be two functions: default_name_from_path() and
+ * path_from_default_name(). Since the default name is the same as
+ * the submodule path we can get away with just one function which only
+ * checks whether there is a submodule in the working directory at that
+ * location.
+ */
+static const char *default_name_or_path(const char *path_or_name)
+{
+   int error_code;
+
+   if (!is_submodule_populated_gently(path_or_name, &error_code))
+   return NULL;
+
+   return path_or_name;
+}
+
 static void collect_changed_submodules_cb(struct diff_queue_struct *q,
  struct diff_options *options,
  void *data)
 {
+   struct collect_changed_submodules_cb_data *me = data;
+   struct string_list *changed = me->changed;
+   const struct object_id *commit_oid = me->commit_oid;
int i;
-   struct string_list *changed = data;
 
for (i = 0; i < q->nr; i++) {
struct diff_filepair *p = q->queue[i];
struct oid_array *commits;
+   const struct submodule *submodule;
+   const char *name;
+
if (!S_ISGITLINK(p->two->mode))
continue;
 
-   if (S_ISGITLINK(p->one->mode)) {
-   /*
-* NEEDSWORK: We should honor the name configured in
-* the .gitmodules file of the commit we are examining
-* here to be able to correctly follow submodules
-* being moved around.
-*/
-   commits = submodule_commits(changed, p->two->path);
-   oid_array_append(commits, &p->two->oid);
-   } else {
-   /* Submodule is new or was moved here */
-   /*
-* NEEDSWORK: When the .git directories of submodules
-* live inside the superprojects .git directory some
-* day we should fetch new submodules directly into
-* that location to

[PATCH v4 3/3] submodule: simplify decision tree whether to or not to fetch

2017-10-16 Thread Heiko Voigt
To make extending this logic later easier.

Signed-off-by: Heiko Voigt 
---
 submodule.c | 74 ++---
 1 file changed, 37 insertions(+), 37 deletions(-)

diff --git a/submodule.c b/submodule.c
index 71d1773e2e..82d206eb65 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1187,6 +1187,31 @@ struct submodule_parallel_fetch {
 };
 #define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0}
 
+static int get_fetch_recurse_config(const struct submodule *submodule,
+   struct submodule_parallel_fetch *spf)
+{
+   if (spf->command_line_option != RECURSE_SUBMODULES_DEFAULT)
+   return spf->command_line_option;
+
+   if (submodule) {
+   char *key;
+   const char *value;
+
+   int fetch_recurse = submodule->fetch_recurse;
+   key = xstrfmt("submodule.%s.fetchRecurseSubmodules", 
submodule->name);
+   if (!repo_config_get_string_const(the_repository, key, &value)) 
{
+   fetch_recurse = parse_fetch_recurse_submodules_arg(key, 
value);
+   }
+   free(key);
+
+   if (fetch_recurse != RECURSE_SUBMODULES_NONE)
+   /* local config overrules everything except commandline 
*/
+   return fetch_recurse;
+   }
+
+   return spf->default_option;
+}
+
 static int get_next_submodule(struct child_process *cp,
  struct strbuf *err, void *data, void **task_cb)
 {
@@ -1214,46 +1239,21 @@ static int get_next_submodule(struct child_process *cp,
}
}
 
-   default_argv = "yes";
-   if (spf->command_line_option == RECURSE_SUBMODULES_DEFAULT) {
-   int fetch_recurse = RECURSE_SUBMODULES_NONE;
-
-   if (submodule) {
-   char *key;
-   const char *value;
-
-   fetch_recurse = submodule->fetch_recurse;
-   key = 
xstrfmt("submodule.%s.fetchRecurseSubmodules", submodule->name);
-   if 
(!repo_config_get_string_const(the_repository, key, &value)) {
-   fetch_recurse = 
parse_fetch_recurse_submodules_arg(key, value);
-   }
-   free(key);
-   }
-
-   if (fetch_recurse != RECURSE_SUBMODULES_NONE) {
-   if (fetch_recurse == RECURSE_SUBMODULES_OFF)
-   continue;
-   if (fetch_recurse == 
RECURSE_SUBMODULES_ON_DEMAND) {
-   if 
(!unsorted_string_list_lookup(&changed_submodule_names,
-
submodule->name))
-   continue;
-   default_argv = "on-demand";
-   }
-   } else {
-   if (spf->default_option == 
RECURSE_SUBMODULES_OFF)
-   continue;
-   if (spf->default_option == 
RECURSE_SUBMODULES_ON_DEMAND) {
-   if 
(!unsorted_string_list_lookup(&changed_submodule_names,
- 
submodule->name))
-   continue;
-   default_argv = "on-demand";
-   }
-   }
-   } else if (spf->command_line_option == 
RECURSE_SUBMODULES_ON_DEMAND) {
-   if 
(!unsorted_string_list_lookup(&changed_submodule_names,
+   switch (get_fetch_recurse_config(submodule, spf))
+   {
+   default:
+   case RECURSE_SUBMODULES_DEFAULT:
+   case RECURSE_SUBMODULES_ON_DEMAND:
+   if (!submodule || 
!unsorted_string_list_lookup(&changed_submodule_names,
 submodule->name))
continue;
default_argv = "on-demand";
+   break;
+   case RECURSE_SUBMODULES_ON:
+   default_argv = "yes";
+   break;
+   case RECURSE_SUBMODULES_OFF:
+   continue;
}
 
strbuf_addf(&submodule_path, "%s/%s", spf->work_tree, ce->name);
-- 
2.14.1.145.gb3622a4



[PATCH v4 1/3] fetch: add test to make sure we stay backwards compatible

2017-10-16 Thread Heiko Voigt
The current implementation of submodules supports on-demand fetch if
there is no .gitmodules entry for a submodule. Let's add a test to
document this behavior.

Signed-off-by: Heiko Voigt 
---
 t/t5526-fetch-submodules.sh | 42 +-
 1 file changed, 41 insertions(+), 1 deletion(-)

diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index 42251f7f3a..43a22f680f 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -478,7 +478,47 @@ test_expect_success "don't fetch submodule when newly 
recorded commits are alrea
git fetch >../actual.out 2>../actual.err
) &&
! test -s actual.out &&
-   test_i18ncmp expect.err actual.err
+   test_i18ncmp expect.err actual.err &&
+   (
+   cd submodule &&
+   git checkout -q master
+   )
+'
+
+test_expect_success "'fetch.recurseSubmodules=on-demand' works also without 
.gitmodule entry" '
+   (
+   cd downstream &&
+   git fetch --recurse-submodules
+   ) &&
+   add_upstream_commit &&
+   head1=$(git rev-parse --short HEAD) &&
+   git add submodule &&
+   git rm .gitmodules &&
+   git commit -m "new submodule without .gitmodules" &&
+   printf "" >expect.out &&
+   head2=$(git rev-parse --short HEAD) &&
+   echo "From $pwd/." >expect.err.2 &&
+   echo "   $head1..$head2  master -> origin/master" >>expect.err.2 &&
+   head -3 expect.err >>expect.err.2 &&
+   (
+   cd downstream &&
+   rm .gitmodules &&
+   git config fetch.recurseSubmodules on-demand &&
+   # fake submodule configuration to avoid skipping submodule 
handling
+   git config -f .gitmodules submodule.fake.path fake &&
+   git config -f .gitmodules submodule.fake.url fakeurl &&
+   git add .gitmodules &&
+   git config --unset submodule.submodule.url &&
+   git fetch >../actual.out 2>../actual.err &&
+   # cleanup
+   git config --unset fetch.recurseSubmodules &&
+   git reset --hard
+   ) &&
+   test_i18ncmp expect.out actual.out &&
+   test_i18ncmp expect.err.2 actual.err &&
+   git checkout HEAD^ -- .gitmodules &&
+   git add .gitmodules &&
+   git commit -m "new submodule restored .gitmodules"
 '
 
 test_expect_success 'fetching submodules respects parallel settings' '
-- 
2.14.1.145.gb3622a4



[PATCH v4 0/3] implement fetching of moved submodules

2017-10-16 Thread Heiko Voigt
The previous RFC iteration can be found here:

https://public-inbox.org/git/20171006222544.GA26642@sandbox/

This should now be in a state ready for review for inclusion.

The main difference from last iteration is that we now also support
unconfigured gitlinks for push and fetch for backwards compatibility.

To implement this compatibility we construct a default name for gitlinks
if there is a repository found at their location in the worktree.

Cheers Heiko

Heiko Voigt (3):
  fetch: add test to make sure we stay backwards compatible
  implement fetching of moved submodules
  submodule: simplify decision tree whether to or not to fetch

 submodule-config.h  |   3 +
 submodule.c | 200 +---
 t/t5526-fetch-submodules.sh |  77 -
 3 files changed, 210 insertions(+), 70 deletions(-)

-- 
2.14.1.145.gb3622a4



Re: [RFC PATCH 2/4] change submodule push test to use proper repository setup

2017-10-11 Thread Heiko Voigt
On Wed, Oct 11, 2017 at 08:31:37AM +0900, Junio C Hamano wrote:
> Stefan Beller  writes:
> 
> > So you propose to make git-add behave like "git submodule add"
> > (i.e. also add the .gitmodules entry for name/path/URL), which I
> > like from a submodule perspective.
> >
> > However other users of gitlinks might be confused[1], which is why
> > I refrained from "making every gitlink into a submodule". Specifically
> > the more powerful a submodule operation is (the more fluff adds),
> > the harder it should be for people to mis-use it.
> 
> A few questions that come to mind are:
> 
>  - Does "git add sub/" have enough information to populate
>.gitmodules?  If we have reasonable "default" values for
>.gitmodules entries (e.g. missing URL means we won't fetch when
>asked to go recursively fetch), perhaps we can leave everything
>other than "submodule.$name.path" undefined.

My suggestion would be: If we do not have them we do not populate them.
We could even go further and say: If we do not have the set "git
submodule add" would populate then we do not add anything to .gitmodules
and warn the user.

>  - Can't we help those who have gitlinks without .gitmodules entries
>exactly the same way as above, i.e. when we see a gitlink and try
>to treat it as a submodule, we'd first try to look it up from
>.gitmodules (by going from path to name and then to
>submodule.$name.$var); the above "'git add sub/' would add an
>entry for .gitmodules" wish is based on the assumption that there
>are reasonable "default" values for each of these $var--so by
>basing on the same assumption, we can "pretend" as if these
>submodule.$name.$var were in .gitmodules file when we see
>gitlinks without .gitmodules entries.  IOW, if "git add sub/" can
>add .gitmodules to help people without having to type "git
>submodule add sub/", then we can give exactly the same degree of
>help without even modifying .gitmodules when "git add sub/" is
>run.

This "default" value thing got me thinking in a different direction. We
could use a scheme like that to get names (and values) for submodules
that are missing from the .gitmodules file. If we decide that we need to
handle them.

Cheers Heiko


Re: [RFC PATCH 2/4] change submodule push test to use proper repository setup

2017-10-11 Thread Heiko Voigt
On Tue, Oct 10, 2017 at 11:39:21AM -0700, Stefan Beller wrote:
> So you propose to make git-add behave like "git submodule add"
> (i.e. also add the .gitmodules entry for name/path/URL), which I
> like from a submodule perspective.

Well more like: clone and add will behave like "git submodule add" but
basically yes.

> However other users of gitlinks might be confused[1], which is why
> I refrained from "making every gitlink into a submodule". Specifically
> the more powerful a submodule operation is (the more fluff adds),
> the harder it should be for people to mis-use it.
> 
> [1] https://github.com/git-series/git-series/blob/master/INTERNALS.md
>  "git-series uses gitlinks to store pointer to commits in its own repo."

But would those users use

git add

to add a gitlink? From the description in that file I read that it
points to commits in its own repository. Will there also be files
checked out like submodules at that location?

Otherwise I would propose that 'git add' could detect whether a gitlink
is a submodule by trying to read its git configuration. If we do not
find that we simply do not do anything.

> > If everyone agrees that submodules are the default way of handling
> > repositories insided repositories, IMO, 'git add' should also alter
> > .gitmodules by default. We could provide a switch to avoid doing that.
> 
> I wonder if that switch should be default-on (i.e. not treat a gitlink as
> a submodule initially, behavior as-is, and then eventually we will
> die() on unconfigured repos, expecting the user to make the decision)
> 
> > An intermediate solution would be to warn
> 
> That is already implemented by Peff.

Ah ok, thanks I suspected so when I realized that this discussion was
older.

> > but in the long run my goal
> > for submodules is and always was: Make them behave as close to files as
> > possible. And why should a 'git add submodule' not magically do
> > everything it can to make submodules just work? I can look into a patch
> > for that if people agree here...
> 
> I'd love to see this implemented. I cc'd Josh (the author of git-series), who
> may disagree with this, or has some good input how to go forward without
> breaking git-series.

Yeah, lets see if, as described above, that actually would break
git-series.

> > Regarding handling of gitlinks with or without .gitmodules:
> >
> > Currently we are actually in some intermediate state:
> >
> >  * If there is no .gitmodules file: No submodule processing on any
> >gitlinks (AFAIK)
> 
> AFAIK this is true.
> 
> >  * If there is a .gitmodules files with some submodule configured: Do
> >recursive fetch and push as far as possible on gitlinks.
> 
> * If submodule.recurse is set, then we also treat submodules like files
>   for checkout, reset, read-tree.

To clarify: If submodule.recurse is set but there is no .gitmodules file
we do submodule processing for the above commands?

> > So I am not sure whether there are actually many users (knowingly)
> > using a mix of some submodules configured and some not and then relying
> > on the submodule infrastructure.
> >
> > I would rather expect two sorts of users:
> >
> >   1. Those that do use .gitmodules
> 
> Those want to reap all benefits of good submodules.
> 
> >
> >   2. Those that do *not* use .gitmodules
> 
> As said above, we don't know if those users are
> "holding submodules wrong" or are using gitlinks for
> magic tricks (unrelated to submodules).

I did not want to say that they are "holding submodules wrong" but
rather that if they do not use .gitmodules they do that knowingly and
thus consistently not use .gitmodules for any gitlink.

> > Users that do not use any .gitmodules file will currently (AFAIK) not
> > get any submodule handling. So the question is are there really many
> > "mixed users"? My guess would be no.
> 
> I hope that there are few (if any) users of these mixed setups.

That sounds promising.

> > Because without those using this mixed we could switch to saying: "You
> > need to have a .gitmodules file for submodule handling" without much
> > fallout from breaking users use cases.
> 
> That seems reasonable to me, actually.

Nice.

> > Maybe we can test this out somehow? My patch series would be ready in
> > that case, just had to drop the first patch and adjust the commit
> > message of this one.
> 
> I wonder how we would test this, though? Do you have any idea
> (even vague) how we'd accomplish such a measurement?
> I fear we'll have to go this way blindly.

One idea would be to expose this somewhere to a limited amount of users.
I remember Jonathan was suggesting, back when Jens was working on the
recursive checkout, that he could add the series to the debian package
and see what happens. Or we could use Junios next branch? Something like
that. If we get complaints we know the assumption was wrong and we need
a fallback.

Cheers Heiko


Re: [RFC PATCH 2/4] change submodule push test to use proper repository setup

2017-10-10 Thread Heiko Voigt
Hi,

On Mon, Oct 09, 2017 at 11:20:51AM -0700, Stefan Beller wrote:
> On Fri, Oct 6, 2017 at 3:32 PM, Heiko Voigt  wrote:
> > NOTE: The argument in this message is not correct, see description in
> > cover letter.
> >
> > The setup of the repositories in this test is using gitlinks without the
> > .gitmodules infrastructure. It is however testing convenience features
> > like --recurse-submodules=on-demand. These features are already not
> > supported by fetch without a .gitmodules file. This leads us to the
> > conclusion that it is not really used here as well.
> >
> > Let's use the usual submodule commands to setup the repository in a
> > typical way. This also has the advantage that we are testing with a
> > repository structure that is more similar to one we could expect on a
> > users setup.
> >
> > Signed-off-by: Heiko Voigt 
> > ---
> >
> > As mentioned in the cover letter. This seems to be the only test that
> > ensures that we stay compatible with setups without .gitmodules. Maybe
> > we should add/revive some?
> 
> An interesting discussion covering this topic is found at
> https://public-inbox.org/git/20170606035650.oykbz2uc4xkr3...@sigill.intra.peff.net/

Thanks for that pointer. So in that discussion Junio said that the
recursive operations should succeed if we have everything necessary at
hand. I kind of agree because why should we limit usage when not
necessary. On the other hand we want git to be easy to use. And that
example from Peff is perfect as a demonstration of a incosistency we
currently have:

git clone git://some.where.git/submodule.git
git add submodule

is an operation I remember, I did, when first getting in contact with
submodules (many years back), since that is one intuitive way. And the
thing is: It works, kind of... Only later I discovered that one actually
needs to us a special submodule command to get everything approriately
setup to work together with others.

If everyone agrees that submodules are the default way of handling
repositories insided repositories, IMO, 'git add' should also alter
.gitmodules by default. We could provide a switch to avoid doing that.

An intermediate solution would be to warn but in the long run my goal
for submodules is and always was: Make them behave as close to files as
possible. And why should a 'git add submodule' not magically do
everything it can to make submodules just work? I can look into a patch
for that if people agree here...

Regarding handling of gitlinks with or without .gitmodules:

Currently we are actually in some intermediate state:

 * If there is no .gitmodules file: No submodule processing on any
   gitlinks (AFAIK)
 * If there is a .gitmodules files with some submodule configured: Do
   recursive fetch and push as far as possible on gitlinks.

So I am not sure whether there are actually many users (knowingly)
using a mix of some submodules configured and some not and then relying
on the submodule infrastructure.

I would rather expect two sorts of users:

  1. Those that do use .gitmodules

  2. Those that do *not* use .gitmodules

Users that do not use any .gitmodules file will currently (AFAIK) not
get any submodule handling. So the question is are there really many
"mixed users"? My guess would be no.
Because without those using this mixed we could switch to saying: "You
need to have a .gitmodules file for submodule handling" without much
fallout from breaking users use cases.

Maybe we can test this out somehow? My patch series would be ready in
that case, just had to drop the first patch and adjust the commit
message of this one.

Cheers Heiko


Re: "git rm" seems to do recursive removal even without "-r"

2017-10-10 Thread Heiko Voigt
On Sun, Oct 08, 2017 at 07:56:20AM -0400, Robert P. J. Day wrote:
>   but as i asked in my earlier post, if i wanted to remove *all* files
> with names of "Makefile*", why can't i use:
> 
>   $ git rm 'Makefile*'
> 
> just as i used:
> 
>   $ git rm '*.c'
> 
> are those not both acceptable fileglobs? why does the former clearly
> only match the top-level Makefile, and refuse to cross directory
> boundaries?

Maybe think about it this way: The only difference between git's
globbing and the default shell globbing is that the '/' in a path has a
special meaning. The shells expansion stops at a '/' but git does not.

So with *.c the shell matches: blabla.c, blub.c, ...  but not
subdir/bla.c, subdir/blub.c, ... since it only considers files in the
current directory. A little different for Makefile* that will also match
Makefile.bla, Makefile/bla or Makefile_bla/blub in shell but not
subdir/Makefile or bla.Makefile. Basically anything directly in *this*
directory that *starts* with 'Makefile'.

Git on the other hand does not consider '/' to be special. So *.c
matches all of the path above: bla.c, blub.c, subdir/bla.c,
subdir/blub.c. Basically any file below the current directory with a
path that ends in '.c'. With Makefile* it is the opposite: Every file
below the current directory that *starts* with 'Makefile'. So
Makefile.bla, Makefile/bla, ... but also not subdir/Makefile or
bla.Makefile.

Maybe that helps...

Cheers Heiko



[RFC PATCH v3 3/4] implement fetching of moved submodules

2017-10-06 Thread Heiko Voigt
We store the changed submodules paths to calculate which submodule needs
fetching. This does not work for moved submodules since their paths do
not stay the same in case of a moved submodules. In case of new
submodules we do not have a path in the current checkout, since they
just appeared in this fetch.

It is more general to collect the submodule names for changes instead of
their paths to include the above cases.

With the change described above we implement 'on-demand' fetching of
changes in moved submodules.

Note: This does only work when repositories have a .gitmodules file. In
other words: We now require a name for a submodule repository.

Signed-off-by: Heiko Voigt 
---

No changes from the previous version here.

 submodule.c | 91 +
 t/t5526-fetch-submodules.sh | 35 +
 2 files changed, 85 insertions(+), 41 deletions(-)

diff --git a/submodule.c b/submodule.c
index 63e7094..0c586a0 100644
--- a/submodule.c
+++ b/submodule.c
@@ -21,7 +21,7 @@
 #include "parse-options.h"
 
 static int config_update_recurse_submodules = RECURSE_SUBMODULES_OFF;
-static struct string_list changed_submodule_paths = STRING_LIST_INIT_DUP;
+static struct string_list changed_submodule_names = STRING_LIST_INIT_DUP;
 static int initialized_fetch_ref_tips;
 static struct oid_array ref_tips_before_fetch;
 static struct oid_array ref_tips_after_fetch;
@@ -674,11 +674,11 @@ const struct submodule *submodule_from_ce(const struct 
cache_entry *ce)
 }
 
 static struct oid_array *submodule_commits(struct string_list *submodules,
-  const char *path)
+  const char *name)
 {
struct string_list_item *item;
 
-   item = string_list_insert(submodules, path);
+   item = string_list_insert(submodules, name);
if (item->util)
return (struct oid_array *) item->util;
 
@@ -687,39 +687,34 @@ static struct oid_array *submodule_commits(struct 
string_list *submodules,
return (struct oid_array *) item->util;
 }
 
+struct collect_changed_submodules_cb_data {
+   struct string_list *changed;
+   const struct object_id *commit_oid;
+};
+
 static void collect_changed_submodules_cb(struct diff_queue_struct *q,
  struct diff_options *options,
  void *data)
 {
+   struct collect_changed_submodules_cb_data *me = data;
+   struct string_list *changed = me->changed;
+   const struct object_id *commit_oid = me->commit_oid;
int i;
-   struct string_list *changed = data;
 
for (i = 0; i < q->nr; i++) {
struct diff_filepair *p = q->queue[i];
struct oid_array *commits;
+   const struct submodule *submodule;
+
if (!S_ISGITLINK(p->two->mode))
continue;
 
-   if (S_ISGITLINK(p->one->mode)) {
-   /*
-* NEEDSWORK: We should honor the name configured in
-* the .gitmodules file of the commit we are examining
-* here to be able to correctly follow submodules
-* being moved around.
-*/
-   commits = submodule_commits(changed, p->two->path);
-   oid_array_append(commits, &p->two->oid);
-   } else {
-   /* Submodule is new or was moved here */
-   /*
-* NEEDSWORK: When the .git directories of submodules
-* live inside the superprojects .git directory some
-* day we should fetch new submodules directly into
-* that location too when config or options request
-* that so they can be checked out from there.
-*/
+   submodule = submodule_from_path(commit_oid, p->two->path);
+   if (!submodule)
continue;
-   }
+
+   commits = submodule_commits(changed, submodule->name);
+   oid_array_append(commits, &p->two->oid);
}
 }
 
@@ -742,11 +737,14 @@ static void collect_changed_submodules(struct string_list 
*changed,
 
while ((commit = get_revision(&rev))) {
struct rev_info diff_rev;
+   struct collect_changed_submodules_cb_data data;
+   data.changed = changed;
+   data.commit_oid = &commit->object.oid;
 
init_revisions(&diff_rev, NULL);
diff_rev.diffopt.output_format |= DIFF_FORMAT_CALLBACK;
diff_rev.diffopt.format_callback = 
collect_changed_submodules_cb;
-   diff

[RFC PATCH v3 4/4] submodule: simplify decision tree whether to or not to fetch

2017-10-06 Thread Heiko Voigt
To make extending this logic later easier.

Signed-off-by: Heiko Voigt 
---

This should also be the same as in the previous version.

 submodule.c | 74 ++---
 1 file changed, 37 insertions(+), 37 deletions(-)

diff --git a/submodule.c b/submodule.c
index 0c586a0..c7b32c6 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1142,6 +1142,31 @@ struct submodule_parallel_fetch {
 };
 #define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0}
 
+static int get_fetch_recurse_config(const struct submodule *submodule,
+   struct submodule_parallel_fetch *spf)
+{
+   if (spf->command_line_option != RECURSE_SUBMODULES_DEFAULT)
+   return spf->command_line_option;
+
+   if (submodule) {
+   char *key;
+   const char *value;
+
+   int fetch_recurse = submodule->fetch_recurse;
+   key = xstrfmt("submodule.%s.fetchRecurseSubmodules", 
submodule->name);
+   if (!repo_config_get_string_const(the_repository, key, &value)) 
{
+   fetch_recurse = parse_fetch_recurse_submodules_arg(key, 
value);
+   }
+   free(key);
+
+   if (fetch_recurse != RECURSE_SUBMODULES_NONE)
+   /* local config overrules everything except commandline 
*/
+   return fetch_recurse;
+   }
+
+   return spf->default_option;
+}
+
 static int get_next_submodule(struct child_process *cp,
  struct strbuf *err, void *data, void **task_cb)
 {
@@ -1161,46 +1186,21 @@ static int get_next_submodule(struct child_process *cp,
 
submodule = submodule_from_path(&null_oid, ce->name);
 
-   default_argv = "yes";
-   if (spf->command_line_option == RECURSE_SUBMODULES_DEFAULT) {
-   int fetch_recurse = RECURSE_SUBMODULES_NONE;
-
-   if (submodule) {
-   char *key;
-   const char *value;
-
-   fetch_recurse = submodule->fetch_recurse;
-   key = 
xstrfmt("submodule.%s.fetchRecurseSubmodules", submodule->name);
-   if 
(!repo_config_get_string_const(the_repository, key, &value)) {
-   fetch_recurse = 
parse_fetch_recurse_submodules_arg(key, value);
-   }
-   free(key);
-   }
-
-   if (fetch_recurse != RECURSE_SUBMODULES_NONE) {
-   if (fetch_recurse == RECURSE_SUBMODULES_OFF)
-   continue;
-   if (fetch_recurse == 
RECURSE_SUBMODULES_ON_DEMAND) {
-   if 
(!unsorted_string_list_lookup(&changed_submodule_names,
-
submodule->name))
-   continue;
-   default_argv = "on-demand";
-   }
-   } else {
-   if (spf->default_option == 
RECURSE_SUBMODULES_OFF)
-   continue;
-   if (spf->default_option == 
RECURSE_SUBMODULES_ON_DEMAND) {
-   if 
(!unsorted_string_list_lookup(&changed_submodule_names,
- 
submodule->name))
-   continue;
-   default_argv = "on-demand";
-   }
-   }
-   } else if (spf->command_line_option == 
RECURSE_SUBMODULES_ON_DEMAND) {
-   if 
(!unsorted_string_list_lookup(&changed_submodule_names,
+   switch (get_fetch_recurse_config(submodule, spf))
+   {
+   default:
+   case RECURSE_SUBMODULES_DEFAULT:
+   case RECURSE_SUBMODULES_ON_DEMAND:
+   if (!submodule || 
!unsorted_string_list_lookup(&changed_submodule_names,
 submodule->name))
continue;
default_argv = "on-demand";
+   break;
+   case RECURSE_SUBMODULES_ON:
+   default_argv = "yes";
+   break;
+   case RECURSE_SUBMODULES_OFF:
+   continue;
}
 
strbuf_addf(&submodule_path, "%s/%s", spf->work_tree, ce->name);
-- 
2.10.0.129.g35f6318



[RFC PATCH 2/4] change submodule push test to use proper repository setup

2017-10-06 Thread Heiko Voigt
NOTE: The argument in this message is not correct, see description in
cover letter.

The setup of the repositories in this test is using gitlinks without the
.gitmodules infrastructure. It is however testing convenience features
like --recurse-submodules=on-demand. These features are already not
supported by fetch without a .gitmodules file. This leads us to the
conclusion that it is not really used here as well.

Let's use the usual submodule commands to setup the repository in a
typical way. This also has the advantage that we are testing with a
repository structure that is more similar to one we could expect on a
users setup.

Signed-off-by: Heiko Voigt 
---

As mentioned in the cover letter. This seems to be the only test that
ensures that we stay compatible with setups without .gitmodules. Maybe
we should add/revive some?

Cheers Heiko

 t/t5531-deep-submodule-push.sh | 29 -
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/t/t5531-deep-submodule-push.sh b/t/t5531-deep-submodule-push.sh
index 39cb2c1..a4a2c6a 100755
--- a/t/t5531-deep-submodule-push.sh
+++ b/t/t5531-deep-submodule-push.sh
@@ -8,22 +8,26 @@ test_expect_success setup '
mkdir pub.git &&
GIT_DIR=pub.git git init --bare &&
GIT_DIR=pub.git git config receive.fsckobjects true &&
+   mkdir submodule &&
+   (
+   cd submodule &&
+   git init &&
+   git config push.default matching &&
+   >junk &&
+   git add junk &&
+   git commit -m "Initial junk"
+   ) &&
+   git clone --bare submodule submodule.git &&
mkdir work &&
(
cd work &&
git init &&
git config push.default matching &&
-   mkdir -p gar/bage &&
-   (
-   cd gar/bage &&
-   git init &&
-   git config push.default matching &&
-   >junk &&
-   git add junk &&
-   git commit -m "Initial junk"
-   ) &&
-   git add gar/bage &&
+   mkdir gar &&
+   git submodule add ../submodule.git gar/bage &&
git commit -m "Initial superproject"
+   cd gar/bage &&
+   git remote rm origin
)
 '
 
@@ -51,11 +55,10 @@ test_expect_success 'push if submodule has no remote' '
 
 test_expect_success 'push fails if submodule commit not on remote' '
(
-   cd work/gar &&
-   git clone --bare bage ../../submodule.git &&
-   cd bage &&
+   cd work/gar/bage &&
git remote add origin ../../../submodule.git &&
git fetch &&
+   git push --set-upstream origin master &&
>junk3 &&
git add junk3 &&
git commit -m "Third junk"
-- 
2.10.0.129.g35f6318



[RFC PATCH 1/4] fetch: add test to make sure we stay backwards compatible

2017-10-06 Thread Heiko Voigt
The current implementation of submodules supports on-demand fetch if
there is no .gitmodules entry for a submodule. Let's add a test to
document this behavior.

Signed-off-by: Heiko Voigt 
---
 t/t5526-fetch-submodules.sh | 42 +-
 1 file changed, 41 insertions(+), 1 deletion(-)

diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index 42251f7..43a22f6 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -478,7 +478,47 @@ test_expect_success "don't fetch submodule when newly 
recorded commits are alrea
git fetch >../actual.out 2>../actual.err
) &&
! test -s actual.out &&
-   test_i18ncmp expect.err actual.err
+   test_i18ncmp expect.err actual.err &&
+   (
+   cd submodule &&
+   git checkout -q master
+   )
+'
+
+test_expect_success "'fetch.recurseSubmodules=on-demand' works also without 
.gitmodule entry" '
+   (
+   cd downstream &&
+   git fetch --recurse-submodules
+   ) &&
+   add_upstream_commit &&
+   head1=$(git rev-parse --short HEAD) &&
+   git add submodule &&
+   git rm .gitmodules &&
+   git commit -m "new submodule without .gitmodules" &&
+   printf "" >expect.out &&
+   head2=$(git rev-parse --short HEAD) &&
+   echo "From $pwd/." >expect.err.2 &&
+   echo "   $head1..$head2  master -> origin/master" >>expect.err.2 &&
+   head -3 expect.err >>expect.err.2 &&
+   (
+   cd downstream &&
+   rm .gitmodules &&
+   git config fetch.recurseSubmodules on-demand &&
+   # fake submodule configuration to avoid skipping submodule 
handling
+   git config -f .gitmodules submodule.fake.path fake &&
+   git config -f .gitmodules submodule.fake.url fakeurl &&
+   git add .gitmodules &&
+   git config --unset submodule.submodule.url &&
+   git fetch >../actual.out 2>../actual.err &&
+   # cleanup
+   git config --unset fetch.recurseSubmodules &&
+   git reset --hard
+   ) &&
+   test_i18ncmp expect.out actual.out &&
+   test_i18ncmp expect.err.2 actual.err &&
+   git checkout HEAD^ -- .gitmodules &&
+   git add .gitmodules &&
+   git commit -m "new submodule restored .gitmodules"
 '
 
 test_expect_success 'fetching submodules respects parallel settings' '
-- 
2.10.0.129.g35f6318



[RFC PATCH v3 0/4] implement fetching of moved submodules

2017-10-06 Thread Heiko Voigt
The last iteration can be found here:

https://public-inbox.org/git/20170817105349.gc52...@book.hvoigt.net/

This is mainly a status update and to let people know that I am still
working on this.

I struggled quite a bit with reviving my original test for the path
based recursive fetch (first patch). The behavior seems to haved changed
and simply setting the submodule configuration in .git/config without
one in .gitmodules does not work anymore. I did not have time to
investigate whether this was a deliberate change or a maybe a bug?

So the solution for now is that I write my fake configuration (to avoid
skipping submodule handling altogether) into a .gitmodules file.

The second patch (cleanup of a submodule push testcase) was written
because that currently is the only test failing. It is not meant for
inclusion but rather as a demonstration of what might be happening when
we cleanup testcases: Because of the behavioral change above, on first
sight, it seemed like there was a shortcut in fetch and so on-demand
fetch without submodule configuration would not be supported anymore.

IIRC there were a lot more tests failing before when I implemented my
patch without the fallback on paths. So my guess is that some tests have
been cleaned up to use proper (.gitmodules) submodule setup.

So the thing here is: If we want to make sure that we stay backwards
compatible by supporting the setup with gitlinks without configuration.
Then we also should keep tests around that have the plain manual setup
without .gitmodules files. Just something, I think, we should keep in
mind.

Apart from the tests nothing has been added in this iteration. Since I
finally have a working test now I will continue with reviving the
fallback to paths.

Cheers Heiko

Heiko Voigt (4):
  fetch: add test to make sure we stay backwards compatible
  change submodule push test to use proper repository setup
  implement fetching of moved submodules
  submodule: simplify decision tree whether to or not to fetch

 submodule.c| 155 ++---
 t/t5526-fetch-submodules.sh|  77 +++-
 t/t5531-deep-submodule-push.sh |  29 
 3 files changed, 174 insertions(+), 87 deletions(-)

-- 
2.10.0.129.g35f6318



Re: [PATCH v2] add test for bug in git-mv for recursive submodules

2017-09-20 Thread Heiko Voigt
On Mon, Sep 18, 2017 at 01:03:32PM -0700, Stefan Beller wrote:
> >> Took a little while but here is a more clean patch creating individual
> >> submodules for the nesting.
> >>
> >> Cheers Heiko
> 
> Thanks for writing this test!

No worries. :)

> > Thanks.  Stefan, does this look good to you now?
> 
> Yes, though there are nits below.
> 
> > It is not quite clear which step is expected to fail with the
> > current code by reading the test or the proposed log message.  Does
> > "mv" refuse to work and we do not get to run "status", or does
> > "status" report a failure, or do we fail well before that?
> 
> git-mv failing seems like a new possibility without incurring
> another process spawn with the new repository object.
> (Though then we could also just fix the recursed submodule)

It is mv that fails to update everything necessary when using it with
recursively nested submodules. So the git-mv command does not report a
failure here. As an interim fix it could maybe report an error when
encountering nested submodules but the real fix would be to teach it to
recursively spawn the appropriate git-mv commands.

> > The log message that only says "This does not work when ..." is not
> > helpful in figuring it out, either.  Something like "This does not
> > work and fails to update the paths for its configurations" or
> > whatever that describes "what actually happens" (in contrast to
> > "what ought to happen", which you described clearly) should be
> > there.
> >
> > Description on how you happened to have discovered the issue feels a
> > lot less relevant compared to that, and it is totally useless if it
> > is unclear what the issue is in the first place.

Sorry about being a bit brief here. How about dropping that information
how I discovered the bug then and change the commit message to something
like this:

add test for bug in git-mv for recursive submodules

When using git-mv with a submodule it will detect that and update
the paths for its configurations (.gitmodules, worktree and
gitfile). This does not work in case it encounters nested
submodules. In that case it only updates the configurations for the
submodule directly underneath the superproject and fails to update
the paths for the submodules nested more deeply. This in turn leads
to the symptom that git status reports that it can not chdir to the
nested submodule in its old location.

Lets add a test to document.

?

> >>  t/t7001-mv.sh | 25 +
> >>  1 file changed, 25 insertions(+)
> >>
> >> diff --git a/t/t7001-mv.sh b/t/t7001-mv.sh
> >> index e365d1ff77..cbc5fb37fe 100755
> >> --- a/t/t7001-mv.sh
> >> +++ b/t/t7001-mv.sh
> >> @@ -491,4 +491,29 @@ test_expect_success 'moving a submodule in nested 
> >> directories' '
> >>   test_cmp actual expect
> >>  '
> >>
> >> +test_expect_failure 'moving nested submodules' '
> >> + git commit -am "cleanup commit" &&
> >> + mkdir sub_nested_nested &&
> >> + (cd sub_nested_nested &&
> 
> We seem to have different styles for nested shell. I prefer
> 
>   outside command &&
>   (
>   first nested command here &&
>   ...
> 
> as that aligns indentation to the nesting level. I have seen
> the style you use a lot in the  test suite, and we do not have
> a guideline in Documentation/CodingGuidelines, so I do not
> complain too loudly. ;)

Yeah we have some different styles it seems ;) So here some reasoning
behind my style:

I actually would agree on your style if 'first nested command' was any
arbitrary command but when I use my style it is always when I use a
nested shell for changing into some directory, doing something there and
then being able to return to the previous directory by closing the nested
shell. So for me the 'cd somewhere' belongs to the brackets similarly
like a condition definition belongs to the if it is used with.

> >> + touch nested_level2 &&
> >> + git init &&
> >> + git add . &&
> >> + git commit -m "nested level 2"
> >> + ) &&
> >> + mkdir sub_nested &&
> >> + (cd sub_nested &&
> >> + touch nested_level1 &&
> >> + git init &&
> >> + git add . &&
> >> + git commit -m "nested level 1"
> >> + git submodule add ../sub_nested_nested &&
> >> + git commit -m "add nested level 2"
> >> + ) &&
> >> + git submodule add ./sub_nested nested_move &&
> >> + git commit -m "add nested_move" &&
> >> + git submodule update --init --recursive &&
> 
> So far a nice setup!

Thanks.

> >> + git mv nested_move sub_nested_moved &&
> 
> This is the offending command that produces the bug,
> as it will break most subsequent commands, such as

Yes.

> >> + git status
> 
> git-status is one of the basic commands. Without
> status to function, I think it is hard to recover your repo without
> a lot of in-depth knowledge of Git (submodules).
> 
> I wonder if git-status should

[RFC PATCH v2 2/2] submodule: simplify decision tree whether to or not to fetch

2017-09-15 Thread Heiko Voigt
To make extending this logic later easier.

Signed-off-by: Heiko Voigt 
---
 submodule.c | 74 ++---
 1 file changed, 37 insertions(+), 37 deletions(-)

diff --git a/submodule.c b/submodule.c
index 38b9905e43..fa44fc59f2 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1118,6 +1118,31 @@ struct submodule_parallel_fetch {
 };
 #define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0}
 
+static int get_fetch_recurse_config(const struct submodule *submodule,
+   struct submodule_parallel_fetch *spf)
+{
+   if (spf->command_line_option != RECURSE_SUBMODULES_DEFAULT)
+   return spf->command_line_option;
+
+   if (submodule) {
+   char *key;
+   const char *value;
+
+   int fetch_recurse = submodule->fetch_recurse;
+   key = xstrfmt("submodule.%s.fetchRecurseSubmodules", 
submodule->name);
+   if (!repo_config_get_string_const(the_repository, key, &value)) 
{
+   fetch_recurse = parse_fetch_recurse_submodules_arg(key, 
value);
+   }
+   free(key);
+
+   if (fetch_recurse != RECURSE_SUBMODULES_NONE)
+   /* local config overrules everything except commandline 
*/
+   return fetch_recurse;
+   }
+
+   return spf->default_option;
+}
+
 static int get_next_submodule(struct child_process *cp,
  struct strbuf *err, void *data, void **task_cb)
 {
@@ -1137,46 +1162,21 @@ static int get_next_submodule(struct child_process *cp,
 
submodule = submodule_from_path(&null_oid, ce->name);
 
-   default_argv = "yes";
-   if (spf->command_line_option == RECURSE_SUBMODULES_DEFAULT) {
-   int fetch_recurse = RECURSE_SUBMODULES_NONE;
-
-   if (submodule) {
-   char *key;
-   const char *value;
-
-   fetch_recurse = submodule->fetch_recurse;
-   key = 
xstrfmt("submodule.%s.fetchRecurseSubmodules", submodule->name);
-   if 
(!repo_config_get_string_const(the_repository, key, &value)) {
-   fetch_recurse = 
parse_fetch_recurse_submodules_arg(key, value);
-   }
-   free(key);
-   }
-
-   if (fetch_recurse != RECURSE_SUBMODULES_NONE) {
-   if (fetch_recurse == RECURSE_SUBMODULES_OFF)
-   continue;
-   if (fetch_recurse == 
RECURSE_SUBMODULES_ON_DEMAND) {
-   if 
(!unsorted_string_list_lookup(&changed_submodule_names,
-
submodule->name))
-   continue;
-   default_argv = "on-demand";
-   }
-   } else {
-   if (spf->default_option == 
RECURSE_SUBMODULES_OFF)
-   continue;
-   if (spf->default_option == 
RECURSE_SUBMODULES_ON_DEMAND) {
-   if 
(!unsorted_string_list_lookup(&changed_submodule_names,
- 
submodule->name))
-   continue;
-   default_argv = "on-demand";
-   }
-   }
-   } else if (spf->command_line_option == 
RECURSE_SUBMODULES_ON_DEMAND) {
-   if 
(!unsorted_string_list_lookup(&changed_submodule_names,
+   switch (get_fetch_recurse_config(submodule, spf))
+   {
+   default:
+   case RECURSE_SUBMODULES_DEFAULT:
+   case RECURSE_SUBMODULES_ON_DEMAND:
+   if (!submodule || 
!unsorted_string_list_lookup(&changed_submodule_names,
 submodule->name))
continue;
default_argv = "on-demand";
+   break;
+   case RECURSE_SUBMODULES_ON:
+   default_argv = "yes";
+   break;
+   case RECURSE_SUBMODULES_OFF:
+   continue;
}
 
strbuf_addf(&submodule_path, "%s/%s", spf->work_tree, ce->name);
-- 
2.14.1.145.gb3622a4



[RFC PATCH v2 1/2] implement fetching of moved submodules

2017-09-15 Thread Heiko Voigt
We store the changed submodules paths to calculate which submodule needs
fetching. This does not work for moved submodules since their paths do
not stay the same in case of a moved submodules. In case of new
submodules we do not have a path in the current checkout, since they
just appeared in this fetch.

It is more general to collect the submodule names for changes instead of
their paths to include the above cases.

With the change described above we implement 'on-demand' fetching of
changes in moved submodules.

Note: This does only work when repositories have a .gitmodules file. In
other words: It breaks if we do not get a name for a repository.
IIRC, consensus was that this is a requirement to get nice submodule
handling these days?

NEEDSWORK: This breaks t5531 and t5545 because they do not use a
.gitmodules file. I will add a fallback to paths to help such users.

Signed-off-by: Heiko Voigt 
---
This an update of the previous series[1] to the current master. The
fallback is still missing but now it should not conflict with any topics
in flight anymore (hopefully).

Cheers Heiko

[1] https://public-inbox.org/git/20170817105349.gc52...@book.hvoigt.net/

 submodule.c | 91 +
 t/t5526-fetch-submodules.sh | 35 +
 2 files changed, 85 insertions(+), 41 deletions(-)

diff --git a/submodule.c b/submodule.c
index 3cea8221e0..38b9905e43 100644
--- a/submodule.c
+++ b/submodule.c
@@ -21,7 +21,7 @@
 #include "parse-options.h"
 
 static int config_update_recurse_submodules = RECURSE_SUBMODULES_OFF;
-static struct string_list changed_submodule_paths = STRING_LIST_INIT_DUP;
+static struct string_list changed_submodule_names = STRING_LIST_INIT_DUP;
 static int initialized_fetch_ref_tips;
 static struct oid_array ref_tips_before_fetch;
 static struct oid_array ref_tips_after_fetch;
@@ -667,11 +667,11 @@ const struct submodule *submodule_from_ce(const struct 
cache_entry *ce)
 }
 
 static struct oid_array *submodule_commits(struct string_list *submodules,
-  const char *path)
+  const char *name)
 {
struct string_list_item *item;
 
-   item = string_list_insert(submodules, path);
+   item = string_list_insert(submodules, name);
if (item->util)
return (struct oid_array *) item->util;
 
@@ -680,39 +680,34 @@ static struct oid_array *submodule_commits(struct 
string_list *submodules,
return (struct oid_array *) item->util;
 }
 
+struct collect_changed_submodules_cb_data {
+   struct string_list *changed;
+   const struct object_id *commit_oid;
+};
+
 static void collect_changed_submodules_cb(struct diff_queue_struct *q,
  struct diff_options *options,
  void *data)
 {
+   struct collect_changed_submodules_cb_data *me = data;
+   struct string_list *changed = me->changed;
+   const struct object_id *commit_oid = me->commit_oid;
int i;
-   struct string_list *changed = data;
 
for (i = 0; i < q->nr; i++) {
struct diff_filepair *p = q->queue[i];
struct oid_array *commits;
+   const struct submodule *submodule;
+
if (!S_ISGITLINK(p->two->mode))
continue;
 
-   if (S_ISGITLINK(p->one->mode)) {
-   /*
-* NEEDSWORK: We should honor the name configured in
-* the .gitmodules file of the commit we are examining
-* here to be able to correctly follow submodules
-* being moved around.
-*/
-   commits = submodule_commits(changed, p->two->path);
-   oid_array_append(commits, &p->two->oid);
-   } else {
-   /* Submodule is new or was moved here */
-   /*
-* NEEDSWORK: When the .git directories of submodules
-* live inside the superprojects .git directory some
-* day we should fetch new submodules directly into
-* that location too when config or options request
-* that so they can be checked out from there.
-*/
+   submodule = submodule_from_path(commit_oid, p->two->path);
+   if (!submodule)
continue;
-   }
+
+   commits = submodule_commits(changed, submodule->name);
+   oid_array_append(commits, &p->two->oid);
}
 }
 
@@ -735,11 +730,14 @@ static void collect_changed_submodules(struct string_list 
*changed,
 
while ((commit = get_revision(&rev))) {
 

[PATCH v2] add test for bug in git-mv for recursive submodules

2017-09-15 Thread Heiko Voigt
When using git-mv with a submodule it will detect that and update the
paths for its configurations (.gitmodules, worktree and gitfile). This
does not work for recursive submodules where a user renames the root
submodule.

We discovered this fact when working on on-demand fetch for renamed
submodules. Lets add a test to document.

Signed-off-by: Heiko Voigt 
---
On Fri, Aug 18, 2017 at 12:04:03PM -0700, Stefan Beller wrote:
> > I just copied the shortcut that they were adding themselfes as submodule
> > in 'setup submodule'. The whole setup of submodules in this test is like
> > this. This way we already had a nested submodule structure which I could
> > just add.
> >
> > I agree that this is unrealistic so I can change that in the test I am
> > adding. But from what I have seen, this shortcut is taken in quite some
> > places when dealing with submodules.
> 
> Please do not make it worse.
> Once upon a time (late '16 IIRC) I had a series floating on the
> list removing all occurrences, but there were issues with the
> series and it did not land.

Took a little while but here is a more clean patch creating individual
submodules for the nesting.

Cheers Heiko

 t/t7001-mv.sh | 25 +
 1 file changed, 25 insertions(+)

diff --git a/t/t7001-mv.sh b/t/t7001-mv.sh
index e365d1ff77..cbc5fb37fe 100755
--- a/t/t7001-mv.sh
+++ b/t/t7001-mv.sh
@@ -491,4 +491,29 @@ test_expect_success 'moving a submodule in nested 
directories' '
test_cmp actual expect
 '
 
+test_expect_failure 'moving nested submodules' '
+   git commit -am "cleanup commit" &&
+   mkdir sub_nested_nested &&
+   (cd sub_nested_nested &&
+   touch nested_level2 &&
+   git init &&
+   git add . &&
+   git commit -m "nested level 2"
+   ) &&
+   mkdir sub_nested &&
+   (cd sub_nested &&
+   touch nested_level1 &&
+   git init &&
+   git add . &&
+   git commit -m "nested level 1"
+   git submodule add ../sub_nested_nested &&
+   git commit -m "add nested level 2"
+   ) &&
+   git submodule add ./sub_nested nested_move &&
+   git commit -m "add nested_move" &&
+   git submodule update --init --recursive &&
+   git mv nested_move sub_nested_moved &&
+   git status
+'
+
 test_done
-- 
2.14.1.145.gb3622a4



Re: [PATCH 2/4] push, fetch: error out for submodule entries not pointing to commits

2017-09-14 Thread Heiko Voigt
On Tue, Sep 12, 2017 at 10:30:27AM -0700, Jonathan Nieder wrote:
> From: Stefan Beller 
> 
> The check_has_commit helper uses resolves a submodule entry to a
> commit, when validating its existence. As a side effect this means
> tolerates a submodule entry pointing to a tag, which is not a valid
> submodule entry that git commands would know how to cope with.
> 
> Tighten the check to require an actual commit, not a tag pointing to a
> commit.
> 
> Also improve the error handling when a submodule entry points to
> non-commit (e.g., a blob) to error out instead of warning and
> pretending the pointed to object doesn't exist.
> 
> Signed-off-by: Stefan Beller 
> Signed-off-by: Jonathan Nieder 

Looks good to me.

Cheers Heiko


Re: Submodule regression in 2.14?

2017-08-25 Thread Heiko Voigt
On Tue, Aug 22, 2017 at 11:10:52AM -0700, Stefan Beller wrote:
> On Tue, Aug 22, 2017 at 8:33 AM, Heiko Voigt  wrote:
> > On Mon, Aug 21, 2017 at 09:42:54AM -0700, Stefan Beller wrote:
> >> On Mon, Aug 21, 2017 at 9:05 AM, Heiko Voigt  wrote:
> >> > On Fri, Aug 18, 2017 at 11:51:13PM -0700, Junio C Hamano wrote:
> >> # You feel the superproject is in the way?
> >> git worktree add --for-submodule  ...
> >> # The new submodule worktree puts the
> >> # submodule only UX first. so it feels like its own
> >> # repository, no need for specific flags.
> >
> > I am not sure I understand this one. What would that do? Put a worktree
> > for submodule path/to/sub to ...?
> 
> Yes, and at "..." you would have the UX of the submodule being
> its own repository, no interaction with the superproject.

Are you sure that is what Junio meant? IMO that would be 'git worktree
clone' or 'git worktree checkout', no?  In todays git terminology an
'add' is something that puts data into the repository / the index. That
is why I was/am confused.

Cheers Heiko


Re: Submodule regression in 2.14?

2017-08-22 Thread Heiko Voigt
On Mon, Aug 21, 2017 at 09:48:51AM -0700, Junio C Hamano wrote:
> Heiko Voigt  writes:
> 
> > So I think it is important that there are commits in the submodule so
> > its history makes sense independently for others.
> 
> I was trying to push the "just like trees" to the logical conclusion
> in order to see see if it leads to an absurd conclusions, or some
> useful scenario.  I do not necessarily subscribed to Jonathan's
> "vision" (I do not object to it as one possible mode of operation,
> either).
> 
> > Or how would you push out the history in the submodule in your idea?
> > Maybe I am missing something? What would be your use case with gitlinks
> > pointing to trees?
> 
> Not my idea.  But Stefan's message I was responding to said that
> object database for the superproject is the same as and mixed with
> object databases for the submodules, so it is plausible that push
> always happens at the superproject, and a mechanism to be invented
> (I mentioned the need for it in the message you are responding to)
> to enumerate all the commits bound from submodules to a range of
> superproject's commits would identify these trees to be pushed out.
> 
> In essense, "just like trees" folks want to use the gitlinks in the
> superproject to only specify the tree from the submodule project
> that should sit there.  And my point is that such a world view would
> lead to no need for branches in the submodule repository, no need
> for commits in the submodule repository, and no need for history in
> the submodule repository.  Which may or may not be a bad thing.

The problem I see here is: How do we seperate the submodule from the
superproject? Without the history (commits and trees) of the
superproject the submodule trees do not make sense anymore. The reason
users have submodules is because the projects are seperate somehow. With
just trees in the submodule it becomes tied quite tightly to the
superproject, IMO.

One step further: Why use gitlinks to point to trees? Let's just use
normal tree links instead, we already have that implemented :)

Cheers Heiko


Re: Submodule regression in 2.14?

2017-08-22 Thread Heiko Voigt
On Mon, Aug 21, 2017 at 09:42:54AM -0700, Stefan Beller wrote:
> On Mon, Aug 21, 2017 at 9:05 AM, Heiko Voigt  wrote:
> > On Fri, Aug 18, 2017 at 11:51:13PM -0700, Junio C Hamano wrote:
> >> As long as we are talking about idealized future world (well, at
> >> least an idea of somebody's "ideal", not necessarily shared by
> >> everybody), I wonder if there is even any need to have commits in
> >> submodules in such a world.  To realize such a "monorepo" world, you
> >> might be better off allowing a gitlink in the superproject to
> >> directly point at a tree object in a submodule repository (making
> >> them physically a single repository is an optional implementation
> >> detail I choose to ignore in this discussion).
> >
> > IMO this is one step to far. One main use of submodules are shared
> > repositories that are used by many superprojects. The reason you want to
> > have commits in the submodule are so that you can push them
> > independently and all other users can pick up the changes. You could get
> > by by Using the superproject commits for the submodule once you push or
> > something but those do not necessarily make sense in the context of the
> > submodule.
> >
> > So I think it is important that there are commits in the submodule so
> > its history makes sense independently for others.
> >
> > Or how would you push out the history in the submodule in your idea?
> > Maybe I am missing something? What would be your use case with gitlinks
> > pointing to trees?
> 
> Well there are still commits, but in the superproject the UX feels more
> as if the submodules were special trees.

Ah ok then I misunderstood. So they only feel like trees.

> So if you want to interact with
> the submodule specifically, you could do things like
> 
> git add /path/inside/sub
> # works seamlessly from the superproject tree

Would that mean that we need to loosen/keep the requirement loose for a
name from .gitmodules? I am asking because of my series for on-demand
fetch of renamed submodules. For the full functionality I would require
a name.

Would that be in a scenario where the user would then e.g. push the
submodule into the superproject?

Ah wait I misunderstood again. You mean a file in an existing
submodule right? Not adding submodule from a repository a user moved
there?

> git commit --submodule-commit-only
> # When the flag is not give, you may get an editor
> # asking for two commit messages, (sub+super)

Or maybe something like

git commit --submodule path/to/submodule

so the user can specify which submodule she wants. I first wrote it
without the switch but but that collides with listing files which should
be added. IMO this shorter option is also more intuitive to understand
what it does (for this case).

> git fetch --submodule
> # When the flag is not given, we'd fetch superproject and
> # on-demand

Yes like above we should add the path to the submodule right?

> # You feel the superproject is in the way?
> git worktree add --for-submodule  ...
> # The new submodule worktree puts the
> # submodule only UX first. so it feels like its own
> # repository, no need for specific flags.

I am not sure I understand this one. What would that do? Put a worktree
for submodule path/to/sub to ...?

Overall I like the direction of these ideas.

Cheers Heiko


Re: [GSoC][PATCH 3/4] submodule: port set_name_rev() from shell to C

2017-08-21 Thread Heiko Voigt
On Mon, Aug 21, 2017 at 09:45:14PM +0530, Prathamesh Chavan wrote:
> Function set_name_rev() is ported from git-submodule to the
> submodule--helper builtin. The function get_name_rev() generates the
> value of the revision name as required, and the function
> print_name_rev() handles the formating and printing of the obtained
> revision name.
> 
> Mentored-by: Christian Couder 
> Mentored-by: Stefan Beller 
> Signed-off-by: Prathamesh Chavan 
> ---
>  builtin/submodule--helper.c | 63 
> +
>  git-submodule.sh| 16 ++--
>  2 files changed, 65 insertions(+), 14 deletions(-)
> 
> diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
> index 7803457ba..a4bff3f38 100644
> --- a/builtin/submodule--helper.c
> +++ b/builtin/submodule--helper.c

[...]

> @@ -1242,6 +1304,7 @@ static struct cmd_struct commands[] = {
>   {"relative-path", resolve_relative_path, 0},
>   {"resolve-relative-url", resolve_relative_url, 0},
>   {"resolve-relative-url-test", resolve_relative_url_test, 0},
> + {"print-name-rev", print_name_rev, 0},

I see the function in git-submodule.sh was named kind of reverse. How
about we do it more naturally here and call this 'rev-name' instead?
That makes is more clear to me and is also similar to the used variable
name 'revname'.

I would also prefix it differently like 'get' or 'calculate' instead of
'print' since it tries to find a name and is not just a simple lookup.

So in summary I would prefer something like 'get-rev-name' as a name for
the subcommand here.

>   {"init", module_init, SUPPORT_SUPER_PREFIX},
>   {"remote-branch", resolve_remote_submodule_branch, 0},
>   {"push-check", push_check, 0},
> diff --git a/git-submodule.sh b/git-submodule.sh
> index e131760ee..e988167e0 100755
> --- a/git-submodule.sh
> +++ b/git-submodule.sh
> @@ -759,18 +759,6 @@ cmd_update()
>   }
>  }
>  
> -set_name_rev () {
> - revname=$( (
> - sanitize_submodule_env
> - cd "$1" && {
> - git describe "$2" 2>/dev/null ||
> - git describe --tags "$2" 2>/dev/null ||
> - git describe --contains "$2" 2>/dev/null ||
> - git describe --all --always "$2"
> - }
> - ) )
> - test -z "$revname" || revname=" ($revname)"
> -}
>  #
>  # Show commit summary for submodules in index or working tree
>  #
> @@ -1042,14 +1030,14 @@ cmd_status()
>   fi
>   if git diff-files --ignore-submodules=dirty --quiet -- 
> "$sm_path"
>   then
> - set_name_rev "$sm_path" "$sha1"
> + revname=$(git submodule--helper print-name-rev 
> "$sm_path" "$sha1")
>   say " $sha1 $displaypath$revname"
>   else
>   if test -z "$cached"
>   then
>   sha1=$(sanitize_submodule_env; cd "$sm_path" && 
> git rev-parse --verify HEAD)
>   fi
> - set_name_rev "$sm_path" "$sha1"
> + revname=$(git submodule--helper print-name-rev 
> "$sm_path" "$sha1")
>   say "+$sha1 $displaypath$revname"
>   fi
>  
> -- 
> 2.13.0
> 


Re: [PATCH] pull: respect submodule update configuration

2017-08-21 Thread Heiko Voigt
On Fri, Aug 18, 2017 at 11:24:47PM -0700, Junio C Hamano wrote:
> Stefan Beller  writes:
> 
> > From: Lars Schneider 
> >
> > Do not override the submodule configuration in the call to update
> > the submodules, but give a weaker default.
> >
> > Reported-by: Lars Schneider 
> > Signed-off-by: Stefan Beller 
> > ---
> >   
> > Personally I dislike this patch, but I have no better idea for the time
> > being.
> 
> The patch text from a cursory look seems reasonable to me.
> 
> It's not like you have 47 different codepaths that need to pay
> attention to the .update config and they all have to pass the new
> --default-update option, this is merely to fix one of them that
> relates to the problem reported by Lars, and you need a similar fix
> to other 46, right?
> 
> If you want the "--recurse-submodules" thing to always do the
> "weaker default" thing in your project, you can choose not to set
> .update to custom values in any of your submodules, so I do not
> think the reason why you dislike this change is because it would
> affect your use of submodules.
> 
> So I am a bit curious to learn which part of this change you dislike
> and why.

I am also curious. Isn't this the same strategy we are using in other
places?

Cheers Heiko


Re: Submodule regression in 2.14?

2017-08-21 Thread Heiko Voigt
On Fri, Aug 18, 2017 at 11:51:13PM -0700, Junio C Hamano wrote:
> As long as we are talking about idealized future world (well, at
> least an idea of somebody's "ideal", not necessarily shared by
> everybody), I wonder if there is even any need to have commits in
> submodules in such a world.  To realize such a "monorepo" world, you
> might be better off allowing a gitlink in the superproject to
> directly point at a tree object in a submodule repository (making
> them physically a single repository is an optional implementation
> detail I choose to ignore in this discussion).

IMO this is one step to far. One main use of submodules are shared
repositories that are used by many superprojects. The reason you want to
have commits in the submodule are so that you can push them
independently and all other users can pick up the changes. You could get
by by Using the superproject commits for the submodule once you push or
something but those do not necessarily make sense in the context of the
submodule.

So I think it is important that there are commits in the submodule so
its history makes sense independently for others.

Or how would you push out the history in the submodule in your idea?
Maybe I am missing something? What would be your use case with gitlinks
pointing to trees?

Cheers Heiko


Re: [PATCH] add test for bug in git-mv with nested submodules

2017-08-18 Thread Heiko Voigt
On Thu, Aug 17, 2017 at 12:05:56PM -0700, Stefan Beller wrote:
> On Thu, Aug 17, 2017 at 3:34 AM, Heiko Voigt  wrote:
> > When using git-mv with a submodule it will detect that and update the
> > paths for its configurations (.gitmodules, worktree and gitfile). This
> > does not work for nested submodules where a user renames the root
> > submodule.
> >
> > We discovered this fact when working on on-demand fetch for renamed
> > submodules. Lets add a test to document.
> >
> > Signed-off-by: Heiko Voigt 
> > ---
> >  t/t7001-mv.sh | 9 +
> >  1 file changed, 9 insertions(+)
> >
> > diff --git a/t/t7001-mv.sh b/t/t7001-mv.sh
> > index e365d1f..39f8aed 100755
> > --- a/t/t7001-mv.sh
> > +++ b/t/t7001-mv.sh
> > @@ -491,4 +491,13 @@ test_expect_success 'moving a submodule in nested 
> > directories' '
> > test_cmp actual expect
> >  '
> >
> > +test_expect_failure 'moving nested submodules' '
> > +   git commit -am "cleanup commit" &&
> > +   git submodule add ./. sub_nested &&
> 
> If possible, I would avoid adding the repo itself
> as a submodule as it is unrealistic in the wild.
> 
> While it may be ok for the test here, later down the road
> other tests making use of it it may become an issue with
> the URL of the submodule.

I just copied the shortcut that they were adding themselfes as submodule
in 'setup submodule'. The whole setup of submodules in this test is like
this. This way we already had a nested submodule structure which I could
just add.

I agree that this is unrealistic so I can change that in the test I am
adding. But from what I have seen, this shortcut is taken in quite some
places when dealing with submodules.

Cheers Heiko


Re: [RFC PATCH 1/2] implement fetching of moved submodules

2017-08-18 Thread Heiko Voigt
On Thu, Aug 17, 2017 at 10:20:13AM -0700, Stefan Beller wrote:
> On Thu, Aug 17, 2017 at 3:53 AM, Heiko Voigt  wrote:
> > We store the changed submodules paths to calculate which submodule needs
> > fetching. This does not work for moved submodules since their paths do
> > not stay the same in case of a moved submodules. In case of new
> > submodules we do not have a path in the current checkout, since they
> > just appeared in this fetch.
> >
> > It is more general to collect the submodule names for changes instead of
> > their paths to include the above cases.
> >
> > With the change described above we implement 'on-demand' fetching of
> > changes in moved submodules.
> 
> This sounds as if this would also enable fetching new submodules
> eventually?

Yes that was the goal when starting with these changes back then. But it
took more time than I had back then. So instead of letting these changes
sit bitrot again lets see if we can get them integrated.

For new submodules we need to change the iteration somehow. Currently we
are iterating through the index. But new submodules obviously do not
have an index entry (otherwise they would not be new). So instead of the
index we will need to create another list that contains "all"
submodules. Maybe something like: all submodules from the index plus all
submodules that changed / are new? We could also go further and inspect
all submodules from all ref tips to handle submodules on other branches
configured to 'yes'. But I think we should leave that for later if need
arises.

Some merge of index and additional submodules is needed, because for
--recurse-submodules=yes or submodule..fetchRecurseSubmodules=yes
we always need to run fetch inside the submodule. That would break if we
only looked at submodules that are collected as changed.

> > Note: This does only work when repositories have a .gitmodules file. In
> > other words: It breaks if we do not get a name for a repository.
> > IIRC, consensus was that this is a requirement to get nice submodule
> > handling these days?
> 
> I think that should have been the consensus since ~1.7.8 (since the
> submodules git dir can live inside the superprojects
> /module/).

I agree but since we started without it, we kind of have a mixed state.

> A gitlink entry without corresponding .gitmodules entry is just a gitlink.
> If we happen to have a repository at that path of the gitlink, we can
> be nice and pretend like it is a functional submodule, but it really is
> not. It's just another repo inside the superproject that happen to live
> at the path of a gitlink.

Yeah but at the moment we are handling 'on-demand' fetches and stuff for
such just gitlink submodules. If we were firm on that requirement we
would just skip those but that is not the case with the current
implementation.

> > Signed-off-by: Heiko Voigt 
> > ---
> >
> > I updated the leftover code from my series implementing recursive fetch
> > for moved submodules[1] to the current master.
> >
> > This breaks t5531 and t5545 because they do not use a .gitmodules file.
> >
> > I also have some code leftover that does fallback on paths in case no
> > submodule names can be found. But I do not really like it. The question
> > here is how far do we support not using .gitmodules. Is it e.g.
> > reasonable to say: "For --recurse-submodules=on-demand you need a
> > .gitmodules file?"
> 
> I would not intentionally break users here, but any new functionality can
> safely assume (a) we have a proper .gitmodules entry or (b) it is not a
> submodule, so do nothing/be extra careful.
> 
> For example in recursive diff sort of makes sense to also handle
> non-submodule gitlinks, but fetch is harder to tell.

Well we have a few different cases for gitlinks without .gitmodule
entry:

 1. New gitlink: We can not handle since we do not know where to clone
 from.

 2. Removed gitlink: No need to do anything in fetch

 3. Changed (but same name) gitlink: We can / and currently do run fetch
 in it

 4. Renamed: We currently skip those. We could probably do something to
 track the rename and run fetch in case of gitlink changes.
 In my current approach only the ones with a name are
 handled.

So I guess I will add a fallback to paths for 3. so we do not
unnecessarily break users using the current implementation.

> (just last night I was rereading
> https://public-inbox.org/git/CAJo=hjvnapnaddcaawavu9c4rveqdos3ev9wtguhx4fd0v_...@mail.gmail.com/
> which I think is a super cute application of gitlinks. If you happen
> to checkout such
> a tree, you don't want t

Re: [RFC PATCH 2/2] submodule: simplify decision tree whether to or not to fetch

2017-08-18 Thread Heiko Voigt
On Thu, Aug 17, 2017 at 10:50:07AM -0700, Brandon Williams wrote:
> On 08/17, Stefan Beller wrote:
> > On Thu, Aug 17, 2017 at 4:00 AM, Heiko Voigt  wrote:
> > > To make extending this logic later easier.
> > >
> > > Signed-off-by: Heiko Voigt 
> > > ---
> > > I am quite sure I replicated the same logic but a few more eyes would be
> > > appreciated.
> > 
> > A code cleanup is appreciated!
> > 
> > I thought Brandon had a series in flight doing a very similar cleanup here,
> > but in master..pu there is nothing to be found.
> 
> Yeah there are 2 series in flight which will probably conflict here.
> bw/grep-recurse-submodules and bw/submodule-config-cleanup

Ok then I will wait until those are in and then see if I can base the
cleanup on top. I think it is only necessary as a preparation for the
fully fledged fetch configuration logic mess we will get into once we
get to the full recursive submodule fetch implementation. Not
necessarily needed for the moved submodules.

> > 
> > The code looks good to me.

Thanks.

Cheers Heiko


[RFC PATCH 2/2] submodule: simplify decision tree whether to or not to fetch

2017-08-17 Thread Heiko Voigt
To make extending this logic later easier.

Signed-off-by: Heiko Voigt 
---
I am quite sure I replicated the same logic but a few more eyes would be
appreciated.

Cheers Heiko

 submodule.c | 55 +++
 1 file changed, 27 insertions(+), 28 deletions(-)

diff --git a/submodule.c b/submodule.c
index 3ed78ac..a1011f4 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1171,6 +1171,21 @@ int submodule_touches_in_range(struct object_id 
*excl_oid,
return ret;
 }
 
+static int get_fetch_recurse_config(const struct submodule *submodule, int 
command_line_option)
+{
+   if (command_line_option != RECURSE_SUBMODULES_DEFAULT)
+   return command_line_option;
+
+   if (submodule && submodule->fetch_recurse != RECURSE_SUBMODULES_NONE)
+   /* local config overrules everything except commandline */
+   return submodule->fetch_recurse;
+
+   if (gitmodules_is_unmerged)
+   return RECURSE_SUBMODULES_OFF;
+
+   return config_fetch_recurse_submodules;
+}
+
 struct submodule_parallel_fetch {
int count;
struct argv_array args;
@@ -1203,37 +1218,21 @@ static int get_next_submodule(struct child_process *cp,
if (!submodule)
submodule = submodule_from_name(&null_oid, ce->name);
 
-   default_argv = "yes";
-   if (spf->command_line_option == RECURSE_SUBMODULES_DEFAULT) {
-   if (submodule &&
-   submodule->fetch_recurse !=
-   RECURSE_SUBMODULES_NONE) {
-   if (submodule->fetch_recurse ==
-   RECURSE_SUBMODULES_OFF)
-   continue;
-   if (submodule->fetch_recurse ==
-   RECURSE_SUBMODULES_ON_DEMAND) {
-   if 
(!unsorted_string_list_lookup(&changed_submodule_names,
-
submodule->name))
-   continue;
-   default_argv = "on-demand";
-   }
-   } else {
-   if ((config_fetch_recurse_submodules == 
RECURSE_SUBMODULES_OFF) ||
-   gitmodules_is_unmerged)
-   continue;
-   if (config_fetch_recurse_submodules == 
RECURSE_SUBMODULES_ON_DEMAND) {
-   if 
(!unsorted_string_list_lookup(&changed_submodule_names,
-
submodule->name))
-   continue;
-   default_argv = "on-demand";
-   }
-   }
-   } else if (spf->command_line_option == 
RECURSE_SUBMODULES_ON_DEMAND) {
-   if 
(!unsorted_string_list_lookup(&changed_submodule_names,
+   switch (get_fetch_recurse_config(submodule, 
spf->command_line_option))
+   {
+   default:
+   case RECURSE_SUBMODULES_DEFAULT:
+   case RECURSE_SUBMODULES_ON_DEMAND:
+   if (!submodule || 
!unsorted_string_list_lookup(&changed_submodule_names,
 submodule->name))
continue;
default_argv = "on-demand";
+   break;
+   case RECURSE_SUBMODULES_ON:
+   default_argv = "yes";
+   break;
+   case RECURSE_SUBMODULES_OFF:
+   continue;
}
 
strbuf_addf(&submodule_path, "%s/%s", spf->work_tree, ce->name);
-- 
2.0.0.274.g6b2cd91



[RFC PATCH 1/2] implement fetching of moved submodules

2017-08-17 Thread Heiko Voigt
We store the changed submodules paths to calculate which submodule needs
fetching. This does not work for moved submodules since their paths do
not stay the same in case of a moved submodules. In case of new
submodules we do not have a path in the current checkout, since they
just appeared in this fetch.

It is more general to collect the submodule names for changes instead of
their paths to include the above cases.

With the change described above we implement 'on-demand' fetching of
changes in moved submodules.

Note: This does only work when repositories have a .gitmodules file. In
other words: It breaks if we do not get a name for a repository.
IIRC, consensus was that this is a requirement to get nice submodule
handling these days?

Signed-off-by: Heiko Voigt 
---

I updated the leftover code from my series implementing recursive fetch
for moved submodules[1] to the current master.

This breaks t5531 and t5545 because they do not use a .gitmodules file.

I also have some code leftover that does fallback on paths in case no
submodule names can be found. But I do not really like it. The question
here is how far do we support not using .gitmodules. Is it e.g.
reasonable to say: "For --recurse-submodules=on-demand you need a
.gitmodules file?"

[1] 
https://public-inbox.org/git/f5baa2acc09531a16f4f693eebbe60706bb8ed1e.1361751905.git.hvo...@hvoigt.net/

 submodule.c | 92 +
 t/t5526-fetch-submodules.sh | 35 +
 2 files changed, 86 insertions(+), 41 deletions(-)

diff --git a/submodule.c b/submodule.c
index 27de65a..3ed78ac 100644
--- a/submodule.c
+++ b/submodule.c
@@ -23,7 +23,7 @@
 static int config_fetch_recurse_submodules = RECURSE_SUBMODULES_ON_DEMAND;
 static int config_update_recurse_submodules = RECURSE_SUBMODULES_OFF;
 static int parallel_jobs = 1;
-static struct string_list changed_submodule_paths = STRING_LIST_INIT_DUP;
+static struct string_list changed_submodule_names = STRING_LIST_INIT_DUP;
 static int initialized_fetch_ref_tips;
 static struct oid_array ref_tips_before_fetch;
 static struct oid_array ref_tips_after_fetch;
@@ -742,11 +742,11 @@ const struct submodule *submodule_from_ce(const struct 
cache_entry *ce)
 }
 
 static struct oid_array *submodule_commits(struct string_list *submodules,
-  const char *path)
+  const char *name)
 {
struct string_list_item *item;
 
-   item = string_list_insert(submodules, path);
+   item = string_list_insert(submodules, name);
if (item->util)
return (struct oid_array *) item->util;
 
@@ -755,39 +755,34 @@ static struct oid_array *submodule_commits(struct 
string_list *submodules,
return (struct oid_array *) item->util;
 }
 
+struct collect_changed_submodules_cb_data {
+   struct string_list *changed;
+   const struct object_id *commit_oid;
+};
+
 static void collect_changed_submodules_cb(struct diff_queue_struct *q,
  struct diff_options *options,
  void *data)
 {
+   struct collect_changed_submodules_cb_data *me = data;
+   struct string_list *changed = me->changed;
+   const struct object_id *commit_oid = me->commit_oid;
int i;
-   struct string_list *changed = data;
 
for (i = 0; i < q->nr; i++) {
struct diff_filepair *p = q->queue[i];
struct oid_array *commits;
+   const struct submodule *submodule;
+
if (!S_ISGITLINK(p->two->mode))
continue;
 
-   if (S_ISGITLINK(p->one->mode)) {
-   /*
-* NEEDSWORK: We should honor the name configured in
-* the .gitmodules file of the commit we are examining
-* here to be able to correctly follow submodules
-* being moved around.
-*/
-   commits = submodule_commits(changed, p->two->path);
-   oid_array_append(commits, &p->two->oid);
-   } else {
-   /* Submodule is new or was moved here */
-   /*
-* NEEDSWORK: When the .git directories of submodules
-* live inside the superprojects .git directory some
-* day we should fetch new submodules directly into
-* that location too when config or options request
-* that so they can be checked out from there.
-*/
+   submodule = submodule_from_path(commit_oid, p->two->path);
+   if (!submodule)
continue;
-   }
+
+   commits = submodule_commit

[PATCH] t5526: fix some broken && chains

2017-08-17 Thread Heiko Voigt
Signed-off-by: Heiko Voigt 
---
 t/t5526-fetch-submodules.sh | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index ce788e9..22a7358 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -193,7 +193,7 @@ test_expect_success "recurseSubmodules=true propagates into 
submodules" '
add_upstream_commit &&
(
cd downstream &&
-   git config fetch.recurseSubmodules true
+   git config fetch.recurseSubmodules true &&
git fetch >../actual.out 2>../actual.err
) &&
test_must_be_empty actual.out &&
@@ -218,7 +218,7 @@ test_expect_success "--no-recurse-submodules overrides 
config setting" '
add_upstream_commit &&
(
cd downstream &&
-   git config fetch.recurseSubmodules true
+   git config fetch.recurseSubmodules true &&
git fetch --no-recurse-submodules >../actual.out 2>../actual.err
) &&
! test -s actual.out &&
@@ -232,7 +232,7 @@ test_expect_success "Recursion doesn't happen when no new 
commits are fetched in
cd submodule &&
git config --unset fetch.recurseSubmodules
) &&
-   git config --unset fetch.recurseSubmodules
+   git config --unset fetch.recurseSubmodules &&
git fetch >../actual.out 2>../actual.err
) &&
! test -s actual.out &&
@@ -312,7 +312,7 @@ test_expect_success "Recursion picks up all submodules when 
necessary" '
) &&
head1=$(git rev-parse --short HEAD^) &&
git add subdir/deepsubmodule &&
-   git commit -m "new deepsubmodule"
+   git commit -m "new deepsubmodule" &&
head2=$(git rev-parse --short HEAD) &&
echo "Fetching submodule submodule" > ../expect.err.sub &&
echo "From $pwd/submodule" >> ../expect.err.sub &&
-- 
2.0.0.274.g6b2cd91



[PATCH] add test for bug in git-mv with nested submodules

2017-08-17 Thread Heiko Voigt
When using git-mv with a submodule it will detect that and update the
paths for its configurations (.gitmodules, worktree and gitfile). This
does not work for nested submodules where a user renames the root
submodule.

We discovered this fact when working on on-demand fetch for renamed
submodules. Lets add a test to document.

Signed-off-by: Heiko Voigt 
---
 t/t7001-mv.sh | 9 +
 1 file changed, 9 insertions(+)

diff --git a/t/t7001-mv.sh b/t/t7001-mv.sh
index e365d1f..39f8aed 100755
--- a/t/t7001-mv.sh
+++ b/t/t7001-mv.sh
@@ -491,4 +491,13 @@ test_expect_success 'moving a submodule in nested 
directories' '
test_cmp actual expect
 '
 
+test_expect_failure 'moving nested submodules' '
+   git commit -am "cleanup commit" &&
+   git submodule add ./. sub_nested &&
+   git commit -m "add sub_nested" &&
+   git submodule update --init --recursive &&
+   git mv sub_nested sub_nested_moved &&
+   git status
+'
+
 test_done
-- 
2.0.0.274.g6b2cd91



Re: [PATCH] push: do not add submodule odb as an alternate when recursing on demand

2017-08-16 Thread Heiko Voigt
Hi,

was about to write that we are maybe overly cautious here. Because the
current way a submodule ends up in the list to be pushed is through:

find_unpushed_submodules()

that itself collects all changed submodules when submodule_needs_pushing() is
true. In there we have this:

if (!submodule_has_commits(path, commits))
/*
 * NOTE: We do consider it safe to return "no" here. The
 * correct answer would be "We do not know" instead of
 * "No push needed", but it is quite hard to change
 * the submodule pointer without having the submodule
 * around. If a user did however change the submodules
 * without having the submodule around, this indicates
 * an expert who knows what they are doing or a
 * maintainer integrating work from other people. In
 * both cases it should be safe to skip this check.
 */
return 0;

So if the check, whether a submodule has commits, fails for any reason it will
not end up in the list to be pushed.

As a side note: inside submodule_has_commits() there is an add_submodule_odb()
followed by a process to really make sure that the commits are in the
submodule.

So IMO at this point we can be sure that the *database* exists and this extra
check could be dropped if we said that a caller to push_submodule() should make
sure that the submodule exists. The current ones are doing it already (if I did
not miss anything).

On Tue, Aug 15, 2017 at 06:05:25PM -0700, Stefan Beller wrote:
> On Tue, Aug 15, 2017 at 5:11 PM, Junio C Hamano  wrote:
> > Stefan Beller  writes:
> >
> >>> Is "is it populated" a good thing to check here, though?  IIRC,
> >>> add-submodule-odb allows you to add the object database of an
> >>> inactivated submodule, so this seems to change the behaviour.  I do
> >>> not know if the behaviour change is a good thing (i.e. bugfix) or
> >>> not (i.e. regression) offhand, though.
> >>
> >> Good point, we should be able to push non-populated, even inactive(?)
> >> submodules. For that we strictly need add_submodule_odb here
> >> (or the repo object of the submodule, eventually).
> >>
> >> So let's retract this patch for now.
> >
> > Not so fast.
> 
> Ok, I took another look at the code.
> 
> While we may desire that un-populated submodules can be pushed
> (due to checking out another revision where the submodule
> doesn't exist, before pushing), this is not supported currently, because
> the call to run the push in the submodule assumes there is a
> "/.git" on which the child process can operate.
> So for now we HAVE to have the submodule populated.

That is a good point though. In the current form of push_submodule() we need to
have a populated submodule. So IMO to check whether the submodule is actually
*populated* instead of adding the odb is correct and a possible bug fix.

Cheers Heiko


Re: [PATCH v2 02/15] submodule: don't use submodule_from_name

2017-08-11 Thread Heiko Voigt
On Fri, Aug 04, 2017 at 02:53:11PM -0700, Brandon Williams wrote:
> On 08/03, Stefan Beller wrote:
> > On Thu, Aug 3, 2017 at 11:19 AM, Brandon Williams  wrote:
> > > The function 'submodule_from_name()' is being used incorrectly here as a
> > > submodule path is being used instead of a submodule name.  Since the
> > > correct function to use with a path to a submodule is already being used
> > > ('submodule_from_path()') let's remove the call to
> > > 'submodule_from_name()'.
> > >
> > > Signed-off-by: Brandon Williams 
> > 
> > In case a reroll is needed, you could incorperate Jens feedback
> > stating that 851e18c385 should have done it.
> 
> K I'll add that into the commit message.

Well, thats not 100% correct... IMO, it should have been a follow up patch
which I never got to implement. See my other reply to the v1 of this
patch I just sent out.

As stated there I will have a look into where it makes sense to pass a
commit id and behave more correctly.

Cheers Heiko


Re: [PATCH v2 14/15] unpack-trees: improve loading of .gitmodules

2017-08-11 Thread Heiko Voigt
On Thu, Aug 03, 2017 at 11:19:59AM -0700, Brandon Williams wrote:
> diff --git a/unpack-trees.c b/unpack-trees.c
> index 5dce7ff7d..3c7f464fa 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -1,5 +1,6 @@
>  #define NO_THE_INDEX_COMPATIBILITY_MACROS
>  #include "cache.h"
> +#include "repository.h"
>  #include "config.h"
>  #include "dir.h"
>  #include "tree.h"
> @@ -268,22 +269,28 @@ static int check_submodule_move_head(const struct 
> cache_entry *ce,
>   return 0;
>  }
>  
> -static void reload_gitmodules_file(struct index_state *index,
> -struct checkout *state)
> +/*
> + * Preform the loading of the repository's gitmodules file.  This function is

s/Preform/Perform/

and a nit: There is some extra space after the end of this sentence.

Cheers Heiko


Re: [PATCH 02/15] submodule: don't use submodule_from_name

2017-08-11 Thread Heiko Voigt
Hi,

sorry for the late reply, just stumpled upon this.

On Mon, Jul 31, 2017 at 01:43:04PM -0700, Stefan Beller wrote:
> On Sun, Jul 30, 2017 at 6:43 AM, Jens Lehmann  wrote:
> > Am 26.07.2017 um 23:06 schrieb Junio C Hamano:
> >>
> >> Stefan Beller  writes:
> >>
> >>> Rereading the archives, there was quite some discussion on the design
> >>> of these patches, but these lines of code did not get any attention
> >>>
> >>>  https://public-inbox.org/git/4cdb3063.5010...@web.de/
> >>>
> >>> I cc'd Jens in the hope of him having a good memory why he
> >>> wrote the code that way. :)
> >>
> >>
> >> Thanks for digging.  I wouldn't be surprised if this were a fallback
> >> to help a broken entry in .gitmodules that lack .path variable, but
> >> we shouldn't be sweeping the problem under the rug like that.
> >
> >
> > Sorry to disappoint you ;-) I added this in 7dce19d374 because
> > submodule by path lookup back then only parsed the checked out
> > .gitmodules file.
> 
> This is still the case AFAICT, as we never ask for a specific .gitmodules
> file identified by sha1 of the commit.

This was actually part of my original approach[1] but it seems I never got
around to implement that last part for which I originally started the
submodule config API: Proper recursive fetch.

I still have a patch for moved submodules lying around which pass a commit id
for a gitmodules file. That particular patch is quite simple and finished but
I was planning to include that in the finished fetch series. So I can have a
look if I can quickly update that to the current state, so we can at least have
one proper user of the submodule config API.

> > So looking for it by name was a good guess to
> > fetch a new submodule that wasn't present in the current HEAD's
> > .gitmodules, as the path is used as the default name in "git
> > submodule add".

I will have a look whether we can easily replace this hack with the proper
lookup now. Lets see how many low hanging fruits we have lying around
for recursive fetch. The full blown implementation including cloning of
new submodules might still take some time...

Cheers Heiko

[1] 
https://public-inbox.org/git/f5baa2acc09531a16f4f693eebbe60706bb8ed1e.1361751905.git.hvo...@hvoigt.net/


Re: [PATCH v4 4/4] submodule_needs_pushing() NEEDSWORK when we can not answer this question

2016-11-16 Thread Heiko Voigt
On Wed, Nov 16, 2016 at 11:18:07AM -0800, Junio C Hamano wrote:
> Heiko Voigt  writes:
> 
> > Signed-off-by: Heiko Voigt 
> > ---
> 
> Needs retitle ;-)  Here is what I tentatively queued.

Thanks ;-) Missed that one.

> submodule_needs_pushing(): explain the behaviour when we cannot answer
> 
> When we do not have commits that are involved in the update of the
> superproject in our copy of submodule, we cannot tell if the remote
> end needs to acquire these commits to be able to check out the
> superproject tree.  Explain why we answer "no there is no need/point
> in pushing from our submodule repository" in this case.
> 
> Signed-off-by: Heiko Voigt 
> Signed-off-by: Junio C Hamano 

Sound fine to me.

Cheers Heiko


[PATCH v4 3/4] batch check whether submodule needs pushing into one call

2016-11-16 Thread Heiko Voigt
We run a command for each sha1 change in a submodule. This is
unnecessary since we can simply batch all sha1's we want to check into
one command. Lets do it so we can speedup the check when many submodule
changes are in need of checking.

Signed-off-by: Heiko Voigt 
---
 submodule.c | 62 -
 1 file changed, 33 insertions(+), 29 deletions(-)

diff --git a/submodule.c b/submodule.c
index 12ac1ea..11391fa 100644
--- a/submodule.c
+++ b/submodule.c
@@ -507,27 +507,49 @@ static int append_sha1_to_argv(const unsigned char 
sha1[20], void *data)
return 0;
 }
 
-static int submodule_needs_pushing(const char *path, const unsigned char 
sha1[20])
+static int check_has_commit(const unsigned char sha1[20], void *data)
 {
-   if (add_submodule_odb(path) || !lookup_commit_reference(sha1))
+   int *has_commit = data;
+
+   if (!lookup_commit_reference(sha1))
+   *has_commit = 0;
+
+   return 0;
+}
+
+static int submodule_has_commits(const char *path, struct sha1_array *commits)
+{
+   int has_commit = 1;
+
+   if (add_submodule_odb(path))
+   return 0;
+
+   sha1_array_for_each_unique(commits, check_has_commit, &has_commit);
+   return has_commit;
+}
+
+static int submodule_needs_pushing(const char *path, struct sha1_array 
*commits)
+{
+   if (!submodule_has_commits(path, commits))
return 0;
 
if (for_each_remote_ref_submodule(path, has_remote, NULL) > 0) {
struct child_process cp = CHILD_PROCESS_INIT;
-   const char *argv[] = {"rev-list", NULL, "--not", "--remotes", 
"-n", "1" , NULL};
struct strbuf buf = STRBUF_INIT;
int needs_pushing = 0;
 
-   argv[1] = sha1_to_hex(sha1);
-   cp.argv = argv;
+   argv_array_push(&cp.args, "rev-list");
+   sha1_array_for_each_unique(commits, append_sha1_to_argv, 
&cp.args);
+   argv_array_pushl(&cp.args, "--not", "--remotes", "-n", "1" , 
NULL);
+
prepare_submodule_repo_env(&cp.env_array);
cp.git_cmd = 1;
cp.no_stdin = 1;
cp.out = -1;
cp.dir = path;
if (start_command(&cp))
-   die("Could not run 'git rev-list %s --not --remotes -n 
1' command in submodule %s",
-   sha1_to_hex(sha1), path);
+   die("Could not run 'git rev-list  --not 
--remotes -n 1' command in submodule %s",
+   path);
if (strbuf_read(&buf, cp.out, 41))
needs_pushing = 1;
finish_command(&cp);
@@ -582,22 +604,6 @@ static void find_unpushed_submodule_commits(struct commit 
*commit,
diff_tree_combined_merge(commit, 1, &rev);
 }
 
-struct collect_submodule_from_sha1s_data {
-   char *submodule_path;
-   struct string_list *needs_pushing;
-};
-
-static int collect_submodules_from_sha1s(const unsigned char sha1[20],
-   void *data)
-{
-   struct collect_submodule_from_sha1s_data *me = data;
-
-   if (submodule_needs_pushing(me->submodule_path, sha1))
-   string_list_insert(me->needs_pushing, me->submodule_path);
-
-   return 0;
-}
-
 static void free_submodules_sha1s(struct string_list *submodules)
 {
struct string_list_item *item;
@@ -634,12 +640,10 @@ int find_unpushed_submodules(struct sha1_array *commits,
argv_array_clear(&argv);
 
for_each_string_list_item(submodule, &submodules) {
-   struct collect_submodule_from_sha1s_data data;
-   data.submodule_path = submodule->string;
-   data.needs_pushing = needs_pushing;
-   sha1_array_for_each_unique((struct sha1_array *) 
submodule->util,
-   collect_submodules_from_sha1s,
-   &data);
+   struct sha1_array *commits = (struct sha1_array *) 
submodule->util;
+
+   if (submodule_needs_pushing(submodule->string, commits))
+   string_list_insert(needs_pushing, submodule->string);
}
free_submodules_sha1s(&submodules);
 
-- 
2.10.1.386.gc503e45



[PATCH v4 2/4] serialize collection of refs that contain submodule changes

2016-11-16 Thread Heiko Voigt
We are iterating over each pushed ref and want to check whether it
contains changes to submodules. Instead of immediately checking each ref
lets first collect them and then do the check for all of them in one
revision walk.

Signed-off-by: Heiko Voigt 
---
 submodule.c | 35 ---
 submodule.h |  5 +++--
 transport.c | 29 +
 3 files changed, 44 insertions(+), 25 deletions(-)

diff --git a/submodule.c b/submodule.c
index b2908fe..12ac1ea 100644
--- a/submodule.c
+++ b/submodule.c
@@ -500,6 +500,13 @@ static int has_remote(const char *refname, const struct 
object_id *oid,
return 1;
 }
 
+static int append_sha1_to_argv(const unsigned char sha1[20], void *data)
+{
+   struct argv_array *argv = data;
+   argv_array_push(argv, sha1_to_hex(sha1));
+   return 0;
+}
+
 static int submodule_needs_pushing(const char *path, const unsigned char 
sha1[20])
 {
if (add_submodule_odb(path) || !lookup_commit_reference(sha1))
@@ -599,25 +606,24 @@ static void free_submodules_sha1s(struct string_list 
*submodules)
string_list_clear(submodules, 1);
 }
 
-int find_unpushed_submodules(unsigned char new_sha1[20],
+int find_unpushed_submodules(struct sha1_array *commits,
const char *remotes_name, struct string_list *needs_pushing)
 {
struct rev_info rev;
struct commit *commit;
-   const char *argv[] = {NULL, NULL, "--not", "NULL", NULL};
-   int argc = ARRAY_SIZE(argv) - 1;
-   char *sha1_copy;
struct string_list submodules = STRING_LIST_INIT_DUP;
struct string_list_item *submodule;
+   struct argv_array argv = ARGV_ARRAY_INIT;
 
-   struct strbuf remotes_arg = STRBUF_INIT;
-
-   strbuf_addf(&remotes_arg, "--remotes=%s", remotes_name);
init_revisions(&rev, NULL);
-   sha1_copy = xstrdup(sha1_to_hex(new_sha1));
-   argv[1] = sha1_copy;
-   argv[3] = remotes_arg.buf;
-   setup_revisions(argc, argv, &rev, NULL);
+
+   /* argv.argv[0] will be ignored by setup_revisions */
+   argv_array_push(&argv, "find_unpushed_submodules");
+   sha1_array_for_each_unique(commits, append_sha1_to_argv, &argv);
+   argv_array_push(&argv, "--not");
+   argv_array_pushf(&argv, "--remotes=%s", remotes_name);
+
+   setup_revisions(argv.argc, argv.argv, &rev, NULL);
if (prepare_revision_walk(&rev))
die("revision walk setup failed");
 
@@ -625,8 +631,7 @@ int find_unpushed_submodules(unsigned char new_sha1[20],
find_unpushed_submodule_commits(commit, &submodules);
 
reset_revision_walk();
-   free(sha1_copy);
-   strbuf_release(&remotes_arg);
+   argv_array_clear(&argv);
 
for_each_string_list_item(submodule, &submodules) {
struct collect_submodule_from_sha1s_data data;
@@ -663,12 +668,12 @@ static int push_submodule(const char *path)
return 1;
 }
 
-int push_unpushed_submodules(unsigned char new_sha1[20], const char 
*remotes_name)
+int push_unpushed_submodules(struct sha1_array *commits, const char 
*remotes_name)
 {
int i, ret = 1;
struct string_list needs_pushing = STRING_LIST_INIT_DUP;
 
-   if (!find_unpushed_submodules(new_sha1, remotes_name, &needs_pushing))
+   if (!find_unpushed_submodules(commits, remotes_name, &needs_pushing))
return 1;
 
for (i = 0; i < needs_pushing.nr; i++) {
diff --git a/submodule.h b/submodule.h
index d9e197a..9454806 100644
--- a/submodule.h
+++ b/submodule.h
@@ -3,6 +3,7 @@
 
 struct diff_options;
 struct argv_array;
+struct sha1_array;
 
 enum {
RECURSE_SUBMODULES_CHECK = -4,
@@ -62,9 +63,9 @@ int submodule_uses_gitfile(const char *path);
 int ok_to_remove_submodule(const char *path);
 int merge_submodule(unsigned char result[20], const char *path, const unsigned 
char base[20],
const unsigned char a[20], const unsigned char b[20], int 
search);
-int find_unpushed_submodules(unsigned char new_sha1[20], const char 
*remotes_name,
+int find_unpushed_submodules(struct sha1_array *commits, const char 
*remotes_name,
struct string_list *needs_pushing);
-int push_unpushed_submodules(unsigned char new_sha1[20], const char 
*remotes_name);
+int push_unpushed_submodules(struct sha1_array *commits, const char 
*remotes_name);
 void connect_work_tree_and_git_dir(const char *work_tree, const char *git_dir);
 int parallel_submodules(void);
 
diff --git a/transport.c b/transport.c
index d57e8de..f482869 100644
--- a/transport.c
+++ b/transport.c
@@ -949,23 +949,36 @@ int transport_push(struct transport *transport,
 
if ((flags & TRANSPORT_RECURSE_SUBMODULES_ON_DEMAND) && 
!is_bare_repository()) {
struct ref *ref = remote_refs;
+   struct s

[PATCH v4 4/4] submodule_needs_pushing() NEEDSWORK when we can not answer this question

2016-11-16 Thread Heiko Voigt
Signed-off-by: Heiko Voigt 
---
 submodule.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/submodule.c b/submodule.c
index 11391fa..00dd655 100644
--- a/submodule.c
+++ b/submodule.c
@@ -531,6 +531,17 @@ static int submodule_has_commits(const char *path, struct 
sha1_array *commits)
 static int submodule_needs_pushing(const char *path, struct sha1_array 
*commits)
 {
if (!submodule_has_commits(path, commits))
+   /*
+* NOTE: We do consider it safe to return "no" here. The
+* correct answer would be "We do not know" instead of
+* "No push needed", but it is quite hard to change
+* the submodule pointer without having the submodule
+* around. If a user did however change the submodules
+* without having the submodule around, this indicates
+* an expert who knows what they are doing or a
+* maintainer integrating work from other people. In
+* both cases it should be safe to skip this check.
+*/
return 0;
 
if (for_each_remote_ref_submodule(path, has_remote, NULL) > 0) {
-- 
2.10.1.386.gc503e45



[PATCH v4 1/4] serialize collection of changed submodules

2016-11-16 Thread Heiko Voigt
To check whether a submodule needs to be pushed we need to collect all
changed submodules. Lets collect them first and then execute the
possibly expensive test whether certain revisions are already pushed
only once per submodule.

There is further potential for optimization since we can assemble one
command and only issued that instead of one call for each remote ref in
the submodule.

Signed-off-by: Heiko Voigt 
---
 submodule.c | 59 +++
 1 file changed, 55 insertions(+), 4 deletions(-)

diff --git a/submodule.c b/submodule.c
index 6f7d883..b2908fe 100644
--- a/submodule.c
+++ b/submodule.c
@@ -532,19 +532,34 @@ static int submodule_needs_pushing(const char *path, 
const unsigned char sha1[20
return 0;
 }
 
+static struct sha1_array *submodule_commits(struct string_list *submodules,
+   const char *path)
+{
+   struct string_list_item *item;
+
+   item = string_list_insert(submodules, path);
+   if (item->util)
+   return (struct sha1_array *) item->util;
+
+   /* NEEDSWORK: should we have sha1_array_init()? */
+   item->util = xcalloc(1, sizeof(struct sha1_array));
+   return (struct sha1_array *) item->util;
+}
+
 static void collect_submodules_from_diff(struct diff_queue_struct *q,
 struct diff_options *options,
 void *data)
 {
int i;
-   struct string_list *needs_pushing = data;
+   struct string_list *submodules = data;
 
for (i = 0; i < q->nr; i++) {
struct diff_filepair *p = q->queue[i];
+   struct sha1_array *commits;
if (!S_ISGITLINK(p->two->mode))
continue;
-   if (submodule_needs_pushing(p->two->path, p->two->oid.hash))
-   string_list_insert(needs_pushing, p->two->path);
+   commits = submodule_commits(submodules, p->two->path);
+   sha1_array_append(commits, p->two->oid.hash);
}
 }
 
@@ -560,6 +575,30 @@ static void find_unpushed_submodule_commits(struct commit 
*commit,
diff_tree_combined_merge(commit, 1, &rev);
 }
 
+struct collect_submodule_from_sha1s_data {
+   char *submodule_path;
+   struct string_list *needs_pushing;
+};
+
+static int collect_submodules_from_sha1s(const unsigned char sha1[20],
+   void *data)
+{
+   struct collect_submodule_from_sha1s_data *me = data;
+
+   if (submodule_needs_pushing(me->submodule_path, sha1))
+   string_list_insert(me->needs_pushing, me->submodule_path);
+
+   return 0;
+}
+
+static void free_submodules_sha1s(struct string_list *submodules)
+{
+   struct string_list_item *item;
+   for_each_string_list_item(item, submodules)
+   sha1_array_clear((struct sha1_array *) item->util);
+   string_list_clear(submodules, 1);
+}
+
 int find_unpushed_submodules(unsigned char new_sha1[20],
const char *remotes_name, struct string_list *needs_pushing)
 {
@@ -568,6 +607,8 @@ int find_unpushed_submodules(unsigned char new_sha1[20],
const char *argv[] = {NULL, NULL, "--not", "NULL", NULL};
int argc = ARRAY_SIZE(argv) - 1;
char *sha1_copy;
+   struct string_list submodules = STRING_LIST_INIT_DUP;
+   struct string_list_item *submodule;
 
struct strbuf remotes_arg = STRBUF_INIT;
 
@@ -581,12 +622,22 @@ int find_unpushed_submodules(unsigned char new_sha1[20],
die("revision walk setup failed");
 
while ((commit = get_revision(&rev)) != NULL)
-   find_unpushed_submodule_commits(commit, needs_pushing);
+   find_unpushed_submodule_commits(commit, &submodules);
 
reset_revision_walk();
free(sha1_copy);
strbuf_release(&remotes_arg);
 
+   for_each_string_list_item(submodule, &submodules) {
+   struct collect_submodule_from_sha1s_data data;
+   data.submodule_path = submodule->string;
+   data.needs_pushing = needs_pushing;
+   sha1_array_for_each_unique((struct sha1_array *) 
submodule->util,
+   collect_submodules_from_sha1s,
+   &data);
+   }
+   free_submodules_sha1s(&submodules);
+
return needs_pushing->nr;
 }
 
-- 
2.10.1.386.gc503e45



[PATCH v4 0/4] Speedup finding of unpushed submodules

2016-11-16 Thread Heiko Voigt
You can find the third iteration of this series here:

http://public-inbox.org/git/cover.1479221071.git.hvo...@hvoigt.net/

All comments from the last iteration should be addressed.

Cheers Heiko

Heiko Voigt (4):
  serialize collection of changed submodules
  serialize collection of refs that contain submodule changes
  batch check whether submodule needs pushing into one call
  submodule_needs_pushing() NEEDSWORK when we can not answer this
question

 submodule.c | 123 +++-
 submodule.h |   5 ++-
 transport.c |  29 ++
 3 files changed, 121 insertions(+), 36 deletions(-)

-- 
2.10.1.386.gc503e45



Re: [PATCH v1 2/2] travis-ci: disable GIT_TEST_HTTPD for macOS

2016-11-16 Thread Heiko Voigt
On Tue, Nov 15, 2016 at 10:31:59AM -0500, Jeff King wrote:
> On Tue, Nov 15, 2016 at 01:07:18PM +0100, Heiko Voigt wrote:
> 
> > On Fri, Nov 11, 2016 at 09:22:51AM +0100, Lars Schneider wrote:
> > > To all macOS users on the list:
> > > Does anyone execute the tests with GIT_TEST_HTTPD enabled successfully?
> > 
> > Nope. The following tests fail for me on master: 5539, 5540, 5541, 5542,
> > 5550, 5551, 5561, 5812.
> 
> Failing how? Does apache fail to start up? Do tests fails? What does
> "-v" say? Is there anything interesting in httpd/error.log in the trash
> directory?

This is what I see for 5539:

$ GIT_TEST_HTTPD=1 ./t5539-fetch-http-shallow.sh -v
Initialized empty Git repository in /Users/hvoigt/Repository/git4/t/trash 
directory.t5539-fetch-http-shallow/.git/
checking prerequisite: NOT_ROOT

mkdir -p "$TRASH_DIRECTORY/prereq-test-dir" &&
(
cd "$TRASH_DIRECTORY/prereq-test-dir" &&
uid=$(id -u) &&
test "$uid" != 0

)
prerequisite NOT_ROOT ok
httpd: Syntax error on line 65 of 
/Users/hvoigt/Repository/git4/t/lib-httpd/apache.conf: Cannot load 
modules/mod_mpm_prefork.so into server: 
dlopen(/Users/hvoigt/Repository/git4/t/trash 
directory.t5539-fetch-http-shallow/httpd/modules/mod_mpm_prefork.so, 10): image 
not found
error: web server setup failed


It seems the other failures have the same cause.

Cheers Heiko


Re: [PATCH v3 3/4] batch check whether submodule needs pushing into one call

2016-11-16 Thread Heiko Voigt
On Tue, Nov 15, 2016 at 02:28:31PM -0800, Stefan Beller wrote:
> On Tue, Nov 15, 2016 at 6:56 AM, Heiko Voigt  wrote:
> 
> > -static int submodule_needs_pushing(const char *path, const unsigned char 
> > sha1[20])
> > +static int check_has_commit(const unsigned char sha1[20], void *data)
> >  {
> > -   if (add_submodule_odb(path) || !lookup_commit_reference(sha1))
> > +   int *has_commit = (int *) data;
> 
> nit: just as prior patches ;) void* can be cast implicitly.

Even though its just a nit: Will remove all the void casts. :)

Cheers Heiko


Re: [PATCH v3 4/4] submodule_needs_pushing() NEEDSWORK when we can not answer this question

2016-11-16 Thread Heiko Voigt
On Tue, Nov 15, 2016 at 04:13:51PM -0800, Junio C Hamano wrote:
> Stefan Beller  writes:
> 
> >> "We do not know" ...
> >
> > ... because there is no way to check for us as we don't have the
> > submodule commits.
> >
> > " We do consider it safe as no one in their sane mind would
> > have changed the submodule pointers without having the
> > submodule around. If a user did however change the submodules
> > without having the submodule commits around, this indicates an
> > expert who knows what they were doing."
> 
> I didn't think it through myself to arrive at such a conclusion, but
> to me the above sounds like a sensible reasoning [*1*].

I think you have a point here. If I rephrase it like this: "We do
consider it safe as no one in their sane mind *could* have changed the
submodule pointers without having the submodule around..."

Since its actually hard to create such a situation without the submodule
commit around I agree here.

> *1* My version was more like "we do not know if they would get into
> a situation where they do not have enough submodule commits if
> we pushed our superproject, but more importantly, we DO KNOW
> that it would not help an iota if we pushed our submodule to
> them, so there is no point stopping the push of superproject
> saying 'no, no, no, you must push the submodule first'".

Yes saying that would be wrong. I was rather suggesting that we tell the
user that we could not find the submodule commits to and that if he
wants to proceed he should either pass --recurse-submodules=no or
initialize the submodule.

But I think the above reasoning obsoletes my suggestion. I would adjust
the comment accordingly but still keep the patch so we have
documentation that this behavior is on purpose.

Cheers Heiko


Re: Git status takes too long- How to improve the performance of git

2016-11-15 Thread Heiko Voigt
On Tue, Nov 15, 2016 at 02:33:12AM -0700, ravalika wrote:
> Number of files - 63883

Since you also posted this to the "Git for Windows" mailinglist I assume
that you are using Windows. Reduce the number of files. For example
split the repository into two one for documentation and one for source.
Thats what I did with a converted repository that had to many files.

Windows is unfortunately very slow when it comes to handling many files
and if I recall correctly ~3 files was in a nicely handleable range
for a Git repository on Windows, but that might have changed...

Cheers Heiko


[PATCH v3 3/4] batch check whether submodule needs pushing into one call

2016-11-15 Thread Heiko Voigt
We run a command for each sha1 change in a submodule. This is
unnecessary since we can simply batch all sha1's we want to check into
one command. Lets do it so we can speedup the check when many submodule
changes are in need of checking.

Signed-off-by: Heiko Voigt 
---
 submodule.c | 63 -
 1 file changed, 33 insertions(+), 30 deletions(-)

diff --git a/submodule.c b/submodule.c
index 769d666..e1196fd 100644
--- a/submodule.c
+++ b/submodule.c
@@ -507,27 +507,49 @@ static int append_sha1_to_argv(const unsigned char 
sha1[20], void *data)
return 0;
 }
 
-static int submodule_needs_pushing(const char *path, const unsigned char 
sha1[20])
+static int check_has_commit(const unsigned char sha1[20], void *data)
 {
-   if (add_submodule_odb(path) || !lookup_commit_reference(sha1))
+   int *has_commit = (int *) data;
+
+   if (!lookup_commit_reference(sha1))
+   *has_commit = 0;
+
+   return 0;
+}
+
+static int submodule_has_commits(const char *path, struct sha1_array *commits)
+{
+   int has_commit = 1;
+
+   if (add_submodule_odb(path))
+   return 0;
+
+   sha1_array_for_each_unique(commits, check_has_commit, &has_commit);
+   return has_commit;
+}
+
+static int submodule_needs_pushing(const char *path, struct sha1_array 
*commits)
+{
+   if (!submodule_has_commits(path, commits))
return 0;
 
if (for_each_remote_ref_submodule(path, has_remote, NULL) > 0) {
struct child_process cp = CHILD_PROCESS_INIT;
-   const char *argv[] = {"rev-list", NULL, "--not", "--remotes", 
"-n", "1" , NULL};
struct strbuf buf = STRBUF_INIT;
int needs_pushing = 0;
 
-   argv[1] = sha1_to_hex(sha1);
-   cp.argv = argv;
+   argv_array_push(&cp.args, "rev-list");
+   sha1_array_for_each_unique(commits, append_sha1_to_argv, 
&cp.args);
+   argv_array_pushl(&cp.args, "--not", "--remotes", "-n", "1" , 
NULL);
+
prepare_submodule_repo_env(&cp.env_array);
cp.git_cmd = 1;
cp.no_stdin = 1;
cp.out = -1;
cp.dir = path;
if (start_command(&cp))
-   die("Could not run 'git rev-list %s --not --remotes -n 
1' command in submodule %s",
-   sha1_to_hex(sha1), path);
+   die("Could not run 'git rev-list  --not 
--remotes -n 1' command in submodule %s",
+   path);
if (strbuf_read(&buf, cp.out, 41))
needs_pushing = 1;
finish_command(&cp);
@@ -582,23 +604,6 @@ static void find_unpushed_submodule_commits(struct commit 
*commit,
diff_tree_combined_merge(commit, 1, &rev);
 }
 
-struct collect_submodule_from_sha1s_data {
-   char *submodule_path;
-   struct string_list *needs_pushing;
-};
-
-static int collect_submodules_from_sha1s(const unsigned char sha1[20],
-   void *data)
-{
-   struct collect_submodule_from_sha1s_data *me =
-   (struct collect_submodule_from_sha1s_data *) data;
-
-   if (submodule_needs_pushing(me->submodule_path, sha1))
-   string_list_insert(me->needs_pushing, me->submodule_path);
-
-   return 0;
-}
-
 static void free_submodules_sha1s(struct string_list *submodules)
 {
struct string_list_item *item;
@@ -635,12 +640,10 @@ int find_unpushed_submodules(struct sha1_array *commits,
argv_array_clear(&argv);
 
for_each_string_list_item(submodule, &submodules) {
-   struct collect_submodule_from_sha1s_data data;
-   data.submodule_path = submodule->string;
-   data.needs_pushing = needs_pushing;
-   sha1_array_for_each_unique((struct sha1_array *) 
submodule->util,
-   collect_submodules_from_sha1s,
-   &data);
+   struct sha1_array *commits = (struct sha1_array *) 
submodule->util;
+
+   if (submodule_needs_pushing(submodule->string, commits))
+   string_list_insert(needs_pushing, submodule->string);
}
free_submodules_sha1s(&submodules);
 
-- 
2.10.1.386.gc503e45



[PATCH v3 2/4] serialize collection of refs that contain submodule changes

2016-11-15 Thread Heiko Voigt
We are iterating over each pushed ref and want to check whether it
contains changes to submodules. Instead of immediately checking each ref
lets first collect them and then do the check for all of them in one
revision walk.

Signed-off-by: Heiko Voigt 
---
 submodule.c | 35 ---
 submodule.h |  5 +++--
 transport.c | 29 +
 3 files changed, 44 insertions(+), 25 deletions(-)

diff --git a/submodule.c b/submodule.c
index b91585e..769d666 100644
--- a/submodule.c
+++ b/submodule.c
@@ -500,6 +500,13 @@ static int has_remote(const char *refname, const struct 
object_id *oid,
return 1;
 }
 
+static int append_sha1_to_argv(const unsigned char sha1[20], void *data)
+{
+   struct argv_array *argv = (struct argv_array *) data;
+   argv_array_push(argv, sha1_to_hex(sha1));
+   return 0;
+}
+
 static int submodule_needs_pushing(const char *path, const unsigned char 
sha1[20])
 {
if (add_submodule_odb(path) || !lookup_commit_reference(sha1))
@@ -600,25 +607,24 @@ static void free_submodules_sha1s(struct string_list 
*submodules)
string_list_clear(submodules, 1);
 }
 
-int find_unpushed_submodules(unsigned char new_sha1[20],
+int find_unpushed_submodules(struct sha1_array *commits,
const char *remotes_name, struct string_list *needs_pushing)
 {
struct rev_info rev;
struct commit *commit;
-   const char *argv[] = {NULL, NULL, "--not", "NULL", NULL};
-   int argc = ARRAY_SIZE(argv) - 1;
-   char *sha1_copy;
struct string_list submodules = STRING_LIST_INIT_DUP;
struct string_list_item *submodule;
+   struct argv_array argv = ARGV_ARRAY_INIT;
 
-   struct strbuf remotes_arg = STRBUF_INIT;
-
-   strbuf_addf(&remotes_arg, "--remotes=%s", remotes_name);
init_revisions(&rev, NULL);
-   sha1_copy = xstrdup(sha1_to_hex(new_sha1));
-   argv[1] = sha1_copy;
-   argv[3] = remotes_arg.buf;
-   setup_revisions(argc, argv, &rev, NULL);
+
+   /* argv.argv[0] will be ignored by setup_revisions */
+   argv_array_push(&argv, "find_unpushed_submodules");
+   sha1_array_for_each_unique(commits, append_sha1_to_argv, &argv);
+   argv_array_push(&argv, "--not");
+   argv_array_pushf(&argv, "--remotes=%s", remotes_name);
+
+   setup_revisions(argv.argc, argv.argv, &rev, NULL);
if (prepare_revision_walk(&rev))
die("revision walk setup failed");
 
@@ -626,8 +632,7 @@ int find_unpushed_submodules(unsigned char new_sha1[20],
find_unpushed_submodule_commits(commit, &submodules);
 
reset_revision_walk();
-   free(sha1_copy);
-   strbuf_release(&remotes_arg);
+   argv_array_clear(&argv);
 
for_each_string_list_item(submodule, &submodules) {
struct collect_submodule_from_sha1s_data data;
@@ -664,12 +669,12 @@ static int push_submodule(const char *path)
return 1;
 }
 
-int push_unpushed_submodules(unsigned char new_sha1[20], const char 
*remotes_name)
+int push_unpushed_submodules(struct sha1_array *commits, const char 
*remotes_name)
 {
int i, ret = 1;
struct string_list needs_pushing = STRING_LIST_INIT_DUP;
 
-   if (!find_unpushed_submodules(new_sha1, remotes_name, &needs_pushing))
+   if (!find_unpushed_submodules(commits, remotes_name, &needs_pushing))
return 1;
 
for (i = 0; i < needs_pushing.nr; i++) {
diff --git a/submodule.h b/submodule.h
index d9e197a..9454806 100644
--- a/submodule.h
+++ b/submodule.h
@@ -3,6 +3,7 @@
 
 struct diff_options;
 struct argv_array;
+struct sha1_array;
 
 enum {
RECURSE_SUBMODULES_CHECK = -4,
@@ -62,9 +63,9 @@ int submodule_uses_gitfile(const char *path);
 int ok_to_remove_submodule(const char *path);
 int merge_submodule(unsigned char result[20], const char *path, const unsigned 
char base[20],
const unsigned char a[20], const unsigned char b[20], int 
search);
-int find_unpushed_submodules(unsigned char new_sha1[20], const char 
*remotes_name,
+int find_unpushed_submodules(struct sha1_array *commits, const char 
*remotes_name,
struct string_list *needs_pushing);
-int push_unpushed_submodules(unsigned char new_sha1[20], const char 
*remotes_name);
+int push_unpushed_submodules(struct sha1_array *commits, const char 
*remotes_name);
 void connect_work_tree_and_git_dir(const char *work_tree, const char *git_dir);
 int parallel_submodules(void);
 
diff --git a/transport.c b/transport.c
index d57e8de..f482869 100644
--- a/transport.c
+++ b/transport.c
@@ -949,23 +949,36 @@ int transport_push(struct transport *transport,
 
if ((flags & TRANSPORT_RECURSE_SUBMODULES_ON_DEMAND) && 
!is_bare_repository()) {
struct ref *ref = remote_refs;
+

[PATCH v3 4/4] submodule_needs_pushing() NEEDSWORK when we can not answer this question

2016-11-15 Thread Heiko Voigt
Signed-off-by: Heiko Voigt 
---
 submodule.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/submodule.c b/submodule.c
index e1196fd..29efee9 100644
--- a/submodule.c
+++ b/submodule.c
@@ -531,6 +531,14 @@ static int submodule_has_commits(const char *path, struct 
sha1_array *commits)
 static int submodule_needs_pushing(const char *path, struct sha1_array 
*commits)
 {
if (!submodule_has_commits(path, commits))
+   /* NEEDSWORK: The correct answer here is "We do not
+* know" instead of "No push needed". We currently
+* proceed pushing here as if the submodules commits are
+* available on a remote. Since we can not check the
+* remote availability for this submodule we should
+* consider changing this behavior to: Stop here and
+* tell the user how to skip this check if wanted.
+*/
return 0;
 
if (for_each_remote_ref_submodule(path, has_remote, NULL) > 0) {
-- 
2.10.1.386.gc503e45



[PATCH v3 1/4] serialize collection of changed submodules

2016-11-15 Thread Heiko Voigt
To check whether a submodule needs to be pushed we need to collect all
changed submodules. Lets collect them first and then execute the
possibly expensive test whether certain revisions are already pushed
only once per submodule.

There is further potential for optimization since we can assemble one
command and only issued that instead of one call for each remote ref in
the submodule.

Signed-off-by: Heiko Voigt 
---
 submodule.c | 60 
 1 file changed, 56 insertions(+), 4 deletions(-)

diff --git a/submodule.c b/submodule.c
index 6f7d883..b91585e 100644
--- a/submodule.c
+++ b/submodule.c
@@ -532,19 +532,34 @@ static int submodule_needs_pushing(const char *path, 
const unsigned char sha1[20
return 0;
 }
 
+static struct sha1_array *submodule_commits(struct string_list *submodules,
+   const char *path)
+{
+   struct string_list_item *item;
+
+   item = string_list_insert(submodules, path);
+   if (item->util)
+   return (struct sha1_array *) item->util;
+
+   /* NEEDSWORK: should we have sha1_array_init()? */
+   item->util = xcalloc(1, sizeof(struct sha1_array));
+   return (struct sha1_array *) item->util;
+}
+
 static void collect_submodules_from_diff(struct diff_queue_struct *q,
 struct diff_options *options,
 void *data)
 {
int i;
-   struct string_list *needs_pushing = data;
+   struct string_list *submodules = data;
 
for (i = 0; i < q->nr; i++) {
struct diff_filepair *p = q->queue[i];
+   struct sha1_array *commits;
if (!S_ISGITLINK(p->two->mode))
continue;
-   if (submodule_needs_pushing(p->two->path, p->two->oid.hash))
-   string_list_insert(needs_pushing, p->two->path);
+   commits = submodule_commits(submodules, p->two->path);
+   sha1_array_append(commits, p->two->oid.hash);
}
 }
 
@@ -560,6 +575,31 @@ static void find_unpushed_submodule_commits(struct commit 
*commit,
diff_tree_combined_merge(commit, 1, &rev);
 }
 
+struct collect_submodule_from_sha1s_data {
+   char *submodule_path;
+   struct string_list *needs_pushing;
+};
+
+static int collect_submodules_from_sha1s(const unsigned char sha1[20],
+   void *data)
+{
+   struct collect_submodule_from_sha1s_data *me =
+   (struct collect_submodule_from_sha1s_data *) data;
+
+   if (submodule_needs_pushing(me->submodule_path, sha1))
+   string_list_insert(me->needs_pushing, me->submodule_path);
+
+   return 0;
+}
+
+static void free_submodules_sha1s(struct string_list *submodules)
+{
+   struct string_list_item *item;
+   for_each_string_list_item(item, submodules)
+   sha1_array_clear((struct sha1_array *) item->util);
+   string_list_clear(submodules, 1);
+}
+
 int find_unpushed_submodules(unsigned char new_sha1[20],
const char *remotes_name, struct string_list *needs_pushing)
 {
@@ -568,6 +608,8 @@ int find_unpushed_submodules(unsigned char new_sha1[20],
const char *argv[] = {NULL, NULL, "--not", "NULL", NULL};
int argc = ARRAY_SIZE(argv) - 1;
char *sha1_copy;
+   struct string_list submodules = STRING_LIST_INIT_DUP;
+   struct string_list_item *submodule;
 
struct strbuf remotes_arg = STRBUF_INIT;
 
@@ -581,12 +623,22 @@ int find_unpushed_submodules(unsigned char new_sha1[20],
die("revision walk setup failed");
 
while ((commit = get_revision(&rev)) != NULL)
-   find_unpushed_submodule_commits(commit, needs_pushing);
+   find_unpushed_submodule_commits(commit, &submodules);
 
reset_revision_walk();
free(sha1_copy);
strbuf_release(&remotes_arg);
 
+   for_each_string_list_item(submodule, &submodules) {
+   struct collect_submodule_from_sha1s_data data;
+   data.submodule_path = submodule->string;
+   data.needs_pushing = needs_pushing;
+   sha1_array_for_each_unique((struct sha1_array *) 
submodule->util,
+   collect_submodules_from_sha1s,
+   &data);
+   }
+   free_submodules_sha1s(&submodules);
+
return needs_pushing->nr;
 }
 
-- 
2.10.1.386.gc503e45



[PATCH v3 0/4] Speedup finding of unpushed submodules

2016-11-15 Thread Heiko Voigt
You can find the second iteration of this series here:

http://public-inbox.org/git/cover.1475851621.git.hvo...@hvoigt.net/

All mentioned issues should be fixed. I put the NEEDSWORK comment in a
seperate patch since it seemed to me as if we did not fully agree on
that. So in case we decide against it we can just drop that patch.

Cheers Heiko

Heiko Voigt (4):
  serialize collection of changed submodules
  serialize collection of refs that contain submodule changes
  batch check whether submodule needs pushing into one call
  submodule_needs_pushing() NEEDSWORK when we can not answer this
question

 submodule.c | 120 +++-
 submodule.h |   5 ++-
 transport.c |  29 +++
 3 files changed, 118 insertions(+), 36 deletions(-)

-- 
2.10.1.386.gc503e45



Re: [PATCH v1 2/2] travis-ci: disable GIT_TEST_HTTPD for macOS

2016-11-15 Thread Heiko Voigt
On Fri, Nov 11, 2016 at 09:22:51AM +0100, Lars Schneider wrote:
> To all macOS users on the list:
> Does anyone execute the tests with GIT_TEST_HTTPD enabled successfully?

Nope. The following tests fail for me on master: 5539, 5540, 5541, 5542,
5550, 5551, 5561, 5812.

Cheers Heiko


Re: Uninitialized submodules as symlinks

2016-10-17 Thread Heiko Voigt
On Fri, Oct 14, 2016 at 09:48:16AM -0700, Junio C Hamano wrote:
> Kevin Daudt  writes:
> 
> > On Thu, Oct 13, 2016 at 06:10:17PM +0200, Heiko Voigt wrote:
> >> On Fri, Oct 07, 2016 at 06:17:05PM +, David Turner wrote:
> >> > Presently, uninitialized submodules are materialized in the working
> >> > tree as empty directories.  We would like to consider having them be
> >> > symlinks.  Specifically, we'd like them to be symlinks into a FUSE
> >> > filesystem which retrieves files on demand.
> >> 
> >> How about portability? This feature would only work on Unix like
> >> operating systems. You have to be careful to not break Windows since
> >> they do not have symlinks.
> >
> > NTFS does have symlinks, but you need admin right to create them though
> > (unless you change the policy).
> 
> That sounds like saying "It has, but it practically is not usable by
> Git as a mechanism to achieve this goal" to me.

Yes and that is why Git for Windows does not use them and I simplified
to: "Windows does not have symlinks". For a normal user there is no such
thing as symlinks on Windows, unfortunately.

Cheers Heiko


Re: [PATCH v2 3/3] batch check whether submodule needs pushing into one call

2016-10-13 Thread Heiko Voigt
On Wed, Oct 12, 2016 at 10:37:33AM -0700, Junio C Hamano wrote:
> Heiko Voigt  writes:
> 
> >> If we do not even have these commits locally, then there is no point
> >> attempting to push, so returning 0 (i.e. it is not "needs pushing"
> >> situation) is correct but it is a but subtle.  It's not "we know
> >> they already have them", but it is "even if we tried to push, it
> >> won't do us or the other side any good."  A single-liner in-code
> >> comment may help.
> >
> > First the naming part. How about:
> >
> > submodule_has_commits()
> 
> Nice.

Ok will use that. And while I am at it: I will also rename all the
'hashes' variables to commits because that makes the code way clearer I
think.

> > Returning 0 here means: "No push needed" but the correct answer would
> > be: "We do not know". 
> 
> Is it?  Perhaps I am misreading the "submodule-has-commits"; I
> thought it was "the remote may or may not need updating, but we
> ourselves don't have what they may need to have commits in their
> submodule that are referenced by their superproject, so it would not
> help them even if we pushed our submodule to them".  It indeed is
> different from "No push needed" (rather, "our pushing would be
> pointless").

Yes you could also rephrase/see it that way. But the question is: If we
do not have what the remote needs would the user expect us to tell him
that fact and stop or does he usually not care?

> > So how about:
> >
> >
> > if (!submodule_has_hashes(path, hashes))
> > /* NEEDSWORK: The correct answer here is "We do not
> >  * know" instead of "No". We currently proceed pushing
> >  * here as if the submodules commits are available on a
> >  * remote, which is not always correct. */
> > return 0;
> 
> I am not sure.  
> 
> What should happen in this scenario?
> 
>  * We have two remotes, A and B, for our superproject.
> 
>  * We are not interested in one submodule at path X.  Our repository
>is primarily used to work on the superproject and possibly other
>submodules but not the one at path X.
> 
>  * We pulled from A to update ourselves.  They were actively working
>on the submodule we are not interested in, and path X in the
>superproject records a new commit that we do not have.
> 
>  * We are now trying to push to B.

I am not sure if this is a typical scenario? Well, if you are updating
your main branch from someone else and then push it to your own fork
maybe. You could specify --no-recurse-submodules for this case though.
The proper solution for this case would probably be something along the
lines of 'submodule..fetchRecurseSubmodules' but for push so we
can mark certain submodules as uninteresting by default.

I like to be more protective to the user here. Its usually more
annoying for possibly many others when you push out things that have
missing things compared to one person not being able to push because his
submodule is not up-to-date/initialized.

> Should different things happen in these two subcases?
> 
>  - We are not interested in submodule at path X, so we haven't even
>done "submodule init" on it.
> 
>  - We are not interested in submodule at path X, so even though we
>do have a rather stale clone of it, we do not usually bother
>updating what is checked out at path X and commit our changes
>outside that area.
> 
> I tend to think that in these two cases the same thing should
> happen.  I am not sure if that same thing should be rejection
> (i.e. "you do not know for sure that the commit at path X of the
> superproject you are pushing exists in the submodule repository at
> the receiving end, so I'd refuse to push the superproject"), as it
> makes the only remedy for the situation is for you to make a full
> clone of the submodule you are not interested in and you have never
> touched yourself in either of these two subcases.

I also think in both situations the same thing should happen. A decision
that something different should happen should be made explicitly instead
of implicitly just because some submodule is not initialized. That might
be by accident or because a certain submodule is new so here the choice
should be made deliberately by the user, IMO.

Cheers Heiko


Re: Uninitialized submodules as symlinks

2016-10-13 Thread Heiko Voigt
On Fri, Oct 07, 2016 at 06:17:05PM +, David Turner wrote:
> Presently, uninitialized submodules are materialized in the working
> tree as empty directories.  We would like to consider having them be
> symlinks.  Specifically, we'd like them to be symlinks into a FUSE
> filesystem which retrieves files on demand.

How about portability? This feature would only work on Unix like
operating systems. You have to be careful to not break Windows since
they do not have symlinks.

Cheers Heiko


Re: [PATCH v2 1/3] serialize collection of changed submodules

2016-10-13 Thread Heiko Voigt
On Wed, Oct 12, 2016 at 10:18:28AM -0700, Junio C Hamano wrote:
> Heiko Voigt  writes:
> 
> > Which seems quite extensively long for a static function so how about
> > we shorten it a bit and add a comment:
> >
> > /* lookup or create commit object list for submodule */
> > get_commit_objects_for_submodule_path(...
> 
> Or you can even lose "get_" and "path", I guess.  You are not even
> "getting" commits but the array that holds them, so the caller can
> use it to "get" one of them or it can even use it to "put" a new
> one, no?  "get-commit-objects" is a misnomer in that sense.  Either
> one of
> 
> get_submodule_commits_array()
> submodule_commits()
> 
> perhaps?  I dunno.

I like the last one. Will use 'submodule_commits()'.

Cheers Heiko


Re: [PATCH v2 3/3] batch check whether submodule needs pushing into one call

2016-10-12 Thread Heiko Voigt
On Mon, Oct 10, 2016 at 03:56:13PM -0700, Junio C Hamano wrote:
> Heiko Voigt  writes:
> 
> > -static int submodule_needs_pushing(const char *path, const unsigned char 
> > sha1[20])
> > +static int check_has_hash(const unsigned char sha1[20], void *data)
> >  {
> > -   if (add_submodule_odb(path) || !lookup_commit_reference(sha1))
> > +   int *has_hash = (int *) data;
> > +
> > +   if (!lookup_commit_reference(sha1))
> > +   *has_hash = 0;
> > +
> > +   return 0;
> > +}
> > +
> > +static int submodule_has_hashes(const char *path, struct sha1_array 
> > *hashes)
> > +{
> > +   int has_hash = 1;
> > +
> > +   if (add_submodule_odb(path))
> > +   return 0;
> > +
> > +   sha1_array_for_each_unique(hashes, check_has_hash, &has_hash);
> > +   return has_hash;
> > +}
> > +
> > +static int submodule_needs_pushing(const char *path, struct sha1_array 
> > *hashes)
> > +{
> > +   if (!submodule_has_hashes(path, hashes))
> > return 0;
> 
> Same comment about naming.  
> 
> What do check-has-hash and submodule-has-hashes exactly mean by
> "hash" in their names?  Because I think what is checked here is
> "does the local submodule repository have _all_ the commits
> referenced from the superproject commit we are pushing?", so I'd
> prefer to see "commit" in their names.
> 
> If we do not even have these commits locally, then there is no point
> attempting to push, so returning 0 (i.e. it is not "needs pushing"
> situation) is correct but it is a but subtle.  It's not "we know
> they already have them", but it is "even if we tried to push, it
> won't do us or the other side any good."  A single-liner in-code
> comment may help.

First the naming part. How about:

submodule_has_commits()

?

Second as mentioned a previous answer[1] to this part: I would actually
like to have a die() here instead of blindly proceeding. Since the user
either specified --recurse-submodules=... at the commandline or it was
implicitly enabled because we have submodules in the tree we should be
careful and not push revisions referencing submodules that are not
available at a remote. If we can not properly figure it out I would
suggest to stop and tell the user how to solve the situation. E.g.
either she clones the appropriate submodules or specifies
--no-recurse-submodules on the commandline to tell git that she does not
care.

Returning 0 here means: "No push needed" but the correct answer would
be: "We do not know". Question is what we should do here which I am
planning to address in a separate patch series since that will be
changing behavior.

So how about:


if (!submodule_has_hashes(path, hashes))
/* NEEDSWORK: The correct answer here is "We do not
 * know" instead of "No". We currently proceed pushing
 * here as if the submodules commits are available on a
 * remote, which is not always correct. */
return 0;

What do you think?

Cheers Heiko

[1] http://public-inbox.org/git/20160919195812.gc62...@book.hvoigt.net/


Re: [PATCH v2 1/3] serialize collection of changed submodules

2016-10-12 Thread Heiko Voigt
On Mon, Oct 10, 2016 at 03:43:13PM -0700, Junio C Hamano wrote:
> Stefan Beller  writes:
> 
> >> +static struct sha1_array *get_sha1s_from_list(struct string_list 
> >> *submodules,
> >> +   const char *path)
> >
> > So this will take the stringlist `submodules` and insert the path into it,
> > if it wasn't already in there. In case it is newly inserted, add a 
> > sha1_array
> > as util, so each inserted path has it's own empty array.
> >
> > So it is both init of the data structures as well as retrieving them. I was
> > initially confused by the name as I assumed it would give you sha1s out
> > of a string list (e.g. transform strings to internal sha1 things).
> > Maybe it's just
> > me having a hard time to understand that, but I feel like the name could be
> > improved.
> >
> > lookup_sha1_list_by_path,
> > insert_path_and_return_sha1_list ?
> 
> I do not think either the name or the "find if exists otherwise
> initialize one" behaviour is particularly confusing, but I do not
> think "maintain a set of sha1_arrays keyed with a string" is a so
> widely reusable general concept/construct.  As can be seen easily in
> the names of parameters, this function is about maintaining a set of
> sha1_arrays keyed by paths to submodules, and I also assume that the
> array indexed by path is not meant to be a general purpose "we can
> use it to store any 40-hex thing" but to store something specific.
> 
> What is that specific thing?  The names of commit objects in the
> submodule repository?
> 
> I'd prefer to see that exact thing used to construct the function
> name for a helper function with specific usage in mind, i.e.
> get_commit_object_names_for_submodule_path() or something along that
> line.

I did not name this function too precisely to keep it's name short since
everything specific was quite long, like the suggestion from Junio.

Since this is a static function local to the submodule file I was
assuming anyone interested would just look up the usage and immediately
see the purpose. If I look into submodule-cache.c where I have a similar
functionality we used 'lookup_or_create' for this create on demand
functionality. So a function name would be:

lookup_or_create_commit_objects_for_submodule_path(...

Which seems quite extensively long for a static function so how about
we shorten it a bit and add a comment:

/* lookup or create commit object list for submodule */
get_commit_objects_for_submodule_path(...

?

Cheers Heiko


Re: [PATCH v2 2/3] serialize collection of refs that contain submodule changes

2016-10-12 Thread Heiko Voigt
On Fri, Oct 07, 2016 at 11:16:31AM -0700, Stefan Beller wrote:
> > diff --git a/submodule.c b/submodule.c
> > index 59c9d15905..5044afc2f8 100644
> > --- a/submodule.c
> > +++ b/submodule.c
> > @@ -522,6 +522,13 @@ static int has_remote(const char *refname, const 
> > struct object_id *oid,
> > return 1;
> >  }
> >
> > +static int append_hash_to_argv(const unsigned char sha1[20], void *data)
> > +{
> > +   struct argv_array *argv = (struct argv_array *) data;
> > +   argv_array_push(argv, sha1_to_hex(sha1));
> 
> Nit of the day:
> When using the struct child-process, we have the oldstyle argv NULL
> terminated array as
> well as the new style args argv_array. So in that context we'd prefer
> `args` as a name for
> argv_array as that helps to distinguish from the old array type.
> Here however `argv` seems to be a reasonable name, in fact whenever we
> do not deal with
> child processes, we seem to not like the `args` name:
> 
> $ git grep argv_array |wc -l
> 577
> $ git grep argv_array |grep args |wc -l
> 293
> 
> The rest looks good to me. :)

Thanks. So I do not completely get what you are suggesting: args or kept
it the way it is? Since in the end you are saying it is ok here ;) I
mainly chose this name because I am substituting the argv variable which
is already called 'argv' with this array. That might also be the reason
why in so many locations with struct child_processe's we have the 'argv'
name: Because they initially started with the old-style NULL terminated
array.

I am fine with it either way. Just tell me what you like :)

Cheers Heiko


Re: [PATCH v2 1/3] serialize collection of changed submodules

2016-10-12 Thread Heiko Voigt
On Fri, Oct 07, 2016 at 10:59:29AM -0700, Stefan Beller wrote:
> On Fri, Oct 7, 2016 at 8:06 AM, Heiko Voigt  wrote:
> > +static void free_submodules_sha1s(struct string_list *submodules)
> > +{
> > +   int i;
> > +   for (i = 0; i < submodules->nr; i++) {
> > +   struct string_list_item *item = &submodules->items[i];
> 
> You do not seem to make use of `i` explicitely, so
> for_each_string_list_item might be more readable here?

Will change.

> > @@ -603,12 +645,23 @@ int find_unpushed_submodules(unsigned char 
> > new_sha1[20],
> > die("revision walk setup failed");
> >
> > while ((commit = get_revision(&rev)) != NULL)
> > -   find_unpushed_submodule_commits(commit, needs_pushing);
> > +   find_unpushed_submodule_commits(commit, &submodules);
> >
> > reset_revision_walk();
> > free(sha1_copy);
> > strbuf_release(&remotes_arg);
> >
> > +   for (i = 0; i < submodules.nr; i++) {
> > +   struct string_list_item *item = &submodules.items[i];
> 
> You do not seem to make use of `i` explicitely, so
> for_each_string_list_item might be more readable here?

As above.

Cheers Heiko


Re: [PATCH] clean up confusing suggestion for commit references

2016-10-11 Thread Heiko Voigt
On Mon, Oct 10, 2016 at 12:14:14PM -0700, Junio C Hamano wrote:
> Jeff King  writes:
> 
> > On Mon, Oct 10, 2016 at 11:24:01AM -0700, Junio C Hamano wrote:
> >
> >> I no longer have preference either way myself, even though I was in
> >> favor of no-quotes simply because I had an alias to produce that
> >> format and was used to it.
> >
> > I'll admit that I don't care _that_ much and am happy to leave it up to
> > individual authors, as long as nobody quotes SubmittingPatches at me as
> > some kind of gospel when I use the no-quotes form.
> 
> ;-).  
> 
> I just do not want to hear "gitk (or was it git-gui) produces quoted
> form, why are you recommending no-quoted form in SubmittingPatches?"
> 
> I'd say "use common sense; sometimes it is less confusing to read
> without quotes and it is perfectly OK to do so if that is the case".

I do not care about which format it should be either. I just wanted to
be clear about whatever should be used. Since it seems we will allow
both, I am also fine with leaving the description as it is ;-)

Cheers Heiko


Re: [PATCH 1/2] submodule add: extend force flag to add existing repos

2016-10-11 Thread Heiko Voigt
Hi,

On Fri, Oct 07, 2016 at 10:25:04AM -0700, Stefan Beller wrote:
> On Fri, Oct 7, 2016 at 5:52 AM, Heiko Voigt  wrote:
> > On Thu, Oct 06, 2016 at 01:11:20PM -0700, Junio C Hamano wrote:
> >> Stefan Beller  writes:
> >>
> >> > Currently the force flag in `git submodule add` takes care of possibly
> >> > ignored files or when a name collision occurs.
> >> >
> >> > However there is another situation where submodule add comes in handy:
> >> > When you already have a gitlink recorded, but no configuration was
> >> > done (i.e. no .gitmodules file nor any entry in .git/config) and you
> >> > want to generate these config entries. For this situation allow
> >> > `git submodule add` to proceed if there is already a submodule at the
> >> > given path in the index.
> >
> > Is it important that the submodule is in the index?
> 
> If it is not in the index, it already works.

Ah ok I was not aware of that, sorry.

> > How about worktree?
> > From the index entry alone we can not deduce the values anyway.
> 
> Right, but as of now this is the only show stopper, i.e.
> * you have an existing repo? -> fine, it works with --force
> * you even ignored that repo -> --force knows how to do it.
> * you already have a gitlink -> Sorry, you're out of luck.
> 
> So that is why I stressed index in this commit message, as it is only about 
> this
> case.

Forget what I wrote. As said above I was not aware that there is only an
error when it is already in the index.

> > [1] 
> > http://public-inbox.org/git/%3c20160916141143.ga47...@book.hvoigt.net%3E/
> 
> Current situation:
> 
> > clone the submodule into a directory
> > git submodule add --force
> > git commit everything
> 
> works fine, but:
> 
> > clone the submodule into a directory
> > git add 
> > git commit  -m "Add submodule"
> > # me: "Oh crap! I did forget the configuration."
> > git submodule add --force  
> > # Git: "It already exists in the index, I am not going to produce the 
> > config for you."
> 
> The last step is changed with this patch, as
> it will just work fine then.

Thanks.

Cheers Heiko


[PATCH v2 3/3] batch check whether submodule needs pushing into one call

2016-10-07 Thread Heiko Voigt
We run a command for each sha1 change in a submodule. This is
unnecessary since we can simply batch all sha1's we want to check into
one command. Lets do it so we can speedup the check when many submodule
changes are in need of checking.

Signed-off-by: Heiko Voigt 
---
 submodule.c | 63 +
 1 file changed, 34 insertions(+), 29 deletions(-)

diff --git a/submodule.c b/submodule.c
index 5044afc2f8..a05c2a34b1 100644
--- a/submodule.c
+++ b/submodule.c
@@ -529,27 +529,49 @@ static int append_hash_to_argv(const unsigned char 
sha1[20], void *data)
return 0;
 }
 
-static int submodule_needs_pushing(const char *path, const unsigned char 
sha1[20])
+static int check_has_hash(const unsigned char sha1[20], void *data)
 {
-   if (add_submodule_odb(path) || !lookup_commit_reference(sha1))
+   int *has_hash = (int *) data;
+
+   if (!lookup_commit_reference(sha1))
+   *has_hash = 0;
+
+   return 0;
+}
+
+static int submodule_has_hashes(const char *path, struct sha1_array *hashes)
+{
+   int has_hash = 1;
+
+   if (add_submodule_odb(path))
+   return 0;
+
+   sha1_array_for_each_unique(hashes, check_has_hash, &has_hash);
+   return has_hash;
+}
+
+static int submodule_needs_pushing(const char *path, struct sha1_array *hashes)
+{
+   if (!submodule_has_hashes(path, hashes))
return 0;
 
if (for_each_remote_ref_submodule(path, has_remote, NULL) > 0) {
struct child_process cp = CHILD_PROCESS_INIT;
-   const char *argv[] = {"rev-list", NULL, "--not", "--remotes", 
"-n", "1" , NULL};
struct strbuf buf = STRBUF_INIT;
int needs_pushing = 0;
 
-   argv[1] = sha1_to_hex(sha1);
-   cp.argv = argv;
+   argv_array_push(&cp.args, "rev-list");
+   sha1_array_for_each_unique(hashes, append_hash_to_argv, 
&cp.args);
+   argv_array_pushl(&cp.args, "--not", "--remotes", "-n", "1" , 
NULL);
+
prepare_submodule_repo_env(&cp.env_array);
cp.git_cmd = 1;
cp.no_stdin = 1;
cp.out = -1;
cp.dir = path;
if (start_command(&cp))
-   die("Could not run 'git rev-list %s --not --remotes -n 
1' command in submodule %s",
-   sha1_to_hex(sha1), path);
+   die("Could not run 'git rev-list  --not 
--remotes -n 1' command in submodule %s",
+   path);
if (strbuf_read(&buf, cp.out, 41))
needs_pushing = 1;
finish_command(&cp);
@@ -604,21 +626,6 @@ static void find_unpushed_submodule_commits(struct commit 
*commit,
diff_tree_combined_merge(commit, 1, &rev);
 }
 
-struct collect_submodule_from_sha1s_data {
-   char *submodule_path;
-   struct string_list *needs_pushing;
-};
-
-static void collect_submodules_from_sha1s(const unsigned char sha1[20],
-   void *data)
-{
-   struct collect_submodule_from_sha1s_data *me =
-   (struct collect_submodule_from_sha1s_data *) data;
-
-   if (submodule_needs_pushing(me->submodule_path, sha1))
-   string_list_insert(me->needs_pushing, me->submodule_path);
-}
-
 static void free_submodules_sha1s(struct string_list *submodules)
 {
int i;
@@ -658,13 +665,11 @@ int find_unpushed_submodules(struct sha1_array *hashes,
argv_array_clear(&argv);
 
for (i = 0; i < submodules.nr; i++) {
-   struct string_list_item *item = &submodules.items[i];
-   struct collect_submodule_from_sha1s_data data;
-   data.submodule_path = item->string;
-   data.needs_pushing = needs_pushing;
-   sha1_array_for_each_unique((struct sha1_array *) item->util,
-   collect_submodules_from_sha1s,
-   &data);
+   struct string_list_item *submodule = &submodules.items[i];
+   struct sha1_array *hashes = (struct sha1_array *) 
submodule->util;
+
+   if (submodule_needs_pushing(submodule->string, hashes))
+   string_list_insert(needs_pushing, submodule->string);
}
free_submodules_sha1s(&submodules);
 
-- 
2.10.1.637.g09b28c5



[PATCH v2 2/3] serialize collection of refs that contain submodule changes

2016-10-07 Thread Heiko Voigt
We are iterating over each pushed ref and want to check whether it
contains changes to submodules. Instead of immediately checking each ref
lets first collect them and then do the check for all of them in one
revision walk.

Signed-off-by: Heiko Voigt 
---
 submodule.c | 36 +---
 submodule.h |  5 +++--
 transport.c | 29 +
 3 files changed, 45 insertions(+), 25 deletions(-)

diff --git a/submodule.c b/submodule.c
index 59c9d15905..5044afc2f8 100644
--- a/submodule.c
+++ b/submodule.c
@@ -522,6 +522,13 @@ static int has_remote(const char *refname, const struct 
object_id *oid,
return 1;
 }
 
+static int append_hash_to_argv(const unsigned char sha1[20], void *data)
+{
+   struct argv_array *argv = (struct argv_array *) data;
+   argv_array_push(argv, sha1_to_hex(sha1));
+   return 0;
+}
+
 static int submodule_needs_pushing(const char *path, const unsigned char 
sha1[20])
 {
if (add_submodule_odb(path) || !lookup_commit_reference(sha1))
@@ -623,24 +630,24 @@ static void free_submodules_sha1s(struct string_list 
*submodules)
string_list_clear(submodules, 1);
 }
 
-int find_unpushed_submodules(unsigned char new_sha1[20],
+int find_unpushed_submodules(struct sha1_array *hashes,
const char *remotes_name, struct string_list *needs_pushing)
 {
struct rev_info rev;
struct commit *commit;
-   const char *argv[] = {NULL, NULL, "--not", "NULL", NULL};
-   int argc = ARRAY_SIZE(argv) - 1, i;
-   char *sha1_copy;
+   int i;
struct string_list submodules = STRING_LIST_INIT_DUP;
+   struct argv_array argv = ARGV_ARRAY_INIT;
 
-   struct strbuf remotes_arg = STRBUF_INIT;
-
-   strbuf_addf(&remotes_arg, "--remotes=%s", remotes_name);
init_revisions(&rev, NULL);
-   sha1_copy = xstrdup(sha1_to_hex(new_sha1));
-   argv[1] = sha1_copy;
-   argv[3] = remotes_arg.buf;
-   setup_revisions(argc, argv, &rev, NULL);
+
+   /* argv.argv[0] will be ignored by setup_revisions */
+   argv_array_push(&argv, "find_unpushed_submodules");
+   sha1_array_for_each_unique(hashes, append_hash_to_argv, &argv);
+   argv_array_push(&argv, "--not");
+   argv_array_pushf(&argv, "--remotes=%s", remotes_name);
+
+   setup_revisions(argv.argc, argv.argv, &rev, NULL);
if (prepare_revision_walk(&rev))
die("revision walk setup failed");
 
@@ -648,8 +655,7 @@ int find_unpushed_submodules(unsigned char new_sha1[20],
find_unpushed_submodule_commits(commit, &submodules);
 
reset_revision_walk();
-   free(sha1_copy);
-   strbuf_release(&remotes_arg);
+   argv_array_clear(&argv);
 
for (i = 0; i < submodules.nr; i++) {
struct string_list_item *item = &submodules.items[i];
@@ -687,12 +693,12 @@ static int push_submodule(const char *path)
return 1;
 }
 
-int push_unpushed_submodules(unsigned char new_sha1[20], const char 
*remotes_name)
+int push_unpushed_submodules(struct sha1_array *hashes, const char 
*remotes_name)
 {
int i, ret = 1;
struct string_list needs_pushing = STRING_LIST_INIT_DUP;
 
-   if (!find_unpushed_submodules(new_sha1, remotes_name, &needs_pushing))
+   if (!find_unpushed_submodules(hashes, remotes_name, &needs_pushing))
return 1;
 
for (i = 0; i < needs_pushing.nr; i++) {
diff --git a/submodule.h b/submodule.h
index d9e197a948..065b2f0a2a 100644
--- a/submodule.h
+++ b/submodule.h
@@ -3,6 +3,7 @@
 
 struct diff_options;
 struct argv_array;
+struct sha1_array;
 
 enum {
RECURSE_SUBMODULES_CHECK = -4,
@@ -62,9 +63,9 @@ int submodule_uses_gitfile(const char *path);
 int ok_to_remove_submodule(const char *path);
 int merge_submodule(unsigned char result[20], const char *path, const unsigned 
char base[20],
const unsigned char a[20], const unsigned char b[20], int 
search);
-int find_unpushed_submodules(unsigned char new_sha1[20], const char 
*remotes_name,
+int find_unpushed_submodules(struct sha1_array *hashes, const char 
*remotes_name,
struct string_list *needs_pushing);
-int push_unpushed_submodules(unsigned char new_sha1[20], const char 
*remotes_name);
+int push_unpushed_submodules(struct sha1_array *hashes, const char 
*remotes_name);
 void connect_work_tree_and_git_dir(const char *work_tree, const char *git_dir);
 int parallel_submodules(void);
 
diff --git a/transport.c b/transport.c
index 94d6dc3725..05f2ce83f1 100644
--- a/transport.c
+++ b/transport.c
@@ -903,23 +903,36 @@ int transport_push(struct transport *transport,
 
if ((flags & TRANSPORT_RECURSE_SUBMODULES_ON_DEMAND) && 
!is_bare_repository()) {
struct ref *ref = remote_refs;
+   stru

[PATCH v2 1/3] serialize collection of changed submodules

2016-10-07 Thread Heiko Voigt
To check whether a submodule needs to be pushed we need to collect all
changed submodules. Lets collect them first and then execute the
possibly expensive test whether certain revisions are already pushed
only once per submodule.

There is further potential for optimization since we can assemble one
command and only issued that instead of one call for each remote ref in
the submodule.

Signed-off-by: Heiko Voigt 
---
 submodule.c | 63 -
 1 file changed, 58 insertions(+), 5 deletions(-)

diff --git a/submodule.c b/submodule.c
index 2de06a3351..59c9d15905 100644
--- a/submodule.c
+++ b/submodule.c
@@ -554,19 +554,34 @@ static int submodule_needs_pushing(const char *path, 
const unsigned char sha1[20
return 0;
 }
 
+static struct sha1_array *get_sha1s_from_list(struct string_list *submodules,
+   const char *path)
+{
+   struct string_list_item *item;
+
+   item = string_list_insert(submodules, path);
+   if (item->util)
+   return (struct sha1_array *) item->util;
+
+   /* NEEDSWORK: should we have sha1_array_init()? */
+   item->util = xcalloc(1, sizeof(struct sha1_array));
+   return (struct sha1_array *) item->util;
+}
+
 static void collect_submodules_from_diff(struct diff_queue_struct *q,
 struct diff_options *options,
 void *data)
 {
int i;
-   struct string_list *needs_pushing = data;
+   struct string_list *submodules = data;
 
for (i = 0; i < q->nr; i++) {
struct diff_filepair *p = q->queue[i];
+   struct sha1_array *hashes;
if (!S_ISGITLINK(p->two->mode))
continue;
-   if (submodule_needs_pushing(p->two->path, p->two->oid.hash))
-   string_list_insert(needs_pushing, p->two->path);
+   hashes = get_sha1s_from_list(submodules, p->two->path);
+   sha1_array_append(hashes, p->two->oid.hash);
}
 }
 
@@ -582,14 +597,41 @@ static void find_unpushed_submodule_commits(struct commit 
*commit,
diff_tree_combined_merge(commit, 1, &rev);
 }
 
+struct collect_submodule_from_sha1s_data {
+   char *submodule_path;
+   struct string_list *needs_pushing;
+};
+
+static void collect_submodules_from_sha1s(const unsigned char sha1[20],
+   void *data)
+{
+   struct collect_submodule_from_sha1s_data *me =
+   (struct collect_submodule_from_sha1s_data *) data;
+
+   if (submodule_needs_pushing(me->submodule_path, sha1))
+   string_list_insert(me->needs_pushing, me->submodule_path);
+}
+
+static void free_submodules_sha1s(struct string_list *submodules)
+{
+   int i;
+   for (i = 0; i < submodules->nr; i++) {
+   struct string_list_item *item = &submodules->items[i];
+   struct sha1_array *hashes = (struct sha1_array *) item->util;
+   sha1_array_clear(hashes);
+   }
+   string_list_clear(submodules, 1);
+}
+
 int find_unpushed_submodules(unsigned char new_sha1[20],
const char *remotes_name, struct string_list *needs_pushing)
 {
struct rev_info rev;
struct commit *commit;
const char *argv[] = {NULL, NULL, "--not", "NULL", NULL};
-   int argc = ARRAY_SIZE(argv) - 1;
+   int argc = ARRAY_SIZE(argv) - 1, i;
char *sha1_copy;
+   struct string_list submodules = STRING_LIST_INIT_DUP;
 
struct strbuf remotes_arg = STRBUF_INIT;
 
@@ -603,12 +645,23 @@ int find_unpushed_submodules(unsigned char new_sha1[20],
die("revision walk setup failed");
 
while ((commit = get_revision(&rev)) != NULL)
-   find_unpushed_submodule_commits(commit, needs_pushing);
+   find_unpushed_submodule_commits(commit, &submodules);
 
reset_revision_walk();
free(sha1_copy);
strbuf_release(&remotes_arg);
 
+   for (i = 0; i < submodules.nr; i++) {
+   struct string_list_item *item = &submodules.items[i];
+   struct collect_submodule_from_sha1s_data data;
+   data.submodule_path = item->string;
+   data.needs_pushing = needs_pushing;
+   sha1_array_for_each_unique((struct sha1_array *) item->util,
+   collect_submodules_from_sha1s,
+   &data);
+   }
+   free_submodules_sha1s(&submodules);
+
return needs_pushing->nr;
 }
 
-- 
2.10.1.637.g09b28c5



[PATCH v2 0/3] Speedup finding of unpushed submodules

2016-10-07 Thread Heiko Voigt
You can find the first iteration of this series as part of this thread:

http://public-inbox.org/git/%3C20160914173124.GA7613@sandbox%3E/

All mentioned issues should be fixed. I dropped the last patch which was
the cause of the broken tests.

This should optimize every part of this test to a nice speed if you are
pushing to a remote. The only case that is still broken/slow as hell is
when calling push with a direct url.

I am thinking whether we should maybe error out with a "not implemented"
message or something and mention that --recurse-submoules does not work
with direct urls? But we might want to have another look at performance
with this patch included. Maybe it is actually useable with the last
patch included which was not yet on pu.

Cheers Heiko

Heiko Voigt (3):
  serialize collection of changed submodules
  serialize collection of refs that contain submodule changes
  batch check whether submodule needs pushing into one call

 submodule.c | 116 ++--
 submodule.h |   5 +--
 transport.c |  29 ++-
 3 files changed, 114 insertions(+), 36 deletions(-)

-- 
2.10.1.637.g09b28c5



Re: [PATCH 1/2] submodule add: extend force flag to add existing repos

2016-10-07 Thread Heiko Voigt
On Thu, Oct 06, 2016 at 01:11:20PM -0700, Junio C Hamano wrote:
> Stefan Beller  writes:
> 
> > Currently the force flag in `git submodule add` takes care of possibly
> > ignored files or when a name collision occurs.
> >
> > However there is another situation where submodule add comes in handy:
> > When you already have a gitlink recorded, but no configuration was
> > done (i.e. no .gitmodules file nor any entry in .git/config) and you
> > want to generate these config entries. For this situation allow
> > `git submodule add` to proceed if there is already a submodule at the
> > given path in the index.

Is it important that the submodule is in the index? How about worktree?
>From the index entry alone we can not deduce the values anyway. So I
would say the submodule has to be in the worktree, no matter what is in
the index. If its not in the index we can also add it.

BTW, that is the way I imagined submodules would work in the first
place: just clone and add them, like I described here[1].

> > Signed-off-by: Stefan Beller 
> > ---
> 
> Yup, the goal makes perfect sense.  
> 
> I vaguely recall discussing this exact issue of "git submodule add"
> that refuses to add a path that already is a gitlink (via "git add"
> that has previously been run) elsewhere on this list some time ago.

Yes there was a discussion, see the link.

Cheers Heiko

[1] http://public-inbox.org/git/%3c20160916141143.ga47...@book.hvoigt.net%3E/


Re: [PATCHv3 1/2] push: change submodule default to check when submodules exist

2016-10-07 Thread Heiko Voigt
On Thu, Oct 06, 2016 at 10:20:16AM -0700, Stefan Beller wrote:
> On Thu, Oct 6, 2016 at 2:23 AM, Heiko Voigt  wrote:
> > On Wed, Oct 05, 2016 at 03:53:25PM +0200, Heiko Voigt wrote:
> >> On Tue, Oct 04, 2016 at 02:03:58PM -0700, Stefan Beller wrote:
> >> > Jeff,
> >> > thanks for the suggestions, both git_path(..) as well as checking the 
> >> > config,
> >> > this seems quite readable to me:
> >>
> >> When reading the discussion I thought the same: What about the
> >> "old-style" repositories. I like this one. Checking both locations
> >> is nice.
> >
> > BTW, since it seems we all agree on the direction. Should we add some
> > tests?
> >
> 
> Good call. What do we want to test for?
> * Correctness in case of submodules? (just push and get rejected)
>   I think that is easy to do.
> * Performance with [no, lots of] submodules? That seems harder to me.
> 
> I'll add a test for the correctness part and resend.

Well I though about the following tests:

 * Without submodules: Make sure submodule processing is disabled by
   default
 * With new-style submodules: Make sure submodules are processed by
   default
 * With old-style submodules: Make sure submodules are processed by
   default

But I have to admit that I did not think about the "how can we do that".
But performance seems to be the only thing that is exposing the
processing when we have no submodules, so it seems we can only easily do
the tests with submodules.

Cheers Heiko


[PATCH] clean up confusing suggestion for commit references

2016-10-07 Thread Heiko Voigt
The description for referencing commits looks as if it is contradicting
the example, since it is itself enclosed in double quotes. Lets use
single quotes around the description and include the double quotes in
the description so it matches the example.
---
Sorry for opening this up again but I just looked up the format and was
like: "Umm, which one is now the correct one..."

For this makes more sense. What do others think?

Cheers Heiko

 Documentation/SubmittingPatches | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index 08352de..692f4ce 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -122,7 +122,7 @@ without external resources. Instead of giving a URL to a 
mailing list
 archive, summarize the relevant points of the discussion.
 
 If you want to reference a previous commit in the history of a stable
-branch, use the format "abbreviated sha1 (subject, date)",
+branch, use the format 'abbreviated sha1 ("subject", date)',
 with the subject enclosed in a pair of double-quotes, like this:
 
 Commit f86a374 ("pack-bitmap.c: fix a memleak", 2015-03-30)
-- 
2.10.0.645.g54f1e86



Re: [PATCHv3 1/2] push: change submodule default to check when submodules exist

2016-10-06 Thread Heiko Voigt
On Wed, Oct 05, 2016 at 03:53:25PM +0200, Heiko Voigt wrote:
> On Tue, Oct 04, 2016 at 02:03:58PM -0700, Stefan Beller wrote:
> > Jeff,
> > thanks for the suggestions, both git_path(..) as well as checking the 
> > config,
> > this seems quite readable to me:
> 
> When reading the discussion I thought the same: What about the
> "old-style" repositories. I like this one. Checking both locations
> is nice.

BTW, since it seems we all agree on the direction. Should we add some
tests?

Cheers Heiko


Re: Bug Report: "git submodule deinit" fails right after a clone

2016-10-06 Thread Heiko Voigt
Hi,

please also keep the mailinglist in the CC so everyone can read this.

On Thu, Oct 06, 2016 at 09:11:05AM +0200, Thomas Bétous wrote:
> On Wed, Oct 5, 2016 at 3:36 PM, Heiko Voigt  wrote:
> 
> >
> > My initial reaction is that this might be a problem with line endings. Did
> > you
> > check whether you get any diff when you do a 'git diff' after the clone?
> >
> > Maybe the variable 'core.autocrlf' is set to 'input' ? Have a look at 'git
> > help
> > config'
> 
> 
> When I do a 'git diff' right after the clone, nothing appears.
> Moreover my global setting for core.autocrlf is true. (This was configured
> on purpose as I work on Windows whereas the repositories are hosted on an
> UNIX server.)

So I guess the same applies to 'git status'?

> Nevertheless when I change core.autocrlf to 'input', the error disappears
> and I got the expected behavior for git submodule deinit...
> So I guess it is just a configuration problem but I do not understand why
> core.autocrlf should be set to 'input' to remove this error. Do you have
> any idea?

This is indeed strange. That's why I asked whether 'git diff' shows
anything. I was suspecting that the .gitmodules is somehow checked out
in UNIX format. And the fact that setting core.autocrlf to
'input' seems to fix it supports that.

I currently do not have access to a windows machine as the moment to
test this. Copying the windows mailing list maybe someone over there
can reproduce and help with the issue[1].

Cheers Heiko

[1] 
http://public-inbox.org/git/%3ccapoqyv+xsrlk7y1hjyhzfy8ofkxvrwpczbdqhdgrhthqdzy...@mail.gmail.com%3E/


Re: Reference a submodule branch instead of a commit

2016-10-05 Thread Heiko Voigt
On Wed, Oct 05, 2016 at 09:13:53AM -0700, Junio C Hamano wrote:
> Heiko Voigt  writes:
> 
> >> It IS a hack, but having this information in .git would
> >> mean that it can be forced to be in machine readable form, unlike a
> >> mention in README.  I do not know if the .gitmodules/.gitignore
> >> combination is a sensible thing to use, but it does smell like a
> >> potentially useful hack.
> >
> > IIRC the tree entries are the reference for submodules in the code. We
> > are iterating over the tree entries in many places so that change does
> > not seem so easy to me.
> >
> > But you are right maybe we should stop arguing against this workflow and
> > just let people use it until they find out whats wrong with it ;)
> 
> I didn't say that, though.  I am fairly firm on _not_ changing what
> the superproject records in its tree for the submodule, i.e. it must
> record the exact commit, not "a branch name", for reproducibility. 

I was not talking about changing what the superproject records in its
tree. I was just talking about changing where we look for submodules
(e.g. for updating and such). I.e. in .git* instead of just the tree as
it is at the moment. Thats what I understood from the discussion above.
Sorry that might have been ambiguous.

I agree that there should always be a commit as a reference for a
submodule. But as far as I understand for some projects its to much
overhead to record every change of a submodule but still they want to
use the latest code during development. Those projects might only want
to record the actual commit when they release something. At least thats
what I imagine.

> I am OK if people ignored the unmatch between the recorded commit
> from a submodule and what they had in the submodule directory while
> they developed and tested the superproject commit.  After all, it is
> not an error to make a commit while having a local uncommitted
> changes to tracked files, and it is equally valid to have a commit
> checked out in a submodule directory that is different from what
> goes in the superproject commit.  But we do show "modified but not
> committed" in the status output.  In that light, submodule.*.ignore
> may have been a mistake.

The original intend for submodule.*.ignore was to help people not
showing submodules as dirty when they had untracked files in them. That
was after status learned to look into submodules. 'untracked' to avoid the
performance overhead and 'dirty' for the people that accidentally worked
with dirty submodules. I agree 'all' might have been to much.

For the above workflow what user might actually want is something that
ignores all changes as long as they are part of the remote branch. But I
am just guessing here. My gut feeling is still that most people that
request this feature come from svn. Thats why I asked whether the
options I described provide the behavior that Jeremy wants.

Cheers Heiko


Re: Reference a submodule branch instead of a commit

2016-10-05 Thread Heiko Voigt
On Tue, Oct 04, 2016 at 12:01:13PM -0700, Junio C Hamano wrote:
> Junio C Hamano  writes:
> 
> > Stefan Beller  writes:
> >
> >> I wonder if we could make that convenient for users by not tracking
> >> the submodule,
> >> i.e.
> >> * we have the information in the .gitmodules file
> >> * the path itself is in the .gitignore
> >> * no tree entry
> >>
> >> Then you can update to the remote latest branch, without Git reporting
> >> a dirty submodule locally, in fact it reports nothing for the submodule.
> >>
> >> It sounds like a hack, but maybe it's worth looking into that when
> >> people want to see that workflow.
> >
> > It IS a hack.  
> >
> > But if you do not touch .git file and instead say "clone
> > this other project at that path yourself" in README, that would
> > probably be sufficient.
> 
> eh,... hit send too early.
> 
> It IS a hack, but having this information in .git would
> mean that it can be forced to be in machine readable form, unlike a
> mention in README.  I do not know if the .gitmodules/.gitignore
> combination is a sensible thing to use, but it does smell like a
> potentially useful hack.

IIRC the tree entries are the reference for submodules in the code. We
are iterating over the tree entries in many places so that change does
not seem so easy to me.

But you are right maybe we should stop arguing against this workflow and
just let people use it until they find out whats wrong with it ;)

I have another tip for Jeremy:

git config submodule..ignore all

and you will not see any changes to the submodule. Put that into your
.gitmodules and you do not see any changes to the submodules anymore.

So now the only thing missing for complete convenience is a config
option for the --remote option in 'git submodule update'.

Jeremy, does the ignore option combined with --remote what you want?

Cheers Heiko


Re: Bug Report: "git submodule deinit" fails right after a clone

2016-10-05 Thread Heiko Voigt
Hi,

please do not top-post the conversation will otherwise get hard to
follow. Thank you.

On Tue, Oct 04, 2016 at 05:46:45PM +0200, Thomas Bétous wrote:
> Thank you for your answer and sorry for the delay (I was on vacation...).
> 
> I am using git 2.9.0.windows.1 (run on Windows 7 via git bash).

My initial reaction is that this might be a problem with line endings. Did you
check whether you get any diff when you do a 'git diff' after the clone?

Maybe the variable 'core.autocrlf' is set to 'input' ? Have a look at 'git help
config'

> I tested it on this repo:
> https://github.com/githubtraining/example-dependency.git
> The same problem occurs.
> Here a small script to reproduce the error on my PC:
> #!/bin/bash
> git clone https://github.com/githubtraining/example-dependency.git
> cd example-dependency
> git submodule deinit js
> 
> It ends with this error:
> fatal: Please stage your changes to .gitmodules or stash them to proceed
> Submodule work tree 'js' contains local modifications; use '-f' to discard 
> them

Here I get

$ git submodule deinit js
Cleared directory 'js'

So all seems fine.

> Is the script working on your PC?

Yes. I am on Mac OS X though.

Cheers Heiko


Re: [PATCHv3 1/2] push: change submodule default to check when submodules exist

2016-10-05 Thread Heiko Voigt
On Tue, Oct 04, 2016 at 02:03:58PM -0700, Stefan Beller wrote:
> Jeff,
> thanks for the suggestions, both git_path(..) as well as checking the config,
> this seems quite readable to me:

When reading the discussion I thought the same: What about the
"old-style" repositories. I like this one. Checking both locations
is nice.

Cheers Heiko


Re: Re: Slow pushes on 'pu' - even when up-to-date..

2016-10-04 Thread Heiko Voigt
On Tue, Oct 04, 2016 at 07:44:28AM -0400, Jeff King wrote:
> > My idea of a solution goes like this:
> >   * collect all SHA1's of the remotes refs
> >   * check if we have them locally
> >   * if not we abort and tell the user to fetch them somehow into local
> > refs or disable the check
> >   * when we have them locally we proceed passing those SHA1's as bases
> > instead of --remotes=
> 
> As I argued in [1], I think it's not just "this must be cheaper" but
> "this must not be enabled if submodules are not in use at all".  Most
> repositories don't have submodules enabled at all, so anything that
> cause any extra traversal, even of a portion of the history, is going to
> be a net negative for a lot of people.
> 
> I think the only sane default is going to be some kind of heuristic that
> says "submodules are probably in use". Something like "is there a
> .gitmodules file" is not perfect (you can have gitlink entries without
> it), but it's a really cheap constant-time check.

I agree. We are adding convenience for submodules, so we can also say a
checked out ".gitmodules" file is a must to have convenience.

I am not sure if I agree on another layer of options for this as
suggested in your post. More options mean more implementation
complexity and more confusion on the users side.

How about we choose our defaults based on the existence of a checked out
.gitmodules file? So the default would only be --recurse-submodules=check
if there is a .gitmodules file in the worktree. All other users need to
either pass or explicitly configure it.

Cheers Heiko

> [1] Quoted in
> http://public-inbox.org/git/xmqqh9aaot49@gitster.mtv.corp.google.com/


Re: [RFC PATCH] clone: add clone.recursesubmodules config option

2016-10-04 Thread Heiko Voigt
On Mon, Oct 03, 2016 at 10:18:32AM -0700, Stefan Beller wrote:
> On Mon, Oct 3, 2016 at 8:36 AM, Jeremy Morton  wrote:
> > Did this ever get anywhere?  Can we recursively update submodules with "git
> > pull" in the supermodule now?
> 
> I think the idea is sound.

I am confused there is nothing handling *pull* here? This patch was
about clone. Handling 'pull' is a much bigger topic[1].

Cheers Heiko

[1] 
https://github.com/jlehmann/git-submod-enhancements/wiki/Recursive-submodule-checkout


Re: Reference a submodule branch instead of a commit

2016-10-04 Thread Heiko Voigt
On Mon, Oct 03, 2016 at 12:00:45PM -0700, Junio C Hamano wrote:
> Jeremy Morton  writes:
> 
> > At the moment, supermodules must reference a given commit in each of
> > its submodules.  If one is in control of a submodule and it changes on
> > a regular basis, this can cause a lot of overhead with "submodule
> > updated" commits in the supermodule.  It would be useful of git allows
> > the option of referencing a submodule's branch instead of a given
> > submodule commit.  How about adding this functionality?
> 
> When somebody downstream fetches from your superproject and grabs
> the set of submodules, how would s/he know what _exact_ state you
> meant to record?  When s/he says "I have your superproject commit X,
> which binds submodule's branch Y at path sub/, and it simply does
> not work.  Your project is broken", how do you go about reproducing
> the exact state s/he had trouble with to help her/him?
> 
> The only thing s/he knows is that the commit used from the submodule
> must be one of the commits that was on branch Y at some point in
> time, hopefully close to the timestamp recorded in the commit in the
> superproject.  And your record in the history of the superproject
> does not tell you more than that, so you wouldn't have any idea
> better than what s/he already has to help.
> 
> Hence, such a "functionality" will never happen, at least in the
> exact form you are describing.
> 
> It is conceivable to add some feature that allows you to squelch the
> report that the submodule recorded in your superproject is not up to
> date from "git status" etc. to help those who thinks it is OK to not
> bind the latest submodule commit to the superproject all the time,
> though.

We already have options to support these kinds of workflows. Look at the
option '--remote' for 'git submodule update'.

You then only have to commit the submodule if you do not want to see it
as dirty locally, but you will always get the tip of a remote tracking
branch when updating.

Cheers Heiko


Re: Slow pushes on 'pu' - even when up-to-date..

2016-10-04 Thread Heiko Voigt
Hi,

On Mon, Oct 03, 2016 at 02:11:36PM -0700, Linus Torvalds wrote:
> This seems to be because I'm now on 'pu' as of a day or two ago in
> order to test the abbrev logic, but lookie here:
> 
> time git ls-remote ra.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux
> .. shows all the branches and tags ..
> real 0m0.655s
> user 0m0.011s
> sys 0m0.004s
> 
> so the remote is fast to connect to, and with network connection
> overhead and everything, it's just over half a second. But then:
> 
> time git push ra.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux

The reason behind this is when pushing to an address we do not easily
have the remote refs to compare available. When pushing an existing ref
it would be easy and could get a shortcut but it gets more complicated
for new refs. Currently we fall back to walking the whole history since
that is "the most correct way" we have. But obviously it is not a
practical solution in any way.

I mentioned this fact when discussing the current state and my patches
to make this check less painful. So we still need to think about a
solution for this check when passing an address.

IMO: It's definitely not ready to be switched on as default, unless we
find something a lot cheaper for the above case.

My idea of a solution goes like this:
  * collect all SHA1's of the remotes refs
  * check if we have them locally
  * if not we abort and tell the user to fetch them somehow into local
refs or disable the check
  * when we have them locally we proceed passing those SHA1's as bases
instead of --remotes=

Cheers Heiko


  1   2   3   4   >