Re: [PATCH 10/15] fetch --tags: fetch tags *in addition to* other stuff

2013-10-29 Thread Michael Haggerty
On 10/28/2013 08:10 PM, Junio C Hamano wrote:
 Michael Haggerty mhag...@alum.mit.edu writes:
 
 True but when fetching other references, tags relevant to the
 history being fetched by default should automatically follow, so the
 above explains why fetch --tags is not a useful thing to do daily.

 Maybe not necessary in many scenarios, but is it harmful for the common
 case of cloning from and then periodically fetching from an upstream?
 
 There is no mention of harmful; I only said not useful. And it
 comes primarily from knowing why --tags was introduced in the
 first place; I should have said less useful than before, ever since
 we started to reliably auto-follow.
 
 The only harmful part is its interaction with --prune, which
 your series nicely addresses.

OK, then we are in agreement.

 ISTM that the result of the declarative --tags option

 we have all upstream tags

 is easier to reason about than the history-dependent tag-following behavior
 
 I'd say we have all the relevant tags and we have all the tags
 the other side has are equally valid things to wish for, depending
 who you are and how your project is organized, and one is not
 necessarily easy/useful than the other.  In case it was unclear, I
 was not seriously advocating to deprecate/remove --tags.

Yes, I agree that both are valid things to wish for.  I guess I was
mostly thinking about which would be a better default.

clone and remote add, by default, configure the repo to fetch all
branches on the remote.  For this default setup, what are the pros and
cons of the two tag-fetching options (assuming that this patch series
has fixed the problem with tag pruning)?

Pros of auto-following:

* Doesn't require a change to the status quo

* The the user can change the refspec to be more restrictive without
having to change the tag-following option and it continues to do the
right thing.

* If the project has branches outside of refs/heads (which would not, by
default, be fetched--think continuous integration artifacts) and also
has tags pointing at those branches, the user might get unwanted
contents with --tags, but not with auto-following.


Pros of --tags:

* Easier to understand (?) because result is not history-dependent.

* Avoids missing tags that are not directly on a branch:

  o---o---o---o- branch
   \
o  - tag

So it's not obvious that one is better than the other.  I think if I
were choosing the behavior for the first time, I would favor --tags.
But I don't think the advantage is strong enough to make it worth
changing the existing default.

 Regarding your first point: if it matters which of the duplicates is
 chosen, then it can only matter because they differ in some other way
 than their reference names, for example in their flags.  So a robust way
 of de-duping them (if it is possible) would be to compare the two
 records and decide which one should take precedence based on this other
 information rather than based on which one happened to come earlier in
 the list.  Then the list order would be immaterial and (for example) it
 wouldn't be a problem to reorder the items.
 
 But that changes the behaviour to those who has cared to rely on the
 ordering.  With the change to first collect and then sort unstably
 before deduping, the series already tells them not to rely on the
 order, and two of us tentatively agreed in this discussion that it
 is not likely to matter.  If later this agreement turns out to be a
 mistake and there are people who *do* rely on the ordering, the only
 acceptable fix for them is by making sure we restore the first one
 trumps semantics, not by defining an alternative, arguably better,
 behaviour---that is not a regression fix.

Please note that the current patch series does *not* change the
algorithm to use an unstable sort; that was only a suggestion for the
future should somebody discover that the O(N^2) algorithm in this
function is a performance bottleneck.

What the old patch series *did* do was change the ordering that
get_ref_map() adds references to the list in the case of (refspec_count
 tags == TAGS_SET) by moving the (tags == TAGS_SET) handling outside
of the true branch of the (refspec_count) conditional.  This had the
result that the references added by

   get_fetch_map(remote_refs, tag_refspec, tail, 0);

came after, rather than before, the references opportunistically being
fetched with FETCH_HEAD_IGNORE status.

But I dug even deeper and found that there was a (rather obscure)
situation where the ordering change could lead to incorrect behavior,
namely if all of the following are true:

* there is a configured refspec for tags, like refs/tags/*:refs/tags/*

* the user runs fetch with the --tags option and also with an explicit
refspec on the command line to fetch a remote tag (e.g.,
refs/tags/foo:refs/heads/foo).

In that case (after this version of this patch), an entry for
refs/tags/foo:refs/heads/foo would be added to the list, then the

Re: [PATCH 10/15] fetch --tags: fetch tags *in addition to* other stuff

2013-10-28 Thread Junio C Hamano
Michael Haggerty mhag...@alum.mit.edu writes:

 True but when fetching other references, tags relevant to the
 history being fetched by default should automatically follow, so the
 above explains why fetch --tags is not a useful thing to do daily.

 Maybe not necessary in many scenarios, but is it harmful for the common
 case of cloning from and then periodically fetching from an upstream?

There is no mention of harmful; I only said not useful. And it
comes primarily from knowing why --tags was introduced in the
first place; I should have said less useful than before, ever since
we started to reliably auto-follow.

The only harmful part is its interaction with --prune, which
your series nicely addresses.

 ISTM that the result of the declarative --tags option

 we have all upstream tags

 is easier to reason about than the history-dependent tag-following behavior

I'd say we have all the relevant tags and we have all the tags
the other side has are equally valid things to wish for, depending
who you are and how your project is organized, and one is not
necessarily easy/useful than the other.  In case it was unclear, I
was not seriously advocating to deprecate/remove --tags.

 Regarding your first point: if it matters which of the duplicates is
 chosen, then it can only matter because they differ in some other way
 than their reference names, for example in their flags.  So a robust way
 of de-duping them (if it is possible) would be to compare the two
 records and decide which one should take precedence based on this other
 information rather than based on which one happened to come earlier in
 the list.  Then the list order would be immaterial and (for example) it
 wouldn't be a problem to reorder the items.

But that changes the behaviour to those who has cared to rely on the
ordering.  With the change to first collect and then sort unstably
before deduping, the series already tells them not to rely on the
order, and two of us tentatively agreed in this discussion that it
is not likely to matter.  If later this agreement turns out to be a
mistake and there are people who *do* rely on the ordering, the only
acceptable fix for them is by making sure we restore the first one
trumps semantics, not by defining an alternative, arguably better,
behaviour---that is not a regression fix.

In any case, thanks for working on this topic.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/15] fetch --tags: fetch tags *in addition to* other stuff

2013-10-25 Thread Michael Haggerty
On 10/24/2013 10:51 PM, Junio C Hamano wrote:
 Michael Haggerty mhag...@alum.mit.edu writes:
 
 Previously, fetch's --tags option was considered equivalent to
 specifying the refspec refs/tags/*:refs/tags/* on the command line;
 in particular, it caused the remote.name.refspec configuration to be
 ignored.

 But it is not very useful to fetch tags without also fetching other
 references, whereas it *is* quite useful to be able to fetch tags *in
 addition to* other references.
 
 True but when fetching other references, tags relevant to the
 history being fetched by default should automatically follow, so the
 above explains why fetch --tags is not a useful thing to do daily.

Maybe not necessary in many scenarios, but is it harmful for the common
case of cloning from and then periodically fetching from an upstream?
ISTM that the result of the declarative --tags option

we have all upstream tags

is easier to reason about than the history-dependent tag-following behavior

we have all upstream tags that point to commits that were
reachable from an upstream branch at the time of this or one
of the previous fetches

I just noticed that this is not explained very clearly in the fetch man
page.  I will try to improve it.

 If a user wants to fetch *only* tags, then it is still possible to
 specifying an explicit refspec:

 git fetch remote 'refs/tags/*:refs/tags/*'

 Please note that the documentation prior to 1.8.0.3 was ambiguous
 about this aspect of fetch --tags behavior.  Commit

 f0cb2f137c 2012-12-14 fetch --tags: clarify documentation

 made the documentation match the old behavior.  This commit changes
 the documentation to match the new behavior.
 
 The old behaviour as designed.  The documentation change was not
 about refusing to fix a bug but the above makes it sound as if it
 were a such commit.

I didn't mean to imply this.  My point in mentioning the documentation
change was that even though the old behavior has been in effect for a
while, it was not clearly documented until quite recently.  But I guess
it is a moot point.

 The change to builtin/fetch.c:get_ref_map() has the side effect of
 changing the order that the (struct ref) objects are listed in
 ref_map.  It seems to me that this could probably only matter in the
 case of duplicates.  But later ref_remove_duplicates() is called, so
 duplicates are eliminated.  However, ref_remove_duplicates() always
 retains the first instance of duplicates and discards the rest.  It is
 conceivable that the boolean flags (which are not inspected by
 ref_remove_duplicates()) could differ among the duplicates, and that
 therefore changing the order of the items in the original list has the
 effect of changing the flags on the items that are retained.

 I don't know this code well enough to judge whether this might be a
 problem.
 
 If it is, then the correct approach is probably either to teach
 ref_remove_duplicates() to ensure that the flags are also consistent
 across duplicates, or to somehow combine the flags from all duplicates
 into the one that is retained.  Please let me know if this is needed.
 
 I do not think either of these two is sufficient if you want to fix
 it; ref_remove_duplicates() needs to be taught to retain the first
 one it encounters among the duplicates, not ensure the flags are
 consistent by erroring out when they are not among duplicates, nor
 somehow combine flags from later one to the first one.
 
 But I suspect that, if this behaviour change were a problem, such a
 configuration was a problem before this change to most people; the
 order of duplicated [remote foo] section would not be under
 control of those who only use git remote without using an editor
 to tweak .git/config file anyway. So I do not think this regression
 is too big a deal; it is something you can fix later on top.

I'll address your second point first: I agree that the order among
explicit refspecs couldn't really be a problem because it is already
arbitrary.  However, get_ref_map() adds references from other sources to
the result list.  For example, if there are command-line arguments, it adds

1. command-line arguments (with fetch_head_status=FETCH_HEAD_MERGE)

2. configured refspecs (with fetch_head_status=FETCH_HEAD_IGNORE)

3. if (--tags)
   then tags_refspec (with fetch_head_status=FETCH_HEAD_NOT_FOR_MERGE)
   else the result of find_non_local_tags() (with
fetch_head_status=FETCH_HEAD_NOT_FOR_MERGE)

It could very well be that the order matters across different classes of
reference.

Regarding your first point: if it matters which of the duplicates is
chosen, then it can only matter because they differ in some other way
than their reference names, for example in their flags.  So a robust way
of de-duping them (if it is possible) would be to compare the two
records and decide which one should take precedence based on this other
information rather than based on which one happened to come earlier in
the list.  Then 

Re: [PATCH 10/15] fetch --tags: fetch tags *in addition to* other stuff

2013-10-25 Thread Michael Haggerty
On 10/24/2013 10:51 PM, Junio C Hamano wrote:
 Michael Haggerty mhag...@alum.mit.edu writes:
 diff --git a/Documentation/fetch-options.txt 
 b/Documentation/fetch-options.txt
 index ba1fe49..0e6d2ac 100644
 --- a/Documentation/fetch-options.txt
 +++ b/Documentation/fetch-options.txt
 @@ -61,11 +61,9 @@ endif::git-pull[]
  ifndef::git-pull[]
  -t::
  --tags::
 -This is a short-hand for giving `refs/tags/*:refs/tags/*`
 -refspec from the command line, to ask all tags to be fetched
 -and stored locally.  Because this acts as an explicit
 -refspec, the default refspecs (configured with the
 -remote.$name.fetch variable) are overridden and not used.
 +This is a short-hand requesting that all tags be fetched from
 +the remote in addition to whatever else is being fetched.  It
 +is similar to using the refspec `refs/tags/*:refs/tags/*`.
 
 This is no longer a short-hand, is it?  There is no other way to ask
 fetch the usual stuff, and then refs/tags/*:refs/tags/* as well.
 
 It should be sufficient for me to locally do:
 
 s/This is a short-hand requesting/Request/;
 
 I think.

Yes, that's better.

 diff --git a/git-pull.sh b/git-pull.sh
 index b946fd9..dac7e1c 100755
 --- a/git-pull.sh
 +++ b/git-pull.sh
 @@ -172,7 +172,7 @@ error_on_no_merge_candidates () {
  do
  case $opt in
  -t|--t|--ta|--tag|--tags)
 -echo Fetching tags only, you probably meant:
 +echo It doesn't make sense to pull tags; you probably 
 meant:
 
 s/pull tags/pull all tags/; perhaps?

Yes, that's also an improvement.

Thanks,
Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/15] fetch --tags: fetch tags *in addition to* other stuff

2013-10-24 Thread Junio C Hamano
Michael Haggerty mhag...@alum.mit.edu writes:

 Previously, fetch's --tags option was considered equivalent to
 specifying the refspec refs/tags/*:refs/tags/* on the command line;
 in particular, it caused the remote.name.refspec configuration to be
 ignored.

 But it is not very useful to fetch tags without also fetching other
 references, whereas it *is* quite useful to be able to fetch tags *in
 addition to* other references.

True but when fetching other references, tags relevant to the
history being fetched by default should automatically follow, so the
above explains why fetch --tags is not a useful thing to do daily.

 So change the semantics of this option
 to do the latter.

And I am not opposed to that change in the semantics; it makes an
operation that is not so useful less annoying.

 If a user wants to fetch *only* tags, then it is still possible to
 specifying an explicit refspec:

 git fetch remote 'refs/tags/*:refs/tags/*'

 Please note that the documentation prior to 1.8.0.3 was ambiguous
 about this aspect of fetch --tags behavior.  Commit

 f0cb2f137c 2012-12-14 fetch --tags: clarify documentation

 made the documentation match the old behavior.  This commit changes
 the documentation to match the new behavior.

The old behaviour as designed.  The documentation change was not
about refusing to fix a bug but the above makes it sound as if it
were a such commit.

 The change to builtin/fetch.c:get_ref_map() has the side effect of
 changing the order that the (struct ref) objects are listed in
 ref_map.  It seems to me that this could probably only matter in the
 case of duplicates.  But later ref_remove_duplicates() is called, so
 duplicates are eliminated.  However, ref_remove_duplicates() always
 retains the first instance of duplicates and discards the rest.  It is
 conceivable that the boolean flags (which are not inspected by
 ref_remove_duplicates()) could differ among the duplicates, and that
 therefore changing the order of the items in the original list has the
 effect of changing the flags on the items that are retained.

 I don't know this code well enough to judge whether this might be a
 problem.

 If it is, then the correct approach is probably either to teach
 ref_remove_duplicates() to ensure that the flags are also consistent
 across duplicates, or to somehow combine the flags from all duplicates
 into the one that is retained.  Please let me know if this is needed.

I do not think either of these two is sufficient if you want to fix
it; ref_remove_duplicates() needs to be taught to retain the first
one it encounters among the duplicates, not ensure the flags are
consistent by erroring out when they are not among duplicates, nor
somehow combine flags from later one to the first one.

But I suspect that, if this behaviour change were a problem, such a
configuration was a problem before this change to most people; the
order of duplicated [remote foo] section would not be under
control of those who only use git remote without using an editor
to tweak .git/config file anyway. So I do not think this regression
is too big a deal; it is something you can fix later on top.

 diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt
 index ba1fe49..0e6d2ac 100644
 --- a/Documentation/fetch-options.txt
 +++ b/Documentation/fetch-options.txt
 @@ -61,11 +61,9 @@ endif::git-pull[]
  ifndef::git-pull[]
  -t::
  --tags::
 - This is a short-hand for giving `refs/tags/*:refs/tags/*`
 - refspec from the command line, to ask all tags to be fetched
 - and stored locally.  Because this acts as an explicit
 - refspec, the default refspecs (configured with the
 - remote.$name.fetch variable) are overridden and not used.
 + This is a short-hand requesting that all tags be fetched from
 + the remote in addition to whatever else is being fetched.  It
 + is similar to using the refspec `refs/tags/*:refs/tags/*`.

This is no longer a short-hand, is it?  There is no other way to ask
fetch the usual stuff, and then refs/tags/*:refs/tags/* as well.

It should be sufficient for me to locally do:

s/This is a short-hand requesting/Request/;

I think.

 diff --git a/git-pull.sh b/git-pull.sh
 index b946fd9..dac7e1c 100755
 --- a/git-pull.sh
 +++ b/git-pull.sh
 @@ -172,7 +172,7 @@ error_on_no_merge_candidates () {
   do
   case $opt in
   -t|--t|--ta|--tag|--tags)
 - echo Fetching tags only, you probably meant:
 + echo It doesn't make sense to pull tags; you probably 
 meant:

s/pull tags/pull all tags/; perhaps?

   echo   git fetch --tags
   exit 1
   esac
 diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
 index 8328be1..02e5901 100755
 --- a/t/t5510-fetch.sh
 +++ b/t/t5510-fetch.sh
 @@ -113,7 +113,7 @@ test_expect_success 'fetch --prune with a namespace keeps 
 other namespaces' '
   git rev-parse origin/master
  '
  

[PATCH 10/15] fetch --tags: fetch tags *in addition to* other stuff

2013-10-23 Thread Michael Haggerty
Previously, fetch's --tags option was considered equivalent to
specifying the refspec refs/tags/*:refs/tags/* on the command line;
in particular, it caused the remote.name.refspec configuration to be
ignored.

But it is not very useful to fetch tags without also fetching other
references, whereas it *is* quite useful to be able to fetch tags *in
addition to* other references.  So change the semantics of this option
to do the latter.

If a user wants to fetch *only* tags, then it is still possible to
specifying an explicit refspec:

git fetch remote 'refs/tags/*:refs/tags/*'

Please note that the documentation prior to 1.8.0.3 was ambiguous
about this aspect of fetch --tags behavior.  Commit

f0cb2f137c 2012-12-14 fetch --tags: clarify documentation

made the documentation match the old behavior.  This commit changes
the documentation to match the new behavior.

Signed-off-by: Michael Haggerty mhag...@alum.mit.edu
---

The change to builtin/fetch.c:get_ref_map() has the side effect of
changing the order that the (struct ref) objects are listed in
ref_map.  It seems to me that this could probably only matter in the
case of duplicates.  But later ref_remove_duplicates() is called, so
duplicates are eliminated.  However, ref_remove_duplicates() always
retains the first instance of duplicates and discards the rest.  It is
conceivable that the boolean flags (which are not inspected by
ref_remove_duplicates()) could differ among the duplicates, and that
therefore changing the order of the items in the original list has the
effect of changing the flags on the items that are retained.

I don't know this code well enough to judge whether this might be a
problem.

If it is, then the correct approach is probably either to teach
ref_remove_duplicates() to ensure that the flags are also consistent
across duplicates, or to somehow combine the flags from all duplicates
into the one that is retained.  Please let me know if this is needed.

 Documentation/fetch-options.txt  |  8 +++---
 builtin/fetch.c  | 46 +++-
 git-pull.sh  |  2 +-
 t/t5510-fetch.sh | 22 ---
 t/t5515/fetch.br-unconfig_--tags_.._.git |  1 +
 t/t5515/fetch.master_--tags_.._.git  |  1 +
 t/t5525-fetch-tagopt.sh  | 23 
 7 files changed, 71 insertions(+), 32 deletions(-)

diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt
index ba1fe49..0e6d2ac 100644
--- a/Documentation/fetch-options.txt
+++ b/Documentation/fetch-options.txt
@@ -61,11 +61,9 @@ endif::git-pull[]
 ifndef::git-pull[]
 -t::
 --tags::
-   This is a short-hand for giving `refs/tags/*:refs/tags/*`
-   refspec from the command line, to ask all tags to be fetched
-   and stored locally.  Because this acts as an explicit
-   refspec, the default refspecs (configured with the
-   remote.$name.fetch variable) are overridden and not used.
+   This is a short-hand requesting that all tags be fetched from
+   the remote in addition to whatever else is being fetched.  It
+   is similar to using the refspec `refs/tags/*:refs/tags/*`.
 
 --recurse-submodules[=yes|on-demand|no]::
This option controls if and under what conditions new commits of
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 61e8117..7edb1ea 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -271,7 +271,7 @@ static struct ref *get_ref_map(struct transport *transport,
 
const struct ref *remote_refs = transport_get_remote_refs(transport);
 
-   if (refspec_count || tags == TAGS_SET) {
+   if (refspec_count) {
struct ref **old_tail;
 
for (i = 0; i  refspec_count; i++) {
@@ -279,11 +279,9 @@ static struct ref *get_ref_map(struct transport *transport,
if (refspecs[i].dst  refspecs[i].dst[0])
*autotags = 1;
}
-   /* Merge everything on the command line, but not --tags */
+   /* Merge everything on the command line (but not --tags) */
for (rm = ref_map; rm; rm = rm-next)
rm-fetch_head_status = FETCH_HEAD_MERGE;
-   if (tags == TAGS_SET)
-   get_fetch_map(remote_refs, tag_refspec, tail, 0);
 
/*
 * For any refs that we happen to be fetching via command-line
@@ -334,8 +332,13 @@ static struct ref *get_ref_map(struct transport *transport,
tail = ref_map-next;
}
}
-   if (tags == TAGS_DEFAULT  *autotags)
+
+   if (tags == TAGS_SET)
+   /* also fetch all tags */
+   get_fetch_map(remote_refs, tag_refspec, tail, 0);
+   else if (tags == TAGS_DEFAULT  *autotags)
find_non_local_tags(transport, ref_map, tail);
+
ref_remove_duplicates(ref_map);
 
return ref_map;