Re: [PATCH] transport: report refs only if transport does

2018-08-02 Thread Jeff King
On Tue, Jul 31, 2018 at 04:23:43PM -0700, Jonathan Tan wrote:

> > > Because transport_fetch_refs() filters the refs sent to the transport,
> > > it cannot just report the transport's result directly, but first needs
> > > to readd the excluded refs, pretending that they are fetched. However,
> > > this results in a wrong result if the transport did not report the refs
> > > that they have fetched in "fetched_refs" - the excluded refs would be
> > > added and reported, presenting an incomplete picture to the caller.
> > 
> > This part leaves me confused. If we are not fetching them, then why do
> > we need to pretend that they are fetched?
> 
> The short answer is that we need:
>  (1) the complete list of refs that was passed to
>  transport_fetch_refs(),
>  (2) with shallow information (REF_STATUS_REJECT_SHALLOW set if
>  relevant), and
>  (3) with updated OIDs if ref-in-want was used.
> 
> The fetched_refs out param already fulfils (2) and (3), and this patch
> makes it fulfil (1). As for calling them fetched_refs, perhaps that is a
> misnomer, but they do appear in FETCH_HEAD even though they are not
> truly fetched.

Thanks for this explanation. It does make more sense to me now, and I
agree that a lot of my confusion was from calling it "fetched_refs" (and
the comment saying "reported as fetched, but not actually fetched").

> Which raises the question...if completeness is so important, why not
> reuse the input list of refs and document that transport_fetch_refs()
> can mutate the input list? You ask the same question below, so I'll put
> the answer after quoting your paragraph.
> [...]

Thanks, the answer here was enlightening as well.

I see you posted a patch to go back to mutating the list, and that seems
reasonable to me. I'm fine with a separate "out" list, too. Its purpose
and expectations just need to be reflected in the name (and possibly in
a comment).

-Peff


Re: [PATCH] transport: report refs only if transport does

2018-08-01 Thread Brandon Williams
On 07/31, Jonathan Tan wrote:
> > On Mon, Jul 30, 2018 at 03:56:01PM -0700, Jonathan Tan wrote:
> > 
> > > Commit 989b8c4452 ("fetch-pack: put shallow info in output parameter",
> > > 2018-06-28) allows transports to report the refs that they have fetched
> > > in a new out-parameter "fetched_refs". If they do so,
> > > transport_fetch_refs() makes this information available to its caller.
> > > 
> > > Because transport_fetch_refs() filters the refs sent to the transport,
> > > it cannot just report the transport's result directly, but first needs
> > > to readd the excluded refs, pretending that they are fetched. However,
> > > this results in a wrong result if the transport did not report the refs
> > > that they have fetched in "fetched_refs" - the excluded refs would be
> > > added and reported, presenting an incomplete picture to the caller.
> > 
> > This part leaves me confused. If we are not fetching them, then why do
> > we need to pretend that they are fetched?
> 
> The short answer is that we need:
>  (1) the complete list of refs that was passed to
>  transport_fetch_refs(),
>  (2) with shallow information (REF_STATUS_REJECT_SHALLOW set if
>  relevant), and
>  (3) with updated OIDs if ref-in-want was used.
> 
> The fetched_refs out param already fulfils (2) and (3), and this patch
> makes it fulfil (1). As for calling them fetched_refs, perhaps that is a
> misnomer, but they do appear in FETCH_HEAD even though they are not
> truly fetched.
> 
> Which raises the question...if completeness is so important, why not
> reuse the input list of refs and document that transport_fetch_refs()
> can mutate the input list? You ask the same question below, so I'll put
> the answer after quoting your paragraph.
> 
> > I think I am showing my lack of understanding about the reason for this
> > whole "return the fetched refs" scheme from 989b8c4452, and probably
> > reading the rest of that series would make it more clear. But from the
> > perspective of somebody digging into history and finding just this
> > commit, it probably needs to lay out a little more of the reasoning.
> 
> I think it's because 989b8c4452 is based on my earlier work [1] which
> also had a fetched_refs out param. Its main reason is to enable the
> invoker of transport_fetch_refs() to specify ref patterns (as you can
> see in a later commit in the same patch set [2]) - and if we specify
> patterns, the invoker of transport_fetch_refs() needs the resulting refs
> (which are provided through fetched_refs).
> 
> In the version that made it to master, however, there was some debate
> about whether ref patterns need to be allowed. In the end, ref patterns
> were not allowed [3], but the fetched_refs out param was still left in.
> 
> I think that reverting the API might work, but am on the fence about it.
> It would reduce the number of questions about the code (and would
> probably automatically fix the issue that I was fixing in the first

If you believe the API is difficult to work with (which given this bug
it is) then perhaps we go with your suggestion and revert the API back
to only providing a list of input refs and having the fetch operation
mutate that input list.

> place), but if we were to revert the API and then decide that we do want
> ref patterns in "want-ref" (or expand transport_fetch_refs in some
> similar way), we would need to revert our revert, causing code churn.

I haven't thought too much about what we would need to do in the event
we add patterns to ref-in-want, but couldn't we possible mutate the
input list again in this case and just simply add the resulting refs to
the input list?

> 
> [1] 
> https://public-inbox.org/git/86a128c5fb710a41791e7183207c4d64889f9307.1485381677.git.jonathanta...@google.com/
> [2] 
> https://public-inbox.org/git/eef2b77d88df0db08e4a1505b06e0af2d40143d5.1485381677.git.jonathanta...@google.com/
> [3] https://public-inbox.org/git/20180620213235.10952-1-bmw...@google.com/

-- 
Brandon Williams


Re: [PATCH] transport: report refs only if transport does

2018-07-31 Thread Jonathan Tan
> What leaves me even more confused is that the entire log message
> does not make it clear what the end-user observable problem the
> patch is trying to solve.
> 
> Is this "we sometimes follow and sometimes fail to follow refs while
> fetching"?  Does it affect all protocol versions and transports, or
> only just selected few (and if so which ones)?

Normally I would respond by creating a new patch with the answer in its
commit message, but I'm now not sure about whether it's better to revert
back to the non-"fetched_refs" API entirely (as I explained in the reply
to Peff I just sent [1]), so I'll answer your questions here for now:

 - Yes. We fail to follow when we fetch at least one ref that is
   up-to-date and one ref that is not, and when we're using the "fetch"
   command in a remote helper (for example, HTTP protocol v0).
 - I haven't checked exhaustively, but as far as I know, affects HTTP
   protocol v0, and does not affect anything using connect or
   stateless-connect (e.g. HTTP protocol v2, ssh).

When I create a new patch, I'll also include these answers in its commit
message.

[1] 
https://public-inbox.org/git/20180731232343.184463-1-jonathanta...@google.com/


Re: [PATCH] transport: report refs only if transport does

2018-07-31 Thread Jonathan Tan
> On Mon, Jul 30, 2018 at 03:56:01PM -0700, Jonathan Tan wrote:
> 
> > Commit 989b8c4452 ("fetch-pack: put shallow info in output parameter",
> > 2018-06-28) allows transports to report the refs that they have fetched
> > in a new out-parameter "fetched_refs". If they do so,
> > transport_fetch_refs() makes this information available to its caller.
> > 
> > Because transport_fetch_refs() filters the refs sent to the transport,
> > it cannot just report the transport's result directly, but first needs
> > to readd the excluded refs, pretending that they are fetched. However,
> > this results in a wrong result if the transport did not report the refs
> > that they have fetched in "fetched_refs" - the excluded refs would be
> > added and reported, presenting an incomplete picture to the caller.
> 
> This part leaves me confused. If we are not fetching them, then why do
> we need to pretend that they are fetched?

The short answer is that we need:
 (1) the complete list of refs that was passed to
 transport_fetch_refs(),
 (2) with shallow information (REF_STATUS_REJECT_SHALLOW set if
 relevant), and
 (3) with updated OIDs if ref-in-want was used.

The fetched_refs out param already fulfils (2) and (3), and this patch
makes it fulfil (1). As for calling them fetched_refs, perhaps that is a
misnomer, but they do appear in FETCH_HEAD even though they are not
truly fetched.

Which raises the question...if completeness is so important, why not
reuse the input list of refs and document that transport_fetch_refs()
can mutate the input list? You ask the same question below, so I'll put
the answer after quoting your paragraph.

> I think I am showing my lack of understanding about the reason for this
> whole "return the fetched refs" scheme from 989b8c4452, and probably
> reading the rest of that series would make it more clear. But from the
> perspective of somebody digging into history and finding just this
> commit, it probably needs to lay out a little more of the reasoning.

I think it's because 989b8c4452 is based on my earlier work [1] which
also had a fetched_refs out param. Its main reason is to enable the
invoker of transport_fetch_refs() to specify ref patterns (as you can
see in a later commit in the same patch set [2]) - and if we specify
patterns, the invoker of transport_fetch_refs() needs the resulting refs
(which are provided through fetched_refs).

In the version that made it to master, however, there was some debate
about whether ref patterns need to be allowed. In the end, ref patterns
were not allowed [3], but the fetched_refs out param was still left in.

I think that reverting the API might work, but am on the fence about it.
It would reduce the number of questions about the code (and would
probably automatically fix the issue that I was fixing in the first
place), but if we were to revert the API and then decide that we do want
ref patterns in "want-ref" (or expand transport_fetch_refs in some
similar way), we would need to revert our revert, causing code churn.

[1] 
https://public-inbox.org/git/86a128c5fb710a41791e7183207c4d64889f9307.1485381677.git.jonathanta...@google.com/
[2] 
https://public-inbox.org/git/eef2b77d88df0db08e4a1505b06e0af2d40143d5.1485381677.git.jonathanta...@google.com/
[3] https://public-inbox.org/git/20180620213235.10952-1-bmw...@google.com/


Re: [PATCH] transport: report refs only if transport does

2018-07-31 Thread Junio C Hamano
Jeff King  writes:

> On Mon, Jul 30, 2018 at 03:56:01PM -0700, Jonathan Tan wrote:
>
>> Commit 989b8c4452 ("fetch-pack: put shallow info in output parameter",
>> 2018-06-28) allows transports to report the refs that they have fetched
>> in a new out-parameter "fetched_refs". If they do so,
>> transport_fetch_refs() makes this information available to its caller.
>> 
>> Because transport_fetch_refs() filters the refs sent to the transport,
>> it cannot just report the transport's result directly, but first needs
>> to readd the excluded refs, pretending that they are fetched. However,
>> this results in a wrong result if the transport did not report the refs
>> that they have fetched in "fetched_refs" - the excluded refs would be
>> added and reported, presenting an incomplete picture to the caller.
>
> This part leaves me confused. If we are not fetching them, then why do
> we need to pretend that they are fetched?

What leaves me even more confused is that the entire log message
does not make it clear what the end-user observable problem the
patch is trying to solve.

Is this "we sometimes follow and sometimes fail to follow refs while
fetching"?  Does it affect all protocol versions and transports, or
only just selected few (and if so which ones)?

In minds of those who reported an issue and wrote the fix, the issue
may be fresh, but let's write the commit log message for ourselves 6
months down the road.

Thanks.


Re: [PATCH] transport: report refs only if transport does

2018-07-31 Thread Jeff King
On Mon, Jul 30, 2018 at 03:56:01PM -0700, Jonathan Tan wrote:

> Commit 989b8c4452 ("fetch-pack: put shallow info in output parameter",
> 2018-06-28) allows transports to report the refs that they have fetched
> in a new out-parameter "fetched_refs". If they do so,
> transport_fetch_refs() makes this information available to its caller.
> 
> Because transport_fetch_refs() filters the refs sent to the transport,
> it cannot just report the transport's result directly, but first needs
> to readd the excluded refs, pretending that they are fetched. However,
> this results in a wrong result if the transport did not report the refs
> that they have fetched in "fetched_refs" - the excluded refs would be
> added and reported, presenting an incomplete picture to the caller.

This part leaves me confused. If we are not fetching them, then why do
we need to pretend that they are fetched?

I think I am showing my lack of understanding about the reason for this
whole "return the fetched refs" scheme from 989b8c4452, and probably
reading the rest of that series would make it more clear. But from the
perspective of somebody digging into history and finding just this
commit, it probably needs to lay out a little more of the reasoning.

> Thanks for the reproduction recipe, Peff. Here's a fix. It can be
> reproduced with something using a remote helper's fetch command (and not
> using "connect" or "stateless-connect"), fetching at least one ref that
> requires a ref update and at least one that does not (as you can see
> from the included test).

Ah, that explains why I couldn't reproduce it with another repository; I
was using a direct git-upload-pack fetch, which wouldn't trigger the
remote helper code.

> diff --git a/transport.c b/transport.c
> index fdd813f684..2a2415d79c 100644
> --- a/transport.c
> +++ b/transport.c
> @@ -1230,17 +1230,18 @@ int transport_fetch_refs(struct transport *transport, 
> struct ref *refs,
>   struct ref **heads = NULL;
>   struct ref *nop_head = NULL, **nop_tail = _head;
>   struct ref *rm;
> + struct ref *fetched_by_transport = NULL;
>  
>   for (rm = refs; rm; rm = rm->next) {
>   nr_refs++;
>   if (rm->peer_ref &&
>   !is_null_oid(>old_oid) &&
>   !oidcmp(>peer_ref->old_oid, >old_oid)) {
> - /*
> -  * These need to be reported as fetched, but we don't
> -  * actually need to fetch them.
> -  */
>   if (fetched_refs) {
> + /*
> +  * These may need to be reported as fetched,
> +  * but we don't actually need to fetch them.
> +  */

So it's really this comment that leaves me the most puzzled.

> @@ -1264,10 +1265,25 @@ int transport_fetch_refs(struct transport *transport, 
> struct ref *refs,
>   heads[nr_heads++] = rm;
>   }
>  
> - rc = transport->vtable->fetch(transport, nr_heads, heads, fetched_refs);
> - if (fetched_refs && nop_head) {
> - *nop_tail = *fetched_refs;
> - *fetched_refs = nop_head;
> + rc = transport->vtable->fetch(transport, nr_heads, heads,
> +   fetched_refs ? _by_transport : 
> NULL);
> + if (fetched_refs) {
> + if (fetched_by_transport) {
> + /*
> +  * The transport reported its fetched refs. Pretend
> +  * that we also fetched the ones that we didn't need to
> +  * fetch.
> +  */
> + *nop_tail = fetched_by_transport;
> + *fetched_refs = nop_head;
> + } else if (!fetched_by_transport) {
> + /*
> +  * The transport didn't report its fetched refs, so
> +  * this function will not report them either. We have
> +  * no use for nop_head.
> +  */
> + free_refs(nop_head);
> + }

This part makes sense to me based on the description (and on the
assumption that reporting those nop refs is useful in the first place ;)
).

So I think your fix here is probably the right thing, but I'm just left
confused by the background a bit.

-Peff