Re: [PATCH/RFC] upload-pack: ignore 'shallow' lines with unknown obj-ids

2013-04-20 Thread Duy Nguyen
On Sun, Apr 21, 2013 at 6:51 AM, Junio C Hamano  wrote:
> Duy Nguyen  writes:
> But the shallow list is also used to compute the updated boundary
> (i.e. "this client does not have a valid history behind these
> commits")?  When we know their current shallow boundary, after
> sending history that crosses that boundary, we can tell them "before
> this fetch, you did not have any history behind this commit, but we
> know that you now have a bit more. Update your record with these new
> boundaries. You still do not have any history behind these commits."
> That is how deepening works.
>
> When you receive a shallow boundary unknown to you, it might not
> hurt if you keep it and parrot it at the end, so that the fetcher
> will still remember that it does not have any history behind the
> commit.  There may be reasons why doing so is not sufficient and we
> have to reject clients whose history is truncated at a place unknown
> to us.
>
> You would declare "now you have everything behind that unknown
> shallow boundary", if you ignore an unknown shallow boundary and do
> not send it back.
>
> That sounds ourright wrong to me. You simply do not know enough to
> make such a declaration.

It's a good point. But I think the receiver does not rely solely on
"shallow " lines from the sender to create new shallow file. If it
receives "shallow " line, it registers the received sha-1 as a cut
point. If it receives "unshallow " line, it unregisters. If it
receives neither, the current registered cutpoints in memory are
simply written back to disk. So not sending it back should not be a
big problem (at least for C Git).

> I do not seem to find the patch you are responding to, so I do not
> know how the patch handled the unshallowing part, but the impression
> I got from reading the log message quoted is that the patch was not
> even aware of the issue.

I can't find it on gmane.org either. Patch quoted below.

On Sat, Apr 20, 2013 at 8:05 PM, Michael Heemskerk
 wrote:
> diff --git a/Documentation/technical/pack-protocol.txt
> b/Documentation/technical/pack-protocol.txt
> index f1a51ed..b898e97 100644
> --- a/Documentation/technical/pack-protocol.txt
> +++ b/Documentation/technical/pack-protocol.txt
> @@ -228,8 +228,7 @@ obtained through ref discovery.
>  The client MUST write all obj-ids which it only has shallow copies
>  of (meaning that it does not have the parents of a commit) as
>  'shallow' lines so that the server is aware of the limitations of
> -the client's history. Clients MUST NOT mention an obj-id which
> -it does not know exists on the server.
> +the client's history.
>
>  The client now sends the maximum commit history depth it wants for
>  this transaction, which is the number of commits it wants from the
> diff --git a/upload-pack.c b/upload-pack.c
> index bfa6279..127e59a 100644
> --- a/upload-pack.c
> +++ b/upload-pack.c
> @@ -592,7 +592,7 @@ static void receive_needs(void)
> die("invalid shallow line: %s", line);
> object = parse_object(sha1);
> if (!object)
> -   die("did not find object for %s", line);
> +   continue;
> if (object->type != OBJ_COMMIT)
> die("invalid shallow object %s",
> sha1_to_hex(sha1));
> if (!(object->flags & CLIENT_SHALLOW)) {
> --
> 1.8.0.2
>
--
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] upload-pack: ignore 'shallow' lines with unknown obj-ids

2013-04-20 Thread Junio C Hamano
Duy Nguyen  writes:

> On Sat, Apr 20, 2013 at 8:05 PM, Michael Heemskerk
>  wrote:
>> When the client sends a 'shallow' line for an object that the server does
>> not have, the server currently dies with the error: "did not find object
>> for shallow ". The client may have received the object from a
>> different server, or the object may have been garbage collected by the
>> server. In either case, the object is not relevant for calculating the pack
>> that is to be sent and can be safely ignored.

For the purpose of generating a pack, these shallow boundaries are
irrelevant.  The above describes only that part.

But the shallow list is also used to compute the updated boundary
(i.e. "this client does not have a valid history behind these
commits")?  When we know their current shallow boundary, after
sending history that crosses that boundary, we can tell them "before
this fetch, you did not have any history behind this commit, but we
know that you now have a bit more. Update your record with these new
boundaries. You still do not have any history behind these commits."
That is how deepening works.

When you receive a shallow boundary unknown to you, it might not
hurt if you keep it and parrot it at the end, so that the fetcher
will still remember that it does not have any history behind the
commit.  There may be reasons why doing so is not sufficient and we
have to reject clients whose history is truncated at a place unknown
to us.

You would declare "now you have everything behind that unknown
shallow boundary", if you ignore an unknown shallow boundary and do
not send it back.

That sounds ourright wrong to me. You simply do not know enough to
make such a declaration.

I do not seem to find the patch you are responding to, so I do not
know how the patch handled the unshallowing part, but the impression
I got from reading the log message quoted is that the patch was not
even aware of the issue.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] upload-pack: ignore 'shallow' lines with unknown obj-ids

2013-04-20 Thread Duy Nguyen
On Sat, Apr 20, 2013 at 8:05 PM, Michael Heemskerk
 wrote:
> When the client sends a 'shallow' line for an object that the server does
> not have, the server currently dies with the error: "did not find object
> for shallow ". The client may have received the object from a
> different server, or the object may have been garbage collected by the
> server. In either case, the object is not relevant for calculating the pack
> that is to be sent and can be safely ignored.
>
> The documentation in technical/pack-protocol.txt has been updated to
> remove the restriction that "Clients MUST NOT mention an obj-id which it
> does not know exists on the server". This requirement is not realistic
> because clients cannot know whether an object has been garbage collected
> by the server.
>
> Signed-off-by: Michael Heemskerk 

I'm not an expert in git protocol (or shallow extension in particular)
but the reasoning sounds right and the implemention looks correct.

Acked-by: Duy Nguyen 
--
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html