[PATCH v2 7/9] connect: tell server that the client understands v1

2017-09-26 Thread Brandon Williams
Teach the connection logic to tell a serve that it understands protocol
v1.  This is done in 2 different ways for the built in protocols.

1. git://
   A normal request is structured as "command path/to/repo\0host=..\0"
   and due to a bug in an old version of git-daemon 73bb33a94 (daemon:
   Strictly parse the "extra arg" part of the command, 2009-06-04) we
   aren't able to place any extra args (separated by NULs) besides the
   host.  In order to get around this limitation put protocol version
   information after a second NUL byte so the request is structured
   like: "command path/to/repo\0host=..\0\0version=1\0".  git-daemon can
   then parse out the version number and set GIT_PROTOCOL.

2. ssh://, file://
   Set GIT_PROTOCOL envvar with the desired protocol version.  The
   envvar can be sent across ssh by using '-o SendEnv=GIT_PROTOCOL' and
   having the server whitelist this envvar.

Signed-off-by: Brandon Williams 
---
 connect.c  |  37 ++--
 t/t5700-protocol-v1.sh | 223 +
 2 files changed, 255 insertions(+), 5 deletions(-)
 create mode 100755 t/t5700-protocol-v1.sh

diff --git a/connect.c b/connect.c
index 1805debf3..12ebab724 100644
--- a/connect.c
+++ b/connect.c
@@ -871,6 +871,7 @@ struct child_process *git_connect(int fd[2], const char 
*url,
printf("Diag: path=%s\n", path ? path : "NULL");
conn = NULL;
} else if (protocol == PROTO_GIT) {
+   struct strbuf request = STRBUF_INIT;
/*
 * Set up virtual host information based on where we will
 * connect, unless the user has overridden us in
@@ -898,13 +899,25 @@ struct child_process *git_connect(int fd[2], const char 
*url,
 * Note: Do not add any other headers here!  Doing so
 * will cause older git-daemon servers to crash.
 */
-   packet_write_fmt(fd[1],
-"%s %s%chost=%s%c",
-prog, path, 0,
-target_host, 0);
+   strbuf_addf(&request,
+   "%s %s%chost=%s%c",
+   prog, path, 0,
+   target_host, 0);
+
+   /* If using a new version put that stuff here after a second 
null byte */
+   if (get_protocol_version_config() > 0) {
+   strbuf_addch(&request, '\0');
+   strbuf_addf(&request, "version=%d%c",
+   get_protocol_version_config(), '\0');
+   }
+
+   packet_write(fd[1], request.buf, request.len);
+
free(target_host);
+   strbuf_release(&request);
} else {
struct strbuf cmd = STRBUF_INIT;
+   const char *const *var;
 
conn = xmalloc(sizeof(*conn));
child_process_init(conn);
@@ -917,7 +930,9 @@ struct child_process *git_connect(int fd[2], const char 
*url,
sq_quote_buf(&cmd, path);
 
/* remove repo-local variables from the environment */
-   conn->env = local_repo_env;
+   for (var = local_repo_env; *var; var++)
+   argv_array_push(&conn->env_array, *var);
+
conn->use_shell = 1;
conn->in = conn->out = -1;
if (protocol == PROTO_SSH) {
@@ -971,6 +986,14 @@ struct child_process *git_connect(int fd[2], const char 
*url,
}
 
argv_array_push(&conn->args, ssh);
+
+   if (get_protocol_version_config() > 0) {
+   argv_array_push(&conn->args, "-o");
+   argv_array_push(&conn->args, "SendEnv=" 
GIT_PROTOCOL_ENVIRONMENT);
+   argv_array_pushf(&conn->env_array, 
GIT_PROTOCOL_ENVIRONMENT "=version=%d",
+get_protocol_version_config());
+   }
+
if (flags & CONNECT_IPV4)
argv_array_push(&conn->args, "-4");
else if (flags & CONNECT_IPV6)
@@ -985,6 +1008,10 @@ struct child_process *git_connect(int fd[2], const char 
*url,
argv_array_push(&conn->args, ssh_host);
} else {
transport_check_allowed("file");
+   if (get_protocol_version_config() > 0) {
+   argv_array_pushf(&conn->env_array, 
GIT_PROTOCOL_ENVIRONMENT "=version=%d",
+get_protocol_version_config());
+   }
}
argv_array_push(&conn->args, cmd.buf);
 
diff --git a/t/t5700-protocol-v1.sh b/t/t5700-protocol-v1.sh
new file mode 100755
index 0..1988bbce6
--- /dev/null
+++ b/t/t5700-protoco

Re: [PATCH v2 7/9] connect: tell server that the client understands v1

2017-09-29 Thread Brandon Williams
On 09/27, Junio C Hamano wrote:
> Junio C Hamano  writes:
> 
> >> +  # Client requested to use protocol v1
> >> +  grep "version=1" log &&
> >> +  # Server responded using protocol v1
> >> +  grep "clone< version 1" log
> >
> > This looked a bit strange to check "clone< version 1" for one
> > direction, but did not check "$something> version 1" for the other
> > direction.  Doesn't "version=1" end up producing 2 hits?
> >
> > Not a complaint, but wondering if we can write it in such a way that
> > does not have to make readers wonder.
> 
> Ah, the check for "version=1" is a short-hand for
> 
>   grep "clone> git-upload-pack ...\\0\\0version=1\\0$" log
> 
> and the symmetry I sought is already there.  So ignore the above; if
> we wanted to make the symmetry more explicit, it would not hurt to
> spell the first one as
> 
>   grep "clone> .*\\0\\0version=1\\0$" log

I think you need three '\' to get an escaped backslash, but I agree,
I'll spell this out more explicitly in the tests.

> 
> though.
> 

-- 
Brandon Williams


Re: [PATCH v2 7/9] connect: tell server that the client understands v1

2017-09-26 Thread Junio C Hamano
Brandon Williams  writes:

> Teach the connection logic to tell a serve that it understands protocol
> v1.  This is done in 2 different ways for the built in protocols.
>
> 1. git://
>A normal request is structured as "command path/to/repo\0host=..\0"
>and due to a bug in an old version of git-daemon 73bb33a94 (daemon:
>Strictly parse the "extra arg" part of the command, 2009-06-04) we
>aren't able to place any extra args (separated by NULs) besides the
>host.  In order to get around this limitation put protocol version
>information after a second NUL byte so the request is structured
>like: "command path/to/repo\0host=..\0\0version=1\0".  git-daemon can
>then parse out the version number and set GIT_PROTOCOL.

Same question as a previous step, wrt the cited commit.  It reads as
if we are saying that the commit introduced a bug and left it there,
that we cannot use \0host=..\0version=..\0other=..\0 until that bug
is fixed, and that in the meantime we use \0host=..\0\0version=.. as
a workaround, but that reading leaves readers wondering if we want
to eventually drop this double-NUL workaround.  I am guessing that
we want to declare that the current protocol has a glitch that
prevents us to use \0host=..\0version=..\0 but we accept that and
plan to keep it that way, and we'll use the double-NUL for anything
other than host from now on, as it is compatible with the current
version of Git before this patch (the extras are safely ignored),
but then it still leaves readers wonder if the mention of the
old commit from 2009 means that this double-NUL would not even work
if the other end is running a version of Git before that commit, or
we are safe to talk with versions of Git even older than that.

I do not think it is a showstopper if we did not work with v1.6.4,
but it still needs to be clarified.

> 2. ssh://, file://
>Set GIT_PROTOCOL envvar with the desired protocol version.  The
>envvar can be sent across ssh by using '-o SendEnv=GIT_PROTOCOL' and
>having the server whitelist this envvar.

OpenSSH lets us do this, but I do not know how well this works with
other implementations of SSH clients.  The log message perhaps needs
to ask for volunteers to check if it is OK with the implementations
they use, and offer conditional code (just like we have for putty
and plink customizations) otherwise.

Other than that, the code changes looked good.

> diff --git a/t/t5700-protocol-v1.sh b/t/t5700-protocol-v1.sh
> new file mode 100755
> index 0..1988bbce6
> --- /dev/null
> +++ b/t/t5700-protocol-v1.sh
> @@ -0,0 +1,223 @@
> +#!/bin/sh
> +
> +test_description='test git wire-protocol transition'
> +
> +TEST_NO_CREATE_REPO=1
> +
> +. ./test-lib.sh
> +
> +# Test protocol v1 with 'git://' transport
> +#
> +. "$TEST_DIRECTORY"/lib-git-daemon.sh
> +start_git_daemon --export-all --enable=receive-pack
> +daemon_parent=$GIT_DAEMON_DOCUMENT_ROOT_PATH/parent
> +
> +test_expect_success 'create repo to be served by git-daemon' '
> + git init "$daemon_parent" &&
> + test_commit -C "$daemon_parent" one
> +'
> +
> +test_expect_success 'clone with git:// using protocol v1' '
> + GIT_TRACE_PACKET=1 git -c protocol.version=1 \
> + clone "$GIT_DAEMON_URL/parent" daemon_child 2>log &&
> +
> + git -C daemon_child log -1 --format=%s >actual &&
> + git -C "$daemon_parent" log -1 --format=%s >expect &&
> + test_cmp expect actual &&
> +
> + # Client requested to use protocol v1
> + grep "version=1" log &&
> + # Server responded using protocol v1
> + grep "clone< version 1" log

This looked a bit strange to check "clone< version 1" for one
direction, but did not check "$something> version 1" for the other
direction.  Doesn't "version=1" end up producing 2 hits?

Not a complaint, but wondering if we can write it in such a way that
does not have to make readers wonder.

> +'
> +
> +test_expect_success 'fetch with git:// using protocol v1' '
> + test_commit -C "$daemon_parent" two &&
> +
> + GIT_TRACE_PACKET=1 git -C daemon_child -c protocol.version=1 \
> + fetch 2>log &&
> +
> + git -C daemon_child log -1 --format=%s FETCH_HEAD >actual &&
> + git -C "$daemon_parent" log -1 --format=%s >expect &&
> + test_cmp expect actual &&

OK.  So the origin repository gained one commit on the 'master'
branch (and a tag 'two').  By fetching, but not pulling, our
'master' would not advance, and that is where check on FETCH_HEAD
comes from.  I suspect that the tag 'two' is also auto-followed with
this operation and would be in FETCH_HEAD; is that something we want
to check?  Alternatively, the "actual" log may want to see what the
remote tracking branch for their 'master' has---then we do not have
to worry about "FETCH_HEAD has two refs---which one are we checking?"

> +
> + # Client requested to use protocol v1
> + grep "version=1" log &&
> + # Server responded using protocol v1
> + grep "fetch< version 1" log
> +

Re: [PATCH v2 7/9] connect: tell server that the client understands v1

2017-09-26 Thread Junio C Hamano
Junio C Hamano  writes:

>> +# Client requested to use protocol v1
>> +grep "version=1" log &&
>> +# Server responded using protocol v1
>> +grep "clone< version 1" log
>
> This looked a bit strange to check "clone< version 1" for one
> direction, but did not check "$something> version 1" for the other
> direction.  Doesn't "version=1" end up producing 2 hits?
>
> Not a complaint, but wondering if we can write it in such a way that
> does not have to make readers wonder.

Ah, the check for "version=1" is a short-hand for

grep "clone> git-upload-pack ...\\0\\0version=1\\0$" log

and the symmetry I sought is already there.  So ignore the above; if
we wanted to make the symmetry more explicit, it would not hurt to
spell the first one as

grep "clone> .*\\0\\0version=1\\0$" log

though.



Re: [PATCH v2 7/9] connect: tell server that the client understands v1

2017-09-28 Thread Brandon Williams
On 09/27, Junio C Hamano wrote:
> Brandon Williams  writes:
> 
> > Teach the connection logic to tell a serve that it understands protocol
> > v1.  This is done in 2 different ways for the built in protocols.
> >
> > 1. git://
> >A normal request is structured as "command path/to/repo\0host=..\0"
> >and due to a bug in an old version of git-daemon 73bb33a94 (daemon:
> >Strictly parse the "extra arg" part of the command, 2009-06-04) we
> >aren't able to place any extra args (separated by NULs) besides the
> >host.  In order to get around this limitation put protocol version
> >information after a second NUL byte so the request is structured
> >like: "command path/to/repo\0host=..\0\0version=1\0".  git-daemon can
> >then parse out the version number and set GIT_PROTOCOL.
> 
> Same question as a previous step, wrt the cited commit.  It reads as
> if we are saying that the commit introduced a bug and left it there,
> that we cannot use \0host=..\0version=..\0other=..\0 until that bug
> is fixed, and that in the meantime we use \0host=..\0\0version=.. as
> a workaround, but that reading leaves readers wondering if we want
> to eventually drop this double-NUL workaround.  I am guessing that
> we want to declare that the current protocol has a glitch that
> prevents us to use \0host=..\0version=..\0 but we accept that and
> plan to keep it that way, and we'll use the double-NUL for anything
> other than host from now on, as it is compatible with the current
> version of Git before this patch (the extras are safely ignored),
> but then it still leaves readers wonder if the mention of the
> old commit from 2009 means that this double-NUL would not even work
> if the other end is running a version of Git before that commit, or
> we are safe to talk with versions of Git even older than that.
> 
> I do not think it is a showstopper if we did not work with v1.6.4,
> but it still needs to be clarified.

I wrote an updated commit msg for the daemon change, I can make a
similar change here.  And this mechanism shouldn't cause any issues with
both the pre and post 73bb33a94 git-daemon servers.

> 
> > 2. ssh://, file://
> >Set GIT_PROTOCOL envvar with the desired protocol version.  The
> >envvar can be sent across ssh by using '-o SendEnv=GIT_PROTOCOL' and
> >having the server whitelist this envvar.
> 
> OpenSSH lets us do this, but I do not know how well this works with
> other implementations of SSH clients.  The log message perhaps needs
> to ask for volunteers to check if it is OK with the implementations
> they use, and offer conditional code (just like we have for putty
> and plink customizations) otherwise.

I'll make a comment indicating that

> 
> Other than that, the code changes looked good.
> 
> > diff --git a/t/t5700-protocol-v1.sh b/t/t5700-protocol-v1.sh
> > new file mode 100755
> > index 0..1988bbce6
> > --- /dev/null
> > +++ b/t/t5700-protocol-v1.sh
> > @@ -0,0 +1,223 @@
> > +#!/bin/sh
> > +
> > +test_description='test git wire-protocol transition'
> > +
> > +TEST_NO_CREATE_REPO=1
> > +
> > +. ./test-lib.sh
> > +
> > +# Test protocol v1 with 'git://' transport
> > +#
> > +. "$TEST_DIRECTORY"/lib-git-daemon.sh
> > +start_git_daemon --export-all --enable=receive-pack
> > +daemon_parent=$GIT_DAEMON_DOCUMENT_ROOT_PATH/parent
> > +
> > +test_expect_success 'create repo to be served by git-daemon' '
> > +   git init "$daemon_parent" &&
> > +   test_commit -C "$daemon_parent" one
> > +'
> > +
> > +test_expect_success 'clone with git:// using protocol v1' '
> > +   GIT_TRACE_PACKET=1 git -c protocol.version=1 \
> > +   clone "$GIT_DAEMON_URL/parent" daemon_child 2>log &&
> > +
> > +   git -C daemon_child log -1 --format=%s >actual &&
> > +   git -C "$daemon_parent" log -1 --format=%s >expect &&
> > +   test_cmp expect actual &&
> > +
> > +   # Client requested to use protocol v1
> > +   grep "version=1" log &&
> > +   # Server responded using protocol v1
> > +   grep "clone< version 1" log
> 
> This looked a bit strange to check "clone< version 1" for one
> direction, but did not check "$something> version 1" for the other
> direction.  Doesn't "version=1" end up producing 2 hits?

I think you discovered this in your next email but the "version=1" check
is to check for the request sent to git-daemon, the "command
path/to/repo\0host=blah\0\0version=1\0" one. While the "clone< version
1" check is to make sure that the server responded with the correct
version.

> 
> Not a complaint, but wondering if we can write it in such a way that
> does not have to make readers wonder.
> 
> > +'
> > +
> > +test_expect_success 'fetch with git:// using protocol v1' '
> > +   test_commit -C "$daemon_parent" two &&
> > +
> > +   GIT_TRACE_PACKET=1 git -C daemon_child -c protocol.version=1 \
> > +   fetch 2>log &&
> > +
> > +   git -C daemon_child log -1 --format=%s FETCH_HEAD >actual &&
> > +   git -C "$daemon_parent" log -1 --format=%s >expect &&
> > +   test_cmp e