Re: [PATCH 1/3] connect.c: Improve parsing of literal IPV6 addresses
On 2015-02-19 20.40, brian m. carlson wrote: On Thu, Feb 19, 2015 at 09:54:52AM -0800, Junio C Hamano wrote: I can see that you do not agree with the If we accept it part (where it refers to allowing [...] was a bug.)---past acceptance was not a bug for you. Do we talk about the same thing here ? The support for the ssh://host/path was introduced 2005, I think here: commit 2386d65822c912f0889ac600b1698b0659190133 Author: Linus Torvalds torva...@g5.osdl.org Date: Wed Jul 13 18:46:20 2005 -0700 Add first cut at git protocol connect logic. It happily accepted everything for host, including ssh://bmc@2001:470:1f05:79::1/git/bmc/homedir.git/ And this was the only way to connect to a server using a literal IPV6 address, the support for [] came in later. Today, in 2015, we can declare this syntax as deprecated, no problem. The parser we have in git.master does not handle URLs like ssh://bmc@[2001:470:1f05:79::1]/git/bmc/homedir.git/ correctly. Instead of this, ssh://[bmc@2001:470:1f05:79::1]/git/bmc/homedir.git/ needs to be used, and this is the main purpose of the series. (If we ignore updates of the test cases, which I think are good to prevent regressions) I could probably shorten the commit message of [1/3] to read like this: Improve the parsing to handle URLs which have a user name and a literal IPV6 like ssh://user@[2001:db8::1]/repo.git. (Thanks to Christian Taube li...@hcf.yourweb.de for reporting this long standing issue) Brian is for that If we accept it, and sees it as a bug. So let's see what he comes up with as a follow-up to the we should explicitly document it part. Here's what I propose: -- 8 -- Subject: [PATCH] Documentation: note deprecated syntax for IPv6 SSH URLs We have historically accepted some invalid syntax for SSH URLs containing IPv6 literals. Older versions of Git accepted URLs missing the brackets required by RFC 2732. Note that this behavior is deprecated and that other protocol handlers will not accept this syntax. Signed-off-by: brian m. carlson sand...@crustytoothpaste.net --- Documentation/urls.txt | 4 1 file changed, 4 insertions(+) diff --git a/Documentation/urls.txt b/Documentation/urls.txt index 9ccb246..2c1a84f 100644 --- a/Documentation/urls.txt +++ b/Documentation/urls.txt @@ -38,6 +38,10 @@ The ssh and git protocols additionally support ~username expansion: - git://host.xz{startsb}:port{endsb}/~{startsb}user{endsb}/path/to/repo.git/ - {startsb}user@{endsb}host.xz:/~{startsb}user{endsb}/path/to/repo.git/ +For backwards compatibility reasons, Git, when using ssh URLs, accepts +some URLs containing IPv6 literals that are missing the brackets. This +syntax is deprecated, and other protocol handlers do not permit this. + For local repositories, also supported by Git natively, the following syntaxes may be used: -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] connect.c: Improve parsing of literal IPV6 addresses
On Thu, Feb 19, 2015 at 09:54:52AM -0800, Junio C Hamano wrote: I can see that you do not agree with the If we accept it part (where it refers to allowing [...] was a bug.)---past acceptance was not a bug for you. Brian is for that If we accept it, and sees it as a bug. So let's see what he comes up with as a follow-up to the we should explicitly document it part. Here's what I propose: -- 8 -- Subject: [PATCH] Documentation: note deprecated syntax for IPv6 SSH URLs We have historically accepted some invalid syntax for SSH URLs containing IPv6 literals. Older versions of Git accepted URLs missing the brackets required by RFC 2732. Note that this behavior is deprecated and that other protocol handlers will not accept this syntax. Signed-off-by: brian m. carlson sand...@crustytoothpaste.net --- Documentation/urls.txt | 4 1 file changed, 4 insertions(+) diff --git a/Documentation/urls.txt b/Documentation/urls.txt index 9ccb246..2c1a84f 100644 --- a/Documentation/urls.txt +++ b/Documentation/urls.txt @@ -38,6 +38,10 @@ The ssh and git protocols additionally support ~username expansion: - git://host.xz{startsb}:port{endsb}/~{startsb}user{endsb}/path/to/repo.git/ - {startsb}user@{endsb}host.xz:/~{startsb}user{endsb}/path/to/repo.git/ +For backwards compatibility reasons, Git, when using ssh URLs, accepts +some URLs containing IPv6 literals that are missing the brackets. This +syntax is deprecated, and other protocol handlers do not permit this. + For local repositories, also supported by Git natively, the following syntaxes may be used: -- 2.2.1.209.g41e5f3a -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] connect.c: Improve parsing of literal IPV6 addresses
On 02/18/2015 07:40 PM, Junio C Hamano wrote: brian m. carlson sand...@crustytoothpaste.net writes: On Thu, Jan 22, 2015 at 11:05:29PM +0100, Torsten Bögershausen wrote: We want to support ssh://bmc@2001:470:1f05:79::1/git/bmc/homedir.git/ because e.g. the Git shipped with Debian (1.7.10.4) (and a lot of other installations) supports it. I understand that this used to work, but it probably shouldn't have ever been accepted. It's nonstandard, and if we accept it for ssh, people will want it to work for https, and due to libcurl, it simply won't. I prefer to see our past acceptance of this format as a bug. This is the first that I've heard of anyone noticing this (since 2013), so it can't be in common usage. If we accept it, we should explicitly document it as being deprecated and note that it's inconsistent with the way everything else works. I was reviewing my Undecided pile today, and I think your objection makes sense. Either of you care to update documentation, please, before I drop this series and forget about it? The URL RFC is much stricter regarding which characters that are allowed in which part of the URL, as least as I read it. The problem started when /usr/bin/ssh excepted things like /usr/bin/ssh fe80:x:y:z%eth0 and Git simply passed the hostname to ssh. And when the [] was there, it was stripped because ssh doesn't like them. URLs like ssh://bmc@2001:470:1f05:79::1/git/bmc/homedir.git/ simply worked, and nobody ever complained about this, (until now), Git never rejected IPV6 URLs without [], please correct me if I'm wrong. Git never cared about the exact URL, so that IPV6 URL's without [] where allowed from day one. On top of that, we support the short form, user@host:~ or other variants. But we never claimed to be compatible to RFC 1738, even if it makes sense to do so. What exactly should we write in the documentation ? Git supports RFC1738 (but is not as strict in parsing the URL, because we assume that the host name resolver will do some checking for us. Git currently does not support user@[fe80::x:y:z], even if RFC suggests it Git never claimed to be 100% compatible to RFC 1738, and will probably never be, (as we have old code that is as it is). We (at least I) don't want to break existing repos, rejecting URL's that had been working before and stopped working because the Git version is updated or so) This patch series is attempting to be backwards compatible to what old, older. and oldest versions of Git accepted. At the price that we accept URL's which do not conform to the RFC are accepted. It fixes the long standing issue that user@[fe80:] did not work. I'm somewhat unsure what to write in the documentation, I must admit. Unfortunately URL parsing is a tricky thing, this patch tries to do improvements. Especially it adds test cases, which are good to prevent further breakage. Updating the documentation was never part of the patch series, and if the documentation is updated, this is done in a separate commit anyway. How much does this series qualify for the we didn't update the docs, but fixed the code, let's drop it ? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] connect.c: Improve parsing of literal IPV6 addresses
Torsten Bögershausen tbo...@web.de writes: On 02/18/2015 07:40 PM, Junio C Hamano wrote: brian m. carlson sand...@crustytoothpaste.net writes: I understand that this used to work, but it probably shouldn't have ever been accepted. It's nonstandard, and if we accept it for ssh, people will want it to work for https, and due to libcurl, it simply won't. I prefer to see our past acceptance of this format as a bug. This is the first that I've heard of anyone noticing this (since 2013), so it can't be in common usage. If we accept it, we should explicitly document it as being deprecated and note that it's inconsistent with the way everything else works. I was reviewing my Undecided pile today, and I think your objection makes sense. Either of you care to update documentation, please, before I drop this series and forget about it? The URL RFC is much stricter regarding which characters that are allowed in which part of the URL, as least as I read it. ... I'm somewhat unsure what to write in the documentation, I must admit. I can see that you do not agree with the If we accept it part (where it refers to allowing [...] was a bug.)---past acceptance was not a bug for you. Brian is for that If we accept it, and sees it as a bug. So let's see what he comes up with as a follow-up to the we should explicitly document it part. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] connect.c: Improve parsing of literal IPV6 addresses
brian m. carlson sand...@crustytoothpaste.net writes: On Thu, Jan 22, 2015 at 11:05:29PM +0100, Torsten Bögershausen wrote: We want to support ssh://bmc@2001:470:1f05:79::1/git/bmc/homedir.git/ because e.g. the Git shipped with Debian (1.7.10.4) (and a lot of other installations) supports it. I understand that this used to work, but it probably shouldn't have ever been accepted. It's nonstandard, and if we accept it for ssh, people will want it to work for https, and due to libcurl, it simply won't. I prefer to see our past acceptance of this format as a bug. This is the first that I've heard of anyone noticing this (since 2013), so it can't be in common usage. If we accept it, we should explicitly document it as being deprecated and note that it's inconsistent with the way everything else works. I was reviewing my Undecided pile today, and I think your objection makes sense. Either of you care to update documentation, please, before I drop this series and forget about it? Thanks. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] connect.c: Improve parsing of literal IPV6 addresses
On 2015-01-22 21.07, brian m. carlson wrote: On Mon, Jan 19, 2015 at 06:21:24PM +0100, Torsten Bögershausen wrote: When parsing an URL, older Git versions did handle URLs like ssh://2001:db8::1/repo.git the same way as ssh://[2001:db8::1]/repo.git Commit 83b058 broke the parsing of IPV6 adresses without []: It was written in mind that the fist ':' in a URL was the beginning of a port number, while the old code was more clever: Literal IPV6 addresses have always at least two ':'. When the hostandport had a ':' and the rest of the hostandport string could be parsed into an integer between 0 and 65536, then it was used as a port number, like host:22. Otherwise the first ':' was assumed to be part of a literal IPV6 address, like ssh://[2001:db8::1]/repo.git or ssh://[::1]/repo.git. Re-introduce the old parsing in get_host_and_port(). Improve the parsing to handle URLs which have a user name and a literal IPV6 like ssh://user@[2001:db8::1]/repo.git. (Thanks to Christian Taube li...@hcf.yourweb.de for reporting this long standing issue) Another regression was introduced in 83b058: A non-RFC conform URL like [localhost:222] could be used in older versions of Git to connect to run ssh -p 222 localhost. The new stricter parsing did not allow this any more. See even 8d3d28f5dba why [localhost:222] should be declared a feature. I'm not sure this is a very good idea. While this may have worked in the past, it's also completely inconsistent with the way all non-SSH URLs work in Git: vauxhall ok % git push https://bmc@2001:470:1f05:79::1/git/bmc/homedir.git master fatal: unable to access 'https://bmc@2001:470:1f05:79::1/git/bmc/homedir.git/': IPv6 numerical address used in URL without brackets vauxhall no % git push https://bmc@[castro.crustytoothpaste.net]/git/bmc/homedir.git master fatal: unable to access 'https://bmc@[castro.crustytoothpaste.net]/git/bmc/homedir.git/': Could not resolve host: [castro.crustytoothpaste.net] vauxhall no % git push https://bmc@[castro.crustytoothpaste.net:443]/git/bmc/homedir.git master fatal: unable to access 'https://bmc@[castro.crustytoothpaste.net:443]/git/bmc/homedir.git/': Could not resolve host: [castro.crustytoothpaste.net I would argue that people using IPv6 literals in URLs should already know how to do it correctly, and the consistency between SSH and HTTPS URLs is a feature, not a bug. In the non-URL SSH form, you still have to use the bracketed form to deal with the case in which somebody creates a directory called 1 in their home directory and writes: We want to support ssh://bmc@2001:470:1f05:79::1/git/bmc/homedir.git/ because e.g. the Git shipped with Debian (1.7.10.4) (and a lot of other installations) supports it. We want to support ssh://bmc@[2001:470:1f05:79::1]/git/bmc/homedir.git/ because that is what other people may expect to work as well: ssh://bmc@[2001:470:1f05:79::1]:/git/bmc/homedir.git/ git push 2001:470:1f05:79::1:1 master when they mean git push [2001:470:1f05:79::1]:1 master That I don't understand this, where is the path name in your example ? Everything after the first ':' is the path in the short form: bmc@hostxx:/git/bmc/homedir.git/ If you really want to use a literal IPV6 with the short form, you must use the brackets: bmc@[2001:470:1f05:79::1]:/git/bmc/homedir.git/ (And you can not have a port number here) Nobody forces somebody to use any specific form. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] connect.c: Improve parsing of literal IPV6 addresses
On Thu, Jan 22, 2015 at 11:05:29PM +0100, Torsten Bögershausen wrote: We want to support ssh://bmc@2001:470:1f05:79::1/git/bmc/homedir.git/ because e.g. the Git shipped with Debian (1.7.10.4) (and a lot of other installations) supports it. I understand that this used to work, but it probably shouldn't have ever been accepted. It's nonstandard, and if we accept it for ssh, people will want it to work for https, and due to libcurl, it simply won't. I prefer to see our past acceptance of this format as a bug. This is the first that I've heard of anyone noticing this (since 2013), so it can't be in common usage. If we accept it, we should explicitly document it as being deprecated and note that it's inconsistent with the way everything else works. We want to support ssh://bmc@[2001:470:1f05:79::1]/git/bmc/homedir.git/ because that is what other people may expect to work as well: ssh://bmc@[2001:470:1f05:79::1]:/git/bmc/homedir.git/ Everyone expects this to work properly, because it's the standard URL form (RFC 2732). I agree we should support it. git push 2001:470:1f05:79::1:1 master when they mean git push [2001:470:1f05:79::1]:1 master That I don't understand this, where is the path name in your example ? The path in question is $HOME/1. That's why the bracket notation is obligatory in the short form. I agree it's a bit bizarre. Everything after the first ':' is the path in the short form: bmc@hostxx:/git/bmc/homedir.git/ If you really want to use a literal IPV6 with the short form, you must use the brackets: bmc@[2001:470:1f05:79::1]:/git/bmc/homedir.git/ (And you can not have a port number here) Right. In my experience, nobody uses the ssh:// form unless they have to (i.e. they need to use a port number); it's extremely uncommon. So they've already become used to using the bracketed notation, because it's already required for the usual form and it's required in the IPv6 URL standard. -- brian m. carlson / brian with sandals: Houston, Texas, US +1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187 signature.asc Description: Digital signature
Re: [PATCH 1/3] connect.c: Improve parsing of literal IPV6 addresses
On Mon, Jan 19, 2015 at 06:21:24PM +0100, Torsten Bögershausen wrote: When parsing an URL, older Git versions did handle URLs like ssh://2001:db8::1/repo.git the same way as ssh://[2001:db8::1]/repo.git Commit 83b058 broke the parsing of IPV6 adresses without []: It was written in mind that the fist ':' in a URL was the beginning of a port number, while the old code was more clever: Literal IPV6 addresses have always at least two ':'. When the hostandport had a ':' and the rest of the hostandport string could be parsed into an integer between 0 and 65536, then it was used as a port number, like host:22. Otherwise the first ':' was assumed to be part of a literal IPV6 address, like ssh://[2001:db8::1]/repo.git or ssh://[::1]/repo.git. Re-introduce the old parsing in get_host_and_port(). Improve the parsing to handle URLs which have a user name and a literal IPV6 like ssh://user@[2001:db8::1]/repo.git. (Thanks to Christian Taube li...@hcf.yourweb.de for reporting this long standing issue) Another regression was introduced in 83b058: A non-RFC conform URL like [localhost:222] could be used in older versions of Git to connect to run ssh -p 222 localhost. The new stricter parsing did not allow this any more. See even 8d3d28f5dba why [localhost:222] should be declared a feature. I'm not sure this is a very good idea. While this may have worked in the past, it's also completely inconsistent with the way all non-SSH URLs work in Git: vauxhall ok % git push https://bmc@2001:470:1f05:79::1/git/bmc/homedir.git master fatal: unable to access 'https://bmc@2001:470:1f05:79::1/git/bmc/homedir.git/': IPv6 numerical address used in URL without brackets vauxhall no % git push https://bmc@[castro.crustytoothpaste.net]/git/bmc/homedir.git master fatal: unable to access 'https://bmc@[castro.crustytoothpaste.net]/git/bmc/homedir.git/': Could not resolve host: [castro.crustytoothpaste.net] vauxhall no % git push https://bmc@[castro.crustytoothpaste.net:443]/git/bmc/homedir.git master fatal: unable to access 'https://bmc@[castro.crustytoothpaste.net:443]/git/bmc/homedir.git/': Could not resolve host: [castro.crustytoothpaste.net I would argue that people using IPv6 literals in URLs should already know how to do it correctly, and the consistency between SSH and HTTPS URLs is a feature, not a bug. In the non-URL SSH form, you still have to use the bracketed form to deal with the case in which somebody creates a directory called 1 in their home directory and writes: git push 2001:470:1f05:79::1:1 master when they mean git push [2001:470:1f05:79::1]:1 master -- brian m. carlson / brian with sandals: Houston, Texas, US +1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187 signature.asc Description: Digital signature
[PATCH 1/3] connect.c: Improve parsing of literal IPV6 addresses
When parsing an URL, older Git versions did handle URLs like ssh://2001:db8::1/repo.git the same way as ssh://[2001:db8::1]/repo.git Commit 83b058 broke the parsing of IPV6 adresses without []: It was written in mind that the fist ':' in a URL was the beginning of a port number, while the old code was more clever: Literal IPV6 addresses have always at least two ':'. When the hostandport had a ':' and the rest of the hostandport string could be parsed into an integer between 0 and 65536, then it was used as a port number, like host:22. Otherwise the first ':' was assumed to be part of a literal IPV6 address, like ssh://[2001:db8::1]/repo.git or ssh://[::1]/repo.git. Re-introduce the old parsing in get_host_and_port(). Improve the parsing to handle URLs which have a user name and a literal IPV6 like ssh://user@[2001:db8::1]/repo.git. (Thanks to Christian Taube li...@hcf.yourweb.de for reporting this long standing issue) Another regression was introduced in 83b058: A non-RFC conform URL like [localhost:222] could be used in older versions of Git to connect to run ssh -p 222 localhost. The new stricter parsing did not allow this any more. See even 8d3d28f5dba why [localhost:222] should be declared a feature. Signed-off-by: Torsten Bögershausen tbo...@web.de --- Unfortunatly my attemps to improve connect.c introduced some regressions: - git clone ssh://::1/repo did not work anymore (for some reason I assumed that literall IPV6 addresses always should have brackets, but that was wrong) - ssh://host:/repo written in the unofficial short form [host:]:repo did not work anymore. I think that should be an unoffical feature, because it worked in older Git versions On top of that, the combination of ssh://username@[host] had never be handled correctly, so fix that as well. Thanks to Christian Taube for reporting it. Comments and an extra pair of critical eyes are welcome. connect.c| 63 ++-- t/t5601-clone.sh | 2 +- 2 files changed, 39 insertions(+), 26 deletions(-) diff --git a/connect.c b/connect.c index d47d0ec..b608976 100644 --- a/connect.c +++ b/connect.c @@ -274,28 +274,44 @@ static enum protocol get_protocol(const char *name) die(I don't handle protocol '%s', name); } +static char *host_end(char **hoststart, int removebrackets) +{ + char *host = *hoststart; + char *end; + char *start = strstr(host, @[); + if (start) + start++; /* Jump over '@' */ + else + start = host; + if (start[0] == '[') { + end = strchr(start + 1, ']'); + if (end) { + if (removebrackets) { + *end = 0; + memmove(start, start + 1, end - start); + end++; + } + } else + end = host; + } else + end = host; + return end; +} + #define STR_(s)# s #define STR(s) STR_(s) static void get_host_and_port(char **host, const char **port) { char *colon, *end; - - if (*host[0] == '[') { - end = strchr(*host + 1, ']'); - if (end) { - *end = 0; - end++; - (*host)++; - } else - end = *host; - } else - end = *host; + end = host_end(host, 1); colon = strchr(end, ':'); - if (colon) { - *colon = 0; - *port = colon + 1; + long portnr = strtol(colon + 1, end, 10); + if (end != colon + 1 *end == '\0' 0 = portnr portnr 65536) { + *colon = 0; + *port = colon + 1; + } } } @@ -547,13 +563,16 @@ static struct child_process *git_proxy_connect(int fd[2], char *host) return proxy; } -static const char *get_port_numeric(const char *p) +static char *get_port(char *host) { char *end; + char *p = strchr(host, ':'); + if (p) { long port = strtol(p + 1, end, 10); if (end != p + 1 *end == '\0' 0 = port port 65536) { - return p; + *p = '\0'; + return p+1; } } @@ -595,14 +614,7 @@ static enum protocol parse_connect_url(const char *url_orig, char **ret_host, * Don't do destructive transforms as protocol code does * '[]' unwrapping in get_host_and_port() */ - if (host[0] == '[') { - end = strchr(host + 1, ']'); - if (end) { - end++; - } else - end = host; - } else - end = host; + end = host_end(host, 0); if (protocol == PROTO_LOCAL) path = end; @@ -705,7