On 2015-08-05 23.19, Jeff King wrote:
> On Wed, Aug 05, 2015 at 10:34:34AM -0700, Junio C Hamano wrote:
> 
>>> As you can see, there is a lot of complexity in there and I'm not
>>> convinced this is better than just exposing
>>> 'parse_connect_url()', which already handles everything for us.
I try expose and use parse_connect_url():
It handles the scp-like syntax "host:/path,
literall IPV6 addresses, port numbers,
':' without a port number and all other Git specific parsing,
which is inside and outside the RFC 3986.
(I should know, because I managed to break the parser twice,
and fix it)

I added a diagnostics to connect.c, and if you run the a simply test,
we can see that the colon slash logic is often unsufficient:

tb@mypc:~/projects/git/tb.150731_connect> ./git fetch-pack --diag-url 
ssh://host/
Diag: url=ssh://host/
Diag: protocol=ssh
Diag: userandhost=host
Diag: port=NONE
Diag: path=/
Diag: guesseddir=host/
tb@macce:~/projects/git/tb.150731_connect> ./git fetch-pack --diag-url 
ssh://host:/
Diag: url=ssh://host:/
Diag: protocol=ssh
Diag: userandhost=host
Diag: port=NONE
Diag: path=/
Diag: guesseddir=/


On top of that, you can easily write test cases in t5601, as many as you want.
The (minor) drawback is that it doesn't handle http:// or https://,
but that is easy to add in the parser, and doesn't break existing code.

The major which remains is to search for '@' in userandhost,
and strip that off.
(Or when there is a '@', search for a ':' before the '@', and strip that off)
After that, all non-printable characters should be %-escaped.
If we replace ':' as non-printable as well, we can make Windows users 1% more 
happy.


>>
>> If the function "handles everything for us", that's fine, but the
>> primary reason I am hesitant is because parse_connect_url() was
>> designed specifically not to have to worry about some protocols
>> (e.g. I think feeding it a "http://"; would fail, and more
>> importantly, its current callers want such a call to fail).  Also it
>> is meant to handle some non-protocols (e.g. scp style host:path that
>> does not follow <scheme>://...).
> 
> True, but the transport code _is_ handling that at some point. It makes
> me wonder if it would be possible to push the call to transport_get
> further up inside cmd_clone(), and then provide some way to query the
> remote path and hostname from the transport code. Then guess_dir_name
> could just go away entirely, in favor of something like:
> 
>   dir_name = transport_get_path(transport);
>   if (!*dir_name)
>       dir_name = transport_get_host(transport);
> 
> That may be overly simplistic or unworkable, though. I haven't dug into
> the code.
> 
>> Also does it handle the "2222" case above?  I do not think
>> parse_connect_url() even calls get_host_and_port() to be able to
>> tell what "2222" means in these examples.
> 
> Speaking of which, has anyone tested whether the old or new code handles
> external remote helpers? Certainly:
> 
>   foo::https://host/repo.git
> 
> should still use repo.git. But technically the string handed to
> git-remote-foo does not have to look anything like a URL. In those cases
> neither guess_dir_name nor the transport code have any idea what anything
> to the right of the "::" means; we probably have to resort to blind
> guessing based on characters like colon and slash.
> 
It is easy to strip the foo:: part of the url, assume that
the remote helper uses a RFC 3986 similar url syntax, so that we
can feed the reminding https://host/repo.git into the parser (see above).

If the remote helper doesn't do this, we can't guess anything, can we ?
So error out and tell the user seems the right thing to do.

In the hope that this is useful, pushed my prototype branch to
https://github.com/tboegi/git/tree/150731_connect_diag_guess_name









--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to