[issue18828] urljoin behaves differently with custom and standard schemas

2016-09-12 Thread Martin Panter

Martin Panter added the comment:

Recording bugs reports for specific schemes as dependencies of this:

Issue 25895: ws(s)
Issue 16134: rtmp(e/s/t)
Issue 23759: coap(s)

--
dependencies: +Add support for RTMP schemes to urlparse, urllib.parse.urljoin 
does not handle WebSocket URLs, urllib.parse: make coap:// known

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18828] urljoin behaves differently with custom and standard schemas

2015-03-26 Thread Martin Panter

Martin Panter added the comment:

If necessary, we can add a new non_relative list, rather than changing 
non_hierarchical. The repository history shows that “non_hierarchical” was 
updated with new schemes once or twice, but has never been used since it was 
added to Python as “urlparse.py”.

IMAP, WAIS and Gopher URLs can have extra components added using slashes, which 
satisfies my idea of “hierarchical”. For IMAP, I think this is explicitly 
mentioned in the RFC: . For 
WAIS, the hierarchy is not arbitrary, but your resulting URL 
wais://f...@bar.com/newpath probably matches the 
wais://:/ URL form, and I am not proposing to change that 
behaviour.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18828] urljoin behaves differently with custom and standard schemas

2015-03-26 Thread Demian Brecht

Demian Brecht added the comment:

> The current behaviour when no scheme is present is fairly sensible to me and 
> should not be changed to do string concatenation nor raise an exception

Agreed. Defaulting to relative behaviour makes sense as I imagine that'll be 
the general use case.

> I removed the gopher, wais, and imap schemes from the list

I'd be concerned about removing items as non_hierarchical /is/ public facing 
and it's reasonable to assume that there are libraries out there that depend on 
these. Additionally, at a glance through their respective RFCs, it seems that 
these three protocols /do/ belong in the non_hierarchical list. While WAIS and 
IMAP do use / as a delimiter, they're not hierarchical and therefore relative 
joining doesn't make sense. For example, with the following definition in mind 
(RFC4156):

wais://:///

The following will result in an incorrect URL:

urljoin('wais://f...@bar.com/mydb/type/path', '/newpath')


> However I am still not really convinced that my first urljoin-scheme.patch is 
> a bad idea. Do people actually use urljoin() with these schemes like mailto 
> in the first place?

I'd be inclined to agree that it's far from common practice. That said, I did 
find one instance of a project that seems to depend on current behaviour 
(although it's only in tests and I haven't looked any deeper): 
https://github.com/chfoo/wpull/blob/32837d7c5614d7f90b8242e1fbb41f8da9bc7ce7/wpull/url_test.py#L637.
 I imagine that the current behaviour may possibly be useful for utilities such 
as web crawlers. In light of that and the fact that the urllib.parse docs 
currently has a list of protocols that are intended to be supported across the 
entire module's API, I think that it's important to not break backwards 
compatibility in cases where the relative URL would have been returned. Your 
second patch seems to have this behaviour which I think is preferable.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18828] urljoin behaves differently with custom and standard schemas

2015-03-26 Thread Martin Panter

Martin Panter added the comment:

The current behaviour when no scheme is present is fairly sensible to me and 
should not be changed to do string concatenation nor raise an exception:

>>> urljoin("//netloc/old/path", "new/path")
'//netloc/old/new/path'

I am posting urljoin-non-hier.patch as an alternative to my first patch. This 
one changes urljoin() to work on any URL scheme not in the existing 
“non_hierarchical” blacklist. I removed the gopher, wais, and imap schemes from 
the list, and added tel, so that urljoin() continues to treat these special 
cases as before. Out of the schemes mentioned in the module but missing from 
uses_relative, I think non_hierarchical now has all those without directory 
components: hdl, mailto, news, sip, sips, snews, tel, telnet.

However I am still not really convinced that my first urljoin-scheme.patch is a 
bad idea. Do people actually use urljoin() with these schemes like mailto in 
the first place?

--
Added file: http://bugs.python.org/file38698/urljoin-non-hier.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18828] urljoin behaves differently with custom and standard schemas

2015-03-24 Thread Berker Peksag

Berker Peksag added the comment:

> Yet another option, similar to my “any_scheme=True” flag, might be to change 
> from the “uses_relative” white-list to a “not_relative” black-list of URL 
> schemes, [...]

I think this looks like a good solution.

--
versions:  -Python 3.3

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18828] urljoin behaves differently with custom and standard schemas

2015-03-19 Thread Demian Brecht

Demian Brecht added the comment:

Also, I would suggest still including the doc changes proposed by Madison in 
all versions prior to 3.5.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18828] urljoin behaves differently with custom and standard schemas

2015-03-19 Thread Demian Brecht

Demian Brecht added the comment:

 urljoin('mailto:foo@', 'bar.com')
> 'mailto:bar.com'
> 
> which seems fairly sensible to me.

This is where joining arbitrary protocols gets tricky. Does it make sense to 
merge non-hierarchical protocols such as mailto? My initial reaction is "no" 
and what should actually happen here is one of two things:

1. The result is a simple concatenation: "mailto:f...@bar.com";.
2. An exception is raised indicating that urljoin cannot determine how to 
handle merging base and url.

The above could happen in cases where either scheme is None for both base and 
url or the scheme to be used is any of urllib.parse.non_hierarchical.

> A more awkward question is if this behaviour of my patch is reasonable:
> 
 urljoin('mailto:person-foo/b...@example.net', 'bar.com')
> 'mailto:person-foo/bar.com'

A couple thoughts on this: If urllib.parse.non_hierarchical is used to 
determine merge vs. simple concat (or exception), this specific case won't be 
an issue. Also, according to 6068, "mailto:person-foo/b...@example.net' is 
invalid (the "/" should be percent-encoded), but I don't think it should be the 
job of urljoin to understand the URI structures of each protocol, outside of 
logically join base and url.

> Yet another option, similar to my “any_scheme=True” flag, might be to change 
> from the “uses_relative” white-list to a “not_relative” black-list of URL 
> schemes, so that urljoin() works for arbitrary schemes except for ones like 
> “mailto:” that are in the hard-coded list.

This list may already be present in urllib.parse.non_hierarchical. I also think 
it's worthwhile to do some further research against 
http://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml to ensure the 
list is up to date.

If this path is chosen, I would suggest getting sign off from a couple core 
devs prior to investing time in this as all changes discussed so far are 
backwards incompatible.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18828] urljoin behaves differently with custom and standard schemas

2015-03-19 Thread Martin Panter

Martin Panter added the comment:

I opened Issue 23703 about the funny doubled bar.com result. After backing out 
revision 901e4e52b20a, but with my patch here applied:

>>> urljoin('mailto:foo@', 'bar.com')
'mailto:bar.com'

which seems fairly sensible to me. A more awkward question is if this behaviour 
of my patch is reasonable:

>>> urljoin('mailto:person-foo/b...@example.net', 'bar.com')
'mailto:person-foo/bar.com'

Yet another option, similar to my “any_scheme=True” flag, might be to change 
from the “uses_relative” white-list to a “not_relative” black-list of URL 
schemes, so that urljoin() works for arbitrary schemes except for ones like 
“mailto:” that are in the hard-coded list.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18828] urljoin behaves differently with custom and standard schemas

2015-03-17 Thread Demian Brecht

Changes by Demian Brecht :


--
stage:  -> patch review
versions: +Python 3.5

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18828] urljoin behaves differently with custom and standard schemas

2015-03-17 Thread Demian Brecht

Demian Brecht added the comment:

> I haven’t heard any arguments against this option yet, and it didn’t break 
> any tests.

Pre patch:

>>> urljoin('mailto:foo@', 'bar.com')
'bar.com'

Post patch:

>>> urljoin('mailto:foo@', 'bar.com')
'mailto:bar.com/bar.com'

I'm taking an educated guess here based on a marginal amount of research (there 
are just a few registered schemes at 
http://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml that should be 
understood), but it /seems/ like perhaps the current behaviour is intended to 
safeguard against joining non-hierarchical schemes in which case you'd get 
nonsensical values. It does seem a little odd to me, but I definitely prefer 
the former behaviour to the latter.

I think that short term, Madison's suggestion about documenting uses_relative 
would be an easy win and can be applied to all branches. Long term though, I 
think it would be nice to have a generalized urljoin() method that accounts for 
most (if not all) classifications of url schemes.

Thoughts?

--
nosy: +demian.brecht

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18828] urljoin behaves differently with custom and standard schemas

2014-12-16 Thread Martin Panter

Martin Panter added the comment:

I think a global registry seems like overkill. Here is a patch to make 
urljoin() treat schemes more equally and work with arbitrary schemes 
automatically. I haven’t heard any arguments against this option yet, and it 
didn’t break any tests.

Another option, still simpler than a registry, would be an extra parameter, say 
urljoin(a, b, any_scheme=True).

--
keywords: +patch
Added file: http://bugs.python.org/file37480/urljoin-scheme.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18828] urljoin behaves differently with custom and standard schemas

2013-09-02 Thread Madison May

Madison May added the comment:

>How about adding a codecs.register like public API for 3.4+?

A codecs style register function seems like an excellent solution to me.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18828] urljoin behaves differently with custom and standard schemas

2013-09-02 Thread Madison May

Madison May added the comment:

If nothing else, we should document the work around for this issue.

>>> import urllib.parse
>>> urllib.parse.uses_relative.append('redis')
>>> urllib.parse.uses_netloc.append('redis')
>>> urllib.parse.urljoin('redis://localhost:6379/0', '/1')
'redis://localhost:6379/1'

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18828] urljoin behaves differently with custom and standard schemas

2013-09-02 Thread Berker Peksag

Berker Peksag added the comment:

How about adding a codecs.register like public API for 3.4+?

import urllib.parse

urllib.parse.schemes.register('redis', 'rtmp')

or:

urllib.parse.urljoin('redis://localhost:6379/0', '/1', scheme='redis')

or just:

urllib.parse.schemes.extend(['redis', 'rtmp'])

--
nosy: +berker.peksag

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18828] urljoin behaves differently with custom and standard schemas

2013-08-29 Thread Martin Panter

Martin Panter added the comment:

Similarly, I expected this to return "rtmp://host/app?auth=token":

urljoin("rtmp://host/app", "?auth=token")

I'm not sure adding everybody's custom scheme to a hard-coded whitelist is the 
best way to do solve this.

Below I have identified some other schemes not in the "uses_relative" list. Is 
there any reason why one would use urljoin() with them, but want the base URL 
to be ignored (as is the current behaviour)? I looked at test_urlparse.py and 
there doesn't seem to be any test cases for these schemes.

>>> all = set().union(uses_relative, uses_netloc, uses_params, 
>>> non_hierarchical, uses_query, uses_fragment)
>>> sorted(all.difference(uses_relative))
['git', 'git+ssh', 'hdl', 'mailto', 'news', 'nfs', 'rsync', 'sip', 'sips', 
'snews', 'tel', 'telnet']

Even if the behaviour can't be changed, could the documentation for urljoin() 
say something like this:

Only the following [uses_relative] schemes are allowed in the base URL; any 
other schemes result in the relative URL being returned without being joined to 
the base.

--
nosy: +vadmium

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18828] urljoin behaves differently with custom and standard schemas

2013-08-29 Thread Madison May

Changes by Madison May :


--
components: +Library (Lib)

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18828] urljoin behaves differently with custom and standard schemas

2013-08-26 Thread Ned Deily

Changes by Ned Deily :


--
nosy: +orsenthil

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18828] urljoin behaves differently with custom and standard schemas

2013-08-26 Thread Madison May

Madison May added the comment:

>From urllib.parse:

uses_relative = ['ftp', 'http', 'gopher', 'nntp', 'imap',
 'wais', 'file', 'https', 'shttp', 'mms',
 'prospero', 'rtsp', 'rtspu', '', 'sftp',
 'svn', 'svn+ssh']

>From urllib.parse.urljoin (scheme='redis' and url='/1' in your example): 

if scheme != bscheme or scheme not in uses_relative:
return _coerce_result(url)

Should the 'redis' scheme be added to uses_relative, perhaps?

--
nosy: +madison.may

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18828] urljoin behaves differently with custom and standard schemas

2013-08-25 Thread Mher Movsisyan

New submission from Mher Movsisyan:

>>> urljoin('redis://localhost:6379/0', '/1')
'/1'
>>> urljoin('http://localhost:6379/0', '/1')
'http://localhost:6379/1'

--
messages: 196125
nosy: mher.movsisyan
priority: normal
severity: normal
status: open
title: urljoin behaves differently with custom and standard schemas
type: behavior
versions: Python 2.7, Python 3.3, Python 3.4

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com