Martin Panter added the comment:

It is true that 3.5 is meant to follow RFC 3986, which obsoletes RFC 1808 and 
specifies slightly different behaviour for abnormal cases. This change is 
documented under urljoin(), and also in “What’s New in 3.5”. Pavel’s first case 
is one of these differences in the RFCs, and I don’t think it is a bug. 
According to <https://tools.ietf.org/html/rfc3986.html#section-5.2.4>,

“The remove_dot_segments algorithm respects [the base’s] hierarchy by removing 
extra dot-segments rather than treating them as an error or leaving them to be 
misinterpreted by dereference implementations.”

For Pavel’s second and third cases, RFC 3986 doesn’t cover them directly 
because the base URL is relative. The RFC only covers absolute base URLs, which 
start with a scheme like “http:”. The documentation doesn’t really bless these 
cases either: ‘Construct a full (“absolute”) URL’. However there is explicit 
support in the source code ("" in urllib.parse.uses_relative).

It looks like 3.5 is strict in following the RFC’s Remove Dot Segments 
algorithm. Step 2C says that for “/../” or “/..”, the parent segment is 
removed, but the input is always replaced with “/”:

“a/..” → “/”
“a/../..” → “/..” → “/”

I would prefer a less strict interpretation of the spirit of the algorithm. Do 
not introduce a slash in the input if you did not remove one from the output 
buffer:

“a/..” → empty URL
“a/../..” → “..” → empty URL

Python 3.4 and earlier did not behave sensibly if you extend the relative URL:

>>> urljoin("a/", "..")
''
>>> urljoin("a/", "../..")
'..'
>>> urljoin("a/", "../../..")
''
>>> urljoin("a/", "../../../..")
'../'

Pavel, what behaviour would you expect in these cases? My empty URL 
interpretation, or perhaps a more sensible version of the Python 3.4 behaviour? 
What is your use case?

One related more serious (IMO) regression I noticed compared to 3.4, where the 
path becomes a host name:

>>> urljoin("file:///base", "/dummy/..//host/oops")
'file://host/oops'

----------
components:  -Interpreter Core

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue25403>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to