[issue25403] urllib.parse.urljoin is broken in python 3.5

2021-12-02 Thread STINNER Victor


Change by STINNER Victor :


--
nosy:  -vstinner

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25403] urllib.parse.urljoin is broken in python 3.5

2021-12-01 Thread Irit Katriel


Irit Katriel  added the comment:

See also 37235, 40594.

--
nosy: +iritkatriel

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25403] urllib.parse.urljoin is broken in python 3.5

2018-09-23 Thread Karthikeyan Singaravelan


Change by Karthikeyan Singaravelan :


--
nosy: +xtreak

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25403] urllib.parse.urljoin is broken in python 3.5

2016-01-03 Thread Ezio Melotti

Changes by Ezio Melotti :


--
nosy: +ezio.melotti

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25403] urllib.parse.urljoin is broken in python 3.5

2015-10-15 Thread Wei Wu

Wei Wu added the comment:

It's a change made in 3.5 that resolution of relative URLs confirms to the RFC 
3986. See https://bugs.python.org/issue22118 for details.

--
nosy: +kilowu

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25403] urllib.parse.urljoin is broken in python 3.5

2015-10-15 Thread STINNER Victor

Changes by STINNER Victor :


--
nosy: +berker.peksag, martin.panter, orsenthil

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25403] urllib.parse.urljoin is broken in python 3.5

2015-10-15 Thread STINNER Victor

STINNER Victor added the comment:

See also this change:

changeset:   95683:fc0e79387a3a
user:Berker Peksag 
date:Thu Apr 16 02:31:14 2015 +0300
files:   Lib/test/test_urlparse.py Lib/urllib/parse.py Misc/NEWS
description:
Issue #23703: Fix a regression in urljoin() introduced in 901e4e52b20a.

Patch by Demian Brecht.

--
nosy: +haypo

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25403] urllib.parse.urljoin is broken in python 3.5

2015-10-15 Thread Berker Peksag

Changes by Berker Peksag :


--
keywords: +3.5regression

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25403] urllib.parse.urljoin is broken in python 3.5

2015-10-15 Thread Martin Panter

Martin Panter added the comment:

It is true that 3.5 is meant to follow RFC 3986, which obsoletes RFC 1808 and 
specifies slightly different behaviour for abnormal cases. This change is 
documented under urljoin(), and also in “What’s New in 3.5”. Pavel’s first case 
is one of these differences in the RFCs, and I don’t think it is a bug. 
According to ,

“The remove_dot_segments algorithm respects [the base’s] hierarchy by removing 
extra dot-segments rather than treating them as an error or leaving them to be 
misinterpreted by dereference implementations.”

For Pavel’s second and third cases, RFC 3986 doesn’t cover them directly 
because the base URL is relative. The RFC only covers absolute base URLs, which 
start with a scheme like “http:”. The documentation doesn’t really bless these 
cases either: ‘Construct a full (“absolute”) URL’. However there is explicit 
support in the source code ("" in urllib.parse.uses_relative).

It looks like 3.5 is strict in following the RFC’s Remove Dot Segments 
algorithm. Step 2C says that for “/../” or “/..”, the parent segment is 
removed, but the input is always replaced with “/”:

“a/..” → “/”
“a/../..” → “/..” → “/”

I would prefer a less strict interpretation of the spirit of the algorithm. Do 
not introduce a slash in the input if you did not remove one from the output 
buffer:

“a/..” → empty URL
“a/../..” → “..” → empty URL

Python 3.4 and earlier did not behave sensibly if you extend the relative URL:

>>> urljoin("a/", "..")
''
>>> urljoin("a/", "../..")
'..'
>>> urljoin("a/", "../../..")
''
>>> urljoin("a/", "../../../..")
'../'

Pavel, what behaviour would you expect in these cases? My empty URL 
interpretation, or perhaps a more sensible version of the Python 3.4 behaviour? 
What is your use case?

One related more serious (IMO) regression I noticed compared to 3.4, where the 
path becomes a host name:

>>> urljoin("file:///base", "/dummy/..//host/oops")
'file://host/oops'

--
components:  -Interpreter Core

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25403] urllib.parse.urljoin is broken in python 3.5

2015-10-14 Thread Pavel Ivanov

New submission from Pavel Ivanov:

urllib.parse.urljoin does not conform the RFC 1808 in case of joining relative 
URL’s containing ‘..’ path components anymore.

Examples:

Python 3.4: 
>>> urllib.parse.urljoin('http://a.com', '..')
'http://a.com/..'
Python 3.5:
>>> urllib.parse.urljoin('http://a.com', '..')
'http://a.com/'

Python 3.4: 
>>> urllib.parse.urljoin('a/’, '..')
''
Python 3.5:
>>> urllib.parse.urljoin('a/', '..')
'/'

Python 3.4: 
>>> urllib.parse.urljoin('a/’, '../..')
'..'
Python 3.5:
>>> urllib.parse.urljoin('a/', '../..')
'/'

Python 3.4 conforms RFC 1808 in these scenarios, but Python 3.5 does not.

--
components: Interpreter Core, Library (Lib)
messages: 252986
nosy: Pavel Ivanov
priority: normal
severity: normal
status: open
title: urllib.parse.urljoin is broken in python 3.5
type: behavior
versions: Python 3.5, Python 3.6

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com