New submission from Stian Soiland-Reyes:
urllib.parse can't handle URIs with empty #fragments. The fragment is removed
and not reconsituted.
http://tools.ietf.org/html/rfc3986#section-3.5 permits empty fragment strings:
URI-reference = [ absoluteURI | relativeURI ] [ # fragment ]
fragment= *( pchar / / / ? )
And even specifies component recomposition to distinguish from not being
defined and being an empty string:
http://tools.ietf.org/html/rfc3986#section-5.3
Note that we are careful to preserve the distinction between a
component that is undefined, meaning that its separator was not
present in the reference, and a component that is empty, meaning that
the separator was present and was immediately followed by the next
component separator or the end of the reference.
This seems to be caused by missing components being represented as '' instead
of None.
import urllib.parse
urllib.parse.urlparse(http://example.com/file#;)
ParseResult(scheme='http', netloc='example.com', path='/file', params='',
query='', fragment='')
urllib.parse.urlunparse(urllib.parse.urlparse(http://example.com/file#;))
'http://example.com/file'
urllib.parse.urlparse(http://example.com/file#;).geturl()
'http://example.com/file'
urllib.parse.urlparse(http://example.com/file# ).geturl()
'http://example.com/file# '
urllib.parse.urlparse(http://example.com/file#nonempty;).geturl()
'http://example.com/file#nonempty'
urllib.parse.urlparse(http://example.com/file#;).fragment
''
The suggested fix is to use None instead of '' to represent missing components,
and to check with if fragment is not None instead of if not fragment.
The same issue applies to query and authority. E.g.
http://example.com/file? != http://example.com/file
... but be careful about the implications of
file:///file != file:/file
--
components: Library (Lib)
messages: 231070
nosy: soilandreyes
priority: normal
severity: normal
status: open
title: urllib.parse wrongly strips empty #fragment
versions: Python 3.5
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22852
___
___
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com