New submission from brent s. <[email protected]>:
Currently, a parsed urlparse() object looks (roughly) like this:
urlparse('http://example.com/foo;key1=value1?key2=value2#key3=value3#key4=value4')
returns:
ParseResult(scheme='http', netloc='example.com', path='/foo',
params='key1=value1', query='key2=value2', fragment='key3=value3#key4=value4')
However, I recommend a couple things:
0.) that ParseResult objects support dict emulation. e.g. one can run:
dict(parseresult_obj)
and get (using the example string above (corrected classification for
RFC2986 compliance and common usage):
{'fragment': [('key4', 'value4')],
'netloc': 'foo.tld',
'params': [('key2', 'value2')],
'path': '/foo',
'query': [('key3', 'value3')],
'scheme': 'http'}
Obviously, fragment, params, and query could instead be serialized into a
nested dict. I'm not sure which is more preferred in the pythonic sense.
1.) Better RFC3986 compliance.
Per RFC3986 ยง 3 (https://tools.ietf.org/html/rfc3986#section-3), the URL
can be further split into separate components. For instance, while considered
deprecated, should "userinfo" (e.g. "http://user:password@...") be parsed? At
the very least, the port should be parsed out to a separate component from the
netloc (or userinfo parsed out separate from netloc) - this will assist in
parsing host:port combinations in netlocs that contain both userinfo and a
specified port (and allow the port to be given as an int type, thus more easily
used in e.g. the socket lib).
2.) If a component is not present, I suggest it be a None object instead of an
empty string.
e.g.:
urlparse('http://example.com/foo')
Would return:
ParseResult(scheme='http', netloc='example.com', path='/foo',
params=None, query=None, fragment=None)
instead of
ParseResult(scheme='http', netloc='example.com', path='/foo',
params='', query='', fragment='')
----------
components: Library (Lib)
messages: 316454
nosy: bsaner
priority: normal
severity: normal
status: open
title: Improvement suggestions for urllib.parse.urlparser
type: enhancement
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue33480>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com