New submission from brent s. <brent.sa...@gmail.com>:

Currently, a parsed urlparse() object looks (roughly) like this:

urlparse('http://example.com/foo;key1=value1?key2=value2#key3=value3#key4=value4')

returns:

ParseResult(scheme='http', netloc='example.com', path='/foo', 
params='key1=value1', query='key2=value2', fragment='key3=value3#key4=value4')

However, I recommend a couple things:

0.) that ParseResult objects support dict emulation. e.g. one can run:

        dict(parseresult_obj)

    and get (using the example string above (corrected classification for 
RFC2986 compliance and common usage):

        {'fragment': [('key4', 'value4')],
         'netloc': 'foo.tld',
         'params': [('key2', 'value2')],
         'path': '/foo',
         'query': [('key3', 'value3')],
         'scheme': 'http'}

    Obviously, fragment, params, and query could instead be serialized into a 
nested dict. I'm not sure which is more preferred in the pythonic sense.

1.) Better RFC3986 compliance.
    Per RFC3986 ยง 3 (https://tools.ietf.org/html/rfc3986#section-3), the URL 
can be further split into separate components. For instance, while considered 
deprecated, should "userinfo" (e.g. "http://user:password@...";) be parsed? At 
the very least, the port should be parsed out to a separate component from the 
netloc (or userinfo parsed out separate from netloc) - this will assist in 
parsing host:port combinations in netlocs that contain both userinfo and a 
specified port (and allow the port to be given as an int type, thus more easily 
used in e.g. the socket lib).

2.) If a component is not present, I suggest it be a None object instead of an 
empty string.
    e.g.:

        urlparse('http://example.com/foo')

    Would return:

        ParseResult(scheme='http', netloc='example.com', path='/foo', 
params=None, query=None, fragment=None)

    instead of

        ParseResult(scheme='http', netloc='example.com', path='/foo', 
params='', query='', fragment='')

----------
components: Library (Lib)
messages: 316454
nosy: bsaner
priority: normal
severity: normal
status: open
title: Improvement suggestions for urllib.parse.urlparser
type: enhancement

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue33480>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to