[issue35891] urllib.parse.splituser has no suitable replacement

Jason R. Coombs Sun, 03 Feb 2019 07:11:41 -0800


New submission from Jason R. Coombs <[email protected]>:


The removal of splituser (issue27485) has the undesirable effect of leaving the 
programmer without a suitable alternative. The deprecation warning states to 
use `urlparse` instead, but `urlparse` doesn't provide the access to the 
`credential` or `address` components of a URL.

Consider for example:

>>> import urllib.parse
>>> url = 'https://user:password@host:port/path'
>>> parsed = urllib.parse.urlparse(url)
>>> urllib.parse.splituser(parsed.netloc)
('user:password', 'host:port')

It's not readily obvious how one might get those two values, the credential and 
the address, from `parsed`. Sure, you can get `username` and `password`. You 
can get `hostname` and `port`. But if what you want is to remove the credential 
and keep the address, or extract the credential and pass it unchanged as a 
single string to something like an `_encode_auth` handler, that's no longer 
possible without some careful handling--because of possible None values, 
re-assembling a username/password into a colon-separated string is more 
complicated than simply doing a ':'.join.

This recommendation and limitation led to issues in production code and 
ultimately the inline adoption of the deprecated function, [summarized 
here](https://github.com/pypa/setuptools/pull/1670).

I believe if splituser is to be deprecated, the netloc should provide a 
suitable alternative - namely that a `urlparse` result should supply `address` 
and `userinfo`. Such functionality would make it easier to transition code that 
currently relies on splituser for more than to parse out the username and 
password.

Even better would be for the urlparse result to support `_replace` operations 
on these attributes... so that one wouldn't have to construct a netloc just to 
construct a URL that replaces only some portion of the netloc, so one could do 
something like:

>>> parsed = urllib.parse.urlparse(url)
>>> without_userinfo = parsed._replace(userinfo=None).geturl()
>>> alt_port = parsed._replace(port=443).geturl()

I realize that because of the nesting of abstractions (namedtuple for the main 
parts), that maybe this technique doesn't extend nicely, so maybe the netloc 
itself should provide this extensibility for a usage something like this:

>>> parsed = urllib.parse.urlparse(url)
>>> without_userinfo = 
>>> parsed._replace(netloc=parsed.netloc._replace(userinfo=None)).geturl()
>>> alt_port = parsed._replace(netloc=parsed.netloc._replace(port=443)).geturl()


It's not as elegant, but likely simpler to implement, with netloc being 
extended with a _replace method to support replacing segments of itself (and 
still immutable)... and is dramatically less error-prone than the status quo 
without splituser.

In any case, I don't think it's suitable to leave it to the programmer to have 
to muddle around with their own URL parsing logic. urllib.parse should provide 
some help here.

----------
components: Library (Lib)
messages: 334793
nosy: jason.coombs
priority: normal
severity: normal
status: open
title: urllib.parse.splituser has no suitable replacement
type: behavior

_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue35891>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35891] urllib.parse.splituser has no suitable replacement

Reply via email to