STINNER Victor <vstin...@python.org> added the comment:
(The first message is basically David's email rephrased. Here is my reply ;-)) > This could present issues if server-side checks are used by applications to > validate a URLs authority. Which kind of application would be affected by this vulnerability? It's unclear to me if urllib should be modified to explicitly reject \ in netloc, or if only third-party code should pay attention to this corner case (potential vulnerability). The urllib module has _parse_proxy() and HTTPPasswordMgr.reduce_uri() code which use an "authority" variable. Example: --- from urllib.parse import urlsplit, _splitport, _splittype, _splituser, _splitpasswd def _parse_proxy(proxy): """Return (scheme, user, password, host/port) given a URL or an authority. If a URL is supplied, it must have an authority (host:port) component. According to RFC 3986, having an authority component means the URL must have two slashes after the scheme. """ scheme, r_scheme = _splittype(proxy) if not r_scheme.startswith("/"): # authority scheme = None authority = proxy else: # URL if not r_scheme.startswith("//"): raise ValueError("proxy URL with no authority: %r" % proxy) # We have an authority, so for RFC 3986-compliant URLs (by ss 3. # and 3.3.), path is empty or starts with '/' end = r_scheme.find("/", 2) if end == -1: end = None authority = r_scheme[2:end] userinfo, hostport = _splituser(authority) if userinfo is not None: user, password = _splitpasswd(userinfo) else: user = password = None return scheme, user, password, hostport def reduce_uri(uri, default_port=True): """Accept authority or URI and extract only the authority and path.""" # note HTTP URLs do not have a userinfo component parts = urlsplit(uri) if parts[1]: # URI scheme = parts[0] authority = parts[1] path = parts[2] or '/' else: # host or host:port scheme = None authority = uri path = '/' host, port = _splitport(authority) if default_port and port is None and scheme is not None: dport = {"http": 80, "https": 443, }.get(scheme) if dport is not None: authority = "%s:%d" % (host, dport) return authority, path def test(uri): print(f"{uri} => reduce_uri: {reduce_uri(uri)}") print(f"{uri} => _parse_proxy: {_parse_proxy(uri)}") test(r"https://www.example.com") test(r"https://u...@www.example.com") test(r"https://xdavidhu.me\test.corp.google.com") test(r"https://user:passw...@xdavidhu.me\test.corp.google.com") --- Output on Python 3.9: --- https://www.example.com => reduce_uri: ('www.example.com:443', '/') https://www.example.com => _parse_proxy: ('https', None, None, 'www.example.com') https://u...@www.example.com => reduce_uri: ('u...@www.example.com:443', '/') https://u...@www.example.com => _parse_proxy: ('https', 'user', None, 'www.example.com') https://xdavidhu.me\test.corp.google.com => reduce_uri: ('xdavidhu.me\\test.corp.google.com:443', '/') https://xdavidhu.me\test.corp.google.com => _parse_proxy: ('https', None, None, 'xdavidhu.me\\test.corp.google.com') https://user:passw...@xdavidhu.me\test.corp.google.com => reduce_uri: ('user:passw...@xdavidhu.me\\test.corp.google.com:443', '/') https://user:passw...@xdavidhu.me\test.corp.google.com => _parse_proxy: ('https', 'user', 'password', 'xdavidhu.me\\test.corp.google.com') --- It seems to behave as expected, no? ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue40338> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com