Hi, I propose to deprecate the urllib module in Python 3.11. It would emit a DeprecationWarning which warn users, so users should consider better alternatives like urllib3 or httpx: well known modules, better maintained, more secure, support HTTP/2 (httpx), etc.
I don't propose to schedule its removal. Let's discuss the removal in 1 or 2 years. -- urllib has many abstraction to support a wide range of protocols with "handlers": HTTP, HTTPS, FTP, "local file", proxy, HTTP authentication, HTTP Cookie, etc. A simple HTTP request using Basic Authentication requires 10-20 lines of code, whereas it should be a single line. Users (me included) don't like urllib API which was too complicated for common tasks. -- Unhappy users created multiple better alternatives to the stdlib urllib module. In 2008, the "urllib3" module was created to provide an API designed to be as simple as possible for the most common HTTP and HTTPS requests. Example: req = http.request('GET', 'http://httpbin.org/robots.txt'). In 2011, the "requests" module based on urllib3 was created. In 2013, the "aiohttp" module based on asyncio was created. In 2015, new "httpx" module was created: req = httpx.get('https://www.example.org/') Not only httpx has a regular "synchronous" API (blocking function calls), but it also has an asynchronous API! Sadly, while HTTP/3 is being developed, it seems like in this list, httpx is the only HTTP client library supporting HTTP/2 currently :-( For HTTP/2, I also found the "httplib2" module. For HTTP/3, I found the "http3" and "aioquic" modules. -- Let's come back to urllib: * It's API is too complicated * It doesn't support HTTP/2 nor HTTP/3 * It's barely maintained: there are 121 open issues including 3 security issues! The 3 open security issues: * bpo-33661 open 2018; * bpo-36338 open in 2019; * bpo-45795 open in 2021. Usually, it's bad when you refer to an open security issue by its creation year :-( The urllib module has long history of security vulnerabilities. List of *fixed* vulnerabilities: * 2011 (bpo-11662): https://python-security.readthedocs.io/vuln/urllib-redirect.html * 2017 (bpo-30119): https://python-security.readthedocs.io/vuln/urllib-ftp-stream-injection.html * 2017 (bpo-30500): https://python-security.readthedocs.io/vuln/urllib-connects-wrong-host.html * 2019 (bpo-35907): https://python-security.readthedocs.io/vuln/urllib-local-file-scheme.html * 2019 (bpo-38826): https://python-security.readthedocs.io/vuln/urllib-basic-auth-regex.html * 2021 (bpo-42967): https://python-security.readthedocs.io/vuln/urllib-query-string-semicolon-separator.html * 2021 (bpo-43075): https://python-security.readthedocs.io/vuln/urllib-basic-auth-regex2.html * 2021 (bpo-44022): https://python-security.readthedocs.io/vuln/urllib-100-continue-loop.html urllib is a package made of 4 parts: * urllib.request for opening and reading URLs * urllib.error containing the exceptions raised by urllib.request * urllib.parse for parsing URLs * urllib.robotparser for parsing robots.txt files I propose to deprecate all of them. Maybe the deprecation can be different for each sub-module? Victor -- Night gathers, and now my watch begins. It shall not end until my death. _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/EZ6O7MOPZ4GA75MKTDO7LAELKXUHK2QS/ Code of Conduct: http://python.org/psf/codeofconduct/