[Python-Dev] Re: It's now time to deprecate the stdlib urllib module
On Sun, Feb 6, 2022 at 9:13 AM Paul Moore wrote: > On Sun, 6 Feb 2022 at 16:51, Christian Heimes > wrote: > > > The urllib package -- and to some degree also the http package -- are > > constant source of security bugs. The code is old and the parsers for > > HTTP and URLs don't handle edge cases well. Python core lacks a true > > maintainer of the code. To be honest, we have to admit defeat and be up > > front that urllib is not up to the task for this decade. It was designed > > written during a more friendly, less scary time on the internet. > > > > If I had the power and time, then I would replace urllib with a simpler, > > reduced HTTP client that uses platform's HTTP library under the hood > > (WinHTTP on Windows, NSURLSession (?) on macOS, Web API for Emscripten, > > maybe curl on Linux/BSD). For non-trivial HTTP requests, httpx or > > aiohttp are much better suited than urllib. > > > > The second best option is to reduce the feature set of urllib to core > > HTTP (no ftp, proxy, HTTP auth) and a partial rewrite with stricter, > > more standard conform parsers for urls, query strings, and RFC 2822 > > instead of RFC 822 for headers. > > I'd likely be fine with either of these two options. I'm not worried > about supporting "advanced" uses. But having no way of getting a file > from the internet without relying on 3rd party packages seems like a > huge gap in functionality for a modern language. And having to use a > 3rd party library to parse URLs will simply push more people to use > home-grown regexes rather than something safe and correct. Remember > that a lot of Python users are not professional software developers, > but scientists, data analysts, and occasional users, for whom the > existence of something in the stdlib is the *only* reason they have > any idea that URLs need specialised parsing in the first place. > > And while we all like to say 3rd party modules are great, the reality > is that they provide a genuine problem for many of these > non-specialist users - and I say that as a packaging specialist and > pip maintainer. The packaging ecosystem is *not* newcomer-friendly in > the way that core Python is, much as we're trying to improve that > situation. > > I've said it previously, but I'll reiterate - IMO this *must* have a > PEP, and that PEP must be clear that the intention is to *remove* > urllib, not simply to "deprecate and then think about it". That could > be making it part of PEP 594, or a separate PEP, but one way or > another it needs a PEP. > This would need to be it's own PEP. urllib et. al. are used by virtually everybody. They're highly used batteries. I'm -1 on deprecating it for that reason alone. Christian proposes that having a simpler scope rewrite of it might be nice, but I think disruption to the world and loss of trust in Python would be similar either way. -gps > > Paul > ___ > Python-Dev mailing list -- python-dev@python.org > To unsubscribe send an email to python-dev-le...@python.org > https://mail.python.org/mailman3/lists/python-dev.python.org/ > Message archived at > https://mail.python.org/archives/list/python-dev@python.org/message/KT6TGUBTLLETHES2OVVGZWSGYC5JCEKC/ > Code of Conduct: http://python.org/psf/codeofconduct/ > ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/QMSFZBQJFWKFFE3LFQLQE2AT6WKMLPGL/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: It's now time to deprecate the stdlib urllib module
On 2/6/22 6:08 AM, Victor Stinner wrote: > I propose to deprecate the urllib module in Python 3.11. It would emit > a DeprecationWarning which warn users, so users should consider better > alternatives like urllib3 or httpx: well known modules, better > maintained, more secure, support HTTP/2 (httpx), etc. Besides the needs of pip, round-up, etc., I think we should keep whatever parts of urllib, cgi, cgitb, http, etc., are necessary for basic serving/consuming of web pages for the same reason we ended up keeping the wave module -- it's fun and engaging for a younger audience. Having one computer get information from another is pretty cool. If we need to do some trimming and rearranging of the above modules, that's fine, but I think losing all the functionality would be a mistake. -- ~Ethan~ ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/TGENXEKPFCIZUQD63ROCIK2WGAN3F7XL/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: It's now time to deprecate the stdlib urllib module
Chiming in to say that whichever way this goes urllib3 would be okay. We can always vendor the small amount of http.client logic we actually depend on for HTTP connections. I do agree that the future of HTTP clearly lies outside the standard library, our team is already thinking about ways to integrate non-http.client HTTP implementations (like HTTP/2). My feeling is that it will be difficult to remove urllib.parse, however urllib.request is much less depended on and more likely to be deprecated and removed. Also clarifying that httplib2 doesn't support HTTP/2, the HTTP/2 package of interest is usually h2: https://pypi.org/project/h2. "http3" also doesn't implement HTTP/3 (bad name), this was one of the potential names for the HTTPX project. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/AW3JP6DHEAKME5FTFNRHV3EJMPJQEDME/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: It's now time to deprecate the stdlib urllib module
On Sun, Feb 6, 2022 at 3:35 PM Paul Moore wrote: > urllib.request may not be "best practice", but it's still extremely > useful for simple situations, and urllib.parse is useful for basic > handling of URLs.Yes, the more complex aspects of urllib are better > handled by external packages, but that's not sufficient argument for > removing the package altogether. There are many situations where > external dependencies are unsuitable. Also, there's quite a lot of > usage of urllib in the stdlib itself - how would you propose to > replace that? > (...) > In addition, pip relies pretty heavily on urllib (parse and request), > and pip has a bootstrapping issue, so using 3rd party libraries is > non-trivial. If a project like urllib3 uses it, urllib can be copied there and its maintenance will continue there. Or maybe the maintenance can be moved into a new project on PyPI like "legacy_urllib". It's situation similar to the distutils deprecation: setuptools decided to include a hidden copy of the distutils in its source, and the distutils maintenance moved there. IMO it's a great move. setuptools is a better place than Python to maintain this code: setuptools release cycle is faster and is related to pip. Python release cycle is slow and the distutils API was too big. Since the distutils API is now hidden, setuptools can freely drop code and changing APIs without affecting the public setuptools API. I'm well aware that moving distutils into setuptools caused troubles. IMO it is worth it and we have to go trough these issues once for a better maintenance burden in the long term. > In any case, why is this being proposed as a simple posting on > python-dev? There's already PEP 594 for removals from the stdlib. urllib is bigger than modules proposed for deprecation in PEP 594. Also, I expect that deprecating urllib is more controversial. Victor -- Night gathers, and now my watch begins. It shall not end until my death. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/UR6BT5S2S4WGEI62MRWHCRAPZNTQXTVT/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: It's now time to deprecate the stdlib urllib module
El dom, 6 feb 2022 a las 9:12, Paul Moore () escribió: > On Sun, 6 Feb 2022 at 16:51, Christian Heimes > wrote: > > > The urllib package -- and to some degree also the http package -- are > > constant source of security bugs. The code is old and the parsers for > > HTTP and URLs don't handle edge cases well. Python core lacks a true > > maintainer of the code. To be honest, we have to admit defeat and be up > > front that urllib is not up to the task for this decade. It was designed > > written during a more friendly, less scary time on the internet. > > > > If I had the power and time, then I would replace urllib with a simpler, > > reduced HTTP client that uses platform's HTTP library under the hood > > (WinHTTP on Windows, NSURLSession (?) on macOS, Web API for Emscripten, > > maybe curl on Linux/BSD). For non-trivial HTTP requests, httpx or > > aiohttp are much better suited than urllib. > > > > The second best option is to reduce the feature set of urllib to core > > HTTP (no ftp, proxy, HTTP auth) and a partial rewrite with stricter, > > more standard conform parsers for urls, query strings, and RFC 2822 > > instead of RFC 822 for headers. > > I'd likely be fine with either of these two options. I'm not worried > about supporting "advanced" uses. But having no way of getting a file > from the internet without relying on 3rd party packages seems like a > huge gap in functionality for a modern language. And having to use a > 3rd party library to parse URLs will simply push more people to use > home-grown regexes rather than something safe and correct. Remember > that a lot of Python users are not professional software developers, > but scientists, data analysts, and occasional users, for whom the > existence of something in the stdlib is the *only* reason they have > any idea that URLs need specialised parsing in the first place. > > And while we all like to say 3rd party modules are great, the reality > is that they provide a genuine problem for many of these > non-specialist users - and I say that as a packaging specialist and > pip maintainer. The packaging ecosystem is *not* newcomer-friendly in > the way that core Python is, much as we're trying to improve that > situation. > > I've said it previously, but I'll reiterate - IMO this *must* have a > PEP, and that PEP must be clear that the intention is to *remove* > urllib, not simply to "deprecate and then think about it". That could > be making it part of PEP 594, or a separate PEP, but one way or > another it needs a PEP. > PEP 594 is meant to be a set of uncontroversial removals of mostly unused modules. Removing urllib is obviously not going to be uncontroversial, so it should be discussed in a separate PEP. > > Paul > ___ > Python-Dev mailing list -- python-dev@python.org > To unsubscribe send an email to python-dev-le...@python.org > https://mail.python.org/mailman3/lists/python-dev.python.org/ > Message archived at > https://mail.python.org/archives/list/python-dev@python.org/message/KT6TGUBTLLETHES2OVVGZWSGYC5JCEKC/ > Code of Conduct: http://python.org/psf/codeofconduct/ > ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/HQ5J7BTB5WW77CQIQXX5FQKBOOIADBYR/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: It's now time to deprecate the stdlib urllib module
On Sun, 6 Feb 2022 at 16:51, Christian Heimes wrote: > The urllib package -- and to some degree also the http package -- are > constant source of security bugs. The code is old and the parsers for > HTTP and URLs don't handle edge cases well. Python core lacks a true > maintainer of the code. To be honest, we have to admit defeat and be up > front that urllib is not up to the task for this decade. It was designed > written during a more friendly, less scary time on the internet. > > If I had the power and time, then I would replace urllib with a simpler, > reduced HTTP client that uses platform's HTTP library under the hood > (WinHTTP on Windows, NSURLSession (?) on macOS, Web API for Emscripten, > maybe curl on Linux/BSD). For non-trivial HTTP requests, httpx or > aiohttp are much better suited than urllib. > > The second best option is to reduce the feature set of urllib to core > HTTP (no ftp, proxy, HTTP auth) and a partial rewrite with stricter, > more standard conform parsers for urls, query strings, and RFC 2822 > instead of RFC 822 for headers. I'd likely be fine with either of these two options. I'm not worried about supporting "advanced" uses. But having no way of getting a file from the internet without relying on 3rd party packages seems like a huge gap in functionality for a modern language. And having to use a 3rd party library to parse URLs will simply push more people to use home-grown regexes rather than something safe and correct. Remember that a lot of Python users are not professional software developers, but scientists, data analysts, and occasional users, for whom the existence of something in the stdlib is the *only* reason they have any idea that URLs need specialised parsing in the first place. And while we all like to say 3rd party modules are great, the reality is that they provide a genuine problem for many of these non-specialist users - and I say that as a packaging specialist and pip maintainer. The packaging ecosystem is *not* newcomer-friendly in the way that core Python is, much as we're trying to improve that situation. I've said it previously, but I'll reiterate - IMO this *must* have a PEP, and that PEP must be clear that the intention is to *remove* urllib, not simply to "deprecate and then think about it". That could be making it part of PEP 594, or a separate PEP, but one way or another it needs a PEP. Paul ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/KT6TGUBTLLETHES2OVVGZWSGYC5JCEKC/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: It's now time to deprecate the stdlib urllib module
On Sun, 6 Feb 2022 at 14:15, Victor Stinner wrote: > I propose to deprecate the urllib module in Python 3.11. It would emit > a DeprecationWarning which warn users, so users should consider better > alternatives like urllib3 or httpx: well known modules, better > maintained, more secure, support HTTP/2 (httpx), etc. Also, I'm -1 on deprecating as a way of saying we *might* remove the module, but haven't decided yet. That isn't (IMO) what deprecation is for, and it doesn't give users a clear message, as maybe they'll be fine continuing to rely on urllib. The net result would likely to be for people to simply become more inclined to ignore deprecation warnings. Conversely, if the idea is to deprecate, and then in a couple of years say "well, it's been deprecated for a while now, so let's remove it" then that seems to me to be a rather cynical way of deflecting arguments, as we can say now "well, it's only deprecation", in spite of the fact that the real intention is to remove. Paul ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/ICHMNBE7PMOHCGXLT4REP2HJZAGSOCHJ/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: It's now time to deprecate the stdlib urllib module
On 06/02/2022 15.08, Victor Stinner wrote: Hi, I propose to deprecate the urllib module in Python 3.11. It would emit a DeprecationWarning which warn users, so users should consider better alternatives like urllib3 or httpx: well known modules, better maintained, more secure, support HTTP/2 (httpx), etc. I don't propose to schedule its removal. Let's discuss the removal in 1 or 2 years. -- urllib has many abstraction to support a wide range of protocols with "handlers": HTTP, HTTPS, FTP, "local file", proxy, HTTP authentication, HTTP Cookie, etc. A simple HTTP request using Basic Authentication requires 10-20 lines of code, whereas it should be a single line. Users (me included) don't like urllib API which was too complicated for common tasks. -- [...] urllib is a package made of 4 parts: * urllib.request for opening and reading URLs * urllib.error containing the exceptions raised by urllib.request * urllib.parse for parsing URLs * urllib.robotparser for parsing robots.txt files I propose to deprecate all of them. Maybe the deprecation can be different for each sub-module? Thanks for bringing this topic forward, Victor! Disclaimer: I proposed the removal of urllib today in Python core's internal chat. The urllib package -- and to some degree also the http package -- are constant source of security bugs. The code is old and the parsers for HTTP and URLs don't handle edge cases well. Python core lacks a true maintainer of the code. To be honest, we have to admit defeat and be up front that urllib is not up to the task for this decade. It was designed written during a more friendly, less scary time on the internet. If I had the power and time, then I would replace urllib with a simpler, reduced HTTP client that uses platform's HTTP library under the hood (WinHTTP on Windows, NSURLSession (?) on macOS, Web API for Emscripten, maybe curl on Linux/BSD). For non-trivial HTTP requests, httpx or aiohttp are much better suited than urllib. The second best option is to reduce the feature set of urllib to core HTTP (no ftp, proxy, HTTP auth) and a partial rewrite with stricter, more standard conform parsers for urls, query strings, and RFC 2822 instead of RFC 822 for headers. Christian ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/WYVETVHMGRS4CI47GTFY6W7B43YLSJH2/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: It's now time to deprecate the stdlib urllib module
That was just one example, here are others in the pip code base that urllib.request is used for more than the pathname functions, they are all vendored or tests but would still be disruptive to remove: https://github.com/pypa/pip/blob/main/tests/lib/local_repos.py https://github.com/pypa/pip/blob/main/src/pip/_vendor/webencodings/mklabels.py https://github.com/pypa/pip/blob/main/src/pip/_vendor/requests/compat.py https://github.com/pypa/pip/blob/main/src/pip/_vendor/distlib/compat.py In particular the vendored library, and replacement you suggest, "requests" is very dependent on the proxy functions such as "getproxies" that are currently in urllib.requests. More than once I've had to go down the rabbit hole of seeing where those functions get that info for each platform. Damian (he/him) On Sun, Feb 6, 2022 at 11:10 AM Victor Stinner wrote: > On Sun, Feb 6, 2022 at 3:48 PM Damian Shaw > wrote: > > > > Pip vendors requests for network calls: > https://github.com/pypa/pip/tree/main/src/pip/_vendor/requests > > > > But still does depend on functions from urllib.parse and urllib.request > in many places: > https://github.com/pypa/pip/blob/main/src/pip/_internal/utils/urls.py > > Aha, it doesn't use urllib.request to open a HTTP connection, it only > uses pathname2url() and url2pathname() functions of urllib.request. > Maybe we can keep these functions. I'm not sure why they don't belong > to urllib.parse. > > If urllib.parse is widely used, maybe we can keep this module. > > Victor > ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/ACA7AU4W6XB35PA6O4IYBPQSQD3HFLFS/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: It's now time to deprecate the stdlib urllib module
On Sun, Feb 06, 2022 at 03:08:40PM +0100, Victor Stinner wrote: > I propose to deprecate the urllib module in Python 3.11. It would emit > a DeprecationWarning which warn users, so users should consider better > alternatives like urllib3 or httpx: well known modules, better > maintained, more secure, support HTTP/2 (httpx), etc. > > I don't propose to schedule its removal. Let's discuss the removal in > 1 or 2 years. I am not certain if we can deprecate/remove the whole 'urllib' module without any good plan for replacement of its facilities within the stdlib. There is heavy usage of urllib.parse in multiple projects (including in urllib3), and parse is semi-maintained. > Let's come back to urllib: > * It's API is too complicated > * It doesn't support HTTP/2 nor HTTP/3 > * It's barely maintained: there are 121 open issues including 3 security > issues! I agree with all of these. I think that removing the old cruft code, might lead to us to closing a number of open issues. > The 3 open security issues: Just because if something marked 'security' doesn't make it actionable too. For instance the last one asks for urllib to maintain client state to be safe against a scenario, which it never did. I don't think it is time to deprecate the urllib module. It will be too disruptive IMO. SO, -1. Right now, I don't have a solution. My suggestion will be we close old bugs, and remove old code (aka maintain a bit, and it falls on me too). Then we can probably chart out a deprecation / replacement path in a non-disruptive manner. -- Senthil ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/ORQEJXJTZDYYV53MHKXTJ3Q6W72AUSGA/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: It's now time to deprecate the stdlib urllib module
On Sun, Feb 6, 2022 at 3:48 PM Damian Shaw wrote: > > Pip vendors requests for network calls: > https://github.com/pypa/pip/tree/main/src/pip/_vendor/requests > > But still does depend on functions from urllib.parse and urllib.request in > many places: > https://github.com/pypa/pip/blob/main/src/pip/_internal/utils/urls.py Aha, it doesn't use urllib.request to open a HTTP connection, it only uses pathname2url() and url2pathname() functions of urllib.request. Maybe we can keep these functions. I'm not sure why they don't belong to urllib.parse. If urllib.parse is widely used, maybe we can keep this module. Victor ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/PDFGPDGESBLSBHVLINCPAFEOHXQWFIRI/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: I want to contribute to Python.
On Sun, Feb 6, 2022 at 3:33 PM Ezekiel Adetoro wrote: > Hello, > My name is Ezekiel, and it is my desire to start contributing to Python, be > part of the core development of Python. I have forked the CPython and cloned > it. What is the next step I need to do? Welcome Ezekiel! I suggest you to start reading https://devguide.python.org/ and join the core-mentorship mailing list which is the best place for such question! https://mail.python.org/mailman3/lists/core-mentorship.python.org/ Victor -- Night gathers, and now my watch begins. It shall not end until my death. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/R63GLDZQ72USFPJDOZD73DNRUTWNUDHC/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: I want to contribute to Python.
On 2022-02-06 13:18, Ezekiel Adetoro wrote: Hello, My name is Ezekiel, and it is my desire to start contributing to Python, be part of the core development of Python. I have forked the CPython and cloned it. What is the next step I need to do? Look on the issue tracker for a bug that you can fix. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/HEA7LYZLM5Q6KSURG2PG7PBNKOR37RM7/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: It's now time to deprecate the stdlib urllib module
Pip vendors requests for network calls: https://github.com/pypa/pip/tree/main/src/pip/_vendor/requests But still does depend on functions from urllib.parse and urllib.request in many places: https://github.com/pypa/pip/blob/main/src/pip/_internal/utils/urls.py Damian (he/him) On Sun, Feb 6, 2022 at 9:36 AM Dong-hee Na wrote: > I am not an expert about pip, > but it will be not a problem about installing the pip module once CPython > removes urllib module from stdlib? > > Warm regards, > Dong-hee > > 2022년 2월 6일 (일) 오후 11:13, Victor Stinner 님이 작성: > >> Hi, >> >> I propose to deprecate the urllib module in Python 3.11. It would emit >> a DeprecationWarning which warn users, so users should consider better >> alternatives like urllib3 or httpx: well known modules, better >> maintained, more secure, support HTTP/2 (httpx), etc. >> >> I don't propose to schedule its removal. Let's discuss the removal in >> 1 or 2 years. >> >> -- >> >> urllib has many abstraction to support a wide range of protocols with >> "handlers": HTTP, HTTPS, FTP, "local file", proxy, HTTP >> authentication, HTTP Cookie, etc. A simple HTTP request using Basic >> Authentication requires 10-20 lines of code, whereas it should be a >> single line. >> >> Users (me included) don't like urllib API which was too complicated >> for common tasks. >> >> -- >> >> Unhappy users created multiple better alternatives to the stdlib urllib >> module. >> >> In 2008, the "urllib3" module was created to provide an API designed >> to be as simple as possible for the most common HTTP and HTTPS >> requests. Example: >> >>req = http.request('GET', 'http://httpbin.org/robots.txt'). >> >> In 2011, the "requests" module based on urllib3 was created. >> >> In 2013, the "aiohttp" module based on asyncio was created. >> >> In 2015, new "httpx" module was created: >> >> req = httpx.get('https://www.example.org/') >> >> Not only httpx has a regular "synchronous" API (blocking function >> calls), but it also has an asynchronous API! >> >> Sadly, while HTTP/3 is being developed, it seems like in this list, >> httpx is the only HTTP client library supporting HTTP/2 currently :-( >> >> For HTTP/2, I also found the "httplib2" module. >> >> For HTTP/3, I found the "http3" and "aioquic" modules. >> >> -- >> >> Let's come back to urllib: >> >> * It's API is too complicated >> * It doesn't support HTTP/2 nor HTTP/3 >> * It's barely maintained: there are 121 open issues including 3 security >> issues! >> >> The 3 open security issues: >> >> * bpo-33661 open 2018; >> * bpo-36338 open in 2019; >> * bpo-45795 open in 2021. >> >> Usually, it's bad when you refer to an open security issue by its >> creation year :-( >> >> The urllib module has long history of security vulnerabilities. List >> of *fixed* vulnerabilities: >> >> * 2011 (bpo-11662): >> https://python-security.readthedocs.io/vuln/urllib-redirect.html >> * 2017 (bpo-30119): >> >> https://python-security.readthedocs.io/vuln/urllib-ftp-stream-injection.html >> * 2017 (bpo-30500): >> >> https://python-security.readthedocs.io/vuln/urllib-connects-wrong-host.html >> * 2019 (bpo-35907): >> https://python-security.readthedocs.io/vuln/urllib-local-file-scheme.html >> * 2019 (bpo-38826): >> https://python-security.readthedocs.io/vuln/urllib-basic-auth-regex.html >> * 2021 (bpo-42967): >> >> https://python-security.readthedocs.io/vuln/urllib-query-string-semicolon-separator.html >> * 2021 (bpo-43075): >> https://python-security.readthedocs.io/vuln/urllib-basic-auth-regex2.html >> * 2021 (bpo-44022): >> https://python-security.readthedocs.io/vuln/urllib-100-continue-loop.html >> >> urllib is a package made of 4 parts: >> >> * urllib.request for opening and reading URLs >> * urllib.error containing the exceptions raised by urllib.request >> * urllib.parse for parsing URLs >> * urllib.robotparser for parsing robots.txt files >> >> I propose to deprecate all of them. Maybe the deprecation can be >> different for each sub-module? >> >> Victor >> -- >> Night gathers, and now my watch begins. It shall not end until my death. >> ___ >> Python-Dev mailing list -- python-dev@python.org >> To unsubscribe send an email to python-dev-le...@python.org >> https://mail.python.org/mailman3/lists/python-dev.python.org/ >> Message archived at >> https://mail.python.org/archives/list/python-dev@python.org/message/EZ6O7MOPZ4GA75MKTDO7LAELKXUHK2QS/ >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > ___ > Python-Dev mailing list -- python-dev@python.org > To unsubscribe send an email to python-dev-le...@python.org > https://mail.python.org/mailman3/lists/python-dev.python.org/ > Message archived at > https://mail.python.org/archives/list/python-dev@python.org/message/E6GN2THYCNQ2Q3CGMSH7GRCDFOOFDDCQ/ > Code of Conduct: http://python.org/psf/codeofconduct/ > ___ Python-Dev mailing list -- python-dev@pytho
[Python-Dev] Re: It's now time to deprecate the stdlib urllib module
Speaking from anecdotal experience, "urllib.parse" is a very popular and highly depended on module, I would be shocked if removing it wouldn't be very disruptive. In fact a quick search of the replacement modules you mention see that they all rely it on it, here is an example from each: * requests: https://github.com/psf/requests/blob/99b3b492418d0751ca960178d274f89805095e4c/requests/sessions.py#L121 * aiohttp: https://github.com/aio-libs/aiohttp/blob/7d78fd01dbe983d119141d7f2775aefd42494f99/aiohttp/formdata.py#L129 * httpx: https://github.com/encode/httpx/blob/b7dc0c3df68279ce89f016a69a41b27a2346d54d/httpx/_content.py#L144 As for "urllib.request" I know that the philosophy of Python being a "batteries included language" is going away, but having no way to make any http call without importing Python definitely has a lot of situations where it makes Python more difficult to use. Could it not always emit a warning that this library should not be used in a production environment? Much in the same way that Flask's default web server does. Damian (he/hm) On Sun, Feb 6, 2022 at 9:16 AM Victor Stinner wrote: > Hi, > > I propose to deprecate the urllib module in Python 3.11. It would emit > a DeprecationWarning which warn users, so users should consider better > alternatives like urllib3 or httpx: well known modules, better > maintained, more secure, support HTTP/2 (httpx), etc. > > I don't propose to schedule its removal. Let's discuss the removal in > 1 or 2 years. > > -- > > urllib has many abstraction to support a wide range of protocols with > "handlers": HTTP, HTTPS, FTP, "local file", proxy, HTTP > authentication, HTTP Cookie, etc. A simple HTTP request using Basic > Authentication requires 10-20 lines of code, whereas it should be a > single line. > > Users (me included) don't like urllib API which was too complicated > for common tasks. > > -- > > Unhappy users created multiple better alternatives to the stdlib urllib > module. > > In 2008, the "urllib3" module was created to provide an API designed > to be as simple as possible for the most common HTTP and HTTPS > requests. Example: > >req = http.request('GET', 'http://httpbin.org/robots.txt'). > > In 2011, the "requests" module based on urllib3 was created. > > In 2013, the "aiohttp" module based on asyncio was created. > > In 2015, new "httpx" module was created: > > req = httpx.get('https://www.example.org/') > > Not only httpx has a regular "synchronous" API (blocking function > calls), but it also has an asynchronous API! > > Sadly, while HTTP/3 is being developed, it seems like in this list, > httpx is the only HTTP client library supporting HTTP/2 currently :-( > > For HTTP/2, I also found the "httplib2" module. > > For HTTP/3, I found the "http3" and "aioquic" modules. > > -- > > Let's come back to urllib: > > * It's API is too complicated > * It doesn't support HTTP/2 nor HTTP/3 > * It's barely maintained: there are 121 open issues including 3 security > issues! > > The 3 open security issues: > > * bpo-33661 open 2018; > * bpo-36338 open in 2019; > * bpo-45795 open in 2021. > > Usually, it's bad when you refer to an open security issue by its > creation year :-( > > The urllib module has long history of security vulnerabilities. List > of *fixed* vulnerabilities: > > * 2011 (bpo-11662): > https://python-security.readthedocs.io/vuln/urllib-redirect.html > * 2017 (bpo-30119): > > https://python-security.readthedocs.io/vuln/urllib-ftp-stream-injection.html > * 2017 (bpo-30500): > https://python-security.readthedocs.io/vuln/urllib-connects-wrong-host.html > * 2019 (bpo-35907): > https://python-security.readthedocs.io/vuln/urllib-local-file-scheme.html > * 2019 (bpo-38826): > https://python-security.readthedocs.io/vuln/urllib-basic-auth-regex.html > * 2021 (bpo-42967): > > https://python-security.readthedocs.io/vuln/urllib-query-string-semicolon-separator.html > * 2021 (bpo-43075): > https://python-security.readthedocs.io/vuln/urllib-basic-auth-regex2.html > * 2021 (bpo-44022): > https://python-security.readthedocs.io/vuln/urllib-100-continue-loop.html > > urllib is a package made of 4 parts: > > * urllib.request for opening and reading URLs > * urllib.error containing the exceptions raised by urllib.request > * urllib.parse for parsing URLs > * urllib.robotparser for parsing robots.txt files > > I propose to deprecate all of them. Maybe the deprecation can be > different for each sub-module? > > Victor > -- > Night gathers, and now my watch begins. It shall not end until my death. > ___ > Python-Dev mailing list -- python-dev@python.org > To unsubscribe send an email to python-dev-le...@python.org > https://mail.python.org/mailman3/lists/python-dev.python.org/ > Message archived at > https://mail.python.org/archives/list/python-dev@python.org/message/EZ6O7MOPZ4GA75MKTDO7LAELKXUHK2QS/ > Code of Conduct: http://python.org/psf/codeofconduct/ > ___ Py
[Python-Dev] Re: It's now time to deprecate the stdlib urllib module
Strong -1 from me. urllib.request may not be "best practice", but it's still extremely useful for simple situations, and urllib.parse is useful for basic handling of URLs.Yes, the more complex aspects of urllib are better handled by external packages, but that's not sufficient argument for removing the package altogether. There are many situations where external dependencies are unsuitable. Also, there's quite a lot of usage of urllib in the stdlib itself - how would you propose to replace that? In addition, pip relies pretty heavily on urllib (parse and request), and pip has a bootstrapping issue, so using 3rd party libraries is non-trivial. Also, of pip's existing vendored dependencies, webencodings, urllib3, requests, pkg_resources, packaging, html5lib, distlib and cachecontrol all import urllib. So this would be *hugely* disruptive to the whole packaging ecosystem (which is under-resourced at the best of times, so this would put a lot of strain on us). In any case, why is this being proposed as a simple posting on python-dev? There's already PEP 594 for removals from the stdlib. If you have a case for removing urllib, I suggest you get it added to PEP 594, so it can be discussed and agreed properly, along with the other removals (none of which is remotely as controversial as urllib, so there's absolutely no doubt in my mind that this would need a PEP however it was proposed). Paul On Sun, 6 Feb 2022 at 14:15, Victor Stinner wrote: > > Hi, > > I propose to deprecate the urllib module in Python 3.11. It would emit > a DeprecationWarning which warn users, so users should consider better > alternatives like urllib3 or httpx: well known modules, better > maintained, more secure, support HTTP/2 (httpx), etc. > > I don't propose to schedule its removal. Let's discuss the removal in > 1 or 2 years. > > -- > > urllib has many abstraction to support a wide range of protocols with > "handlers": HTTP, HTTPS, FTP, "local file", proxy, HTTP > authentication, HTTP Cookie, etc. A simple HTTP request using Basic > Authentication requires 10-20 lines of code, whereas it should be a > single line. > > Users (me included) don't like urllib API which was too complicated > for common tasks. > > -- > > Unhappy users created multiple better alternatives to the stdlib urllib > module. > > In 2008, the "urllib3" module was created to provide an API designed > to be as simple as possible for the most common HTTP and HTTPS > requests. Example: > >req = http.request('GET', 'http://httpbin.org/robots.txt'). > > In 2011, the "requests" module based on urllib3 was created. > > In 2013, the "aiohttp" module based on asyncio was created. > > In 2015, new "httpx" module was created: > > req = httpx.get('https://www.example.org/') > > Not only httpx has a regular "synchronous" API (blocking function > calls), but it also has an asynchronous API! > > Sadly, while HTTP/3 is being developed, it seems like in this list, > httpx is the only HTTP client library supporting HTTP/2 currently :-( > > For HTTP/2, I also found the "httplib2" module. > > For HTTP/3, I found the "http3" and "aioquic" modules. > > -- > > Let's come back to urllib: > > * It's API is too complicated > * It doesn't support HTTP/2 nor HTTP/3 > * It's barely maintained: there are 121 open issues including 3 security > issues! > > The 3 open security issues: > > * bpo-33661 open 2018; > * bpo-36338 open in 2019; > * bpo-45795 open in 2021. > > Usually, it's bad when you refer to an open security issue by its > creation year :-( > > The urllib module has long history of security vulnerabilities. List > of *fixed* vulnerabilities: > > * 2011 (bpo-11662): > https://python-security.readthedocs.io/vuln/urllib-redirect.html > * 2017 (bpo-30119): > https://python-security.readthedocs.io/vuln/urllib-ftp-stream-injection.html > * 2017 (bpo-30500): > https://python-security.readthedocs.io/vuln/urllib-connects-wrong-host.html > * 2019 (bpo-35907): > https://python-security.readthedocs.io/vuln/urllib-local-file-scheme.html > * 2019 (bpo-38826): > https://python-security.readthedocs.io/vuln/urllib-basic-auth-regex.html > * 2021 (bpo-42967): > https://python-security.readthedocs.io/vuln/urllib-query-string-semicolon-separator.html > * 2021 (bpo-43075): > https://python-security.readthedocs.io/vuln/urllib-basic-auth-regex2.html > * 2021 (bpo-44022): > https://python-security.readthedocs.io/vuln/urllib-100-continue-loop.html > > urllib is a package made of 4 parts: > > * urllib.request for opening and reading URLs > * urllib.error containing the exceptions raised by urllib.request > * urllib.parse for parsing URLs > * urllib.robotparser for parsing robots.txt files > > I propose to deprecate all of them. Maybe the deprecation can be > different for each sub-module? > > Victor > -- > Night gathers, and now my watch begins. It shall not end until my death. > ___ > Python-Dev mailing list -- python-dev@python.org > To unsubscribe send
[Python-Dev] Re: It's now time to deprecate the stdlib urllib module
I am not an expert about pip, but it will be not a problem about installing the pip module once CPython removes urllib module from stdlib? Warm regards, Dong-hee 2022년 2월 6일 (일) 오후 11:13, Victor Stinner 님이 작성: > Hi, > > I propose to deprecate the urllib module in Python 3.11. It would emit > a DeprecationWarning which warn users, so users should consider better > alternatives like urllib3 or httpx: well known modules, better > maintained, more secure, support HTTP/2 (httpx), etc. > > I don't propose to schedule its removal. Let's discuss the removal in > 1 or 2 years. > > -- > > urllib has many abstraction to support a wide range of protocols with > "handlers": HTTP, HTTPS, FTP, "local file", proxy, HTTP > authentication, HTTP Cookie, etc. A simple HTTP request using Basic > Authentication requires 10-20 lines of code, whereas it should be a > single line. > > Users (me included) don't like urllib API which was too complicated > for common tasks. > > -- > > Unhappy users created multiple better alternatives to the stdlib urllib > module. > > In 2008, the "urllib3" module was created to provide an API designed > to be as simple as possible for the most common HTTP and HTTPS > requests. Example: > >req = http.request('GET', 'http://httpbin.org/robots.txt'). > > In 2011, the "requests" module based on urllib3 was created. > > In 2013, the "aiohttp" module based on asyncio was created. > > In 2015, new "httpx" module was created: > > req = httpx.get('https://www.example.org/') > > Not only httpx has a regular "synchronous" API (blocking function > calls), but it also has an asynchronous API! > > Sadly, while HTTP/3 is being developed, it seems like in this list, > httpx is the only HTTP client library supporting HTTP/2 currently :-( > > For HTTP/2, I also found the "httplib2" module. > > For HTTP/3, I found the "http3" and "aioquic" modules. > > -- > > Let's come back to urllib: > > * It's API is too complicated > * It doesn't support HTTP/2 nor HTTP/3 > * It's barely maintained: there are 121 open issues including 3 security > issues! > > The 3 open security issues: > > * bpo-33661 open 2018; > * bpo-36338 open in 2019; > * bpo-45795 open in 2021. > > Usually, it's bad when you refer to an open security issue by its > creation year :-( > > The urllib module has long history of security vulnerabilities. List > of *fixed* vulnerabilities: > > * 2011 (bpo-11662): > https://python-security.readthedocs.io/vuln/urllib-redirect.html > * 2017 (bpo-30119): > > https://python-security.readthedocs.io/vuln/urllib-ftp-stream-injection.html > * 2017 (bpo-30500): > https://python-security.readthedocs.io/vuln/urllib-connects-wrong-host.html > * 2019 (bpo-35907): > https://python-security.readthedocs.io/vuln/urllib-local-file-scheme.html > * 2019 (bpo-38826): > https://python-security.readthedocs.io/vuln/urllib-basic-auth-regex.html > * 2021 (bpo-42967): > > https://python-security.readthedocs.io/vuln/urllib-query-string-semicolon-separator.html > * 2021 (bpo-43075): > https://python-security.readthedocs.io/vuln/urllib-basic-auth-regex2.html > * 2021 (bpo-44022): > https://python-security.readthedocs.io/vuln/urllib-100-continue-loop.html > > urllib is a package made of 4 parts: > > * urllib.request for opening and reading URLs > * urllib.error containing the exceptions raised by urllib.request > * urllib.parse for parsing URLs > * urllib.robotparser for parsing robots.txt files > > I propose to deprecate all of them. Maybe the deprecation can be > different for each sub-module? > > Victor > -- > Night gathers, and now my watch begins. It shall not end until my death. > ___ > Python-Dev mailing list -- python-dev@python.org > To unsubscribe send an email to python-dev-le...@python.org > https://mail.python.org/mailman3/lists/python-dev.python.org/ > Message archived at > https://mail.python.org/archives/list/python-dev@python.org/message/EZ6O7MOPZ4GA75MKTDO7LAELKXUHK2QS/ > Code of Conduct: http://python.org/psf/codeofconduct/ > ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/E6GN2THYCNQ2Q3CGMSH7GRCDFOOFDDCQ/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] I want to contribute to Python.
Hello, My name is Ezekiel, and it is my desire to start contributing to Python, be part of the core development of Python. I have forked the CPython and cloned it. What is the next step I need to do? ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/NI7AVGSVM2ZMATCH5GFIHQS65D43YQ47/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] It's now time to deprecate the stdlib urllib module
Hi, I propose to deprecate the urllib module in Python 3.11. It would emit a DeprecationWarning which warn users, so users should consider better alternatives like urllib3 or httpx: well known modules, better maintained, more secure, support HTTP/2 (httpx), etc. I don't propose to schedule its removal. Let's discuss the removal in 1 or 2 years. -- urllib has many abstraction to support a wide range of protocols with "handlers": HTTP, HTTPS, FTP, "local file", proxy, HTTP authentication, HTTP Cookie, etc. A simple HTTP request using Basic Authentication requires 10-20 lines of code, whereas it should be a single line. Users (me included) don't like urllib API which was too complicated for common tasks. -- Unhappy users created multiple better alternatives to the stdlib urllib module. In 2008, the "urllib3" module was created to provide an API designed to be as simple as possible for the most common HTTP and HTTPS requests. Example: req = http.request('GET', 'http://httpbin.org/robots.txt'). In 2011, the "requests" module based on urllib3 was created. In 2013, the "aiohttp" module based on asyncio was created. In 2015, new "httpx" module was created: req = httpx.get('https://www.example.org/') Not only httpx has a regular "synchronous" API (blocking function calls), but it also has an asynchronous API! Sadly, while HTTP/3 is being developed, it seems like in this list, httpx is the only HTTP client library supporting HTTP/2 currently :-( For HTTP/2, I also found the "httplib2" module. For HTTP/3, I found the "http3" and "aioquic" modules. -- Let's come back to urllib: * It's API is too complicated * It doesn't support HTTP/2 nor HTTP/3 * It's barely maintained: there are 121 open issues including 3 security issues! The 3 open security issues: * bpo-33661 open 2018; * bpo-36338 open in 2019; * bpo-45795 open in 2021. Usually, it's bad when you refer to an open security issue by its creation year :-( The urllib module has long history of security vulnerabilities. List of *fixed* vulnerabilities: * 2011 (bpo-11662): https://python-security.readthedocs.io/vuln/urllib-redirect.html * 2017 (bpo-30119): https://python-security.readthedocs.io/vuln/urllib-ftp-stream-injection.html * 2017 (bpo-30500): https://python-security.readthedocs.io/vuln/urllib-connects-wrong-host.html * 2019 (bpo-35907): https://python-security.readthedocs.io/vuln/urllib-local-file-scheme.html * 2019 (bpo-38826): https://python-security.readthedocs.io/vuln/urllib-basic-auth-regex.html * 2021 (bpo-42967): https://python-security.readthedocs.io/vuln/urllib-query-string-semicolon-separator.html * 2021 (bpo-43075): https://python-security.readthedocs.io/vuln/urllib-basic-auth-regex2.html * 2021 (bpo-44022): https://python-security.readthedocs.io/vuln/urllib-100-continue-loop.html urllib is a package made of 4 parts: * urllib.request for opening and reading URLs * urllib.error containing the exceptions raised by urllib.request * urllib.parse for parsing URLs * urllib.robotparser for parsing robots.txt files I propose to deprecate all of them. Maybe the deprecation can be different for each sub-module? Victor -- Night gathers, and now my watch begins. It shall not end until my death. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/EZ6O7MOPZ4GA75MKTDO7LAELKXUHK2QS/ Code of Conduct: http://python.org/psf/codeofconduct/