[Python-Dev] Re: It's now time to deprecate the stdlib urllib module
> Why do you think the stdlib *must *provide an example implementation > for this specific scenario? Is there something unique to HTTP request > handling that you feel is important to demonstrate? *must* is too strong, but I would use a very strong *should*. I think the stdlib should provide simple source-included examples of most things. I think the case is even stronger when it is: (1) a fairly simple protocol (such as version 1 of http was) -- QUIC wouldn't count for a simple demonstration. (2) something new users are likely to find motivating. Short of "here is a way to do IO", and maybe "write a simple game", "get something from the web" is probably the most obvious case. (3) something where bootstrapping might be an issue (network protocols, particularly web downloads). Network access is not an always-available resource. Even when it is available, there is sometimes a barrier between "available in python" and "I could read it on my phone, but can't get it open in python". (4) something where a a beginner is likely to be overwhelmed by choices if we just say "use a 3rd party module". (5) something with a backwards-compatibility story in the stdlib already. As a side note, are there concerns about urllib.robotparser being broken or obsolete, or was that part of the deprecation proposal just contagion from urllib.request? -jJ ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/HF5V6SFWV4BZUAOJTSEBD6DSZWSJONAM/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: It's now time to deprecate the stdlib urllib module
On Mon, Feb 7, 2022 at 5:51 PM Jim J. Jewett wrote: > There are problems with urllib. With hindsight, it would have been nice > to do a few things differently. But that doesn't make migrating away from > it any easier. > > This thread has mentioned several "better" alternatives -- but with the > exception of 3rd party Requests, the docs don't even mention them. > And as soon as httpx hits 1.0 I plan to update the docs to point at it. But until that occurs I personally do not want to have a debate about whether httpx's 0.N version number means it shouldn't be recommended. > > Saying "You can do better, but we won't tell you how" is pretty rude to > beginners, and we should not do it. > > Delegating to the operating system may be sensible for a production > system, and there is nothing wrong with saying so in the docs, and it would > be great if we made that easy. But it is absolutely not a reasonable > replacement for a straightforward (possibly inefficient and non-scalable) > implementation written in python that people can read and use for > reference. urllib shouldn't be deprecated until we have a better solution > to *that* use case that is also in the stdlib. (That might well be worth > doing, but it should happen before the deprecation.) > Why do you think the stdlib *must *provide an example implementation for this specific scenario? Is there something unique to HTTP request handling that you feel is important to demonstrate? ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/XNYPBSXW7DIBQN5YLXCWUOBLIEBRMPEP/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: It's now time to deprecate the stdlib urllib module
On Wed, 9 Feb 2022 at 04:50, Christopher Barker wrote: > So my thoughts: > > Rather than deprecate urllib, we refactor it a bit (and maybe deprecate parts > of it), so that it: > > 1) contains the core building blocks: e.g. urllib.parse with which to build > "better" libraries, > > 2) make the "easy stuff easy" -- e.g. a basic http: request. > - For instance, I'd like to see an API that's kind of "requests-lite" > > And much better docs explain when you should use it, and when you might want > to look for another library (even if it's the stdlib http.client) This sounds like a decent plan. I'd like to add my voice to the appeal to keep urllib.parse; in fact, of all the places where I've used anything from urllib, only two of them are anything other than urllib.parse. (One is an old script that I specifically wanted to be as shareable as possible, so I restricted it to the stdlib; the other catches urllib.error.URLError thrown by a third-party library.) If there are security issues with urllib.request, I wouldn't shed many tears about its deprecation. A "requests-lite" module would certainly be handy, but it's hard to judge how much wants to be in the stdlib and how much can be pushed off to a pip-installable module: > the first thing I do for beginners is to point them to requests, as it's > easier to use :-) Exactly my thoughts :) But a very very simple HTTP/HTTPS GET request endpoint would be a great bootstrapping aid. Consider: with nothing but the stdlib, you could fetch a file from some server, unzip it (zipfile module), and import it. For building dirt-simple install scripts, this kind of thing is really REALLY handy, and I'd rather not have to use plain TCP sockets to do it :) ChrisA ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/VVGKL2TA3UXDD3RDASIYSGEOLMTKPOSH/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: It's now time to deprecate the stdlib urllib module
On Tue, Feb 8, 2022 at 1:31 AM Marc-Andre Lemburg wrote: > FWIW: I find this discussion a bit strange. Python's stdlib is > supposed to provide basic tooling for a breadth of use cases, > with emphasis on "basic" and "breadth". > > urllib is such a basic library and covers one of the main > use cases for Python we have. Exactly. However, it is also a bit of an "attractive nuisance". For example, there is a lot of code in some of my major projects that use urllib for more complex cases, where we'd be much better off with requests, or ... Yes, that's mainly the result of our team's atrocious lack of code review, but this code was written by smart productive people. The fact is that the stdlib is the first place folks look for stuff, and if what you are looking for is there, then many people won't think: "maybe there's a better, and well supported, package on PyPi for this" So my thoughts: Rather than deprecate urllib, we refactor it a bit (and maybe deprecate parts of it), so that it: 1) contains the core building blocks: e.g. urllib.parse with which to build "better" libraries, 2) make the "easy stuff easy" -- e.g. a basic http: request. - For instance, I'd like to see an API that's kind of "requests-lite" And much better docs explain when you should use it, and when you might want to look for another library (even if it's the stdlib http.client) I note that I don't see any discussion of that in urllib dics, whereas http.client does have the suggestion front and center that you might want to use requests. Yes, the web moves fast, but it's also pretty backward compatible - folks keep old browsers around for an astonishingly long time! So I'd think cPython release Cycle shold be able to keep up with all but the very latest. > make Python less attractive and less useful for beginners. On this point, I'm not so sure -- the first thing I do for beginners is to point them to requests, as it's easier to use :-) -- but see my point above, that's why it would be good to put an easy-to-use-for-the-basics API in the stdlib -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/FLU522SFBBSMMXHRDBXYYGXD3IQ2CD6K/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: It's now time to deprecate the stdlib urllib module
... and there are also plenty examples out there of using http.server as a quick HTTP server for trying out new things, testing and teaching. FWIW: I find this discussion a bit strange. Python's stdlib is supposed to provide basic tooling for a breadth of use cases, with emphasis on "basic" and "breadth". urllib is such a basic library and covers one of the main use cases for Python we have. It would be pretty much beside the point of the stdlib to remove such basic functionality and make Python less attractive and less useful for beginners. Anything more complex can be dealt with on PyPI, as it is already happening. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Feb 08 2022) >>> Python Projects, Coaching and Support ...https://www.egenix.com/ >>> Python Product Development ...https://consulting.egenix.com/ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/ ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/AKWL7QKOZJDODM3LIX4BBYZ4HMXZC3CZ/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: It's now time to deprecate the stdlib urllib module
There are problems with urllib. With hindsight, it would have been nice to do a few things differently. But that doesn't make migrating away from it any easier. This thread has mentioned several "better" alternatives -- but with the exception of 3rd party Requests, the docs don't even mention them. Saying "You can do better, but we won't tell you how" is pretty rude to beginners, and we should not do it. Delegating to the operating system may be sensible for a production system, and there is nothing wrong with saying so in the docs, and it would be great if we made that easy. But it is absolutely not a reasonable replacement for a straightforward (possibly inefficient and non-scalable) implementation written in python that people can read and use for reference. urllib shouldn't be deprecated until we have a better solution to *that* use case that is also in the stdlib. (That might well be worth doing, but it should happen before the deprecation.) -jJ ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/JI5CFS3WYXQEXKSEZH2ZTE3JJJ7AUAMW/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: It's now time to deprecate the stdlib urllib module
On Mon, Feb 7, 2022 at 4:56 AM Steve Dower wrote: > On 2/6/2022 4:44 PM, Christian Heimes wrote: > > If I had the power and time, then I would replace urllib with a simpler, > > reduced HTTP client that uses platform's HTTP library under the hood > > (WinHTTP on Windows, NSURLSession (?) on macOS, Web API for Emscripten, > > maybe curl on Linux/BSD). For non-trivial HTTP requests, httpx or > > aiohttp are much better suited than urllib. > > I'm +1 on this, though I think it would have to be in place before the > "two releases until removal" kicked in for urllib.request. > Yes, we definitely couldn't deprecate anything regarding downloading over HTTP w/o having a replacement in place. I am not even considering deprecating urllib.parse. > > The stdlib can't get by without at least the basic functionality of curl > built in natively. But we can do this on most platforms without > vendoring OpenSSL, which is a HUGE win. Then our default behaviour could > correctly use proxies (including auto-config), CA certificate bundles, > integrated authentication, and other OS features that are currently > ignored by our core. > I also agree this is the best of the 2 options, although I would also accept Christian's other option of a more targeted, tight, standards-compliant solution if that would somehow lead to less maintenance overhead. And when I say "less maintenance overhead," I really mean it: I would question whether following redirects as an option is worth the overhead in this scenario. I'm very much thinking of this from a bootstrap/script/learning scenario and pushing people towards e.g. httpx for anything fancier. > > Chances are we could keep simple urlopen() calls in place, and use the > deprecation as a "potential change of behaviour" without necessarily > having to break the API. I'm yet to come across a case where making a > trivial urlopen() call _better_ would break things (the cases I've seen > that would break are things like "using an OpenSSL environment variable > to configure something that I wish had been automatic"). > We could try to get fancy and only raise DeprecationWarning in cases where things won't work to extend when we consider pushing people to the better API. > > The nature of network/internet access is that we have to break things > periodically anyway, because all the code that was written over the last > 30+ years is eventually going to be found to be exploitable. I'd be > quite happy to say "Python gives you what your OS gives you; update the > OS for fixes". > Exactly. My guideline for this whole idea would be that if it doesn't make sense in a beginner course that says to "download an HTML page and count all the anchor tags," then it's too fancy for the stdlib. And that should be enough to bootstrap installers which then get you httpx. Otherwise the networking stack moves too fast (from a security POV) and requires unique knowledge to get right that we have simply not kept up as much as we would like. I think it's okay to admit it might be time to trim with part of the stdlib down to something that we can manage easily (but we *cannot* drop the ability to download something over HTTPS). ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/VP2WXOBWPGAX7UIH25DWRSYWFEDNINNU/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: It's now time to deprecate the stdlib urllib module
My two cents: 1. I’ve grepped ("ag-ed" acually) through all my code bases, and as noted by others, urllib.parse is used in many places. urllib.request never is, as we've been using requests or httpx instead. IMHO and for the context I'm using it (YMMV), urllib.parse is useful and should be kept (could be replaced by third-party libs like furl or yarl, but probably not a great idea). 2. Obviously bootstrapping pip or similar package management tools should be a concern. 3. Overall, I think the days were "battery included" was a positive argument are over. I'd rather make the standard library leaner, and focussed on core language constructs. The only advantage that I can see is that having stuff in the standard lib can reduce fragmentation, at least initially, and ensure a very high level of quality and support, but at some point in the future history has shown us that usually a better alternative ends up emerging. 4. When deprecating and removing stuff from the stdlib and if there are no dependency issues, it should be possible to more the components to their own dedicated packages, maybe under an "extra" or "legacy" organisation. The question of support stays open, though. Regards, S. On 6 Feb 2022 at 23:39:55, Gregory P. Smith wrote: > > > On Sun, Feb 6, 2022 at 9:13 AM Paul Moore wrote: > >> On Sun, 6 Feb 2022 at 16:51, Christian Heimes >> wrote: >> >> > The urllib package -- and to some degree also the http package -- are >> > constant source of security bugs. The code is old and the parsers for >> > HTTP and URLs don't handle edge cases well. Python core lacks a true >> > maintainer of the code. To be honest, we have to admit defeat and be up >> > front that urllib is not up to the task for this decade. It was designed >> > written during a more friendly, less scary time on the internet. >> > >> > If I had the power and time, then I would replace urllib with a simpler, >> > reduced HTTP client that uses platform's HTTP library under the hood >> > (WinHTTP on Windows, NSURLSession (?) on macOS, Web API for Emscripten, >> > maybe curl on Linux/BSD). For non-trivial HTTP requests, httpx or >> > aiohttp are much better suited than urllib. >> > >> > The second best option is to reduce the feature set of urllib to core >> > HTTP (no ftp, proxy, HTTP auth) and a partial rewrite with stricter, >> > more standard conform parsers for urls, query strings, and RFC 2822 >> > instead of RFC 822 for headers. >> >> I'd likely be fine with either of these two options. I'm not worried >> about supporting "advanced" uses. But having no way of getting a file >> from the internet without relying on 3rd party packages seems like a >> huge gap in functionality for a modern language. And having to use a >> 3rd party library to parse URLs will simply push more people to use >> home-grown regexes rather than something safe and correct. Remember >> that a lot of Python users are not professional software developers, >> but scientists, data analysts, and occasional users, for whom the >> existence of something in the stdlib is the *only* reason they have >> any idea that URLs need specialised parsing in the first place. >> >> And while we all like to say 3rd party modules are great, the reality >> is that they provide a genuine problem for many of these >> non-specialist users - and I say that as a packaging specialist and >> pip maintainer. The packaging ecosystem is *not* newcomer-friendly in >> the way that core Python is, much as we're trying to improve that >> situation. >> >> I've said it previously, but I'll reiterate - IMO this *must* have a >> PEP, and that PEP must be clear that the intention is to *remove* >> urllib, not simply to "deprecate and then think about it". That could >> be making it part of PEP 594, or a separate PEP, but one way or >> another it needs a PEP. >> > > This would need to be it's own PEP. urllib et. al. are used by virtually > everybody. They're highly used batteries. > > I'm -1 on deprecating it for that reason alone. > > Christian proposes that having a simpler scope rewrite of it might be > nice, but I think disruption to the world and loss of trust in Python would > be similar either way. > > -gps > > >> >> Paul >> ___ >> Python-Dev mailing list -- python-dev@python.org >> To unsubscribe send an email to python-dev-le...@python.org >> https://mail.python.org/mailman3/lists/python-dev.python.org/ >> Message archived at >> https://mail.python.org/archives/list/python-dev@python.org/message/KT6TGUBTLLETHES2OVVGZWSGYC5JCEKC/ >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > ___ > Python-Dev mailing list -- python-dev@python.org > To unsubscribe send an email to python-dev-le...@python.org > https://mail.python.org/mailman3/lists/python-dev.python.org/ > Message archived at > https://mail.python.org/archives/list/python-dev@python.org/message/QMSFZBQJFWKFFE3LFQLQE2AT6WKMLPGL/ > Code
[Python-Dev] Re: It's now time to deprecate the stdlib urllib module
On 2/6/2022 4:44 PM, Christian Heimes wrote: If I had the power and time, then I would replace urllib with a simpler, reduced HTTP client that uses platform's HTTP library under the hood (WinHTTP on Windows, NSURLSession (?) on macOS, Web API for Emscripten, maybe curl on Linux/BSD). For non-trivial HTTP requests, httpx or aiohttp are much better suited than urllib. I'm +1 on this, though I think it would have to be in place before the "two releases until removal" kicked in for urllib.request. The stdlib can't get by without at least the basic functionality of curl built in natively. But we can do this on most platforms without vendoring OpenSSL, which is a HUGE win. Then our default behaviour could correctly use proxies (including auto-config), CA certificate bundles, integrated authentication, and other OS features that are currently ignored by our core. Chances are we could keep simple urlopen() calls in place, and use the deprecation as a "potential change of behaviour" without necessarily having to break the API. I'm yet to come across a case where making a trivial urlopen() call _better_ would break things (the cases I've seen that would break are things like "using an OpenSSL environment variable to configure something that I wish had been automatic"). The nature of network/internet access is that we have to break things periodically anyway, because all the code that was written over the last 30+ years is eventually going to be found to be exploitable. I'd be quite happy to say "Python gives you what your OS gives you; update the OS for fixes". Cheers, Steve ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/2P3RL7PAOZZFZ7PRGO6FJRMKR6MM2VXH/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: It's now time to deprecate the stdlib urllib module
> Christian proposes that having a simpler scope rewrite of it might be nice, > but I think disruption to the world and loss of trust in Python would be > similar either way. Please don't remove urllib. There are mountains of code that rely on it. A much better idea, IMO, would be to add a new modern API to http.client, where http functionality properly belongs. Maybe a function signature like this: http.client.get(url, user_agent = None, basic_auth=(None, None), custom_headers=None). That would one line to cover many use basic use cases, including user agent and basic auth. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/EV7R35OMQ7QWY7Y744FX7Y7VI7AO5CWX/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: It's now time to deprecate the stdlib urllib module
On Sun, Feb 6, 2022 at 9:13 AM Paul Moore wrote: > On Sun, 6 Feb 2022 at 16:51, Christian Heimes > wrote: > > > The urllib package -- and to some degree also the http package -- are > > constant source of security bugs. The code is old and the parsers for > > HTTP and URLs don't handle edge cases well. Python core lacks a true > > maintainer of the code. To be honest, we have to admit defeat and be up > > front that urllib is not up to the task for this decade. It was designed > > written during a more friendly, less scary time on the internet. > > > > If I had the power and time, then I would replace urllib with a simpler, > > reduced HTTP client that uses platform's HTTP library under the hood > > (WinHTTP on Windows, NSURLSession (?) on macOS, Web API for Emscripten, > > maybe curl on Linux/BSD). For non-trivial HTTP requests, httpx or > > aiohttp are much better suited than urllib. > > > > The second best option is to reduce the feature set of urllib to core > > HTTP (no ftp, proxy, HTTP auth) and a partial rewrite with stricter, > > more standard conform parsers for urls, query strings, and RFC 2822 > > instead of RFC 822 for headers. > > I'd likely be fine with either of these two options. I'm not worried > about supporting "advanced" uses. But having no way of getting a file > from the internet without relying on 3rd party packages seems like a > huge gap in functionality for a modern language. And having to use a > 3rd party library to parse URLs will simply push more people to use > home-grown regexes rather than something safe and correct. Remember > that a lot of Python users are not professional software developers, > but scientists, data analysts, and occasional users, for whom the > existence of something in the stdlib is the *only* reason they have > any idea that URLs need specialised parsing in the first place. > > And while we all like to say 3rd party modules are great, the reality > is that they provide a genuine problem for many of these > non-specialist users - and I say that as a packaging specialist and > pip maintainer. The packaging ecosystem is *not* newcomer-friendly in > the way that core Python is, much as we're trying to improve that > situation. > > I've said it previously, but I'll reiterate - IMO this *must* have a > PEP, and that PEP must be clear that the intention is to *remove* > urllib, not simply to "deprecate and then think about it". That could > be making it part of PEP 594, or a separate PEP, but one way or > another it needs a PEP. > This would need to be it's own PEP. urllib et. al. are used by virtually everybody. They're highly used batteries. I'm -1 on deprecating it for that reason alone. Christian proposes that having a simpler scope rewrite of it might be nice, but I think disruption to the world and loss of trust in Python would be similar either way. -gps > > Paul > ___ > Python-Dev mailing list -- python-dev@python.org > To unsubscribe send an email to python-dev-le...@python.org > https://mail.python.org/mailman3/lists/python-dev.python.org/ > Message archived at > https://mail.python.org/archives/list/python-dev@python.org/message/KT6TGUBTLLETHES2OVVGZWSGYC5JCEKC/ > Code of Conduct: http://python.org/psf/codeofconduct/ > ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/QMSFZBQJFWKFFE3LFQLQE2AT6WKMLPGL/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: It's now time to deprecate the stdlib urllib module
On 2/6/22 6:08 AM, Victor Stinner wrote: > I propose to deprecate the urllib module in Python 3.11. It would emit > a DeprecationWarning which warn users, so users should consider better > alternatives like urllib3 or httpx: well known modules, better > maintained, more secure, support HTTP/2 (httpx), etc. Besides the needs of pip, round-up, etc., I think we should keep whatever parts of urllib, cgi, cgitb, http, etc., are necessary for basic serving/consuming of web pages for the same reason we ended up keeping the wave module -- it's fun and engaging for a younger audience. Having one computer get information from another is pretty cool. If we need to do some trimming and rearranging of the above modules, that's fine, but I think losing all the functionality would be a mistake. -- ~Ethan~ ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/TGENXEKPFCIZUQD63ROCIK2WGAN3F7XL/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: It's now time to deprecate the stdlib urllib module
Chiming in to say that whichever way this goes urllib3 would be okay. We can always vendor the small amount of http.client logic we actually depend on for HTTP connections. I do agree that the future of HTTP clearly lies outside the standard library, our team is already thinking about ways to integrate non-http.client HTTP implementations (like HTTP/2). My feeling is that it will be difficult to remove urllib.parse, however urllib.request is much less depended on and more likely to be deprecated and removed. Also clarifying that httplib2 doesn't support HTTP/2, the HTTP/2 package of interest is usually h2: https://pypi.org/project/h2. "http3" also doesn't implement HTTP/3 (bad name), this was one of the potential names for the HTTPX project. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/AW3JP6DHEAKME5FTFNRHV3EJMPJQEDME/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: It's now time to deprecate the stdlib urllib module
On Sun, Feb 6, 2022 at 3:35 PM Paul Moore wrote: > urllib.request may not be "best practice", but it's still extremely > useful for simple situations, and urllib.parse is useful for basic > handling of URLs.Yes, the more complex aspects of urllib are better > handled by external packages, but that's not sufficient argument for > removing the package altogether. There are many situations where > external dependencies are unsuitable. Also, there's quite a lot of > usage of urllib in the stdlib itself - how would you propose to > replace that? > (...) > In addition, pip relies pretty heavily on urllib (parse and request), > and pip has a bootstrapping issue, so using 3rd party libraries is > non-trivial. If a project like urllib3 uses it, urllib can be copied there and its maintenance will continue there. Or maybe the maintenance can be moved into a new project on PyPI like "legacy_urllib". It's situation similar to the distutils deprecation: setuptools decided to include a hidden copy of the distutils in its source, and the distutils maintenance moved there. IMO it's a great move. setuptools is a better place than Python to maintain this code: setuptools release cycle is faster and is related to pip. Python release cycle is slow and the distutils API was too big. Since the distutils API is now hidden, setuptools can freely drop code and changing APIs without affecting the public setuptools API. I'm well aware that moving distutils into setuptools caused troubles. IMO it is worth it and we have to go trough these issues once for a better maintenance burden in the long term. > In any case, why is this being proposed as a simple posting on > python-dev? There's already PEP 594 for removals from the stdlib. urllib is bigger than modules proposed for deprecation in PEP 594. Also, I expect that deprecating urllib is more controversial. Victor -- Night gathers, and now my watch begins. It shall not end until my death. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/UR6BT5S2S4WGEI62MRWHCRAPZNTQXTVT/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: It's now time to deprecate the stdlib urllib module
El dom, 6 feb 2022 a las 9:12, Paul Moore () escribió: > On Sun, 6 Feb 2022 at 16:51, Christian Heimes > wrote: > > > The urllib package -- and to some degree also the http package -- are > > constant source of security bugs. The code is old and the parsers for > > HTTP and URLs don't handle edge cases well. Python core lacks a true > > maintainer of the code. To be honest, we have to admit defeat and be up > > front that urllib is not up to the task for this decade. It was designed > > written during a more friendly, less scary time on the internet. > > > > If I had the power and time, then I would replace urllib with a simpler, > > reduced HTTP client that uses platform's HTTP library under the hood > > (WinHTTP on Windows, NSURLSession (?) on macOS, Web API for Emscripten, > > maybe curl on Linux/BSD). For non-trivial HTTP requests, httpx or > > aiohttp are much better suited than urllib. > > > > The second best option is to reduce the feature set of urllib to core > > HTTP (no ftp, proxy, HTTP auth) and a partial rewrite with stricter, > > more standard conform parsers for urls, query strings, and RFC 2822 > > instead of RFC 822 for headers. > > I'd likely be fine with either of these two options. I'm not worried > about supporting "advanced" uses. But having no way of getting a file > from the internet without relying on 3rd party packages seems like a > huge gap in functionality for a modern language. And having to use a > 3rd party library to parse URLs will simply push more people to use > home-grown regexes rather than something safe and correct. Remember > that a lot of Python users are not professional software developers, > but scientists, data analysts, and occasional users, for whom the > existence of something in the stdlib is the *only* reason they have > any idea that URLs need specialised parsing in the first place. > > And while we all like to say 3rd party modules are great, the reality > is that they provide a genuine problem for many of these > non-specialist users - and I say that as a packaging specialist and > pip maintainer. The packaging ecosystem is *not* newcomer-friendly in > the way that core Python is, much as we're trying to improve that > situation. > > I've said it previously, but I'll reiterate - IMO this *must* have a > PEP, and that PEP must be clear that the intention is to *remove* > urllib, not simply to "deprecate and then think about it". That could > be making it part of PEP 594, or a separate PEP, but one way or > another it needs a PEP. > PEP 594 is meant to be a set of uncontroversial removals of mostly unused modules. Removing urllib is obviously not going to be uncontroversial, so it should be discussed in a separate PEP. > > Paul > ___ > Python-Dev mailing list -- python-dev@python.org > To unsubscribe send an email to python-dev-le...@python.org > https://mail.python.org/mailman3/lists/python-dev.python.org/ > Message archived at > https://mail.python.org/archives/list/python-dev@python.org/message/KT6TGUBTLLETHES2OVVGZWSGYC5JCEKC/ > Code of Conduct: http://python.org/psf/codeofconduct/ > ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/HQ5J7BTB5WW77CQIQXX5FQKBOOIADBYR/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: It's now time to deprecate the stdlib urllib module
On Sun, 6 Feb 2022 at 16:51, Christian Heimes wrote: > The urllib package -- and to some degree also the http package -- are > constant source of security bugs. The code is old and the parsers for > HTTP and URLs don't handle edge cases well. Python core lacks a true > maintainer of the code. To be honest, we have to admit defeat and be up > front that urllib is not up to the task for this decade. It was designed > written during a more friendly, less scary time on the internet. > > If I had the power and time, then I would replace urllib with a simpler, > reduced HTTP client that uses platform's HTTP library under the hood > (WinHTTP on Windows, NSURLSession (?) on macOS, Web API for Emscripten, > maybe curl on Linux/BSD). For non-trivial HTTP requests, httpx or > aiohttp are much better suited than urllib. > > The second best option is to reduce the feature set of urllib to core > HTTP (no ftp, proxy, HTTP auth) and a partial rewrite with stricter, > more standard conform parsers for urls, query strings, and RFC 2822 > instead of RFC 822 for headers. I'd likely be fine with either of these two options. I'm not worried about supporting "advanced" uses. But having no way of getting a file from the internet without relying on 3rd party packages seems like a huge gap in functionality for a modern language. And having to use a 3rd party library to parse URLs will simply push more people to use home-grown regexes rather than something safe and correct. Remember that a lot of Python users are not professional software developers, but scientists, data analysts, and occasional users, for whom the existence of something in the stdlib is the *only* reason they have any idea that URLs need specialised parsing in the first place. And while we all like to say 3rd party modules are great, the reality is that they provide a genuine problem for many of these non-specialist users - and I say that as a packaging specialist and pip maintainer. The packaging ecosystem is *not* newcomer-friendly in the way that core Python is, much as we're trying to improve that situation. I've said it previously, but I'll reiterate - IMO this *must* have a PEP, and that PEP must be clear that the intention is to *remove* urllib, not simply to "deprecate and then think about it". That could be making it part of PEP 594, or a separate PEP, but one way or another it needs a PEP. Paul ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/KT6TGUBTLLETHES2OVVGZWSGYC5JCEKC/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: It's now time to deprecate the stdlib urllib module
On Sun, 6 Feb 2022 at 14:15, Victor Stinner wrote: > I propose to deprecate the urllib module in Python 3.11. It would emit > a DeprecationWarning which warn users, so users should consider better > alternatives like urllib3 or httpx: well known modules, better > maintained, more secure, support HTTP/2 (httpx), etc. Also, I'm -1 on deprecating as a way of saying we *might* remove the module, but haven't decided yet. That isn't (IMO) what deprecation is for, and it doesn't give users a clear message, as maybe they'll be fine continuing to rely on urllib. The net result would likely to be for people to simply become more inclined to ignore deprecation warnings. Conversely, if the idea is to deprecate, and then in a couple of years say "well, it's been deprecated for a while now, so let's remove it" then that seems to me to be a rather cynical way of deflecting arguments, as we can say now "well, it's only deprecation", in spite of the fact that the real intention is to remove. Paul ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/ICHMNBE7PMOHCGXLT4REP2HJZAGSOCHJ/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: It's now time to deprecate the stdlib urllib module
On 06/02/2022 15.08, Victor Stinner wrote: Hi, I propose to deprecate the urllib module in Python 3.11. It would emit a DeprecationWarning which warn users, so users should consider better alternatives like urllib3 or httpx: well known modules, better maintained, more secure, support HTTP/2 (httpx), etc. I don't propose to schedule its removal. Let's discuss the removal in 1 or 2 years. -- urllib has many abstraction to support a wide range of protocols with "handlers": HTTP, HTTPS, FTP, "local file", proxy, HTTP authentication, HTTP Cookie, etc. A simple HTTP request using Basic Authentication requires 10-20 lines of code, whereas it should be a single line. Users (me included) don't like urllib API which was too complicated for common tasks. -- [...] urllib is a package made of 4 parts: * urllib.request for opening and reading URLs * urllib.error containing the exceptions raised by urllib.request * urllib.parse for parsing URLs * urllib.robotparser for parsing robots.txt files I propose to deprecate all of them. Maybe the deprecation can be different for each sub-module? Thanks for bringing this topic forward, Victor! Disclaimer: I proposed the removal of urllib today in Python core's internal chat. The urllib package -- and to some degree also the http package -- are constant source of security bugs. The code is old and the parsers for HTTP and URLs don't handle edge cases well. Python core lacks a true maintainer of the code. To be honest, we have to admit defeat and be up front that urllib is not up to the task for this decade. It was designed written during a more friendly, less scary time on the internet. If I had the power and time, then I would replace urllib with a simpler, reduced HTTP client that uses platform's HTTP library under the hood (WinHTTP on Windows, NSURLSession (?) on macOS, Web API for Emscripten, maybe curl on Linux/BSD). For non-trivial HTTP requests, httpx or aiohttp are much better suited than urllib. The second best option is to reduce the feature set of urllib to core HTTP (no ftp, proxy, HTTP auth) and a partial rewrite with stricter, more standard conform parsers for urls, query strings, and RFC 2822 instead of RFC 822 for headers. Christian ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/WYVETVHMGRS4CI47GTFY6W7B43YLSJH2/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: It's now time to deprecate the stdlib urllib module
That was just one example, here are others in the pip code base that urllib.request is used for more than the pathname functions, they are all vendored or tests but would still be disruptive to remove: https://github.com/pypa/pip/blob/main/tests/lib/local_repos.py https://github.com/pypa/pip/blob/main/src/pip/_vendor/webencodings/mklabels.py https://github.com/pypa/pip/blob/main/src/pip/_vendor/requests/compat.py https://github.com/pypa/pip/blob/main/src/pip/_vendor/distlib/compat.py In particular the vendored library, and replacement you suggest, "requests" is very dependent on the proxy functions such as "getproxies" that are currently in urllib.requests. More than once I've had to go down the rabbit hole of seeing where those functions get that info for each platform. Damian (he/him) On Sun, Feb 6, 2022 at 11:10 AM Victor Stinner wrote: > On Sun, Feb 6, 2022 at 3:48 PM Damian Shaw > wrote: > > > > Pip vendors requests for network calls: > https://github.com/pypa/pip/tree/main/src/pip/_vendor/requests > > > > But still does depend on functions from urllib.parse and urllib.request > in many places: > https://github.com/pypa/pip/blob/main/src/pip/_internal/utils/urls.py > > Aha, it doesn't use urllib.request to open a HTTP connection, it only > uses pathname2url() and url2pathname() functions of urllib.request. > Maybe we can keep these functions. I'm not sure why they don't belong > to urllib.parse. > > If urllib.parse is widely used, maybe we can keep this module. > > Victor > ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/ACA7AU4W6XB35PA6O4IYBPQSQD3HFLFS/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: It's now time to deprecate the stdlib urllib module
On Sun, Feb 06, 2022 at 03:08:40PM +0100, Victor Stinner wrote: > I propose to deprecate the urllib module in Python 3.11. It would emit > a DeprecationWarning which warn users, so users should consider better > alternatives like urllib3 or httpx: well known modules, better > maintained, more secure, support HTTP/2 (httpx), etc. > > I don't propose to schedule its removal. Let's discuss the removal in > 1 or 2 years. I am not certain if we can deprecate/remove the whole 'urllib' module without any good plan for replacement of its facilities within the stdlib. There is heavy usage of urllib.parse in multiple projects (including in urllib3), and parse is semi-maintained. > Let's come back to urllib: > * It's API is too complicated > * It doesn't support HTTP/2 nor HTTP/3 > * It's barely maintained: there are 121 open issues including 3 security > issues! I agree with all of these. I think that removing the old cruft code, might lead to us to closing a number of open issues. > The 3 open security issues: Just because if something marked 'security' doesn't make it actionable too. For instance the last one asks for urllib to maintain client state to be safe against a scenario, which it never did. I don't think it is time to deprecate the urllib module. It will be too disruptive IMO. SO, -1. Right now, I don't have a solution. My suggestion will be we close old bugs, and remove old code (aka maintain a bit, and it falls on me too). Then we can probably chart out a deprecation / replacement path in a non-disruptive manner. -- Senthil ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/ORQEJXJTZDYYV53MHKXTJ3Q6W72AUSGA/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: It's now time to deprecate the stdlib urllib module
On Sun, Feb 6, 2022 at 3:48 PM Damian Shaw wrote: > > Pip vendors requests for network calls: > https://github.com/pypa/pip/tree/main/src/pip/_vendor/requests > > But still does depend on functions from urllib.parse and urllib.request in > many places: > https://github.com/pypa/pip/blob/main/src/pip/_internal/utils/urls.py Aha, it doesn't use urllib.request to open a HTTP connection, it only uses pathname2url() and url2pathname() functions of urllib.request. Maybe we can keep these functions. I'm not sure why they don't belong to urllib.parse. If urllib.parse is widely used, maybe we can keep this module. Victor ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/PDFGPDGESBLSBHVLINCPAFEOHXQWFIRI/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: It's now time to deprecate the stdlib urllib module
Pip vendors requests for network calls: https://github.com/pypa/pip/tree/main/src/pip/_vendor/requests But still does depend on functions from urllib.parse and urllib.request in many places: https://github.com/pypa/pip/blob/main/src/pip/_internal/utils/urls.py Damian (he/him) On Sun, Feb 6, 2022 at 9:36 AM Dong-hee Na wrote: > I am not an expert about pip, > but it will be not a problem about installing the pip module once CPython > removes urllib module from stdlib? > > Warm regards, > Dong-hee > > 2022년 2월 6일 (일) 오후 11:13, Victor Stinner 님이 작성: > >> Hi, >> >> I propose to deprecate the urllib module in Python 3.11. It would emit >> a DeprecationWarning which warn users, so users should consider better >> alternatives like urllib3 or httpx: well known modules, better >> maintained, more secure, support HTTP/2 (httpx), etc. >> >> I don't propose to schedule its removal. Let's discuss the removal in >> 1 or 2 years. >> >> -- >> >> urllib has many abstraction to support a wide range of protocols with >> "handlers": HTTP, HTTPS, FTP, "local file", proxy, HTTP >> authentication, HTTP Cookie, etc. A simple HTTP request using Basic >> Authentication requires 10-20 lines of code, whereas it should be a >> single line. >> >> Users (me included) don't like urllib API which was too complicated >> for common tasks. >> >> -- >> >> Unhappy users created multiple better alternatives to the stdlib urllib >> module. >> >> In 2008, the "urllib3" module was created to provide an API designed >> to be as simple as possible for the most common HTTP and HTTPS >> requests. Example: >> >>req = http.request('GET', 'http://httpbin.org/robots.txt'). >> >> In 2011, the "requests" module based on urllib3 was created. >> >> In 2013, the "aiohttp" module based on asyncio was created. >> >> In 2015, new "httpx" module was created: >> >> req = httpx.get('https://www.example.org/') >> >> Not only httpx has a regular "synchronous" API (blocking function >> calls), but it also has an asynchronous API! >> >> Sadly, while HTTP/3 is being developed, it seems like in this list, >> httpx is the only HTTP client library supporting HTTP/2 currently :-( >> >> For HTTP/2, I also found the "httplib2" module. >> >> For HTTP/3, I found the "http3" and "aioquic" modules. >> >> -- >> >> Let's come back to urllib: >> >> * It's API is too complicated >> * It doesn't support HTTP/2 nor HTTP/3 >> * It's barely maintained: there are 121 open issues including 3 security >> issues! >> >> The 3 open security issues: >> >> * bpo-33661 open 2018; >> * bpo-36338 open in 2019; >> * bpo-45795 open in 2021. >> >> Usually, it's bad when you refer to an open security issue by its >> creation year :-( >> >> The urllib module has long history of security vulnerabilities. List >> of *fixed* vulnerabilities: >> >> * 2011 (bpo-11662): >> https://python-security.readthedocs.io/vuln/urllib-redirect.html >> * 2017 (bpo-30119): >> >> https://python-security.readthedocs.io/vuln/urllib-ftp-stream-injection.html >> * 2017 (bpo-30500): >> >> https://python-security.readthedocs.io/vuln/urllib-connects-wrong-host.html >> * 2019 (bpo-35907): >> https://python-security.readthedocs.io/vuln/urllib-local-file-scheme.html >> * 2019 (bpo-38826): >> https://python-security.readthedocs.io/vuln/urllib-basic-auth-regex.html >> * 2021 (bpo-42967): >> >> https://python-security.readthedocs.io/vuln/urllib-query-string-semicolon-separator.html >> * 2021 (bpo-43075): >> https://python-security.readthedocs.io/vuln/urllib-basic-auth-regex2.html >> * 2021 (bpo-44022): >> https://python-security.readthedocs.io/vuln/urllib-100-continue-loop.html >> >> urllib is a package made of 4 parts: >> >> * urllib.request for opening and reading URLs >> * urllib.error containing the exceptions raised by urllib.request >> * urllib.parse for parsing URLs >> * urllib.robotparser for parsing robots.txt files >> >> I propose to deprecate all of them. Maybe the deprecation can be >> different for each sub-module? >> >> Victor >> -- >> Night gathers, and now my watch begins. It shall not end until my death. >> ___ >> Python-Dev mailing list -- python-dev@python.org >> To unsubscribe send an email to python-dev-le...@python.org >> https://mail.python.org/mailman3/lists/python-dev.python.org/ >> Message archived at >> https://mail.python.org/archives/list/python-dev@python.org/message/EZ6O7MOPZ4GA75MKTDO7LAELKXUHK2QS/ >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > ___ > Python-Dev mailing list -- python-dev@python.org > To unsubscribe send an email to python-dev-le...@python.org > https://mail.python.org/mailman3/lists/python-dev.python.org/ > Message archived at > https://mail.python.org/archives/list/python-dev@python.org/message/E6GN2THYCNQ2Q3CGMSH7GRCDFOOFDDCQ/ > Code of Conduct: http://python.org/psf/codeofconduct/ > ___ Python-Dev mailing list -- python-dev@pytho
[Python-Dev] Re: It's now time to deprecate the stdlib urllib module
Speaking from anecdotal experience, "urllib.parse" is a very popular and highly depended on module, I would be shocked if removing it wouldn't be very disruptive. In fact a quick search of the replacement modules you mention see that they all rely it on it, here is an example from each: * requests: https://github.com/psf/requests/blob/99b3b492418d0751ca960178d274f89805095e4c/requests/sessions.py#L121 * aiohttp: https://github.com/aio-libs/aiohttp/blob/7d78fd01dbe983d119141d7f2775aefd42494f99/aiohttp/formdata.py#L129 * httpx: https://github.com/encode/httpx/blob/b7dc0c3df68279ce89f016a69a41b27a2346d54d/httpx/_content.py#L144 As for "urllib.request" I know that the philosophy of Python being a "batteries included language" is going away, but having no way to make any http call without importing Python definitely has a lot of situations where it makes Python more difficult to use. Could it not always emit a warning that this library should not be used in a production environment? Much in the same way that Flask's default web server does. Damian (he/hm) On Sun, Feb 6, 2022 at 9:16 AM Victor Stinner wrote: > Hi, > > I propose to deprecate the urllib module in Python 3.11. It would emit > a DeprecationWarning which warn users, so users should consider better > alternatives like urllib3 or httpx: well known modules, better > maintained, more secure, support HTTP/2 (httpx), etc. > > I don't propose to schedule its removal. Let's discuss the removal in > 1 or 2 years. > > -- > > urllib has many abstraction to support a wide range of protocols with > "handlers": HTTP, HTTPS, FTP, "local file", proxy, HTTP > authentication, HTTP Cookie, etc. A simple HTTP request using Basic > Authentication requires 10-20 lines of code, whereas it should be a > single line. > > Users (me included) don't like urllib API which was too complicated > for common tasks. > > -- > > Unhappy users created multiple better alternatives to the stdlib urllib > module. > > In 2008, the "urllib3" module was created to provide an API designed > to be as simple as possible for the most common HTTP and HTTPS > requests. Example: > >req = http.request('GET', 'http://httpbin.org/robots.txt'). > > In 2011, the "requests" module based on urllib3 was created. > > In 2013, the "aiohttp" module based on asyncio was created. > > In 2015, new "httpx" module was created: > > req = httpx.get('https://www.example.org/') > > Not only httpx has a regular "synchronous" API (blocking function > calls), but it also has an asynchronous API! > > Sadly, while HTTP/3 is being developed, it seems like in this list, > httpx is the only HTTP client library supporting HTTP/2 currently :-( > > For HTTP/2, I also found the "httplib2" module. > > For HTTP/3, I found the "http3" and "aioquic" modules. > > -- > > Let's come back to urllib: > > * It's API is too complicated > * It doesn't support HTTP/2 nor HTTP/3 > * It's barely maintained: there are 121 open issues including 3 security > issues! > > The 3 open security issues: > > * bpo-33661 open 2018; > * bpo-36338 open in 2019; > * bpo-45795 open in 2021. > > Usually, it's bad when you refer to an open security issue by its > creation year :-( > > The urllib module has long history of security vulnerabilities. List > of *fixed* vulnerabilities: > > * 2011 (bpo-11662): > https://python-security.readthedocs.io/vuln/urllib-redirect.html > * 2017 (bpo-30119): > > https://python-security.readthedocs.io/vuln/urllib-ftp-stream-injection.html > * 2017 (bpo-30500): > https://python-security.readthedocs.io/vuln/urllib-connects-wrong-host.html > * 2019 (bpo-35907): > https://python-security.readthedocs.io/vuln/urllib-local-file-scheme.html > * 2019 (bpo-38826): > https://python-security.readthedocs.io/vuln/urllib-basic-auth-regex.html > * 2021 (bpo-42967): > > https://python-security.readthedocs.io/vuln/urllib-query-string-semicolon-separator.html > * 2021 (bpo-43075): > https://python-security.readthedocs.io/vuln/urllib-basic-auth-regex2.html > * 2021 (bpo-44022): > https://python-security.readthedocs.io/vuln/urllib-100-continue-loop.html > > urllib is a package made of 4 parts: > > * urllib.request for opening and reading URLs > * urllib.error containing the exceptions raised by urllib.request > * urllib.parse for parsing URLs > * urllib.robotparser for parsing robots.txt files > > I propose to deprecate all of them. Maybe the deprecation can be > different for each sub-module? > > Victor > -- > Night gathers, and now my watch begins. It shall not end until my death. > ___ > Python-Dev mailing list -- python-dev@python.org > To unsubscribe send an email to python-dev-le...@python.org > https://mail.python.org/mailman3/lists/python-dev.python.org/ > Message archived at > https://mail.python.org/archives/list/python-dev@python.org/message/EZ6O7MOPZ4GA75MKTDO7LAELKXUHK2QS/ > Code of Conduct: http://python.org/psf/codeofconduct/ > ___ Py
[Python-Dev] Re: It's now time to deprecate the stdlib urllib module
Strong -1 from me. urllib.request may not be "best practice", but it's still extremely useful for simple situations, and urllib.parse is useful for basic handling of URLs.Yes, the more complex aspects of urllib are better handled by external packages, but that's not sufficient argument for removing the package altogether. There are many situations where external dependencies are unsuitable. Also, there's quite a lot of usage of urllib in the stdlib itself - how would you propose to replace that? In addition, pip relies pretty heavily on urllib (parse and request), and pip has a bootstrapping issue, so using 3rd party libraries is non-trivial. Also, of pip's existing vendored dependencies, webencodings, urllib3, requests, pkg_resources, packaging, html5lib, distlib and cachecontrol all import urllib. So this would be *hugely* disruptive to the whole packaging ecosystem (which is under-resourced at the best of times, so this would put a lot of strain on us). In any case, why is this being proposed as a simple posting on python-dev? There's already PEP 594 for removals from the stdlib. If you have a case for removing urllib, I suggest you get it added to PEP 594, so it can be discussed and agreed properly, along with the other removals (none of which is remotely as controversial as urllib, so there's absolutely no doubt in my mind that this would need a PEP however it was proposed). Paul On Sun, 6 Feb 2022 at 14:15, Victor Stinner wrote: > > Hi, > > I propose to deprecate the urllib module in Python 3.11. It would emit > a DeprecationWarning which warn users, so users should consider better > alternatives like urllib3 or httpx: well known modules, better > maintained, more secure, support HTTP/2 (httpx), etc. > > I don't propose to schedule its removal. Let's discuss the removal in > 1 or 2 years. > > -- > > urllib has many abstraction to support a wide range of protocols with > "handlers": HTTP, HTTPS, FTP, "local file", proxy, HTTP > authentication, HTTP Cookie, etc. A simple HTTP request using Basic > Authentication requires 10-20 lines of code, whereas it should be a > single line. > > Users (me included) don't like urllib API which was too complicated > for common tasks. > > -- > > Unhappy users created multiple better alternatives to the stdlib urllib > module. > > In 2008, the "urllib3" module was created to provide an API designed > to be as simple as possible for the most common HTTP and HTTPS > requests. Example: > >req = http.request('GET', 'http://httpbin.org/robots.txt'). > > In 2011, the "requests" module based on urllib3 was created. > > In 2013, the "aiohttp" module based on asyncio was created. > > In 2015, new "httpx" module was created: > > req = httpx.get('https://www.example.org/') > > Not only httpx has a regular "synchronous" API (blocking function > calls), but it also has an asynchronous API! > > Sadly, while HTTP/3 is being developed, it seems like in this list, > httpx is the only HTTP client library supporting HTTP/2 currently :-( > > For HTTP/2, I also found the "httplib2" module. > > For HTTP/3, I found the "http3" and "aioquic" modules. > > -- > > Let's come back to urllib: > > * It's API is too complicated > * It doesn't support HTTP/2 nor HTTP/3 > * It's barely maintained: there are 121 open issues including 3 security > issues! > > The 3 open security issues: > > * bpo-33661 open 2018; > * bpo-36338 open in 2019; > * bpo-45795 open in 2021. > > Usually, it's bad when you refer to an open security issue by its > creation year :-( > > The urllib module has long history of security vulnerabilities. List > of *fixed* vulnerabilities: > > * 2011 (bpo-11662): > https://python-security.readthedocs.io/vuln/urllib-redirect.html > * 2017 (bpo-30119): > https://python-security.readthedocs.io/vuln/urllib-ftp-stream-injection.html > * 2017 (bpo-30500): > https://python-security.readthedocs.io/vuln/urllib-connects-wrong-host.html > * 2019 (bpo-35907): > https://python-security.readthedocs.io/vuln/urllib-local-file-scheme.html > * 2019 (bpo-38826): > https://python-security.readthedocs.io/vuln/urllib-basic-auth-regex.html > * 2021 (bpo-42967): > https://python-security.readthedocs.io/vuln/urllib-query-string-semicolon-separator.html > * 2021 (bpo-43075): > https://python-security.readthedocs.io/vuln/urllib-basic-auth-regex2.html > * 2021 (bpo-44022): > https://python-security.readthedocs.io/vuln/urllib-100-continue-loop.html > > urllib is a package made of 4 parts: > > * urllib.request for opening and reading URLs > * urllib.error containing the exceptions raised by urllib.request > * urllib.parse for parsing URLs > * urllib.robotparser for parsing robots.txt files > > I propose to deprecate all of them. Maybe the deprecation can be > different for each sub-module? > > Victor > -- > Night gathers, and now my watch begins. It shall not end until my death. > ___ > Python-Dev mailing list -- python-dev@python.org > To unsubscribe send
[Python-Dev] Re: It's now time to deprecate the stdlib urllib module
I am not an expert about pip, but it will be not a problem about installing the pip module once CPython removes urllib module from stdlib? Warm regards, Dong-hee 2022년 2월 6일 (일) 오후 11:13, Victor Stinner 님이 작성: > Hi, > > I propose to deprecate the urllib module in Python 3.11. It would emit > a DeprecationWarning which warn users, so users should consider better > alternatives like urllib3 or httpx: well known modules, better > maintained, more secure, support HTTP/2 (httpx), etc. > > I don't propose to schedule its removal. Let's discuss the removal in > 1 or 2 years. > > -- > > urllib has many abstraction to support a wide range of protocols with > "handlers": HTTP, HTTPS, FTP, "local file", proxy, HTTP > authentication, HTTP Cookie, etc. A simple HTTP request using Basic > Authentication requires 10-20 lines of code, whereas it should be a > single line. > > Users (me included) don't like urllib API which was too complicated > for common tasks. > > -- > > Unhappy users created multiple better alternatives to the stdlib urllib > module. > > In 2008, the "urllib3" module was created to provide an API designed > to be as simple as possible for the most common HTTP and HTTPS > requests. Example: > >req = http.request('GET', 'http://httpbin.org/robots.txt'). > > In 2011, the "requests" module based on urllib3 was created. > > In 2013, the "aiohttp" module based on asyncio was created. > > In 2015, new "httpx" module was created: > > req = httpx.get('https://www.example.org/') > > Not only httpx has a regular "synchronous" API (blocking function > calls), but it also has an asynchronous API! > > Sadly, while HTTP/3 is being developed, it seems like in this list, > httpx is the only HTTP client library supporting HTTP/2 currently :-( > > For HTTP/2, I also found the "httplib2" module. > > For HTTP/3, I found the "http3" and "aioquic" modules. > > -- > > Let's come back to urllib: > > * It's API is too complicated > * It doesn't support HTTP/2 nor HTTP/3 > * It's barely maintained: there are 121 open issues including 3 security > issues! > > The 3 open security issues: > > * bpo-33661 open 2018; > * bpo-36338 open in 2019; > * bpo-45795 open in 2021. > > Usually, it's bad when you refer to an open security issue by its > creation year :-( > > The urllib module has long history of security vulnerabilities. List > of *fixed* vulnerabilities: > > * 2011 (bpo-11662): > https://python-security.readthedocs.io/vuln/urllib-redirect.html > * 2017 (bpo-30119): > > https://python-security.readthedocs.io/vuln/urllib-ftp-stream-injection.html > * 2017 (bpo-30500): > https://python-security.readthedocs.io/vuln/urllib-connects-wrong-host.html > * 2019 (bpo-35907): > https://python-security.readthedocs.io/vuln/urllib-local-file-scheme.html > * 2019 (bpo-38826): > https://python-security.readthedocs.io/vuln/urllib-basic-auth-regex.html > * 2021 (bpo-42967): > > https://python-security.readthedocs.io/vuln/urllib-query-string-semicolon-separator.html > * 2021 (bpo-43075): > https://python-security.readthedocs.io/vuln/urllib-basic-auth-regex2.html > * 2021 (bpo-44022): > https://python-security.readthedocs.io/vuln/urllib-100-continue-loop.html > > urllib is a package made of 4 parts: > > * urllib.request for opening and reading URLs > * urllib.error containing the exceptions raised by urllib.request > * urllib.parse for parsing URLs > * urllib.robotparser for parsing robots.txt files > > I propose to deprecate all of them. Maybe the deprecation can be > different for each sub-module? > > Victor > -- > Night gathers, and now my watch begins. It shall not end until my death. > ___ > Python-Dev mailing list -- python-dev@python.org > To unsubscribe send an email to python-dev-le...@python.org > https://mail.python.org/mailman3/lists/python-dev.python.org/ > Message archived at > https://mail.python.org/archives/list/python-dev@python.org/message/EZ6O7MOPZ4GA75MKTDO7LAELKXUHK2QS/ > Code of Conduct: http://python.org/psf/codeofconduct/ > ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/E6GN2THYCNQ2Q3CGMSH7GRCDFOOFDDCQ/ Code of Conduct: http://python.org/psf/codeofconduct/