[Python-Dev] Re: It's now time to deprecate the stdlib urllib module

2022-02-08 Thread Jim J. Jewett
> Why do you think the stdlib *must *provide an example implementation 
> for this specific scenario? Is there something unique to HTTP request
> handling that you feel is important to demonstrate?

*must* is too strong, but I would use a very strong *should*.

I think the stdlib should provide simple source-included examples of most 
things.  I think the case is even stronger when it is:

(1) a fairly simple protocol (such as version 1 of http was) -- QUIC wouldn't 
count for a simple demonstration.
(2) something new users are likely to find motivating.  Short of "here is a way 
to do IO", and maybe "write a simple game",  "get something from the web" is 
probably the most obvious case.
(3) something where bootstrapping might be an issue (network protocols, 
particularly web downloads).  Network access is not an always-available 
resource.  Even when it is available, there is sometimes a barrier between 
"available in python" and "I could read it on my phone, but can't get it open 
in python".
(4) something where a a beginner is likely to be overwhelmed by choices if we 
just say "use a 3rd party module".
(5) something with a backwards-compatibility story in the stdlib already. 

As a side note, are there concerns about urllib.robotparser being broken or 
obsolete, or was that part of the deprecation proposal just contagion from 
urllib.request?

-jJ
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/HF5V6SFWV4BZUAOJTSEBD6DSZWSJONAM/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: It's now time to deprecate the stdlib urllib module

2022-02-08 Thread Brett Cannon
On Mon, Feb 7, 2022 at 5:51 PM Jim J. Jewett  wrote:

> There are problems with urllib.  With hindsight, it would have been nice
> to do a few things differently.  But that doesn't make migrating away from
> it any easier.
>
> This thread has mentioned several "better" alternatives -- but with the
> exception of 3rd party Requests, the docs don't even mention them.
>

And as soon as httpx hits 1.0 I plan to update the docs to point at it. But
until that occurs I personally do not want to have a debate about whether
httpx's 0.N version number means it shouldn't be recommended.


>
> Saying "You can do better, but we won't tell you how" is pretty rude to
> beginners, and we should not do it.
>
> Delegating to the operating system may be sensible for a production
> system, and there is nothing wrong with saying so in the docs, and it would
> be great if we made that easy.  But it is absolutely not a reasonable
> replacement for a straightforward (possibly inefficient and non-scalable)
> implementation written in python that people can read and use for
> reference.  urllib shouldn't be deprecated until we have a better solution
> to *that* use case that is also in the stdlib.  (That might well be worth
> doing, but it should happen before the deprecation.)
>

Why do you think the stdlib *must *provide an example implementation for
this specific scenario? Is there something unique to HTTP request handling
that you feel is important to demonstrate?
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/XNYPBSXW7DIBQN5YLXCWUOBLIEBRMPEP/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: It's now time to deprecate the stdlib urllib module

2022-02-08 Thread Chris Angelico
On Wed, 9 Feb 2022 at 04:50, Christopher Barker  wrote:
> So my thoughts:
>
> Rather than deprecate urllib, we refactor it a bit (and maybe deprecate parts 
> of it), so that it:
>
> 1) contains the core building blocks: e.g. urllib.parse with which to build 
> "better" libraries,
>
> 2) make the "easy stuff easy" -- e.g. a basic http: request.
> - For instance, I'd like to see an API that's kind of "requests-lite"
>
> And much better docs explain when you should use it, and when you might want 
> to look for another library (even if it's the stdlib http.client)

This sounds like a decent plan. I'd like to add my voice to the appeal
to keep urllib.parse; in fact, of all the places where I've used
anything from urllib, only two of them are anything other than
urllib.parse. (One is an old script that I specifically wanted to be
as shareable as possible, so I restricted it to the stdlib; the other
catches urllib.error.URLError thrown by a third-party library.) If
there are security issues with urllib.request, I wouldn't shed many
tears about its deprecation.

A "requests-lite" module would certainly be handy, but it's hard to
judge how much wants to be in the stdlib and how much can be pushed
off to a pip-installable module:

> the first thing I do for beginners is to point them to requests, as it's 
> easier to use :-)

Exactly my thoughts :) But a very very simple HTTP/HTTPS GET request
endpoint would be a great bootstrapping aid. Consider: with nothing
but the stdlib, you could fetch a file from some server, unzip it
(zipfile module), and import it. For building dirt-simple install
scripts, this kind of thing is really REALLY handy, and I'd rather not
have to use plain TCP sockets to do it :)

ChrisA
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/VVGKL2TA3UXDD3RDASIYSGEOLMTKPOSH/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: It's now time to deprecate the stdlib urllib module

2022-02-08 Thread Christopher Barker
On Tue, Feb 8, 2022 at 1:31 AM Marc-Andre Lemburg  wrote:

> FWIW: I find this discussion a bit strange. Python's stdlib is
> supposed to provide basic tooling for a breadth of use cases,
> with emphasis on "basic" and "breadth".
>
> urllib is such a basic library and covers one of the main
> use cases for Python we have.


Exactly. However, it is also a bit of an "attractive nuisance". For
example, there is a lot of code in some of my major projects that use
urllib for more complex cases, where we'd be much better off with requests,
or ...

Yes, that's mainly the result of our team's atrocious lack of code review,
but this code was written by smart productive people.

The fact is that the stdlib is the first place folks look for stuff, and if
what you are looking for is there, then many people won't think: "maybe
there's a better, and well supported, package on PyPi for this"

So my thoughts:

Rather than deprecate urllib, we refactor it a bit (and maybe deprecate
parts of it), so that it:

1) contains the core building blocks: e.g. urllib.parse with which to build
"better" libraries,

2) make the "easy stuff easy" -- e.g. a basic http: request.
- For instance, I'd like to see an API that's kind of "requests-lite"

And much better docs explain when you should use it, and when you might
want to look for another library (even if it's the stdlib http.client)

I note that I don't see any discussion of that in urllib dics, whereas
http.client does have the suggestion front and center that you might want
to use requests.

Yes, the web moves fast, but it's also pretty backward compatible - folks
keep old browsers around for an astonishingly long time! So I'd think
cPython release Cycle shold be able to keep up with all but the very latest.

> make Python less attractive and less useful for beginners.

On this point, I'm not so sure -- the first thing I do for beginners is to
point them to requests, as it's easier to use :-) -- but see my point
above, that's why it would be good to put an easy-to-use-for-the-basics API
in the stdlib

-CHB


-- 
Christopher Barker, PhD (Chris)

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/FLU522SFBBSMMXHRDBXYYGXD3IQ2CD6K/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: It's now time to deprecate the stdlib urllib module

2022-02-08 Thread Marc-Andre Lemburg
... and there are also plenty examples out there of using http.server as
a quick HTTP server for trying out new things, testing and teaching.

FWIW: I find this discussion a bit strange. Python's stdlib is
supposed to provide basic tooling for a breadth of use cases,
with emphasis on "basic" and "breadth".

urllib is such a basic library and covers one of the main
use cases for Python we have. It would be pretty much beside
the point of the stdlib to remove such basic functionality
and make Python less attractive and less useful for beginners.

Anything more complex can be dealt with on PyPI, as it is already
happening.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Feb 08 2022)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/


::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/AKWL7QKOZJDODM3LIX4BBYZ4HMXZC3CZ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: It's now time to deprecate the stdlib urllib module

2022-02-07 Thread Jim J. Jewett
There are problems with urllib.  With hindsight, it would have been nice to do 
a few things differently.  But that doesn't make migrating away from it any 
easier.

This thread has mentioned several "better" alternatives -- but with the 
exception of 3rd party Requests, the docs don't even mention them.

Saying "You can do better, but we won't tell you how" is pretty rude to 
beginners, and we should not do it.

Delegating to the operating system may be sensible for a production system, and 
there is nothing wrong with saying so in the docs, and it would be great if we 
made that easy.  But it is absolutely not a reasonable replacement for a 
straightforward (possibly inefficient and non-scalable) implementation written 
in python that people can read and use for reference.  urllib shouldn't be 
deprecated until we have a better solution to *that* use case that is also in 
the stdlib.  (That might well be worth doing, but it should happen before the 
deprecation.)

-jJ
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/JI5CFS3WYXQEXKSEZH2ZTE3JJJ7AUAMW/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: It's now time to deprecate the stdlib urllib module

2022-02-07 Thread Brett Cannon
On Mon, Feb 7, 2022 at 4:56 AM Steve Dower  wrote:

> On 2/6/2022 4:44 PM, Christian Heimes wrote:
> > If I had the power and time, then I would replace urllib with a simpler,
> > reduced HTTP client that uses platform's HTTP library under the hood
> > (WinHTTP on Windows, NSURLSession (?) on macOS, Web API for Emscripten,
> > maybe curl on Linux/BSD). For non-trivial HTTP requests, httpx or
> > aiohttp are much better suited than urllib.
>
> I'm +1 on this, though I think it would have to be in place before the
> "two releases until removal" kicked in for urllib.request.
>

Yes, we definitely couldn't deprecate anything regarding downloading over
HTTP w/o having a replacement in place.

I am not even considering deprecating urllib.parse.


>
> The stdlib can't get by without at least the basic functionality of curl
> built in natively. But we can do this on most platforms without
> vendoring OpenSSL, which is a HUGE win. Then our default behaviour could
> correctly use proxies (including auto-config), CA certificate bundles,
> integrated authentication, and other OS features that are currently
> ignored by our core.
>

I also agree this is the best of the 2 options, although I would also
accept Christian's other option of a more targeted, tight,
standards-compliant solution if that would somehow lead to less maintenance
overhead. And when I say "less maintenance overhead," I really mean it: I
would question whether following redirects as an option is worth the
overhead in this scenario. I'm very much thinking of this from a
bootstrap/script/learning scenario and pushing people towards e.g. httpx
for anything fancier.


>
> Chances are we could keep simple urlopen() calls in place, and use the
> deprecation as a "potential change of behaviour" without necessarily
> having to break the API. I'm yet to come across a case where making a
> trivial urlopen() call _better_ would break things (the cases I've seen
> that would break are things like "using an OpenSSL environment variable
> to configure something that I wish had been automatic").
>

We could try to get fancy and only raise DeprecationWarning in cases where
things won't work to extend when we consider pushing people to the better
API.


>
> The nature of network/internet access is that we have to break things
> periodically anyway, because all the code that was written over the last
> 30+ years is eventually going to be found to be exploitable. I'd be
> quite happy to say "Python gives you what your OS gives you; update the
> OS for fixes".
>

Exactly. My guideline for this whole idea would be that if it doesn't make
sense in a beginner course that says to "download an HTML page and count
all the anchor tags," then it's too fancy for the stdlib. And that should
be enough to bootstrap installers which then get you httpx. Otherwise the
networking stack moves too fast (from a security POV) and requires unique
knowledge to get right that we have simply not kept up as much as we would
like. I think it's okay to admit it might be time to trim with part of the
stdlib down to something that we can manage easily (but we *cannot* drop
the ability to download something over HTTPS).
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/VP2WXOBWPGAX7UIH25DWRSYWFEDNINNU/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: It's now time to deprecate the stdlib urllib module

2022-02-07 Thread Stéfane Fermigier
My two cents:

1. I’ve grepped ("ag-ed" acually) through all my code bases, and as noted
by others, urllib.parse is used in many places. urllib.request never is, as
we've been using requests or httpx instead.

IMHO and for the context I'm using it (YMMV), urllib.parse is useful and
should be kept (could be replaced by third-party libs like furl or yarl,
but probably not a great idea).

2. Obviously bootstrapping pip or similar package management tools should
be a concern.

3. Overall, I think the days were "battery included" was a positive
argument are over. I'd rather make the standard library leaner, and
focussed on core language constructs. The only advantage that I can see is
that having stuff in the standard lib can reduce fragmentation, at least
initially, and ensure a very high level of quality and support, but at some
point in the future history has shown us that usually a better alternative
ends up emerging.

4. When deprecating and removing stuff from the stdlib and if there are no
dependency issues, it should be possible to more the components to their
own dedicated packages, maybe under an "extra" or "legacy" organisation.
The question of support stays open, though.

Regards,

  S.

On 6 Feb 2022 at 23:39:55, Gregory P. Smith  wrote:

>
>
> On Sun, Feb 6, 2022 at 9:13 AM Paul Moore  wrote:
>
>> On Sun, 6 Feb 2022 at 16:51, Christian Heimes 
>> wrote:
>>
>> > The urllib package -- and to some degree also the http package -- are
>> > constant source of security bugs. The code is old and the parsers for
>> > HTTP and URLs don't handle edge cases well. Python core lacks a true
>> > maintainer of the code. To be honest, we have to admit defeat and be up
>> > front that urllib is not up to the task for this decade. It was designed
>> > written during a more friendly, less scary time on the internet.
>> >
>> > If I had the power and time, then I would replace urllib with a simpler,
>> > reduced HTTP client that uses platform's HTTP library under the hood
>> > (WinHTTP on Windows, NSURLSession (?) on macOS, Web API for Emscripten,
>> > maybe curl on Linux/BSD). For non-trivial HTTP requests, httpx or
>> > aiohttp are much better suited than urllib.
>> >
>> > The second best option is to reduce the feature set of urllib to core
>> > HTTP (no ftp, proxy, HTTP auth) and a partial rewrite with stricter,
>> > more standard conform parsers for urls, query strings, and RFC 2822
>> > instead of RFC 822 for headers.
>>
>> I'd likely be fine with either of these two options. I'm not worried
>> about supporting "advanced" uses. But having no way of getting a file
>> from the internet without relying on 3rd party packages seems like a
>> huge gap in functionality for a modern language. And having to use a
>> 3rd party library to parse URLs will simply push more people to use
>> home-grown regexes rather than something safe and correct. Remember
>> that a lot of Python users are not professional software developers,
>> but scientists, data analysts, and occasional users, for whom the
>> existence of something in the stdlib is the *only* reason they have
>> any idea that URLs need specialised parsing in the first place.
>>
>> And while we all like to say 3rd party modules are great, the reality
>> is that they provide a genuine problem for many of these
>> non-specialist users - and I say that as a packaging specialist and
>> pip maintainer. The packaging ecosystem is *not* newcomer-friendly in
>> the way that core Python is, much as we're trying to improve that
>> situation.
>>
>> I've said it previously, but I'll reiterate - IMO this *must* have a
>> PEP, and that PEP must be clear that the intention is to *remove*
>> urllib, not simply to "deprecate and then think about it". That could
>> be making it part of PEP 594, or a separate PEP, but one way or
>> another it needs a PEP.
>>
>
> This would need to be it's own PEP.  urllib et. al. are used by virtually
> everybody.  They're highly used batteries.
>
> I'm -1 on deprecating it for that reason alone.
>
> Christian proposes that having a simpler scope rewrite of it might be
> nice, but I think disruption to the world and loss of trust in Python would
> be similar either way.
>
> -gps
>
>
>>
>> Paul
>> ___
>> Python-Dev mailing list -- python-dev@python.org
>> To unsubscribe send an email to python-dev-le...@python.org
>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-dev@python.org/message/KT6TGUBTLLETHES2OVVGZWSGYC5JCEKC/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/QMSFZBQJFWKFFE3LFQLQE2AT6WKMLPGL/
> Code

[Python-Dev] Re: It's now time to deprecate the stdlib urllib module

2022-02-07 Thread Steve Dower

On 2/6/2022 4:44 PM, Christian Heimes wrote:
If I had the power and time, then I would replace urllib with a simpler, 
reduced HTTP client that uses platform's HTTP library under the hood 
(WinHTTP on Windows, NSURLSession (?) on macOS, Web API for Emscripten, 
maybe curl on Linux/BSD). For non-trivial HTTP requests, httpx or 
aiohttp are much better suited than urllib.


I'm +1 on this, though I think it would have to be in place before the 
"two releases until removal" kicked in for urllib.request.


The stdlib can't get by without at least the basic functionality of curl 
built in natively. But we can do this on most platforms without 
vendoring OpenSSL, which is a HUGE win. Then our default behaviour could 
correctly use proxies (including auto-config), CA certificate bundles, 
integrated authentication, and other OS features that are currently 
ignored by our core.


Chances are we could keep simple urlopen() calls in place, and use the 
deprecation as a "potential change of behaviour" without necessarily 
having to break the API. I'm yet to come across a case where making a 
trivial urlopen() call _better_ would break things (the cases I've seen 
that would break are things like "using an OpenSSL environment variable 
to configure something that I wish had been automatic").


The nature of network/internet access is that we have to break things 
periodically anyway, because all the code that was written over the last 
30+ years is eventually going to be found to be exploitable. I'd be 
quite happy to say "Python gives you what your OS gives you; update the 
OS for fixes".


Cheers,
Steve
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/2P3RL7PAOZZFZ7PRGO6FJRMKR6MM2VXH/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: It's now time to deprecate the stdlib urllib module

2022-02-07 Thread Edwin Zimmerman

> Christian proposes that having a simpler scope rewrite of it might be nice, 
> but I think disruption to the world and loss of trust in Python would be 
> similar either way.

Please don't remove urllib.  There are mountains of code that rely on it.  A 
much better idea, IMO, would be to add a new modern API to http.client, where 
http functionality properly belongs.  Maybe a function signature like this: 
http.client.get(url, user_agent = None, basic_auth=(None, None),  
custom_headers=None).  That would one line to cover many use basic use cases, 
including user agent and basic auth.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/EV7R35OMQ7QWY7Y744FX7Y7VI7AO5CWX/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: It's now time to deprecate the stdlib urllib module

2022-02-06 Thread Gregory P. Smith
On Sun, Feb 6, 2022 at 9:13 AM Paul Moore  wrote:

> On Sun, 6 Feb 2022 at 16:51, Christian Heimes 
> wrote:
>
> > The urllib package -- and to some degree also the http package -- are
> > constant source of security bugs. The code is old and the parsers for
> > HTTP and URLs don't handle edge cases well. Python core lacks a true
> > maintainer of the code. To be honest, we have to admit defeat and be up
> > front that urllib is not up to the task for this decade. It was designed
> > written during a more friendly, less scary time on the internet.
> >
> > If I had the power and time, then I would replace urllib with a simpler,
> > reduced HTTP client that uses platform's HTTP library under the hood
> > (WinHTTP on Windows, NSURLSession (?) on macOS, Web API for Emscripten,
> > maybe curl on Linux/BSD). For non-trivial HTTP requests, httpx or
> > aiohttp are much better suited than urllib.
> >
> > The second best option is to reduce the feature set of urllib to core
> > HTTP (no ftp, proxy, HTTP auth) and a partial rewrite with stricter,
> > more standard conform parsers for urls, query strings, and RFC 2822
> > instead of RFC 822 for headers.
>
> I'd likely be fine with either of these two options. I'm not worried
> about supporting "advanced" uses. But having no way of getting a file
> from the internet without relying on 3rd party packages seems like a
> huge gap in functionality for a modern language. And having to use a
> 3rd party library to parse URLs will simply push more people to use
> home-grown regexes rather than something safe and correct. Remember
> that a lot of Python users are not professional software developers,
> but scientists, data analysts, and occasional users, for whom the
> existence of something in the stdlib is the *only* reason they have
> any idea that URLs need specialised parsing in the first place.
>
> And while we all like to say 3rd party modules are great, the reality
> is that they provide a genuine problem for many of these
> non-specialist users - and I say that as a packaging specialist and
> pip maintainer. The packaging ecosystem is *not* newcomer-friendly in
> the way that core Python is, much as we're trying to improve that
> situation.
>
> I've said it previously, but I'll reiterate - IMO this *must* have a
> PEP, and that PEP must be clear that the intention is to *remove*
> urllib, not simply to "deprecate and then think about it". That could
> be making it part of PEP 594, or a separate PEP, but one way or
> another it needs a PEP.
>

This would need to be it's own PEP.  urllib et. al. are used by virtually
everybody.  They're highly used batteries.

I'm -1 on deprecating it for that reason alone.

Christian proposes that having a simpler scope rewrite of it might be nice,
but I think disruption to the world and loss of trust in Python would be
similar either way.

-gps


>
> Paul
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/KT6TGUBTLLETHES2OVVGZWSGYC5JCEKC/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QMSFZBQJFWKFFE3LFQLQE2AT6WKMLPGL/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: It's now time to deprecate the stdlib urllib module

2022-02-06 Thread Ethan Furman

On 2/6/22 6:08 AM, Victor Stinner wrote:

> I propose to deprecate the urllib module in Python 3.11. It would emit
> a DeprecationWarning which warn users, so users should consider better
> alternatives like urllib3 or httpx: well known modules, better
> maintained, more secure, support HTTP/2 (httpx), etc.

Besides the needs of pip, round-up, etc., I think we should keep whatever parts of urllib, cgi, cgitb, http, etc., are 
necessary for basic serving/consuming of web pages for the same reason we ended up keeping the wave module -- it's fun 
and engaging for a younger audience.  Having one computer get information from another is pretty cool.


If we need to do some trimming and rearranging of the above modules, that's fine, but I think losing all the 
functionality would be a mistake.


--
~Ethan~
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/TGENXEKPFCIZUQD63ROCIK2WGAN3F7XL/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: It's now time to deprecate the stdlib urllib module

2022-02-06 Thread sethmichaellarson
Chiming in to say that whichever way this goes urllib3 would be okay. We can 
always vendor the small amount of http.client logic we actually depend on for 
HTTP connections. I do agree that the future of HTTP clearly lies outside the 
standard library, our team is already thinking about ways to integrate 
non-http.client HTTP implementations (like HTTP/2).

My feeling is that it will be difficult to remove urllib.parse, however 
urllib.request is much less depended on and more likely to be deprecated and 
removed.

Also clarifying that httplib2 doesn't support HTTP/2, the HTTP/2 package of 
interest is usually h2: https://pypi.org/project/h2. "http3" also doesn't 
implement HTTP/3 (bad name), this was one of the potential names for the HTTPX 
project.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/AW3JP6DHEAKME5FTFNRHV3EJMPJQEDME/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: It's now time to deprecate the stdlib urllib module

2022-02-06 Thread Victor Stinner
On Sun, Feb 6, 2022 at 3:35 PM Paul Moore  wrote:
> urllib.request may not be "best practice", but it's still extremely
> useful for simple situations, and urllib.parse is useful for basic
> handling of URLs.Yes, the more complex aspects of urllib are better
> handled by external packages, but that's not sufficient argument for
> removing the package altogether. There are many situations where
> external dependencies are unsuitable. Also, there's quite a lot of
> usage of urllib in the stdlib itself - how would you propose to
> replace that?
> (...)
> In addition, pip relies pretty heavily on urllib (parse and request),
> and pip has a bootstrapping issue, so using 3rd party libraries is
> non-trivial.

If a project like urllib3 uses it, urllib can be copied there and its
maintenance will continue there. Or maybe the maintenance can be moved
into a new project on PyPI like "legacy_urllib".

It's situation similar to the distutils deprecation: setuptools
decided to include a hidden copy of the distutils in its source, and
the distutils maintenance moved there. IMO it's a great move.
setuptools is a better place than Python to maintain this code:
setuptools release cycle is faster and is related to pip. Python
release cycle is slow and the distutils API was too big. Since the
distutils API is now hidden, setuptools can freely drop code and
changing APIs without affecting the public setuptools API.

I'm well aware that moving distutils into setuptools caused troubles.
IMO it is worth it and we have to go trough these issues once for a
better maintenance burden in the long term.


> In any case, why is this being proposed as a simple posting on
> python-dev? There's already PEP 594 for removals from the stdlib.

urllib is bigger than modules proposed for deprecation in PEP 594.
Also, I expect that deprecating urllib is more controversial.

Victor
-- 
Night gathers, and now my watch begins. It shall not end until my death.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/UR6BT5S2S4WGEI62MRWHCRAPZNTQXTVT/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: It's now time to deprecate the stdlib urllib module

2022-02-06 Thread Jelle Zijlstra
El dom, 6 feb 2022 a las 9:12, Paul Moore () escribió:

> On Sun, 6 Feb 2022 at 16:51, Christian Heimes 
> wrote:
>
> > The urllib package -- and to some degree also the http package -- are
> > constant source of security bugs. The code is old and the parsers for
> > HTTP and URLs don't handle edge cases well. Python core lacks a true
> > maintainer of the code. To be honest, we have to admit defeat and be up
> > front that urllib is not up to the task for this decade. It was designed
> > written during a more friendly, less scary time on the internet.
> >
> > If I had the power and time, then I would replace urllib with a simpler,
> > reduced HTTP client that uses platform's HTTP library under the hood
> > (WinHTTP on Windows, NSURLSession (?) on macOS, Web API for Emscripten,
> > maybe curl on Linux/BSD). For non-trivial HTTP requests, httpx or
> > aiohttp are much better suited than urllib.
> >
> > The second best option is to reduce the feature set of urllib to core
> > HTTP (no ftp, proxy, HTTP auth) and a partial rewrite with stricter,
> > more standard conform parsers for urls, query strings, and RFC 2822
> > instead of RFC 822 for headers.
>
> I'd likely be fine with either of these two options. I'm not worried
> about supporting "advanced" uses. But having no way of getting a file
> from the internet without relying on 3rd party packages seems like a
> huge gap in functionality for a modern language. And having to use a
> 3rd party library to parse URLs will simply push more people to use
> home-grown regexes rather than something safe and correct. Remember
> that a lot of Python users are not professional software developers,
> but scientists, data analysts, and occasional users, for whom the
> existence of something in the stdlib is the *only* reason they have
> any idea that URLs need specialised parsing in the first place.
>
> And while we all like to say 3rd party modules are great, the reality
> is that they provide a genuine problem for many of these
> non-specialist users - and I say that as a packaging specialist and
> pip maintainer. The packaging ecosystem is *not* newcomer-friendly in
> the way that core Python is, much as we're trying to improve that
> situation.
>
> I've said it previously, but I'll reiterate - IMO this *must* have a
> PEP, and that PEP must be clear that the intention is to *remove*
> urllib, not simply to "deprecate and then think about it". That could
> be making it part of PEP 594, or a separate PEP, but one way or
> another it needs a PEP.
>
PEP 594 is meant to be a set of uncontroversial removals of mostly unused
modules. Removing urllib is obviously not going to be uncontroversial, so
it should be discussed in a separate PEP.


>
> Paul
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/KT6TGUBTLLETHES2OVVGZWSGYC5JCEKC/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/HQ5J7BTB5WW77CQIQXX5FQKBOOIADBYR/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: It's now time to deprecate the stdlib urllib module

2022-02-06 Thread Paul Moore
On Sun, 6 Feb 2022 at 16:51, Christian Heimes  wrote:

> The urllib package -- and to some degree also the http package -- are
> constant source of security bugs. The code is old and the parsers for
> HTTP and URLs don't handle edge cases well. Python core lacks a true
> maintainer of the code. To be honest, we have to admit defeat and be up
> front that urllib is not up to the task for this decade. It was designed
> written during a more friendly, less scary time on the internet.
>
> If I had the power and time, then I would replace urllib with a simpler,
> reduced HTTP client that uses platform's HTTP library under the hood
> (WinHTTP on Windows, NSURLSession (?) on macOS, Web API for Emscripten,
> maybe curl on Linux/BSD). For non-trivial HTTP requests, httpx or
> aiohttp are much better suited than urllib.
>
> The second best option is to reduce the feature set of urllib to core
> HTTP (no ftp, proxy, HTTP auth) and a partial rewrite with stricter,
> more standard conform parsers for urls, query strings, and RFC 2822
> instead of RFC 822 for headers.

I'd likely be fine with either of these two options. I'm not worried
about supporting "advanced" uses. But having no way of getting a file
from the internet without relying on 3rd party packages seems like a
huge gap in functionality for a modern language. And having to use a
3rd party library to parse URLs will simply push more people to use
home-grown regexes rather than something safe and correct. Remember
that a lot of Python users are not professional software developers,
but scientists, data analysts, and occasional users, for whom the
existence of something in the stdlib is the *only* reason they have
any idea that URLs need specialised parsing in the first place.

And while we all like to say 3rd party modules are great, the reality
is that they provide a genuine problem for many of these
non-specialist users - and I say that as a packaging specialist and
pip maintainer. The packaging ecosystem is *not* newcomer-friendly in
the way that core Python is, much as we're trying to improve that
situation.

I've said it previously, but I'll reiterate - IMO this *must* have a
PEP, and that PEP must be clear that the intention is to *remove*
urllib, not simply to "deprecate and then think about it". That could
be making it part of PEP 594, or a separate PEP, but one way or
another it needs a PEP.

Paul
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/KT6TGUBTLLETHES2OVVGZWSGYC5JCEKC/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: It's now time to deprecate the stdlib urllib module

2022-02-06 Thread Paul Moore
On Sun, 6 Feb 2022 at 14:15, Victor Stinner  wrote:
> I propose to deprecate the urllib module in Python 3.11. It would emit
> a DeprecationWarning which warn users, so users should consider better
> alternatives like urllib3 or httpx: well known modules, better
> maintained, more secure, support HTTP/2 (httpx), etc.

Also, I'm -1 on deprecating as a way of saying we *might* remove the
module, but haven't decided yet. That isn't (IMO) what deprecation is
for, and it doesn't give users a clear message, as maybe they'll be
fine continuing to rely on urllib. The net result would likely to be
for people to simply become more inclined to ignore deprecation
warnings.

Conversely, if the idea is to deprecate, and then in a couple of years
say "well, it's been deprecated for a while now, so let's remove it"
then that seems to me to be a rather cynical way of deflecting
arguments, as we can say now "well, it's only deprecation", in spite
of the fact that the real intention is to remove.

Paul
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ICHMNBE7PMOHCGXLT4REP2HJZAGSOCHJ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: It's now time to deprecate the stdlib urllib module

2022-02-06 Thread Christian Heimes

On 06/02/2022 15.08, Victor Stinner wrote:

Hi,

I propose to deprecate the urllib module in Python 3.11. It would emit
a DeprecationWarning which warn users, so users should consider better
alternatives like urllib3 or httpx: well known modules, better
maintained, more secure, support HTTP/2 (httpx), etc.

I don't propose to schedule its removal. Let's discuss the removal in
1 or 2 years.

--

urllib has many abstraction to support a wide range of protocols with
"handlers": HTTP, HTTPS, FTP, "local file", proxy, HTTP
authentication, HTTP Cookie, etc. A simple HTTP request using Basic
Authentication requires 10-20 lines of code, whereas it should be a
single line.

Users (me included) don't like urllib API which was too complicated
for common tasks.

--



[...]



urllib is a package made of 4 parts:

* urllib.request for opening and reading URLs
* urllib.error containing the exceptions raised by urllib.request
* urllib.parse for parsing URLs
* urllib.robotparser for parsing robots.txt files

I propose to deprecate all of them. Maybe the deprecation can be
different for each sub-module?


Thanks for bringing this topic forward, Victor!

Disclaimer: I proposed the removal of urllib today in Python core's 
internal chat.


The urllib package -- and to some degree also the http package -- are 
constant source of security bugs. The code is old and the parsers for 
HTTP and URLs don't handle edge cases well. Python core lacks a true 
maintainer of the code. To be honest, we have to admit defeat and be up 
front that urllib is not up to the task for this decade. It was designed 
written during a more friendly, less scary time on the internet.


If I had the power and time, then I would replace urllib with a simpler, 
reduced HTTP client that uses platform's HTTP library under the hood 
(WinHTTP on Windows, NSURLSession (?) on macOS, Web API for Emscripten, 
maybe curl on Linux/BSD). For non-trivial HTTP requests, httpx or 
aiohttp are much better suited than urllib.


The second best option is to reduce the feature set of urllib to core 
HTTP (no ftp, proxy, HTTP auth) and a partial rewrite with stricter, 
more standard conform parsers for urls, query strings, and RFC 2822 
instead of RFC 822 for headers.


Christian


___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/WYVETVHMGRS4CI47GTFY6W7B43YLSJH2/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: It's now time to deprecate the stdlib urllib module

2022-02-06 Thread Damian Shaw
That was just one example, here are others in the pip code base that
urllib.request is used for more than the pathname functions, they are all
vendored or tests but would still be disruptive to remove:

https://github.com/pypa/pip/blob/main/tests/lib/local_repos.py
https://github.com/pypa/pip/blob/main/src/pip/_vendor/webencodings/mklabels.py
https://github.com/pypa/pip/blob/main/src/pip/_vendor/requests/compat.py
https://github.com/pypa/pip/blob/main/src/pip/_vendor/distlib/compat.py

In particular the vendored library, and replacement you suggest, "requests"
is very dependent on the proxy functions such as "getproxies" that are
currently in urllib.requests. More than once I've had to go down the rabbit
hole of seeing where those functions get that info for each platform.

Damian (he/him)


On Sun, Feb 6, 2022 at 11:10 AM Victor Stinner  wrote:

> On Sun, Feb 6, 2022 at 3:48 PM Damian Shaw 
> wrote:
> >
> > Pip vendors requests for network calls:
> https://github.com/pypa/pip/tree/main/src/pip/_vendor/requests
> >
> > But still does depend on functions from urllib.parse and urllib.request
> in many places:
> https://github.com/pypa/pip/blob/main/src/pip/_internal/utils/urls.py
>
> Aha, it doesn't use urllib.request to open a HTTP connection, it only
> uses pathname2url() and url2pathname() functions of urllib.request.
> Maybe we can keep these functions. I'm not sure why they don't belong
> to urllib.parse.
>
> If urllib.parse is widely used, maybe we can keep this module.
>
> Victor
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ACA7AU4W6XB35PA6O4IYBPQSQD3HFLFS/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: It's now time to deprecate the stdlib urllib module

2022-02-06 Thread Senthil Kumaran
On Sun, Feb 06, 2022 at 03:08:40PM +0100, Victor Stinner wrote:

> I propose to deprecate the urllib module in Python 3.11. It would emit
> a DeprecationWarning which warn users, so users should consider better
> alternatives like urllib3 or httpx: well known modules, better
> maintained, more secure, support HTTP/2 (httpx), etc.
> 
> I don't propose to schedule its removal. Let's discuss the removal in
> 1 or 2 years.

I am not certain if we can deprecate/remove the whole 'urllib' module without 
any good plan for replacement 
of its facilities within the stdlib. There is heavy usage of urllib.parse in 
multiple projects (including in urllib3), 
and parse is semi-maintained. 

> Let's come back to urllib:

> * It's API is too complicated
> * It doesn't support HTTP/2 nor HTTP/3
> * It's barely maintained: there are 121 open issues including 3 security 
> issues!

I agree with all of these.
I think that removing the old cruft code, might lead to us to closing a number 
of open issues.

>  The 3 open security issues:

Just because if something marked 'security' doesn't make it actionable too. 
For instance the last one asks for urllib to maintain client state to be safe 
against a scenario, which it never did.

I don't think it is time to deprecate the urllib module. It will be too 
disruptive IMO. SO, -1.

Right now, I don't have a solution.  
My suggestion will be we close old bugs, and remove old code (aka maintain a 
bit, and it falls on me too).
Then we can probably chart out a deprecation / replacement path in a 
non-disruptive manner.


-- 
Senthil
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ORQEJXJTZDYYV53MHKXTJ3Q6W72AUSGA/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: It's now time to deprecate the stdlib urllib module

2022-02-06 Thread Victor Stinner
On Sun, Feb 6, 2022 at 3:48 PM Damian Shaw  wrote:
>
> Pip vendors requests for network calls: 
> https://github.com/pypa/pip/tree/main/src/pip/_vendor/requests
>
> But still does depend on functions from urllib.parse and urllib.request in 
> many places: 
> https://github.com/pypa/pip/blob/main/src/pip/_internal/utils/urls.py

Aha, it doesn't use urllib.request to open a HTTP connection, it only
uses pathname2url() and url2pathname() functions of urllib.request.
Maybe we can keep these functions. I'm not sure why they don't belong
to urllib.parse.

If urllib.parse is widely used, maybe we can keep this module.

Victor
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/PDFGPDGESBLSBHVLINCPAFEOHXQWFIRI/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: It's now time to deprecate the stdlib urllib module

2022-02-06 Thread Damian Shaw
Pip vendors requests for network calls:
https://github.com/pypa/pip/tree/main/src/pip/_vendor/requests

But still does depend on functions from urllib.parse and urllib.request in
many places:
https://github.com/pypa/pip/blob/main/src/pip/_internal/utils/urls.py

Damian (he/him)

On Sun, Feb 6, 2022 at 9:36 AM Dong-hee Na  wrote:

> I am not an expert about pip,
> but it will be not a problem about installing the pip module once CPython
> removes urllib module from stdlib?
>
> Warm regards,
> Dong-hee
>
> 2022년 2월 6일 (일) 오후 11:13, Victor Stinner 님이 작성:
>
>> Hi,
>>
>> I propose to deprecate the urllib module in Python 3.11. It would emit
>> a DeprecationWarning which warn users, so users should consider better
>> alternatives like urllib3 or httpx: well known modules, better
>> maintained, more secure, support HTTP/2 (httpx), etc.
>>
>> I don't propose to schedule its removal. Let's discuss the removal in
>> 1 or 2 years.
>>
>> --
>>
>> urllib has many abstraction to support a wide range of protocols with
>> "handlers": HTTP, HTTPS, FTP, "local file", proxy, HTTP
>> authentication, HTTP Cookie, etc. A simple HTTP request using Basic
>> Authentication requires 10-20 lines of code, whereas it should be a
>> single line.
>>
>> Users (me included) don't like urllib API which was too complicated
>> for common tasks.
>>
>> --
>>
>> Unhappy users created multiple better alternatives to the stdlib urllib
>> module.
>>
>> In 2008, the "urllib3" module was created to provide an API designed
>> to be as simple as possible for the most common HTTP and HTTPS
>> requests. Example:
>>
>>req = http.request('GET', 'http://httpbin.org/robots.txt').
>>
>> In 2011, the "requests" module based on urllib3 was created.
>>
>> In 2013, the "aiohttp" module based on asyncio was created.
>>
>> In 2015, new "httpx" module was created:
>>
>> req = httpx.get('https://www.example.org/')
>>
>> Not only httpx has a regular "synchronous" API (blocking function
>> calls), but it also has an asynchronous API!
>>
>> Sadly, while HTTP/3 is being developed, it seems like in this list,
>> httpx is the only HTTP client library supporting HTTP/2 currently :-(
>>
>> For HTTP/2, I also found the "httplib2" module.
>>
>> For HTTP/3, I found the "http3" and "aioquic" modules.
>>
>> --
>>
>> Let's come back to urllib:
>>
>> * It's API is too complicated
>> * It doesn't support HTTP/2 nor HTTP/3
>> * It's barely maintained: there are 121 open issues including 3 security
>> issues!
>>
>> The 3 open security issues:
>>
>> * bpo-33661 open 2018;
>> * bpo-36338 open in 2019;
>> * bpo-45795 open in 2021.
>>
>> Usually, it's bad when you refer to an open security issue by its
>> creation year :-(
>>
>> The urllib module has long history of security vulnerabilities. List
>> of *fixed* vulnerabilities:
>>
>> * 2011 (bpo-11662):
>> https://python-security.readthedocs.io/vuln/urllib-redirect.html
>> * 2017 (bpo-30119):
>>
>> https://python-security.readthedocs.io/vuln/urllib-ftp-stream-injection.html
>> * 2017 (bpo-30500):
>>
>> https://python-security.readthedocs.io/vuln/urllib-connects-wrong-host.html
>> * 2019 (bpo-35907):
>> https://python-security.readthedocs.io/vuln/urllib-local-file-scheme.html
>> * 2019 (bpo-38826):
>> https://python-security.readthedocs.io/vuln/urllib-basic-auth-regex.html
>> * 2021 (bpo-42967):
>>
>> https://python-security.readthedocs.io/vuln/urllib-query-string-semicolon-separator.html
>> * 2021 (bpo-43075):
>> https://python-security.readthedocs.io/vuln/urllib-basic-auth-regex2.html
>> * 2021 (bpo-44022):
>> https://python-security.readthedocs.io/vuln/urllib-100-continue-loop.html
>>
>> urllib is a package made of 4 parts:
>>
>> * urllib.request for opening and reading URLs
>> * urllib.error containing the exceptions raised by urllib.request
>> * urllib.parse for parsing URLs
>> * urllib.robotparser for parsing robots.txt files
>>
>> I propose to deprecate all of them. Maybe the deprecation can be
>> different for each sub-module?
>>
>> Victor
>> --
>> Night gathers, and now my watch begins. It shall not end until my death.
>> ___
>> Python-Dev mailing list -- python-dev@python.org
>> To unsubscribe send an email to python-dev-le...@python.org
>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-dev@python.org/message/EZ6O7MOPZ4GA75MKTDO7LAELKXUHK2QS/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/E6GN2THYCNQ2Q3CGMSH7GRCDFOOFDDCQ/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@pytho

[Python-Dev] Re: It's now time to deprecate the stdlib urllib module

2022-02-06 Thread Damian Shaw
Speaking from anecdotal experience, "urllib.parse" is a very popular and
highly depended on module, I would be shocked if removing it wouldn't be
very disruptive.

In fact a quick search of the replacement modules you mention see that they
all rely it on it, here is an example from each:
* requests:
https://github.com/psf/requests/blob/99b3b492418d0751ca960178d274f89805095e4c/requests/sessions.py#L121
* aiohttp:
https://github.com/aio-libs/aiohttp/blob/7d78fd01dbe983d119141d7f2775aefd42494f99/aiohttp/formdata.py#L129
* httpx:
https://github.com/encode/httpx/blob/b7dc0c3df68279ce89f016a69a41b27a2346d54d/httpx/_content.py#L144

As for "urllib.request" I know that the philosophy of Python being a
"batteries included language" is going away, but having no way to make any
http call without importing Python definitely has a lot of situations where
it makes Python more difficult to use. Could it not always emit a warning
that this library should not be used in a production environment? Much in
the same way that Flask's default web server does.

Damian (he/hm)


On Sun, Feb 6, 2022 at 9:16 AM Victor Stinner  wrote:

> Hi,
>
> I propose to deprecate the urllib module in Python 3.11. It would emit
> a DeprecationWarning which warn users, so users should consider better
> alternatives like urllib3 or httpx: well known modules, better
> maintained, more secure, support HTTP/2 (httpx), etc.
>
> I don't propose to schedule its removal. Let's discuss the removal in
> 1 or 2 years.
>
> --
>
> urllib has many abstraction to support a wide range of protocols with
> "handlers": HTTP, HTTPS, FTP, "local file", proxy, HTTP
> authentication, HTTP Cookie, etc. A simple HTTP request using Basic
> Authentication requires 10-20 lines of code, whereas it should be a
> single line.
>
> Users (me included) don't like urllib API which was too complicated
> for common tasks.
>
> --
>
> Unhappy users created multiple better alternatives to the stdlib urllib
> module.
>
> In 2008, the "urllib3" module was created to provide an API designed
> to be as simple as possible for the most common HTTP and HTTPS
> requests. Example:
>
>req = http.request('GET', 'http://httpbin.org/robots.txt').
>
> In 2011, the "requests" module based on urllib3 was created.
>
> In 2013, the "aiohttp" module based on asyncio was created.
>
> In 2015, new "httpx" module was created:
>
> req = httpx.get('https://www.example.org/')
>
> Not only httpx has a regular "synchronous" API (blocking function
> calls), but it also has an asynchronous API!
>
> Sadly, while HTTP/3 is being developed, it seems like in this list,
> httpx is the only HTTP client library supporting HTTP/2 currently :-(
>
> For HTTP/2, I also found the "httplib2" module.
>
> For HTTP/3, I found the "http3" and "aioquic" modules.
>
> --
>
> Let's come back to urllib:
>
> * It's API is too complicated
> * It doesn't support HTTP/2 nor HTTP/3
> * It's barely maintained: there are 121 open issues including 3 security
> issues!
>
> The 3 open security issues:
>
> * bpo-33661 open 2018;
> * bpo-36338 open in 2019;
> * bpo-45795 open in 2021.
>
> Usually, it's bad when you refer to an open security issue by its
> creation year :-(
>
> The urllib module has long history of security vulnerabilities. List
> of *fixed* vulnerabilities:
>
> * 2011 (bpo-11662):
> https://python-security.readthedocs.io/vuln/urllib-redirect.html
> * 2017 (bpo-30119):
>
> https://python-security.readthedocs.io/vuln/urllib-ftp-stream-injection.html
> * 2017 (bpo-30500):
> https://python-security.readthedocs.io/vuln/urllib-connects-wrong-host.html
> * 2019 (bpo-35907):
> https://python-security.readthedocs.io/vuln/urllib-local-file-scheme.html
> * 2019 (bpo-38826):
> https://python-security.readthedocs.io/vuln/urllib-basic-auth-regex.html
> * 2021 (bpo-42967):
>
> https://python-security.readthedocs.io/vuln/urllib-query-string-semicolon-separator.html
> * 2021 (bpo-43075):
> https://python-security.readthedocs.io/vuln/urllib-basic-auth-regex2.html
> * 2021 (bpo-44022):
> https://python-security.readthedocs.io/vuln/urllib-100-continue-loop.html
>
> urllib is a package made of 4 parts:
>
> * urllib.request for opening and reading URLs
> * urllib.error containing the exceptions raised by urllib.request
> * urllib.parse for parsing URLs
> * urllib.robotparser for parsing robots.txt files
>
> I propose to deprecate all of them. Maybe the deprecation can be
> different for each sub-module?
>
> Victor
> --
> Night gathers, and now my watch begins. It shall not end until my death.
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/EZ6O7MOPZ4GA75MKTDO7LAELKXUHK2QS/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Py

[Python-Dev] Re: It's now time to deprecate the stdlib urllib module

2022-02-06 Thread Paul Moore
Strong -1 from me.

urllib.request may not be "best practice", but it's still extremely
useful for simple situations, and urllib.parse is useful for basic
handling of URLs.Yes, the more complex aspects of urllib are better
handled by external packages, but that's not sufficient argument for
removing the package altogether. There are many situations where
external dependencies are unsuitable. Also, there's quite a lot of
usage of urllib in the stdlib itself - how would you propose to
replace that?

In addition, pip relies pretty heavily on urllib (parse and request),
and pip has a bootstrapping issue, so using 3rd party libraries is
non-trivial. Also, of pip's existing vendored dependencies,
webencodings, urllib3, requests, pkg_resources, packaging, html5lib,
distlib and cachecontrol all import urllib. So this would be *hugely*
disruptive to the whole packaging ecosystem (which is under-resourced
at the best of times, so this would put a lot of strain on us).

In any case, why is this being proposed as a simple posting on
python-dev? There's already PEP 594 for removals from the stdlib. If
you have a case for removing urllib, I suggest you get it added to PEP
594, so it can be discussed and agreed properly, along with the other
removals (none of which is remotely as controversial as urllib, so
there's absolutely no doubt in my mind that this would need a PEP
however it was proposed).

Paul

On Sun, 6 Feb 2022 at 14:15, Victor Stinner  wrote:
>
> Hi,
>
> I propose to deprecate the urllib module in Python 3.11. It would emit
> a DeprecationWarning which warn users, so users should consider better
> alternatives like urllib3 or httpx: well known modules, better
> maintained, more secure, support HTTP/2 (httpx), etc.
>
> I don't propose to schedule its removal. Let's discuss the removal in
> 1 or 2 years.
>
> --
>
> urllib has many abstraction to support a wide range of protocols with
> "handlers": HTTP, HTTPS, FTP, "local file", proxy, HTTP
> authentication, HTTP Cookie, etc. A simple HTTP request using Basic
> Authentication requires 10-20 lines of code, whereas it should be a
> single line.
>
> Users (me included) don't like urllib API which was too complicated
> for common tasks.
>
> --
>
> Unhappy users created multiple better alternatives to the stdlib urllib 
> module.
>
> In 2008, the "urllib3" module was created to provide an API designed
> to be as simple as possible for the most common HTTP and HTTPS
> requests. Example:
>
>req = http.request('GET', 'http://httpbin.org/robots.txt').
>
> In 2011, the "requests" module based on urllib3 was created.
>
> In 2013, the "aiohttp" module based on asyncio was created.
>
> In 2015, new "httpx" module was created:
>
> req = httpx.get('https://www.example.org/')
>
> Not only httpx has a regular "synchronous" API (blocking function
> calls), but it also has an asynchronous API!
>
> Sadly, while HTTP/3 is being developed, it seems like in this list,
> httpx is the only HTTP client library supporting HTTP/2 currently :-(
>
> For HTTP/2, I also found the "httplib2" module.
>
> For HTTP/3, I found the "http3" and "aioquic" modules.
>
> --
>
> Let's come back to urllib:
>
> * It's API is too complicated
> * It doesn't support HTTP/2 nor HTTP/3
> * It's barely maintained: there are 121 open issues including 3 security 
> issues!
>
> The 3 open security issues:
>
> * bpo-33661 open 2018;
> * bpo-36338 open in 2019;
> * bpo-45795 open in 2021.
>
> Usually, it's bad when you refer to an open security issue by its
> creation year :-(
>
> The urllib module has long history of security vulnerabilities. List
> of *fixed* vulnerabilities:
>
> * 2011 (bpo-11662):
> https://python-security.readthedocs.io/vuln/urllib-redirect.html
> * 2017 (bpo-30119):
> https://python-security.readthedocs.io/vuln/urllib-ftp-stream-injection.html
> * 2017 (bpo-30500):
> https://python-security.readthedocs.io/vuln/urllib-connects-wrong-host.html
> * 2019 (bpo-35907):
> https://python-security.readthedocs.io/vuln/urllib-local-file-scheme.html
> * 2019 (bpo-38826):
> https://python-security.readthedocs.io/vuln/urllib-basic-auth-regex.html
> * 2021 (bpo-42967):
> https://python-security.readthedocs.io/vuln/urllib-query-string-semicolon-separator.html
> * 2021 (bpo-43075):
> https://python-security.readthedocs.io/vuln/urllib-basic-auth-regex2.html
> * 2021 (bpo-44022):
> https://python-security.readthedocs.io/vuln/urllib-100-continue-loop.html
>
> urllib is a package made of 4 parts:
>
> * urllib.request for opening and reading URLs
> * urllib.error containing the exceptions raised by urllib.request
> * urllib.parse for parsing URLs
> * urllib.robotparser for parsing robots.txt files
>
> I propose to deprecate all of them. Maybe the deprecation can be
> different for each sub-module?
>
> Victor
> --
> Night gathers, and now my watch begins. It shall not end until my death.
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send

[Python-Dev] Re: It's now time to deprecate the stdlib urllib module

2022-02-06 Thread Dong-hee Na
I am not an expert about pip,
but it will be not a problem about installing the pip module once CPython
removes urllib module from stdlib?

Warm regards,
Dong-hee

2022년 2월 6일 (일) 오후 11:13, Victor Stinner 님이 작성:

> Hi,
>
> I propose to deprecate the urllib module in Python 3.11. It would emit
> a DeprecationWarning which warn users, so users should consider better
> alternatives like urllib3 or httpx: well known modules, better
> maintained, more secure, support HTTP/2 (httpx), etc.
>
> I don't propose to schedule its removal. Let's discuss the removal in
> 1 or 2 years.
>
> --
>
> urllib has many abstraction to support a wide range of protocols with
> "handlers": HTTP, HTTPS, FTP, "local file", proxy, HTTP
> authentication, HTTP Cookie, etc. A simple HTTP request using Basic
> Authentication requires 10-20 lines of code, whereas it should be a
> single line.
>
> Users (me included) don't like urllib API which was too complicated
> for common tasks.
>
> --
>
> Unhappy users created multiple better alternatives to the stdlib urllib
> module.
>
> In 2008, the "urllib3" module was created to provide an API designed
> to be as simple as possible for the most common HTTP and HTTPS
> requests. Example:
>
>req = http.request('GET', 'http://httpbin.org/robots.txt').
>
> In 2011, the "requests" module based on urllib3 was created.
>
> In 2013, the "aiohttp" module based on asyncio was created.
>
> In 2015, new "httpx" module was created:
>
> req = httpx.get('https://www.example.org/')
>
> Not only httpx has a regular "synchronous" API (blocking function
> calls), but it also has an asynchronous API!
>
> Sadly, while HTTP/3 is being developed, it seems like in this list,
> httpx is the only HTTP client library supporting HTTP/2 currently :-(
>
> For HTTP/2, I also found the "httplib2" module.
>
> For HTTP/3, I found the "http3" and "aioquic" modules.
>
> --
>
> Let's come back to urllib:
>
> * It's API is too complicated
> * It doesn't support HTTP/2 nor HTTP/3
> * It's barely maintained: there are 121 open issues including 3 security
> issues!
>
> The 3 open security issues:
>
> * bpo-33661 open 2018;
> * bpo-36338 open in 2019;
> * bpo-45795 open in 2021.
>
> Usually, it's bad when you refer to an open security issue by its
> creation year :-(
>
> The urllib module has long history of security vulnerabilities. List
> of *fixed* vulnerabilities:
>
> * 2011 (bpo-11662):
> https://python-security.readthedocs.io/vuln/urllib-redirect.html
> * 2017 (bpo-30119):
>
> https://python-security.readthedocs.io/vuln/urllib-ftp-stream-injection.html
> * 2017 (bpo-30500):
> https://python-security.readthedocs.io/vuln/urllib-connects-wrong-host.html
> * 2019 (bpo-35907):
> https://python-security.readthedocs.io/vuln/urllib-local-file-scheme.html
> * 2019 (bpo-38826):
> https://python-security.readthedocs.io/vuln/urllib-basic-auth-regex.html
> * 2021 (bpo-42967):
>
> https://python-security.readthedocs.io/vuln/urllib-query-string-semicolon-separator.html
> * 2021 (bpo-43075):
> https://python-security.readthedocs.io/vuln/urllib-basic-auth-regex2.html
> * 2021 (bpo-44022):
> https://python-security.readthedocs.io/vuln/urllib-100-continue-loop.html
>
> urllib is a package made of 4 parts:
>
> * urllib.request for opening and reading URLs
> * urllib.error containing the exceptions raised by urllib.request
> * urllib.parse for parsing URLs
> * urllib.robotparser for parsing robots.txt files
>
> I propose to deprecate all of them. Maybe the deprecation can be
> different for each sub-module?
>
> Victor
> --
> Night gathers, and now my watch begins. It shall not end until my death.
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/EZ6O7MOPZ4GA75MKTDO7LAELKXUHK2QS/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/E6GN2THYCNQ2Q3CGMSH7GRCDFOOFDDCQ/
Code of Conduct: http://python.org/psf/codeofconduct/