[issue36050] Why does http.client.HTTPResponse._safe_read use MAXAMOUNT

2021-07-29 Thread Bruce Merry


Bruce Merry  added the comment:

> Will you accept patches to fix this for 3.9? I'm not clear whether the "bug 
> fixes only" status of 3.9 allows for fixing performance regressions.

Never mind, I see your already answered this on bpo-42853 (as a no). Thanks for 
taking the time to answer my questions; I'll just have to skip Python 3.9 for 
this particular application and go straight to 3.10.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36050] Why does http.client.HTTPResponse._safe_read use MAXAMOUNT

2021-07-29 Thread Bruce Merry


Bruce Merry  added the comment:

> There is nothing to do here.

Will you accept patches to fix this for 3.9? I'm not clear whether the "bug 
fixes only" status of 3.9 allows for fixing performance regressions.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36050] Why does http.client.HTTPResponse._safe_read use MAXAMOUNT

2021-07-29 Thread Łukasz Langa

Łukasz Langa  added the comment:

Inadasan is right, this is still fixed in 3.10 and 3.11. The 3.9 revert is due 
to 3.9 supporting OpenSSL older than 1.1.1. 3.10+ requires OpenSSL 1.1.1+ per 
PEP 644.

There is nothing to do here.

--
resolution:  -> fixed
stage: needs patch -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36050] Why does http.client.HTTPResponse._safe_read use MAXAMOUNT

2021-07-28 Thread Gregory P. Smith


Change by Gregory P. Smith :


--
resolution: fixed -> 
stage: resolved -> needs patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36050] Why does http.client.HTTPResponse._safe_read use MAXAMOUNT

2021-07-28 Thread Inada Naoki


Inada Naoki  added the comment:

Note that it was reverted only in 3.9 branch.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36050] Why does http.client.HTTPResponse._safe_read use MAXAMOUNT

2021-07-28 Thread Bruce Merry


Bruce Merry  added the comment:

Re-opening because the patch to fix this has just been reverted due to 
bpo-42853.

--
status: closed -> open

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36050] Why does http.client.HTTPResponse._safe_read use MAXAMOUNT

2021-07-28 Thread Łukasz Langa

Łukasz Langa  added the comment:


New changeset 153365d864c411f6fb523efa752ccb3497d815ca by Inada Naoki in branch 
'3.9':
[3.9] bpo-42853: Fix http.client fails to download >2GiB data over TLS 
(GH-27405)
https://github.com/python/cpython/commit/153365d864c411f6fb523efa752ccb3497d815ca


--
nosy: +lukasz.langa

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36050] Why does http.client.HTTPResponse._safe_read use MAXAMOUNT

2019-04-06 Thread Inada Naoki


Inada Naoki  added the comment:


New changeset d6bf6f2d0c83f0c64ce86e7b9340278627798090 by Inada Naoki in branch 
'master':
bpo-36050: optimize HTTPResponse.read() (GH-12698)
https://github.com/python/cpython/commit/d6bf6f2d0c83f0c64ce86e7b9340278627798090


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36050] Why does http.client.HTTPResponse._safe_read use MAXAMOUNT

2019-04-06 Thread Inada Naoki


Change by Inada Naoki :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36050] Why does http.client.HTTPResponse._safe_read use MAXAMOUNT

2019-04-05 Thread Inada Naoki


Change by Inada Naoki :


--
keywords: +patch
pull_requests: +12623
stage:  -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36050] Why does http.client.HTTPResponse._safe_read use MAXAMOUNT

2019-04-05 Thread Inada Naoki


Inada Naoki  added the comment:

Additionally, _safe_read calls fp.read() multiple times to handle EINTR.
But EINTR is handled by socket module now (PEP 475).

Now the function can be very simple.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36050] Why does http.client.HTTPResponse._safe_read use MAXAMOUNT

2019-04-05 Thread Inada Naoki


Inada Naoki  added the comment:

issue1296004 is too old (512MB RAM machine!) and I can not confirm it.

But I think it was caused by inefficient realloc() in CRT.

See 
https://github.com/python/cpython/blob/6c52d76db8b1cb836c136bd6a1044e85bfe8241e/Lib/socket.py#L298-L303

_fileobject called socket.recv with remaining size.
Typically, socket can't return MBs at once.  So it cause:

1. Large (at most `amt`, some MBs) string (bytes) are allocated. (malloc)
2. recv is called.
3. _PyString_Resize() (realloc) is called with smaller bytes (typically ~128KiB)
4. amt -= received
5. if amt == 0: exit; goto 1.

This might stress malloc and realloc in CRT.  It caused fragmentation and 
MemoryError.

---

For now, almost everything is rewritten.

In case of _pyio, BufferedIOReader calls RawIOBase.read().  It copies from 
bytearray to bytes.  So call only malloc and free.  Stress for realloc will be 
reduced.

https://github.com/python/cpython/blob/50866e9ed3e4e0ebb60c20c3483a8df424c02722/Lib/_pyio.py#L586-L591

In case of C _io module, it is more efficient.  It allocate PyBytes once and 
calls SocketIO.read_into directly.  No temporary bytes objects are created.

https://github.com/python/cpython/blob/50866e9ed3e4e0ebb60c20c3483a8df424c02722/Modules/_io/bufferedio.c#L1632
https://github.com/python/cpython/blob/50866e9ed3e4e0ebb60c20c3483a8df424c02722/Modules/_io/bufferedio.c#L1658
https://github.com/python/cpython/blob/50866e9ed3e4e0ebb60c20c3483a8df424c02722/Modules/_io/bufferedio.c#L1470

--
type:  -> performance
versions: +Python 3.8 -Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36050] Why does http.client.HTTPResponse._safe_read use MAXAMOUNT

2019-04-05 Thread Inada Naoki


Change by Inada Naoki :


--
nosy: +inada.naoki

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36050] Why does http.client.HTTPResponse._safe_read use MAXAMOUNT

2019-02-24 Thread Martin Panter


Martin Panter  added the comment:

The 1 MiB limit was added for Issue 1296004; apparently some platforms were 
overallocating multiple buffers and running out of memory. I suspect the loop 
in "_safe_read" was inherited from Python 2, which has different kinds of file 
objects. The comments suggest it does partial reads.

But the Python 3 code calls "socket.makefile" with "buffering" mode enabled by 
default. I agree it should be safe to read the total length in one go.

--
nosy: +martin.panter

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36050] Why does http.client.HTTPResponse._safe_read use MAXAMOUNT

2019-02-20 Thread Bruce Merry


New submission from Bruce Merry :

While investigating poor HTTP read performance I discovered that reading all 
the data from a response with a content-length goes via _safe_read, which in 
turn reads in chunks of at most MAXAMOUNT (1MB) before stitching them together 
with b"".join. This can really hurt performance for responses larger than 
MAXAMOUNT, because
(a) the data has to be copied an additional time; and
(b) the join operation doesn't drop the GIL, so this limits multi-threaded 
scaling.

I'm struggling to see any advantage in doing this chunking - it's not saving 
memory either (in fact it is wasting it).

To give an idea of the performance impact, changing MAXAMOUNT to a very large 
value made a multithreaded test of mine go from 800MB/s to 2.5GB/s (which is 
limited by the network speed).

--
components: Library (Lib)
messages: 336081
nosy: bmerry
priority: normal
severity: normal
status: open
title: Why does http.client.HTTPResponse._safe_read use MAXAMOUNT
versions: Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com