[issue19009] Enhance HTTPResponse.readline() performance

2014-03-19 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 49017c391564 by Kristján Valur Jónsson in branch 'default':
Issue #19009
http://hg.python.org/cpython/rev/49017c391564

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19009
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19009] Enhance HTTPResponse.readline() performance

2014-03-19 Thread Kristján Valur Jónsson

Kristján Valur Jónsson added the comment:

Sure.  If there are issues we'll just reopen.  Closing.

--
resolution:  - fixed
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19009
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19009] Enhance HTTPResponse.readline() performance

2014-03-19 Thread Antoine Pitrou

Antoine Pitrou added the comment:

If this is committed, should the issue be closed?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19009
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19009] Enhance HTTPResponse.readline() performance

2014-03-18 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

LGTM.

--
stage: patch review - commit review
versions: +Python 3.5 -Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19009
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19009] Enhance HTTPResponse.readline() performance

2013-09-27 Thread Kristján Valur Jónsson

Kristján Valur Jónsson added the comment:

A new patch set, in response to Serhiy's comments on Rietveld.

--
Added file: http://bugs.python.org/file31884/httpresponse.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19009
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19009] Enhance HTTPResponse.readline() performance

2013-09-26 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Kristján, did you notice my comments on Rietveld?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19009
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19009] Enhance HTTPResponse.readline() performance

2013-09-26 Thread Kristján Valur Jónsson

Kristján Valur Jónsson added the comment:

Ah no actually.  Thanks . I'll respond soon.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19009
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19009] Enhance HTTPResponse.readline() performance

2013-09-21 Thread Kristján Valur Jónsson

Kristján Valur Jónsson added the comment:

Ok, Here is a new patch.
We now inherit from BufferedIOBase.  We must implement read(amt) ourselves, 
since the base class does not do it.

I was wrong about the final chunk, it is in fact 0\r\n.  A final \r\n is then 
added to signal the end of the optional trailers.

Added a bunch of tests for chunked encoding, plus tests to verify http 
synchronization for keepalive connections.

--
Added file: http://bugs.python.org/file31836/httpresponse.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19009
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19009] Enhance HTTPResponse.readline() performance

2013-09-20 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

After adding read1() and peek() what stop us from inheriting HTTPResponse from 
BufferedIOBase?

I suggest split _read1_or_peek_chunked() by two parts. First part calculates n 
bounded by chunk_left (it can read the next chunk size and close a connection 
if needed). Perhaps it can be reused in _readall_chunked() and 
_readinto_chunked(() too. Second part which is  controlled by boolean parameter 
should be moved out in read1() and peek().

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19009
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19009] Enhance HTTPResponse.readline() performance

2013-09-20 Thread Kristján Valur Jónsson

Kristján Valur Jónsson added the comment:

Ok, I have refactored this a bit.
A separate new function now takes care of the reading of chunk-header and tail. 
 This simplifies the other functions.

I'm not sure what you mean by inheriting from the buffered class.  Do we gain 
anything by doing that, would it change the code? or would it merely be for the 
benefit of ABC?

Note that the chunk protocol was wrong and I fixed the unittests:  The final 
chunk is a _valid_ zero length chunk, i.e. 0\r\n\r\n.  It contains two eol 
tokens like other chunks, one at the end of the length, the other at the end of 
the(null) payload.

--
Added file: http://bugs.python.org/file31826/httpresponse.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19009
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19009] Enhance HTTPResponse.readline() performance

2013-09-20 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

 A separate new function now takes care of the reading of chunk-header and 
 tail.  This simplifies the other functions.

Good. It is even better than I expected.

 Do we gain anything by doing that, would it change the code? or would it 
 merely be for the benefit of ABC?

Yes, it for the benefit of ABC. Some code tests isinstance(file, 
io.BufferedIOBase) and wraps stream in BufferedReader if it is false.

 Note that the chunk protocol was wrong and I fixed the unittests:  The final 
 chunk is a _valid_ zero length chunk, i.e. 0\r\n\r\n.  It contains two eol 
 tokens like other chunks, one at the end of the length, the other at the end 
 of the(null) payload.

How new code processes invalid final chunk 0\r\n? We need test to check that 
there is no degradation.

Perhaps it relates to behavior change which I mention in the comment on 
Rietveld.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19009
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19009] Enhance HTTPResponse.readline() performance

2013-09-20 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 Note for the different interpretation of the final chunk:
 Chunked encoding allows for adding headers after the final chunk.
 This is what _read_and_discard_trailer() does, but discarding the
 trailing headers.  So, if support is ever added for reading those
 trailing headers, then we need make sure that we consume the last
 chunk correctly.

I think what Serhiy is proposing is that we also succeed when the
server sends a truncated last chunk (and I agree with him).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19009
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19009] Enhance HTTPResponse.readline() performance

2013-09-20 Thread Kristján Valur Jónsson

Kristján Valur Jónsson added the comment:

Ok, I can change the base class inheritance and see if it changes things.

Note for the different interpretation of the final chunk:
Chunked encoding allows for adding headers after the final chunk.  This is what 
_read_and_discard_trailer() does, but discarding the trailing headers.  So, if 
support is ever added for reading those trailing headers, then we need make 
sure that we consume the last chunk correctly.
It is, of course, a matter of resilience, v.s. correctness, but the lesson of 
Internet Explorer should tell us to prefer correctness over flexibility, I 
think.  
We can, of course, change this back. if you think.  It is ieasier now, that the 
chunk consumer is in a single method.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19009
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19009] Enhance HTTPResponse.readline() performance

2013-09-20 Thread Kristján Valur Jónsson

Kristján Valur Jónsson added the comment:

Ok, I can make it resilient.  I was just pointing out that resilience in the 
face of RFC violations can be a bad thing.  E.g. Internet explorer and how they 
allowed internet servers of the world to be lax in how they served their data.
No matter, I can allow truncated tails when reading chunks.  But I don't think 
we ought to make that a requirement or make special unit tests for that.  
Either you read chunked encoding correctly or you don't.  There are so many 
different ways you can break protocol, that this particular one is really only 
a special case.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19009
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19009] Enhance HTTPResponse.readline() performance

2013-09-20 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 Ok, I can make it resilient.  I was just pointing out that resilience
 in the face of RFC violations can be a bad thing.  E.g. Internet
 explorer and how they allowed internet servers of the world to be lax
 in how they served their data.

I'm afraid the ship has sailed a long time ago on that. Poor HTTP
implementations are everywhere.

 No matter, I can allow truncated tails when reading chunks.  But I
 don't think we ought to make that a requirement or make special unit
 tests for that.

Well, the unit tests already exist, which is why I think we shouldn't
stop supporting that.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19009
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19009] Enhance HTTPResponse.readline() performance

2013-09-20 Thread Kristján Valur Jónsson

Kristján Valur Jónsson added the comment:

Very well, let's support both then :)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19009
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19009] Enhance HTTPResponse.readline() performance

2013-09-16 Thread Kristján Valur Jónsson

Kristján Valur Jónsson added the comment:

Ok, I'm adding a more comprehensive patch.  It adds support for
peek, readline, and read1 for both regular and chunked responses.
readline falls back to IOBase.readline for chunked, which again makes use of 
peek() for performance.
read1() is available for other kinds of client buffering, e.g. when reading the 
stream incrementally using other delimiters than newline.

The real work is done by the _read1_or_peek_chunked() method.  the code for 
both cases is almost identical so a dual purpose method is warranted.

Added test cases to test this.  The test cases don't attempt to test that we 
don't block on partial resopnses, though.  I'm not sure how I would write such 
a test.  I'd probably have to extend FakeSocket to work on some stream that can 
be appended to...

--
Added file: http://bugs.python.org/file31804/httpresponse.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19009
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19009] Enhance HTTPResponse.readline() performance

2013-09-13 Thread Kristján Valur Jónsson

New submission from Kristján Valur Jónsson:

Some applications require reading http response data in long polls as it 
becomes available.  This is used, e.g. to receive notifications over a HTTP 
stream.
Using response.read(large_buffer) is not possible because this will attempt to 
fullfill the entire request (using multiple underlying recv() calls), negating 
the attempt to get data as it becomes available.
Using readline, and using \n boundaries in the data, this problem can be 
overcome.
Currently, readline is slow because it will attempt to read one byte at a time. 
 Even if it is doing so from a buffered stream, it is still doing it one 
character at a time.
This patch adds a peek method to HttpResponse, which works both for chunked 
and non-chunked transfer encoding, thus improving the performance of readline.  
IOBase.readline will use peek() to determine the readahead it can use, instead 
of one byte which it must use by default.

--
components: Library (Lib)
files: peek.patch
keywords: patch
messages: 197574
nosy: kristjan.jonsson
priority: normal
severity: normal
status: open
title: Enhance HTTPResponse.readline() performance
type: performance
versions: Python 3.5
Added file: http://bugs.python.org/file31741/peek.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19009
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19009] Enhance HTTPResponse.readline() performance

2013-09-13 Thread Antoine Pitrou

Antoine Pitrou added the comment:

This sounds ok on the principle. I suppose one can't simply wrap the fp 
inside a BufferedReader?
I think it would be good to add tests for the peek() implementation, though.

--
nosy: +orsenthil, pitrou, serhiy.storchaka
stage:  - patch review
versions: +Python 3.4 -Python 3.5

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19009
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19009] Enhance HTTPResponse.readline() performance

2013-09-13 Thread Kristján Valur Jónsson

Kristján Valur Jónsson added the comment:

The problem is that self.fp is already a Buffered stream, and such streams are 
documented to have their read() and readinto() calls make multiple system calls 
to fullfill the request.

My original goal was actually to make response.read(amt) not try to make 
multiple read() calls, so that one could have other delimiters than newline.  
It is simple for the chunked case, but I don't know how to bypass it for 
response.fp, since it is already a buffered file that has no guaranteed such 
behaviour wait, one can simply call peek() on it and know how much one can 
get :)

But anyway, the use case I have actually uses newlines as record separators in 
the http stream, so this enhancement seems less intrusive.

I'll add unit tests.  There ought to be ready-made HTTPResponse tests already 
that I can use as templates.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19009
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19009] Enhance HTTPResponse.readline() performance

2013-09-13 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 My original goal was actually to make response.read(amt) not try to
 make multiple read() calls, so that one could have other delimiters
 than newline.  It is simple for the chunked case, but I don't know
 how to bypass it for response.fp, since it is already a buffered
 file that has no guaranteed such behaviour wait, one can simply
 call peek() on it and know how much one can get :)

Or you can use read1()?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19009
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19009] Enhance HTTPResponse.readline() performance

2013-09-13 Thread R. David Murray

R. David Murray added the comment:

See also issue 13541?

--
nosy: +r.david.murray

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19009
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19009] Enhance HTTPResponse.readline() performance

2013-09-13 Thread Kristján Valur Jónsson

Kristján Valur Jónsson added the comment:

Intersting. I didn't know about that.  My excuse is that I never use 3.x except 
when I'm porting some CCP enhancements for cpython.

Here's a thought:  HTTPResponse inherits from RawIOBase.  Only the BufferedIO 
classes have read1() and are documented to have more than one system calls on 
their read() methods.  Yet, HTTPResponse will happily behave in that way.
So, either HTTPResponse, being a non-buffered IO object, should not work in 
that way, or we could add a read1() method to it.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19009
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19009] Enhance HTTPResponse.readline() performance

2013-09-13 Thread Kristján Valur Jónsson

Kristján Valur Jónsson added the comment:

I should add, I fully support the use case that response.read(amt=None) needs 
to read to the end of the response.  It is only the read(amt=bufsize) use case 
I'm thinking of, and that could be handled with a read1() method.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19009
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com