Rishi added the comment:

I have recreated the patch(issue1610654_1.patch) and it performs more or less 
like the earlier patch

Serhiy,
I agree we cannot use handmade buffering here, without seeking ahead.
I believe, we can make optimizations for streams which are buffered and 
non-seekable.
Cgi modules default value for file object is the BufferedReader of sys.stdin, 
so the solution is fairly generic.

I have removed handmade buffering. Neither do I create a Buffered* object.
We rely on user to create the buffered object. The sys.stdin that cgi module 
has a decent buffer underneath that
works well on apache.

The patch attached does not seek, nor does it read ahead. It only looks ahead.
As Antoine suggests, it peeks the buffer and determines through a fast lookup 
if the buffer has a bounary or not.
It moves forward only if it is convinced that the current buffer is completely 
within the next boundary.


The issue is that the current implementation deals with lines and not chunks.
Even when a savy user wraps sys.stdin around a large BufferredReader there is 
little to no peformance improvement in 
the current implementation for large files in my observation. It does not solve 
the bug mentioned either.
The difference in extreme cases like Chui's is 53s against 0.7s and even 
otherwise for larger files the patch
is 3 times faster than the current implementation.
I have tested this on Apache2 server where the sys.stdin is buffered.

----------
Added file: http://bugs.python.org/file36927/issue1610654_1.patch

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue1610654>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to