On Wed, 18 Feb 2009 at 20:31, Guido van Rossum wrote:
On Wed, Feb 18, 2009 at 6:38 PM,  <rdmur...@bitdance.com> wrote:
On Wed, 18 Feb 2009 at 21:25, Antoine Pitrou wrote:

Nick Coghlan <ncoghlan <at> gmail.com> writes:

I *think* the 2.x system had an internal buffer that was used by the
file iterator, but not by the file methods. With the new IO stack for
3.0, there is now a common buffer shared by all the file operations
(including iteration).

However, given that the lifting of the restriction is currently
undocumented, I wouldn't want to see a commitment to keeping it lifted
until we know that it won't cause any problems for the io-in-c rewrite
for 3.1 (hopefully someone with more direct involvement with that
rewrite will chime in, since they'll know a lot more about it than I do).

As you said, there is no special buffering for the file iterator in 3.x,
which
means the restriction could be lifted (actually there is nothing relying
on this
restriction in the current code, except perhaps the "telling" flag in
TextIOWrapper).

Currently I have python (2.x) code that uses 'readline' instead of 'for
x in myfile' in order to avoid the 'for' buffering (ie: be presented
with the next line as soon as it is written to the other end of a pipe,
instead of waiting for the buffer to fill).  Does "no special buffering"
mean that 'for' doesn't use a read-ahead buffer in p3k, or that readline
does use such a buffer, because the latter could make my code break
unexpectedly when porting to p3k.

Have a look at the code in io.py (class TextIOWrapper):

http://svn.python.org/view/python/branches/py3k/Lib/io.py?view=log

I believe it doesn't force reading ahead more than necessary. If a
single low-level read() returns enough data to satisfy the __next__()
or readline() (or it can be satisfied from data already buffered) then
it won't force reading more.

Hmm.  I'm not sure I'm reading the code right, but it looks from the
docstrings like TextIOWrapper expects to read from a BufferedIOBase
object, whose doc string contains this comment:

        If the argument is positive, and the underlying raw stream is
        not 'interactive', multiple raw reads may be issued to satisfy
        the byte count (unless EOF is reached first).  But for
        interactive raw streams (XXX and for pipes?), at most one raw
        read will be issued, and a short result does not imply that
        EOF is imminent.

Since the 'pipe' comment is an XXX, it is not clear that my use case
is covered.  However, the actual implementation of readinto seems to
only call 'read' once, so as long as the 'read' of the subclass returns
whatever bytes are available, then it looks good to me :)

Since TextIOWrapper is careful to call 'read1' on the wrapped buffer
object, and the one place that 'read1' has a docstring clearly indicates
that it does at most one read and returns whatever data is ready, it
seems that the _intent_ of the code is as you expressed.

I'm a python programmer first, and my C is pretty rusty, so I'm not
sure if I'm up to looking through the new C code to see how this got
translated.  I'm thinking that both my use case (and in my case 'for'
should now work for me) and the OP's are the way it is intended to work,
but documentation of this seems like it would be a good idea.

Since the OP doesn't seem to have opened a ticket, I did so:
http://bugs.python.org/issue5323.  As I said there, I'm willing to work
on doc and test patches if this is the behavior the io library is required
to have in 3.x.

--RDM
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to