Mark Russell added the comment:

Here's an updated version of the patch.  Changes:

    - Updated to work against current py3k branch (r59441)
    - Added support for universal newlines
    - Added unit tests
    - Added docs

The patch includes documentation for reversed() and __reversed__() (in the 
library and reference manuals respectively) which are independent of the 
reverse lines iterator - I can split those out to separate patch if needed.

I also updated the expected output from test_profile and test_cProfile, 
although I think a better fix would be to filter out the stdlib-related stuff 
from the expected output, as currently these tests break whenever io.py is 
changed.

Added file: http://bugs.python.org/file8902/reverse-file-iterator-20071209.diff

_____________________________________
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1677872>
_____________________________________
Index: Doc/reference/datamodel.rst
===================================================================
--- Doc/reference/datamodel.rst (revision 59439)
+++ Doc/reference/datamodel.rst (working copy)
@@ -1662,12 +1662,27 @@
    Iterator objects also need to implement this method; they are required to 
return
    themselves.  For more information on iterator objects, see :ref:`typeiter`.
 
+.. method:: object.__reversed__(self)
+
+   Called (if present) by the :func:`reversed()` builtin to implement
+   reverse iteration.  It should return a new iterator object that iterates
+   over all the objects in the container in reverse order.
+
+   If the :meth:`__reversed__()` method is not provided, the
+   :func:`reversed()` builtin will fall back to using the sequence protocol
+   (:meth:`__len__()` and :meth:`__getitem__()`). Objects should normally
+   only provide :meth:`__reversed__()` if they do not support the sequence
+   protocol and an efficient implementation of reverse iteration is possible.
+   
 The membership test operators (:keyword:`in` and :keyword:`not in`) are 
normally
 implemented as an iteration through a sequence.  However, container objects can
 supply the following special method with a more efficient implementation, which
 also does not require the object be a sequence.
 
 
+
+
+
 .. method:: object.__contains__(self, item)
 
    Called to implement membership test operators.  Should return true if 
*item* is
Index: Doc/library/stdtypes.rst
===================================================================
--- Doc/library/stdtypes.rst    (revision 59439)
+++ Doc/library/stdtypes.rst    (working copy)
@@ -1937,7 +1937,16 @@
    right.  However, using :meth:`seek` to reposition the file to an absolute
    position will flush the read-ahead buffer.
 
+.. method:: file.__reversed__()
 
+   Return a new iterator that returns lines in reverse order (but without
+   reading the entire file into memory first).  Normally called via the
+   :func:`reversed()` builtin, as in ``for line in reversed(f): print(line)``.
+   Useful for scanning backwards through large files without reading the
+   entire file first.  Note that this changes the current position of the
+   underlying file object, so you should not interleave use of reverse and
+   forward iteration over the same file object.
+
 .. method:: file.read([size])
 
    Read at most *size* bytes from the file (less if the read hits EOF before
Index: Doc/library/functions.rst
===================================================================
--- Doc/library/functions.rst   (revision 59439)
+++ Doc/library/functions.rst   (working copy)
@@ -869,8 +869,9 @@
 
 .. function:: reversed(seq)
 
-   Return a reverse :term:`iterator`.  *seq* must be an object which supports
-   the sequence protocol (the :meth:`__len__` method and the 
:meth:`__getitem__`
+   Return a reverse :term:`iterator`.  *seq* must be an object which has
+   a :meth:`__reversed__` method [#]_ or supports the sequence protocol
+   (the :meth:`__len__` method and the :meth:`__getitem__`
    method with integer arguments starting at ``0``).
 
 
@@ -1099,6 +1100,8 @@
    any I/O has been performed, and there's no reliable way to determine whether
    this is the case.
 
+.. [#] See :ref:`sequence-types`
+
 .. [#] In the current implementation, local variable bindings cannot normally 
be
    affected this way, but variables retrieved from other scopes (such as 
modules)
    can be.  This may change.
Index: Lib/io.py
===================================================================
--- Lib/io.py   (revision 59439)
+++ Lib/io.py   (working copy)
@@ -1136,6 +1136,125 @@
                )[self.seennl]
 
 
+class TextIOReverseIterator:
+    """Line-based reverse iterator wrapper for IOBase objects.
+
+    This class is used to implement TextIOWrapper.__reversed__().
+    It searches backwards for encoded line terminator, which
+    works for UTF-8 but not for encodings where one character encoding
+    can be a substring of another longer one.
+    """
+
+    # XXX Should we check for encodings that are known to work?  Currently
+    # we would return incorrect results for a codec where, say, the encoding
+    # of newline could appear as a substring of the encoding for some other
+    # character or where the codec can have a non-default state at the start
+    # of a line (do such encodings exist?).
+    
+    def __init__(self, buffer, encoding, newline=None,
+                 buffer_size=DEFAULT_BUFFER_SIZE):
+        if not isinstance(encoding, str):
+            raise ValueError("invalid encoding: %r" % encoding)
+        buffer.seek(0, 2)
+        self.buffer = buffer
+        self._bufsize = buffer_size
+        self._encoding = encoding
+        self._translate_newlines = newline is None
+        if newline:
+            self._enc_cr = self._enc_lf = None
+        else:
+            self._enc_cr = '\r'.encode(encoding)
+            self._enc_lf = '\n'.encode(encoding)
+            if self._enc_cr + self._enc_lf != '\r\n'.encode(encoding):
+                raise ValueError('unsupported encoding: %r' % encoding)
+        self._newline = newline.encode(encoding) if newline else None
+        self._limpos = buffer.tell()
+        self._bufpos = self._limpos
+        self._pending = b''
+
+    def _extend_buffer_backwards(self):
+        (bufpos, limpos, bufsize) = (self._bufpos, self._limpos, self._bufsize)
+        
+        newpos = (bufpos // bufsize) * bufsize
+        if newpos == bufpos:
+            newpos -= bufsize
+            assert newpos >= 0
+        nbytes = bufpos - newpos
+        assert nbytes != 0
+        
+        self.buffer.seek(newpos, 0)
+        assert self.buffer.tell() == newpos, \
+               'seek() arrived at %r (expected %r)' % (seekpos, newpos)
+        newbuf = self.buffer.read(nbytes)
+        assert len(newbuf) == nbytes, 'Unexpected EOF'
+
+        if limpos > bufpos:
+            newbuf += self._pending[:limpos - bufpos]
+        (self._pending, self._bufpos) = (newbuf, newpos)
+
+    __iter__ = lambda self: self
+
+    # Look backwards for the first occurrence of \r, \n or \r\n.
+    # Return (offset, terminator) or (-1, None) if we need to read more.
+    def _find_universal_endline(self, limpos):
+        enc_cr, enc_lf = self._enc_cr, self._enc_lf
+        cr_pos = self._pending.rfind(enc_cr, 0, limpos)
+        lf_pos = self._pending.rfind(enc_lf, 0, limpos)
+        res = -1, None
+        if lf_pos != -1 and lf_pos > cr_pos:
+            if lf_pos > len(enc_cr) or self._bufpos == 0:
+                if cr_pos != -1 and cr_pos == lf_pos - len(enc_lf):
+                    res = cr_pos, enc_cr + enc_lf
+                else:
+                    res = lf_pos, enc_lf
+        elif cr_pos != -1:
+            res = cr_pos, enc_cr
+        return res
+
+    def _getbytes(self):
+        is_firstline = self._pending == b''
+        limpos, newline = self._limpos, self._newline
+        
+        if limpos is None:
+            raise StopIteration
+        assert limpos >= 0
+        
+        # limpos points one character past the end of the line we're about to
+        # return - e.g "abc\ndef"
+        #                    ^
+        while True:
+            lim_offset = limpos - self._bufpos      # file offset -> buf offset
+            if newline is None:
+                offset, ending = self._find_universal_endline(lim_offset)
+            else:
+                offset = self._pending.rfind(newline, 0, lim_offset)
+                ending = newline
+                
+            if offset != -1:
+                self._limpos = self._bufpos + offset
+                line_offset = offset + len(ending)
+                break
+            
+            if self._bufpos > 0:
+                self._extend_buffer_backwards()
+            else:
+                self._limpos = None
+                line_offset = 0
+                break
+        
+        # We treat the first returned line specially, as it may be missing
+        # the endline terminator.  Also we avoid returning an initial empty
+        # line for files with a normal terminating endline.
+        #
+        if is_firstline:
+            return self._pending[line_offset:] or self._getbytes()
+        else:
+            ending_to_add = self._enc_lf if self._translate_newlines else 
ending
+            return self._pending[line_offset:lim_offset] + ending_to_add
+        
+    def __next__(self):
+        return self._getbytes().decode(self._encoding)
+
 class TextIOWrapper(TextIOBase):
 
     """Buffered text stream.
@@ -1382,6 +1501,9 @@
             self._pending = res[n:]
             return res[:n]
 
+    def __reversed__(self):
+        return TextIOReverseIterator(self.buffer, self._encoding, self._readnl)
+    
     def __next__(self):
         self._telling = False
         line = self.readline()
Index: Lib/test/test_io.py
===================================================================
--- Lib/test/test_io.py (revision 59439)
+++ Lib/test/test_io.py (working copy)
@@ -621,6 +621,76 @@
                             self.assertEquals(got_line, exp_line)
                         self.assertEquals(len(got_lines), len(exp_lines))
 
+    def testReversedLines(self):
+        texts = [
+            "a\nbb\nccc\n\neee\n"
+            "AAA\nBBB\nCCC\rDDD\rEEE\r\nFFF\r\nGGG",
+            "",
+            "foo",
+            "\nfoo",
+            "\rbar",
+            "\r\nbaz",
+            "foo\n",
+            "\n\n",
+            ("\0\x0f\xff\u0fff\uffff\U000fffff\U0010ffff"*3 + "\n") * 3 + "\n"
+        ]
+        encodings = [ "utf-8", "latin-1" ]
+        newlines = [ None, "\n", "\r", "\r\n" ]
+        for text in texts:
+            for encoding in encodings:
+                for newline in newlines:
+                    for bufsize in None, 1, 2, 3, 5, 10:
+                        def make_textio():
+                            bufio = io.BytesIO(text.encode(encoding))
+                            return io.TextIOWrapper(bufio, encoding=encoding,
+                                                    newline=newline)
+                        try:
+                            textio = make_textio()
+                        except UnicodeEncodeError:
+                            # Skip non-ascii tests for latin-1
+                            continue
+                        if bufsize is None:
+                            revio = reversed(textio)
+                        else:
+                            revio = io.TextIOReverseIterator(
+                                textio.buffer, encoding, newline, bufsize)
+                        params = dict(text=text, enc=encoding,
+                                      nl=newline, bufsize=bufsize)
+                        got = list(revio)
+                        exp = list(reversed(list(make_textio())))
+                        self.assertEquals((got, params), (exp, params))
+
+    def testReversedLinesOpcount(self):
+        import math
+        
+        class LoggingRaw (io.RawIOBase):
+            def __init__(self, data):
+                self._bytes = io.BytesIO(data)
+                self._nseeks = self._nreads = 0
+                
+            def readinto(self, b):
+                res = self._bytes.readinto(b)
+                #print("readinto => %r" % (res,))
+                self._nreads += 1
+                return res
+
+            def seek(self, pos, whence):
+                res = self._bytes.seek(pos, whence)
+                #print("seek(%r, %r) => %r" % (pos, whence, res))
+                self._nseeks += 1
+                return res
+
+            readable = lambda self: True
+
+        lines = [ "x" * 80 + "\n" ] * 1000 + [ "l" * 1000 ]
+        encoding = "ascii"
+        raw = LoggingRaw(b"".join(line.encode(encoding) for line in lines))
+        textio = io.TextIOWrapper(io.BufferedReader(raw), encoding)
+        self.assertEqual(list(reversed(textio)), list(reversed(lines)))
+        exp_nreads = math.ceil(sum(map(len, lines)) / io.DEFAULT_BUFFER_SIZE)
+        self.assertEqual(raw._nreads, exp_nreads)
+        #print("nseeks=%d nreads=%d" % (raw._nseeks, raw._nreads))
+
     def testNewlinesInput(self):
         testdata = b"AAA\nBBB\nCCC\rDDD\rEEE\r\nFFF\r\nGGG"
         normalized = testdata.replace(b"\r\n", b"\n").replace(b"\r", b"\n")
@@ -792,7 +862,11 @@
             while f.readline():
                 f.tell()
             t4 = timer()
+            for line in reversed(f):
+                pass
+            t5 = timer()
             f.close()
+            
             if test_support.verbose:
                 print("\nTiming test: %d lines of %d characters (%d bytes)" %
                       (nlines, nchars, nbytes))
@@ -801,6 +875,7 @@
                 print("Reading using iteration:  %6.3f seconds" % (t2-t1))
                 print("Reading using readline(): %6.3f seconds" % (t3-t2))
                 print("Using readline()+tell():  %6.3f seconds" % (t4-t3))
+                print("Using reversed():         %6.3f seconds" % (t5-t4))
 
     def testReadOneByOne(self):
         txt = io.TextIOWrapper(io.BytesIO(b"AA\r\nBB"))
Index: Lib/test/output/test_profile
===================================================================
--- Lib/test/output/test_profile        (revision 59439)
+++ Lib/test/output/test_profile        (working copy)
@@ -10,10 +10,10 @@
        12    0.000    0.000    0.012    0.001 :0(hasattr)
         1    0.000    0.000    0.000    0.000 :0(setprofile)
         1    0.000    0.000    1.000    1.000 <string>:1(<module>)
-        2    0.000    0.000    0.000    0.000 io.py:1201(flush)
-        1    0.000    0.000    0.000    0.000 io.py:259(flush)
-        1    0.000    0.000    0.000    0.000 io.py:646(closed)
-        1    0.000    0.000    0.000    0.000 io.py:864(flush)
+        2    0.000    0.000    0.000    0.000 io.py:1330(flush)
+        1    0.000    0.000    0.000    0.000 io.py:269(flush)
+        1    0.000    0.000    0.000    0.000 io.py:656(closed)
+        1    0.000    0.000    0.000    0.000 io.py:874(flush)
         0    0.000             0.000          profile:0(profiler)
         1    0.000    0.000    1.000    1.000 profile:0(testfunc())
         8    0.064    0.008    0.080    0.010 test_profile.py:103(subhelper)
@@ -33,15 +33,15 @@
 :0(append)                            -> 
 :0(exc_info)                          -> 
 :0(exec)                              -> <string>:1(<module>)(1)    1.000
-                                         io.py:1201(flush)(2)    0.000
+                                         io.py:1330(flush)(2)    0.000
 :0(hasattr)                           -> test_profile.py:115(__getattr__)(12)  
  0.028
 :0(setprofile)                        -> 
 <string>:1(<module>)                  -> test_profile.py:30(testfunc)(1)    
1.000
-io.py:1201(flush)                     -> io.py:259(flush)(1)    0.000
-                                         io.py:864(flush)(1)    0.000
-io.py:259(flush)                      -> 
-io.py:646(closed)                     -> 
-io.py:864(flush)                      -> io.py:646(closed)(1)    0.000
+io.py:1330(flush)                     -> io.py:269(flush)(1)    0.000
+                                         io.py:874(flush)(1)    0.000
+io.py:269(flush)                      -> 
+io.py:656(closed)                     -> 
+io.py:874(flush)                      -> io.py:656(closed)(1)    0.000
 profile:0(profiler)                   -> profile:0(testfunc())(1)    1.000
 profile:0(testfunc())                 -> :0(exec)(1)    1.000
                                          :0(setprofile)(1)    0.000
@@ -74,10 +74,10 @@
                                          test_profile.py:93(helper2)(8)    
0.400
 :0(setprofile)                        <- profile:0(testfunc())(1)    1.000
 <string>:1(<module>)                  <- :0(exec)(1)    1.000
-io.py:1201(flush)                     <- :0(exec)(2)    1.000
-io.py:259(flush)                      <- io.py:1201(flush)(1)    0.000
-io.py:646(closed)                     <- io.py:864(flush)(1)    0.000
-io.py:864(flush)                      <- io.py:1201(flush)(1)    0.000
+io.py:1330(flush)                     <- :0(exec)(2)    1.000
+io.py:269(flush)                      <- io.py:1330(flush)(1)    0.000
+io.py:656(closed)                     <- io.py:874(flush)(1)    0.000
+io.py:874(flush)                      <- io.py:1330(flush)(1)    0.000
 profile:0(profiler)                   <- 
 profile:0(testfunc())                 <- profile:0(profiler)(1)    0.000
 test_profile.py:103(subhelper)        <- test_profile.py:93(helper2)(8)    
0.400
Index: Lib/test/output/test_cProfile
===================================================================
--- Lib/test/output/test_cProfile       (revision 59439)
+++ Lib/test/output/test_cProfile       (working copy)
@@ -5,10 +5,10 @@
 
    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
         1    0.000    0.000    1.000    1.000 <string>:1(<module>)
-        2    0.000    0.000    0.000    0.000 io.py:1201(flush)
-        1    0.000    0.000    0.000    0.000 io.py:259(flush)
-        1    0.000    0.000    0.000    0.000 io.py:646(closed)
-        1    0.000    0.000    0.000    0.000 io.py:864(flush)
+        2    0.000    0.000    0.000    0.000 io.py:1330(flush)
+        1    0.000    0.000    0.000    0.000 io.py:269(flush)
+        1    0.000    0.000    0.000    0.000 io.py:656(closed)
+        1    0.000    0.000    0.000    0.000 io.py:874(flush)
         8    0.064    0.008    0.080    0.010 test_cProfile.py:103(subhelper)
        28    0.028    0.001    0.028    0.001 test_cProfile.py:115(__getattr__)
         1    0.270    0.270    1.000    1.000 test_cProfile.py:30(testfunc)
@@ -30,11 +30,11 @@
 Function                                          called...
                                                       ncalls  tottime  cumtime
 <string>:1(<module>)                              ->       1    0.270    1.000 
 test_cProfile.py:30(testfunc)
-io.py:1201(flush)                                 ->       1    0.000    0.000 
 io.py:259(flush)
-                                                           1    0.000    0.000 
 io.py:864(flush)
-io.py:259(flush)                                  -> 
-io.py:646(closed)                                 -> 
-io.py:864(flush)                                  ->       1    0.000    0.000 
 io.py:646(closed)
+io.py:1330(flush)                                 ->       1    0.000    0.000 
 io.py:269(flush)
+                                                           1    0.000    0.000 
 io.py:874(flush)
+io.py:269(flush)                                  -> 
+io.py:656(closed)                                 -> 
+io.py:874(flush)                                  ->       1    0.000    0.000 
 io.py:656(closed)
 test_cProfile.py:103(subhelper)                   ->      16    0.016    0.016 
 test_cProfile.py:115(__getattr__)
 test_cProfile.py:115(__getattr__)                 -> 
 test_cProfile.py:30(testfunc)                     ->       1    0.014    0.130 
 test_cProfile.py:40(factorial)
@@ -53,7 +53,7 @@
 test_cProfile.py:93(helper2)                      ->       8    0.064    0.080 
 test_cProfile.py:103(subhelper)
                                                            8    0.000    0.008 
 {hasattr}
 {exec}                                            ->       1    0.000    1.000 
 <string>:1(<module>)
-                                                           2    0.000    0.000 
 io.py:1201(flush)
+                                                           2    0.000    0.000 
 io.py:1330(flush)
 {hasattr}                                         ->      12    0.012    0.012 
 test_cProfile.py:115(__getattr__)
 {method 'append' of 'list' objects}               -> 
 {method 'disable' of '_lsprof.Profiler' objects}  -> 
@@ -65,10 +65,10 @@
 Function                                          was called by...
                                                       ncalls  tottime  cumtime
 <string>:1(<module>)                              <-       1    0.000    1.000 
 {exec}
-io.py:1201(flush)                                 <-       2    0.000    0.000 
 {exec}
-io.py:259(flush)                                  <-       1    0.000    0.000 
 io.py:1201(flush)
-io.py:646(closed)                                 <-       1    0.000    0.000 
 io.py:864(flush)
-io.py:864(flush)                                  <-       1    0.000    0.000 
 io.py:1201(flush)
+io.py:1330(flush)                                 <-       2    0.000    0.000 
 {exec}
+io.py:269(flush)                                  <-       1    0.000    0.000 
 io.py:1330(flush)
+io.py:656(closed)                                 <-       1    0.000    0.000 
 io.py:874(flush)
+io.py:874(flush)                                  <-       1    0.000    0.000 
 io.py:1330(flush)
 test_cProfile.py:103(subhelper)                   <-       8    0.064    0.080 
 test_cProfile.py:93(helper2)
 test_cProfile.py:115(__getattr__)                 <-      16    0.016    0.016 
 test_cProfile.py:103(subhelper)
                                                           12    0.012    0.012 
 {hasattr}
_______________________________________________
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to