On 2022-05-06 20:21, Marco Sulla wrote:
I have a little problem.

I tried to extend the tail function, so it can read lines from the bottom
of a file object opened in text mode.

The problem is it does not work. It gets a starting position that is lower
than the expected by 3 characters. So the first line is read only for 2
chars, and the last line is missing.

import os

_lf = "\n"
_cr = "\r"
_lf_ord = ord(_lf)

def tail(f, n=10, chunk_size=100):
     n_chunk_size = n * chunk_size
     pos = os.stat(f.fileno()).st_size
     chunk_line_pos = -1
     lines_not_found = n
     binary_mode = "b" in f.mode
     lf = _lf_ord if binary_mode else _lf

     while pos != 0:
         pos -= n_chunk_size

         if pos < 0:
             pos = 0

         f.seek(pos)
         chars = f.read(n_chunk_size)

         for i, char in enumerate(reversed(chars)):
             if char == lf:
                 lines_not_found -= 1

                 if lines_not_found == 0:
                     chunk_line_pos = len(chars) - i - 1
                     print(chunk_line_pos, i)
                     break

         if lines_not_found == 0:
             break

     line_pos = pos + chunk_line_pos + 1

     f.seek(line_pos)

     res = b"" if binary_mode else ""

     for i in range(n):
         res += f.readline()

     return res

Maybe the problem is 1 char != 1 byte?

Is the file UTF-8? That's a variable-width encoding, so are any of the characters > U+007F?

Which OS? On Windows, it's common/normal for UTF-8 files to start with a BOM/signature, which is 3 bytes/1 codepoint.
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to