> On 8 May 2022, at 20:48, Marco Sulla <marco.sulla.pyt...@gmail.com> wrote: > > On Sun, 8 May 2022 at 20:31, Barry Scott <ba...@barrys-emacs.org> wrote: >> >>>> On 8 May 2022, at 17:05, Marco Sulla <marco.sulla.pyt...@gmail.com> wrote: >>> >>> def tail(filepath, n=10, newline=None, encoding=None, chunk_size=100): >>> n_chunk_size = n * chunk_size >> >> Why use tiny chunks? You can read 4KiB as fast as 100 bytes as its typically >> the smaller size the file system will allocate. >> I tend to read on multiple of MiB as its near instant. > > Well, I tested on a little file, a list of my preferred pizzas, so....
Try it on a very big file. > >>> pos = os.stat(filepath).st_size >> >> You cannot mix POSIX API with text mode. >> pos is in bytes from the start of the file. >> Textmode will be in code points. bytes != code points. >> >>> chunk_line_pos = -1 >>> lines_not_found = n >>> >>> with open(filepath, newline=newline, encoding=encoding) as f: >>> text = "" >>> >>> hard_mode = False >>> >>> if newline == None: >>> newline = _lf >>> elif newline == "": >>> hard_mode = True >>> >>> if hard_mode: >>> while pos != 0: >>> pos -= n_chunk_size >>> >>> if pos < 0: >>> pos = 0 >>> >>> f.seek(pos) >> >> In text mode you can only seek to a value return from f.tell() otherwise the >> behaviour is undefined. > > Why? I don't see any recommendation about it in the docs: > https://docs.python.org/3/library/io.html#io.IOBase.seek What does adding 1 to a pos mean? If it’s binary it mean 1 byte further down the file but in text mode it may need to move the point 1, 2 or 3 bytes down the file. > >>> text = f.read() >> >> You have on limit on the amount of data read. > > I explained that previously. Anyway, chunk_size is small, so it's not > a great problem. Typo I meant you have no limit. You read all the data till the end of the file that might be mega bytes of data. > >>> lf_after = False >>> >>> for i, char in enumerate(reversed(text)): >> >> Simple use text.rindex('\n') or text.rfind('\n') for speed. > > I can't use them when I have to find both \n or \r. So I preferred to > simplify the code and use the for cycle every time. Take into mind > anyway that this is a prototype for a Python C Api implementation > (builtin I hope, or a C extension if not) > >>> Shortly, the file is always opened in text mode. File is read at the end in >>> bigger and bigger chunks, until the file is finished or all the lines are >>> found. >> >> It will fail if the contents is not ASCII. > > Why? > >>> Why? Because in encodings that have more than 1 byte per character, reading >>> a chunk of n bytes, then reading the previous chunk, can eventually split >>> the character between the chunks in two distinct bytes. >> >> No it cannot. text mode only knows how to return code points. Now if you are >> in >> binary it could be split, but you are not in binary mode so it cannot. > >> From the docs: > > seek(offset, whence=SEEK_SET) > Change the stream position to the given byte offset. > >>> Do you think there are chances to get this function as a method of the file >>> object in CPython? The method for a file object opened in bytes mode is >>> simpler, since there's no encoding and newline is only \n in that case. >> >> State your requirements. Then see if your implementation meets them. > > The method should return the last n lines from a file object. > If the file object is in text mode, the newline parameter must be honored. > If the file object is in binary mode, a newline is always b"\n", to be > consistent with readline. > > I suppose the current implementation of tail satisfies the > requirements for text mode. The previous one satisfied binary mode. > > Anyway, apart from my implementation, I'm curious if you think a tail > method is worth it to be a method of the builtin file objects in > CPython. > -- https://mail.python.org/mailman/listinfo/python-list