On 24Apr2022 07:15, Chris Angelico <ros...@gmail.com> wrote: >On Sun, 24 Apr 2022 at 07:13, Marco Sulla <marco.sulla.pyt...@gmail.com> wrote: >> Emh, why chunks? My function simply reads byte per byte and compares >> it to b"\n". When it find it, it stops and do a readline(): [...] >> This is only for one line and in utf8, but it can be generalised.
For some encodings that generalisation might be hard. But mostly, yes. >Ah. Well, then, THAT is why it's inefficient: you're seeking back one >single byte at a time, then reading forwards. That is NOT going to >play nicely with file systems or buffers. An approach I think you both may have missed: mmap the file and use mmap.rfind(b'\n') to locate line delimiters. https://docs.python.org/3/library/mmap.html#mmap.mmap.rfind Avoids sucking the whole file into memory in the usualy sense, instead the file is paged in as needed. Far more efficient that a seek/read single byte approach. If the file's growing you can do this to start with, then do a normal file open from your end point to follow accruing text. (Or reuse the descriptor you sues for the mmap, but using s.read().) Cheers, Cameron Simpson <c...@cskk.id.au> -- https://mail.python.org/mailman/listinfo/python-list