On Sun, 24 Apr 2022 at 08:18, Cameron Simpson <c...@cskk.id.au> wrote:
>
> On 24Apr2022 07:15, Chris Angelico <ros...@gmail.com> wrote:
> >On Sun, 24 Apr 2022 at 07:13, Marco Sulla <marco.sulla.pyt...@gmail.com> 
> >wrote:
> >> Emh, why chunks? My function simply reads byte per byte and compares
> >> it to b"\n". When it find it, it stops and do a readline():
> [...]
> >> This is only for one line and in utf8, but it can be generalised.
>
> For some encodings that generalisation might be hard. But mostly, yes.
>
> >Ah. Well, then, THAT is why it's inefficient: you're seeking back one
> >single byte at a time, then reading forwards. That is NOT going to
> >play nicely with file systems or buffers.
>
> An approach I think you both may have missed: mmap the file and use
> mmap.rfind(b'\n') to locate line delimiters.
> https://docs.python.org/3/library/mmap.html#mmap.mmap.rfind

Yeah, I made a vague allusion to use of mmap, but didn't elaborate
because I actually have zero idea of how efficient this would be.
Would it be functionally equivalent to the chunking, but with the
chunk size defined by the system as whatever's most optimal? It would
need to be tested.

I've never used mmap for this kind of job, so it's not something I'm
comfortable predicting the performance of.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to