On Wed, 18 May 2022 at 23:32, Cameron Simpson <c...@cskk.id.au> wrote: > > On 17May2022 22:45, Marco Sulla <marco.sulla.pyt...@gmail.com> wrote: > >Well, I've done a benchmark. > >>>> timeit.timeit("tail('/home/marco/small.txt')", globals={"tail":tail}, > >>>> number=100000) > >1.5963431186974049 > >>>> timeit.timeit("tail('/home/marco/lorem.txt')", globals={"tail":tail}, > >>>> number=100000) > >2.5240604374557734 > >>>> timeit.timeit("tail('/home/marco/lorem.txt', chunk_size=1000)", > >>>> globals={"tail":tail}, number=100000) > >1.8944984432309866 > > This suggests that the file size does not dominate uour runtime.
Yes, this is what I wanted to test and it seems good. > Ah. > _Or_ that there are similar numbers of newlines vs text in the files so > reading similar amounts of data from the end. If the "line desnity" of > the files were similar you would hope that the runtimes would be > similar. No, well, small.txt has very short lines. Lorem.txt is a lorem ipsum, so really long lines. Indeed I get better results tuning chunk_size. Anyway, also with the default value the performance is not bad at all. > >But the time of Linux tail surprise me: > > > >marco@buzz:~$ time tail lorem.txt > >[text] > > > >real 0m0.004s > >user 0m0.003s > >sys 0m0.001s > > > >It's strange that it's so slow. I thought it was because it decodes > >and print the result, but I timed > > You're measuring different things. timeit() tries hard to measure just > the code snippet you provide. It doesn't measure the startup cost of the > whole python interpreter. Try: > > time python3 your-tail-prog.py /home/marco/lorem.txt Well, I'll try it, but it's not a bit unfair to compare Python startup with C? > BTW, does your `tail()` print output? If not, again not measuring the > same thing. > [...] > Also: does tail(1) do character set / encoding stuff? Does your Python > code do that? Might be apples and oranges. Well, as I wrote I also timed timeit.timeit("print(tail('/home/marco/lorem.txt').decode('utf-8'))", globals={"tail":tail}, number=100000) and I got ~36 seconds. > If you have the source of tail(1) to hand, consider getting to the core > and measuring `time()` immediately before and immediately after the > central tail operation and printing the result. IMHO this is a very good idea, but I have to find the time(). Ahah. Emh. -- https://mail.python.org/mailman/listinfo/python-list