Ben Hoyt added the comment:

To continue the actual "which implementation" discussion: as I mentioned last 
week in http://bugs.python.org/msg235458, I think the benchmarks above show 
pretty clearly we should use the all-C version.

For background: PEP 471 doesn't add any new functionality, and especially with 
the new pathlib module, it doesn't make directory iteration syntax nicer 
either: os.scandir() is all about letting the OS give you whatever info it can 
*for performance*. Most of the Rationale for adding scandir given in PEP 471 is 
because it can be so so much faster than listdir + stat.

My original all-C implementation is definitely more code to review (roughly 800 
lines of C vs scandir-6.patch's 400), but it's also more than twice as fast. On 
my Windows 7 SSD just now, running benchmark.py:

    Original scandir-2.patch version:
    os.walk took 0.509s, scandir.walk took 0.020s -- 25.4x as fast

    New scandir-6.patch version:
    os.walk took 0.455s, scandir.walk took 0.046s -- 10.0x as fast

So the all-C implementation is literally 2.5x as fast on Windows. (After both 
tests, just for a sanity check, I ran the ctypes version as well, and it said 
about 8x as fast for both runs.)

Then on Linux, not a perfect comparison (different benchmarks) but shows the 
same kind of trend:

    Original scandir-2.patch benchmark (http://bugs.python.org/msg228857):
    os.walk took 0.860s, scandir.walk took 0.268s -- 3.2x as fast

    New scandir-6.patch benchmark (http://bugs.python.org/msg235865) -- note 
that "1.3x faster" should actually read "1.3x as fast" here:
    bench: 1.3x faster (scandir: 164.9 ms, listdir: 216.3 ms)

So again, the all-C implementation is 2.5x as fast on Linux too.

And on Linux, the incremental improvement provided by scandir-6 over listdir is 
hardly worth it -- I'd use a new directory listing API for 3.2x as fast, but 
not for 1.3x as fast.

Admittedly a 10x speed gain (!) on Windows is still very much worth going for, 
so I'm positive about scandir even with a half-Python implementation, but 
hopefully the above shows fairly clearly why the all-C implementation is 
important, especially on Linux.

Also, if the consensus is in favour of slow but less C code, I think there are 
further tweaks we can make to the Python part of the code to improve things a 
bit more.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue22524>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to