Ben Hoyt added the comment: To continue the actual "which implementation" discussion: as I mentioned last week in http://bugs.python.org/msg235458, I think the benchmarks above show pretty clearly we should use the all-C version.
For background: PEP 471 doesn't add any new functionality, and especially with the new pathlib module, it doesn't make directory iteration syntax nicer either: os.scandir() is all about letting the OS give you whatever info it can *for performance*. Most of the Rationale for adding scandir given in PEP 471 is because it can be so so much faster than listdir + stat. My original all-C implementation is definitely more code to review (roughly 800 lines of C vs scandir-6.patch's 400), but it's also more than twice as fast. On my Windows 7 SSD just now, running benchmark.py: Original scandir-2.patch version: os.walk took 0.509s, scandir.walk took 0.020s -- 25.4x as fast New scandir-6.patch version: os.walk took 0.455s, scandir.walk took 0.046s -- 10.0x as fast So the all-C implementation is literally 2.5x as fast on Windows. (After both tests, just for a sanity check, I ran the ctypes version as well, and it said about 8x as fast for both runs.) Then on Linux, not a perfect comparison (different benchmarks) but shows the same kind of trend: Original scandir-2.patch benchmark (http://bugs.python.org/msg228857): os.walk took 0.860s, scandir.walk took 0.268s -- 3.2x as fast New scandir-6.patch benchmark (http://bugs.python.org/msg235865) -- note that "1.3x faster" should actually read "1.3x as fast" here: bench: 1.3x faster (scandir: 164.9 ms, listdir: 216.3 ms) So again, the all-C implementation is 2.5x as fast on Linux too. And on Linux, the incremental improvement provided by scandir-6 over listdir is hardly worth it -- I'd use a new directory listing API for 3.2x as fast, but not for 1.3x as fast. Admittedly a 10x speed gain (!) on Windows is still very much worth going for, so I'm positive about scandir even with a half-Python implementation, but hopefully the above shows fairly clearly why the all-C implementation is important, especially on Linux. Also, if the consensus is in favour of slow but less C code, I think there are further tweaks we can make to the Python part of the code to improve things a bit more. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue22524> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com