STINNER Victor added the comment: Hi Ben,
2015-02-13 4:51 GMT+01:00 Ben Hoyt <rep...@bugs.python.org>: > Hi Victor, I thank you for your efforts here, especially your addition of > DirEntry.inode() and your work on the tests. The addition of inode() should still be discussed on python-dev. The remaining question is if the inode number (st_ino) is useful even if st_dev is missing. I didn't added the inode() method yet to the PEP because of that. > However, I'm a bit frustrated that you just re-implemented the whole thing > without discussion: (...) > So it was a bit of a let down for a first-time Python contributor. Even a > simple question would have been helpful here: "Ben, I really think this much > C code for a first version is bug-prone; what do you think about me > re-implementing it and comparing?" I'm really sorry, I didn't want to frustrate you :-( Sorry for not being available, I also expected someone else to review your code, but it didn't happen. And I *want* your opinon on design choices. Sorry if I wasn't explicit about that in my last emails. Sorry, I was very busy last months on other projects (like asyncio). I want scandir() to be included in Python 3.5! The3.5 alpha 1 release was a reminder for me. I didn't reimplement everything. Almost all code comes from your work. I just adapted it to Python 3.5. > I've been behind scandir and written the first couple of implementations, and > I asked if you could review my code, but instead of reviewing it or > interacting with my fairly clear desire for the all-C version, you simply > turn up and say "I've rewritten it, can you please review?" I'm not happy of the benchmark results of the C+Python implementation (scandir-6.patch is supposed to be the fasted implementation of C+Python). I tried to produce as much benchmark results as possible to be able to take a decision. I used: * hardware: SSD, HDD and tmpfs (RAM) * file system: ext4, tmpfs, NFS/ext4 * operating systems: Linux (Fedora 21), Windows 7 I will try to summarize benchmark results to write an email to python-dev, to make a choice between the full C implementation vs C+Python implementation. (ctypes is not an option, it's not reliable, portability matters.) > To continue the actual "which implementation" discussion: as I mentioned last > week in http://bugs.python.org/msg235458, I think the benchmarks above show > pretty clearly we should use the all-C version. It's quite clear that the C implementation is always faster than C+Python. My point is that we have to make a choice, C+Python is a nice compromise because it uses less C code, and C code is more expensive to *maintain*. > For background: PEP 471 doesn't add any new functionality, and especially > with the new pathlib module, it doesn't make directory iteration syntax nicer > either: os.scandir() is all about letting the OS give you whatever info it > can *for performance*. Most of the Rationale for adding scandir given in PEP > 471 is because it can be so so much faster than listdir + stat. Good point :-) In many setup (hardware, operating system, file system), I see a low speedup (less than 2x faster). The whole purpose of the PEP 471 becomes unclear if the speedup is not at least 2x. Quicky summary of benchmarks: * using C+Python (scandir-6.patch), the benchmark ("bench") is only faster than 2x on Windows, in the other cases it's between 1.3x and 1.4x faster * using C (scandir-2.patch), the benchmark ("bench") is at least 3.5x faster, it's between 3.5x and 44.6x faster, the most interesting speedup is on Windows (44.6x faster!) ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue22524> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com