STINNER Victor added the comment:

Hi Ben,

2015-02-13 4:51 GMT+01:00 Ben Hoyt <rep...@bugs.python.org>:
> Hi Victor, I thank you for your efforts here, especially your addition of 
> DirEntry.inode() and your work on the tests.

The addition of inode() should still be discussed on python-dev. The
remaining question is if the inode number (st_ino) is useful even if
st_dev is missing. I didn't added the inode() method yet to the PEP
because of that.

> However, I'm a bit frustrated that you just re-implemented the whole thing 
> without discussion:
(...)
> So it was a bit of a let down for a first-time Python contributor. Even a 
> simple question would have been helpful here: "Ben, I really think this much 
> C code for a first version is bug-prone; what do you think about me 
> re-implementing it and comparing?"

I'm really sorry, I didn't want to frustrate you :-( Sorry for not
being available, I also expected someone else to review your code, but
it didn't happen. And I *want* your opinon on design choices. Sorry if
I wasn't explicit about that in my last emails.

Sorry, I was very busy last months on other projects (like asyncio). I
want scandir() to be included in Python 3.5! The3.5 alpha 1 release
was a reminder for me.

I didn't reimplement everything. Almost all code comes from your work.
I just adapted it to Python 3.5.

> I've been behind scandir and written the first couple of implementations, and 
> I asked if you could review my code, but instead of reviewing it or 
> interacting with my fairly clear desire for the all-C version, you simply 
> turn up and say "I've rewritten it, can you please review?"

I'm not happy of the benchmark results of the C+Python implementation
(scandir-6.patch is supposed to be the fasted implementation of
C+Python).

I tried to produce as much benchmark results as possible to be able to
take a decision. I used:

* hardware: SSD, HDD and tmpfs (RAM)
* file system: ext4, tmpfs, NFS/ext4
* operating systems: Linux (Fedora 21), Windows 7

I will try to summarize benchmark results to write an email to
python-dev, to make a choice between the full C implementation vs
C+Python implementation. (ctypes is not an option, it's not reliable,
portability matters.)

> To continue the actual "which implementation" discussion: as I mentioned last 
> week in http://bugs.python.org/msg235458, I think the benchmarks above show 
> pretty clearly we should use the all-C version.

It's quite clear that the C implementation is always faster than C+Python.

My point is that we have to make a choice, C+Python is a nice
compromise because it uses less C code, and C code is more expensive
to *maintain*.

> For background: PEP 471 doesn't add any new functionality, and especially 
> with the new pathlib module, it doesn't make directory iteration syntax nicer 
> either: os.scandir() is all about letting the OS give you whatever info it 
> can *for performance*. Most of the Rationale for adding scandir given in PEP 
> 471 is because it can be so so much faster than listdir + stat.

Good point :-)

In many setup (hardware, operating system, file system), I see a low
speedup (less than 2x faster). The whole purpose of the PEP 471
becomes unclear if the speedup is not at least 2x.

Quicky summary of benchmarks:

* using C+Python (scandir-6.patch), the benchmark ("bench") is only
faster than 2x on Windows, in the other cases it's between 1.3x and
1.4x faster
* using C (scandir-2.patch), the benchmark ("bench") is at least 3.5x
faster, it's between 3.5x and 44.6x faster, the most interesting
speedup is on Windows (44.6x faster!)

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue22524>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to