STINNER Victor added the comment:

> I'm somewhat surprised at the 2-3x numbers you're seeing, as I was 
> consistently getting 4-5x in the Linux tests I did. But it does depend quite 
> a bit on what file system you're running, what hardware, whether you're 
> running in a VM, etc. Still, 2-3x faster is a good speedup!

I don't think that hardware matters. As I wrote, I expect the whole /usr/share 
tree to fit in memory. It's sounds more like optimizations in the Linux kernel. 
I ran benchmarks on Fedora 20 with the Linux kernel 3.14.

> Anyway, where to from here? Are we agreed given the numbers that -- 
> especially on Linux -- it makes good performance sense to use an all-C 
> approach?

We didn't try yet to call readdir() multiple times in the C iterator and use a 
small cache (ex: between 10 and 1000 items, I don't know which size is the best 
yet) to also limit the number of readdir() calls. The cache would be an array 
of dirent on Linux.

scandir_helper() can return an array of items instead of a single item for 
example.

I can try to implement it if you want.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue22524>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to