[issue15200] Faster os.walk

Charles-François Natali Wed, 27 Jun 2012 03:21:01 -0700

Charles-François Natali <neolo...@free.fr> added the comment:

> On the other hand, fwalk also uses a lot of file descriptors.  Users 
> with processes which were already borderline on max file descriptors 
> might not appreciate upgrading to find their os.walk calls suddenly 
> failing.


It doesn't have to.
Right now, it uses O(depth of the directory tree) FDs. It can be changed to 
only require O(1) FDs, see http://bugs.python.org/issue13734.
For example, GNU coreutils "rm -rf" uses *at() syscalls and only requires a 
constant number of FDs.

> Can you figure out why fwalk is faster, and apply that advantage to 
> walk *without* consuming so many file descriptors?

I didn't run any benchmark or test, but one reason why fwalk() is faster could 
be simply because it doesn't do as much path resolution - which is a somewhat 
expensive operation - thanks to the relative FD being passed.
I guess your mileage will vary with the FS in use, and the kernel version 
(there's been a lot of work to speed up path resolution by Nick Piggin during 
the last years or so).

Anyway, I think that such optimization is useless, because this micro-benchmark 
doesn't make much sense: when you walk a directory tree, it's usually to do 
something with the files/directories encountered, and as soon as you do 
something with them - stat(), unlink(), etc - the gain on the walking time will 
become negligible.

----------
nosy: +neologix

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue15200>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue15200] Faster os.walk

Reply via email to