Very interesting. Although os.walk may not be widely used in cluster
applications, anything that lowers the number of calls to stat() in an
spplication is worthwhile for parallel filesystems as stat() is handled by
the only non-parallel node, the MDS.

Small test on another NFS drive:
Creating tree at benchtree: depth=4, num_dirs=5, num_files=50
Priming the system's cache...
Benchmarking walks on benchtree, repeat 1/3...
Benchmarking walks on benchtree, repeat 2/3...
Benchmarking walks on benchtree, repeat 3/3...
os.walk took 0.117s, scandir.walk took 0.041s -- 2.8x as fast

I may try it on a Lustre FS if I have some time and if I don't forget about
this.

Cheers,

Matthieu


2013/5/14 Charles-François Natali <cf.nat...@gmail.com>

> > I wonder how sshfs compared to nfs.
>
> (I've modified your benchmark to also test the case where data isn't
> in the page cache).
>
> Local ext3:
> cached:
> os.walk took 0.096s, scandir.walk took 0.030s -- 3.2x as fast
> uncached:
> os.walk took 0.320s, scandir.walk took 0.130s -- 2.5x as fast
>
> NFSv3, 1Gb/s network:
> cached:
> os.walk took 0.220s, scandir.walk took 0.078s -- 2.8x as fast
> uncached:
> os.walk took 0.269s, scandir.walk took 0.139s -- 1.9x as fast
> _______________________________________________
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/matthieu.brucher%40gmail.com
>



-- 
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Music band: http://liliejay.com/
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to