Giampaolo Rodola' <g.rod...@gmail.com> added the comment:

Yes, file copy (open() + read() + write()) is of course more expensive than 
just "reading" a tree (os.walk(), glob()) or deleting it (rmtree()) and the 
"pure file copy" time adds up to the benchmark. And indeed it's not an 
coincidence that #33671 (which replaced read() + write() with sendfile()) 
shaved off a 5% gain from the benchmark I posted initially for Linux.

Still, in a 8k small-files-tree scenario we're seeing ~9% gain on Linux, 20% on 
Windows and 30% on a SMB share on localhost vs. VirtualBox. I do not consider 
this a "hardly noticeable gain" as you imply: it is noticeable, exponential and 
measurable, even with cache being involved (as it is). 

Note that the number of stat() syscalls per file is being reduced from 6 to 1 
(or more if follow_symlinks=False), and that is the real gist here. That *does* 
make a difference on a regular Windows fs and makes a huge difference with 
network filesystems in general, as a simple stat() call implies access to the 
network, not the disk.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue33695>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to