[issue23605] Use the new os.scandir() function in os.walk()

2015-03-30 Thread Ben Hoyt

Ben Hoyt added the comment:

Yep, I'm good -- go ahead and close!

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23605] Use the new os.scandir() function in os.walk()

2015-03-30 Thread STINNER Victor

STINNER Victor added the comment:

Thanks for your great work Ben! I close the issue. The PEP 471 has already the 
Final status.

--
resolution:  -> fixed
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23605] Use the new os.scandir() function in os.walk()

2015-03-30 Thread STINNER Victor

STINNER Victor added the comment:

Ben: do you think that we are done with this issue? can I close it?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23605] Use the new os.scandir() function in os.walk()

2015-03-30 Thread Ben Hoyt

Ben Hoyt added the comment:

Thanks for the explanation (and the comment fix).

> What's your point about complexity? Would you like to drop os.scandir
> changes in os.walk(), or do you have a simpler version to propose?

No, not at all! I was just noting it and trying to brainstorm any ways to 
simplify it (while keeping the current behavior). But I'm not sure there is a 
good simplification, so that's fine.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23605] Use the new os.scandir() function in os.walk()

2015-03-30 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 1ad3d8d82b18 by Victor Stinner in branch 'default':
Issue #23605: Fix typo in an os.walk() comment
https://hg.python.org/cpython/rev/1ad3d8d82b18

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23605] Use the new os.scandir() function in os.walk()

2015-03-27 Thread STINNER Victor

STINNER Victor added the comment:

Ben Hoyt added the comment:
> 1) The new implementation is more complex. Of course, most of this is
necessary due to the topdown directory issue.

Sure,  correctness matters more than performances.

> However, one thing I'm not sure about is why you create scandir_it
manually and use a while True loop,

The idea is to control lines where OSError is catched to call onerrror(),
without breaking backward compatibility. Especially, onerror() should only
be called once, even for recursive calls.

> 2) In this commit http://bugs.python.org/review/23605/diff/14181/Lib/os.py
-- which is not the final one, I don't quite understand the catch_oserror
thing.
It's just a try to write differently the same thing. It didn't convince
myself that it's more readable, so I picked the first version.

What's your point about complexity? Would you like to drop os.scandir
changes in os.walk(), or do you have a simpler version to propose?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23605] Use the new os.scandir() function in os.walk()

2015-03-26 Thread Ben Hoyt

Ben Hoyt added the comment:

Victor, great work on pushing this out, especially with the "modifying the 
directories" fix. (And thanks Serhiy for pushing on correctness here.)

Couple of comments/questions about your new os.walk() implementation.

1) The new implementation is more complex. Of course, most of this is necessary 
due to the topdown directory issue. However, one thing I'm not sure about is 
why you create scandir_it manually and use a while True loop, adding complexity 
and making it require two versions of the error handling. Why not a simple "for 
entry in scandir(top): ..." with a try/except OSError around the whole loop? I 
could well be missing something here though.

2) In this commit http://bugs.python.org/review/23605/diff/14181/Lib/os.py -- 
which is not the final one, I don't quite understand the catch_oserror thing. 
Presumably this turned out to be unnecessary, as it's not in the final version?

3) Really minor thing: in one of the comments, you misspell "symbolik". Should 
be "symbolic".

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23605] Use the new os.scandir() function in os.walk()

2015-03-18 Thread STINNER Victor

STINNER Victor added the comment:

I commited fast_bottom-up.patch to fix the regression of os.walk().

> Could you please add a test based on my example (i.e. converting symlinks to 
> a directory during walking) and may be other (creating new directory and 
> adding it to the dirs list)?

Sorry, I'm not interested to write new tests.

> Is it worth to mention in the os.walk() doc that topdown=False can be faster?

I didn't touch the doc yet.

I keep the issue open a few weeks to see if someone is interested to optimize 
os.walk() further, write new tests, enhance the doc, etc.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23605] Use the new os.scandir() function in os.walk()

2015-03-18 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 4fb829f8c04d by Victor Stinner in branch 'default':
Issue #23605: Fix os.walk(topdown=True), don't cache entry.is_symlink() because
https://hg.python.org/cpython/rev/4fb829f8c04d

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23605] Use the new os.scandir() function in os.walk()

2015-03-13 Thread STINNER Victor

STINNER Victor added the comment:

>> Are you testing the top-bottom or bottom-up?
> My benchmark.py calls os.walk() with topdown=True, which is the default.

Is it worth to mention in the os.walk() doc that topdown=False can be faster?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23605] Use the new os.scandir() function in os.walk()

2015-03-13 Thread Ben Hoyt

Ben Hoyt added the comment:

> I don't understand your benchmark. Do you mean that os.walk() is slower
> with fast_bottom-up.patch because islink() is called or because I replaced
> "for entry in scandir(top):" with "entry = next(scandir_it)"?

No, sorry, I was making two separate comments: 1) the code's gotten quite a bit 
more complex (and if it needs to be that way for correctness, I'm okay with 
that), and 2) I'm surprised at how fast it still is.

> Are you testing the top-bottom or bottom-up?

My benchmark.py calls os.walk() with topdown=True, which is the default. I was 
testing the Python 3.4 version of os.walk() via listdir against your 
fast_bottom-up.patch.

I'm keen to look into this a bit further, but it won't be today.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23605] Use the new os.scandir() function in os.walk()

2015-03-13 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

You fast_bottom-up looks awesome, but much more correct. This is what I meant 
when warned that correct implementations with scandir() will be complex.

Could you please add a test based on my example (i.e. converting symlinks to a 
directory during walking) and may be other (creating new directory and adding 
it to the dirs list)?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23605] Use the new os.scandir() function in os.walk()

2015-03-13 Thread STINNER Victor

STINNER Victor added the comment:

I don't understand your benchmark. Do you mean that os.walk() is slower with 
fast_bottom-up.patch because islink() is called or because I replaced "for 
entry in scandir(top):" with "entry = next(scandir_it)"?

Are you testing the top-bottom or bottom-up?

Here is a variant of my patch with "for entry in scandir(top):". I would prefer 
to avoid this variant with a boolean to not catch OSError on the recursive call 
to walk() if next() has similar performances.

--
Added file: http://bugs.python.org/file38471/fast_bottom-up-2.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23605] Use the new os.scandir() function in os.walk()

2015-03-12 Thread Ben Hoyt

Ben Hoyt added the comment:

Thanks, Victor.

I haven't quite grokked all the changes here -- it's gotten somewhat more 
complicated with the scandir_it and manual next() call -- but I ran some 
benchmarks (via a hacked version of my scandir project's benchmark.py). The 
results were surprising, and in a good way:

Dev version in hg (no extra islink syscall):

Windows: 13.1x as fast (68.8x as fast in funky caching mode)
Linux: 7.8x as fast

With Victor's fast_bottom_up patch (100% correct behaviour):

Windows: 9.4x as fast (50.2x as fast in funky caching mode)
Linux: 6.5x as fast

So os.walk() will still be 10x as fast on Windows if you apply this patch, and 
6x as fast on my Linux VM. I haven't dug too deeply to know quite why the 
numbers are this good, especially on Linux, but that's what I'm seeing, which 
is great!

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23605] Use the new os.scandir() function in os.walk()

2015-03-12 Thread STINNER Victor

STINNER Victor added the comment:

And now something completly differenet: fast_bottom-up.patch, fast bottom-up, 
but "slow" top-down.

In bottom-up mode (topdown=False), use entry.path and entry.is_symlink() => 
optimization compared to Python 3.4 (avoid one call to os.stat()).

In top-down mode (topdown=True, default), use os.path.join() and 
os.path.islink() => no change compared to Python 3.4 on this part. More 
globally, os.walk() should still be much faster in Python 3.5 than in Python 
3.5 thanks to entry.is_dir() which avoids a call to os.path.isdir() (and so 
os.stat()).

Correctness matters more than performances!

> 2. Accept and document behavior change. This will break someone's code. :-(
> 3. Left os.walk as was (or only partially optimized as in 1), and add a new 
> function with new behavior.
> 4. Add new boolean parameter that control behavior to os.walk().

Sorry but all these options sucks :-p

> 5. Try to detect when dir list or filesystem were changed. Victor's patch 
> does the first. For the second we can check if top directory inode and mtime 
> were changed. But I afraid this wouldn't decrease a number of system calls.

By experience, it's very hard to write portable code. Even if we find a way to 
detect FS change on some platforms, it will not work on some other platforms.

--
Added file: http://bugs.python.org/file38452/fast_bottom-up.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23605] Use the new os.scandir() function in os.walk()

2015-03-12 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

> Happy? :-)

No, it doesn't fix my example. :-(

I see following possibilities:

1. Partially revert the patch and always call path.islink(). I will not kill 
all the benefit of using os.scandir(), because there is a benefit from avoiding 
path.isdir() and in any case in large directories most content is usually 
files, not dirs.

2. Accept and document behavior change. This will break someone's code. :-(

3. Left os.walk as was (or only partially optimized as in 1), and add a new 
function with new behavior.

4. Add new boolean parameter that control behavior to os.walk().

5. Try to detect when dir list or filesystem were changed. Victor's patch does 
the first. For the second we can check if top directory inode and mtime were 
changed. But I afraid this wouldn't decrease a number of system calls.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23605] Use the new os.scandir() function in os.walk()

2015-03-12 Thread STINNER Victor

STINNER Victor added the comment:

walk_added_symlink_to_dir-2.patch: Ok, here is a new patch, now with tests.

Without the fix on os.walk(), WalkTests.test_add_dir_symlink() fails, whereas 
FwalkTests.test_add_dir_symlink() pass.

--
Added file: http://bugs.python.org/file38451/walk_added_symlink_to_dir-2.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23605] Use the new os.scandir() function in os.walk()

2015-03-12 Thread Roundup Robot

Roundup Robot added the comment:

New changeset c06ebb57d4ed by Victor Stinner in branch '3.4':
Issue #23605: Refactor os.walk() tests to also run them on os.fwalk()
https://hg.python.org/cpython/rev/c06ebb57d4ed

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23605] Use the new os.scandir() function in os.walk()

2015-03-12 Thread STINNER Victor

STINNER Victor added the comment:

> 1) Serhiy's point about not needing to build the symlinks set when 
> followlinks is True is a good one, because it'll never get used. Just a "if 
> not followlinks: ..." around that try/except would do it. Though this is a 
> small optimization, as I expect 95% of people use os.walk() with 
> followlinks=False, which is the default.

No, it's not a "small optimization". I/O are expensive, especially disk I/O. 
Try os.walk() on a NFS share with the cache disabled ;-)

> My first note was about efficiency of the implementation. When followlinks is 
> true, you can avoid testing entry.is_link() and creating the symlinks set.

Oh, I misunderstood your comment. The changeset 1a972140ab62 avoids calling 
entry.is_symlink() when it's not needed. Thanks for the report.

--

walk_added_symlink_to_dir.patch: Small modification to os.walk() to keep the 
performances of os.scandir() *and* still supported adding symlinks to dirs (no 
backward compatibility changes).

Happy? :-)

I started to work on an unit test, but I don't understand how WalkTests and 
FwalkTests are written.

The setUp() method contais unit tests?! For example, setUp() has a unit test 
removing a directory from the dirs list to check os.walk() behaviour.

I will try harder to understand how these tests are written and post a patch 
for tests later.

--
Added file: http://bugs.python.org/file38450/walk_added_symlink_to_dir.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23605] Use the new os.scandir() function in os.walk()

2015-03-12 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 1a972140ab62 by Victor Stinner in branch 'default':
Issue #23605: os.walk() doesn't need to call entry.is_symlink() if followlinks
https://hg.python.org/cpython/rev/1a972140ab62

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23605] Use the new os.scandir() function in os.walk()

2015-03-11 Thread Ben Hoyt

Ben Hoyt added the comment:

Note specifically in the unsymlink() example Serhiy gave, you probably don't 
*want* os.walk() to recurse into a 
symlink-to-a-directory-that's-changed-to-a-directory, and you probably haven't 
thought about it doing so, so maybe the new behaviour is fine?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23605] Use the new os.scandir() function in os.walk()

2015-03-11 Thread Ben Hoyt

Ben Hoyt added the comment:

To Victor and Serhiy:

1) Serhiy's point about not needing to build the symlinks set when followlinks 
is True is a good one, because it'll never get used. Just a "if not 
followlinks: ..." around that try/except would do it. Though this is a small 
optimization, as I expect 95% of people use os.walk() with followlinks=False, 
which is the default.

2) Much as I don't want to admit it :-), I think Serhiy has a point regarding 
the change in behaviour. Can we accept this tiny change? This happens because 
the previous implementation of os.walk() calls path.islink(), which does a real 
syscall, after the yield returns.

So if the caller modifies "dirnames" and adds a symlink to a directory it won't 
be in the symlinks set. Or if (as Serhiy's example shows) they change a 
symlink-to-a-directory to a directory the new implementation doesn't do another 
syscall to check, so differs from the old one -- in Serhiy's example, the old 
os.walk() will recurse into the changed 
symlink-to-a-directory-that's-now-a-directory, whereas the new os.walk() won't 
recurse.

Arguably the new behaviour is just as good here, but it is different. And 
Serhiy's function unsymlink() is a reasonable example of how this might play 
out.

The os.walk() docs explicitly allow modifying "dirnames" -- not just removing 
and reordering, but also adding entries, which could surface the difference in 
behaviour: "the caller can modify the dirnames list in-place ... even to inform 
walk() about directories the caller creates or renames before it resumes walk() 
again". See here:

https://docs.python.org/3.4/library/os.html#os.walk

So I do think it's a real issue. Will this really be an issue in practice? I 
don't know; I'm tempted to think not.

Obviously if we have to call stat/islink again, it defeats half of the purpose 
of scandir. But I don't see a way around it if we want to keep the exact old 
os.walk() behaviour. We could fix the behaviour if a directory/symlink was 
added to "dirnames" fairly easily by testing whether it changed, but I don't 
see a way to fix the unsymlinks() example without a syscall.

Thoughts?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23605] Use the new os.scandir() function in os.walk()

2015-03-11 Thread Ben Hoyt

Ben Hoyt added the comment:

@Scott Dial: just a response about this benchmark: note that this benchmark 
isn't really valid, as it says "Using slower ctypes version of scandir", which 
is the slow all-Python version. You want it to be saying "Using Python 3.5's 
builtin os.scandir()".

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23605] Use the new os.scandir() function in os.walk()

2015-03-10 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

My first note was about efficiency of the implementation. When followlinks is 
true, you can avoid testing entry.is_link() and creating the symlinks set.

The new implementation of os.walk() changes a behavior. The problem is that a 
symlink to a directory can become a directory. It is documented that the 
caller can remove or add a directory to directory list and for sure this 
feature is used in third-party code. In my example a directory list is even 
not changed, but file system is changed.

I think that majority of uses of os.walk() will be not affected by this change, 
but for sure there is a code that will be broken and we can receive bug 
reports after releasing 3.6.

New behavior also is not consistent with os.fwalk().

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23605] Use the new os.scandir() function in os.walk()

2015-03-10 Thread STINNER Victor

STINNER Victor added the comment:

Serhiy Storchaka added the comment:
> When followlinks is true, symlinks is not needed.

Hum, it's not easy to understand your code. I guess that the problem
is that a symlink to a directory can become something else (not a
directory or a symlink to a directory).

I noticed this race condition in the new implementation of os.walk(),
but I don't think that the issue is really new. The old implementation
of os.walk() already uses a list of directories. The caller can remove
a directory or replace a directory with something else.

Since the bug is not documented in os.walk(), I chose to not document
it neither. But it would be better to warn users to not do that :-)

@Serhiy: I agree that the new implementation changes the behaviour,
but I don't consider it as as bug. Do you think that it's a bug?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23605] Use the new os.scandir() function in os.walk()

2015-03-10 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

When followlinks is true, symlinks is not needed.

But I this commit breaks a code like following:

def unsymlink(top):
for root, dirs, files in os.walk(top):
for name in dirs:
path = os.path.join(root, name)
if os.path.islink(path):
target = os.path.join(root, os.readlink(path))
os.unlink(path)
shutil.copytree(target, path)
for name in files:
path = os.path.join(root, name)
if os.path.islink(path):
target = os.path.join(root, os.readlink(path))
os.unlink(path)
shutil.copy2(target, path)

--
nosy: +serhiy.storchaka
resolution: fixed -> 
status: closed -> open

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23605] Use the new os.scandir() function in os.walk()

2015-03-10 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 50e32059a404 by Victor Stinner in branch '3.4':
Issue #23605: os.walk() doc now mentions shutil.rmtree() in the last example
https://hg.python.org/cpython/rev/50e32059a404

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23605] Use the new os.scandir() function in os.walk()

2015-03-10 Thread STINNER Victor

STINNER Victor added the comment:

My suggestion to add a new walk_dirs list is wrong: os.walk() documentation 
explicitly says that the dirs list can be modified to delete some directories:
https://docs.python.org/dev/library/os.html#os.walk
"""
When topdown is True, the caller can modify the dirnames list in-place (perhaps 
using del or slice assignment), and walk() will only recurse into the 
subdirectories whose names remain in dirnames; this can be used to prune the 
search, impose a specific order of visiting, or even to inform walk() about 
directories the caller creates or renames before it resumes walk() again.
"""

os_walk_1.patch is inefficient: it also calls entry.is_symlink() for file 
entries.

I reworked your patch to only call is_symlink() for directories.

Thanks for the patch Ben. I think that we are now done with the PEP 471 no? 
Maybe some doc changes (I'm now reviewing your doc change in issue #22524).

--
resolution:  -> fixed
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23605] Use the new os.scandir() function in os.walk()

2015-03-10 Thread Roundup Robot

Roundup Robot added the comment:

New changeset f805fdacdfe0 by Victor Stinner in branch 'default':
Issue #23605: os.walk() now calls os.scandir() instead of os.listdir().
https://hg.python.org/cpython/rev/f805fdacdfe0

--
nosy: +python-dev

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23605] Use the new os.scandir() function in os.walk()

2015-03-09 Thread Scott Dial

Scott Dial added the comment:

I cloned https://github.com/benhoyt/scandir @ 494f34d784 and ran benchmark.py 
on a couple systems that are Linux backed by a couple different NFS servers of 
various quality.

First, a Linux VM backed by a Mac OS X NFS server backed by a SSD:

$ python benchmark.py
Using slower ctypes version of scandir
Comparing against builtin version of os.walk()
Priming the system's cache...
Benchmarking walks on benchtree, repeat 1/3...
Benchmarking walks on benchtree, repeat 2/3...
Benchmarking walks on benchtree, repeat 3/3...
os.walk took 0.088s, scandir.walk took 0.084s -- 1.1x as fast
$ python benchmark.py -s
Using slower ctypes version of scandir
Comparing against builtin version of os.walk()
Priming the system's cache...
Benchmarking walks on benchtree, repeat 1/3...
Benchmarking walks on benchtree, repeat 2/3...
Benchmarking walks on benchtree, repeat 3/3...
os.walk size 23400, scandir.walk size 23400 -- equal
os.walk took 0.142s, scandir.walk took 0.145s -- 1.0x as fast

Second, a Linux VM backed by a Linux NFS server backed by a NAS with big, slow 
drives:

$ python benchmark.py
Using slower ctypes version of scandir
Comparing against builtin version of os.walk()
Priming the system's cache...
Benchmarking walks on benchtree, repeat 1/3...
Benchmarking walks on benchtree, repeat 2/3...
Benchmarking walks on benchtree, repeat 3/3...
os.walk took 0.071s, scandir.walk took 0.063s -- 1.1x as fast
$ python benchmark.py -s
Using slower ctypes version of scandir
Comparing against builtin version of os.walk()
Priming the system's cache...
Benchmarking walks on benchtree, repeat 1/3...
Benchmarking walks on benchtree, repeat 2/3...
Benchmarking walks on benchtree, repeat 3/3...
os.walk size 23400, scandir.walk size 23400 -- equal
os.walk took 0.118s, scandir.walk took 0.141s -- 0.8x as fast

Finally, a linux VM backed by a Linux NFS server backed by a NAS with small, 
fast SAS drives:

$ python benchmark.py
Using slower ctypes version of scandir
Comparing against builtin version of os.walk()
Priming the system's cache...
Benchmarking walks on benchtree, repeat 1/3...
Benchmarking walks on benchtree, repeat 2/3...
Benchmarking walks on benchtree, repeat 3/3...
os.walk took 0.159s, scandir.walk took 0.119s -- 1.3x as fast
$ python benchmark.py -s
Using slower ctypes version of scandir
Comparing against builtin version of os.walk()
Priming the system's cache...
Benchmarking walks on benchtree, repeat 1/3...
Benchmarking walks on benchtree, repeat 2/3...
Benchmarking walks on benchtree, repeat 3/3...
os.walk size 23400, scandir.walk size 23400 -- equal
os.walk took 0.229s, scandir.walk took 0.232s -- 1.0x as fast

A major factor that is not addressed above is that the performance is 
dramatically different if the metadata cache for the NFS mount is disabled, 
which is not the default. In the above data, the first system is normally 
configured in such a manner in order to ensure that the filesystem is coherent. 
The results of that test is much more dramatic:

$ python benchmark.py
Using slower ctypes version of scandir
Comparing against builtin version of os.walk()
Priming the system's cache...
Benchmarking walks on benchtree, repeat 1/3...
Benchmarking walks on benchtree, repeat 2/3...
Benchmarking walks on benchtree, repeat 3/3...
os.walk took 4.835s, scandir.walk took 0.097s -- 49.9x as fast
$ python benchmark.py -s
Using slower ctypes version of scandir
Comparing against builtin version of os.walk()
Priming the system's cache...
Benchmarking walks on benchtree, repeat 1/3...
Benchmarking walks on benchtree, repeat 2/3...
Benchmarking walks on benchtree, repeat 3/3...
os.walk size 23400, scandir.walk size 23400 -- equal
os.walk took 9.945s, scandir.walk took 5.373s -- 1.9x as fast

--
nosy: +scott.dial

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23605] Use the new os.scandir() function in os.walk()

2015-03-09 Thread STINNER Victor

STINNER Victor added the comment:

Note: glob.glob() might be faster with os.scandir() on very large directories. 
But on my benchmarks, listdir() was always faster than scandir() when only the 
name of directory entries i used. Maybe we need an option glob.glob(pattern, 
scandir=True) :-p

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23605] Use the new os.scandir() function in os.walk()

2015-03-09 Thread STINNER Victor

STINNER Victor added the comment:

I reviewed your patch.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23605] Use the new os.scandir() function in os.walk()

2015-03-07 Thread Ben Hoyt

Ben Hoyt added the comment:

Attaching a first cut at this -- basically the implementation I use for walk() 
in scandir.py on GitHub.

One thing that's really weird to me: are the os.walk() unit tests actually 
being run? In test_os.py, I notice everything's in WalkTest.setUp, which is 
kinda weird -- and it's not actually running. At least when I add a "1/0" 
inside setUp() to force an error and run "python -m test test_os" I don't get 
any errors...

--
keywords: +patch
nosy: +benhoyt
Added file: http://bugs.python.org/file38385/os_walk_1.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23605] Use the new os.scandir() function in os.walk()

2015-03-07 Thread STINNER Victor

New submission from STINNER Victor:

The PEP 471 announces a huge speed up of the os.walk() function, but os.walk() 
was not modified yet. I just merged the implementation of os.scandir() (issue 
#22524), it's now time to work on os.walk().

We need a patch and benchmarks on Linux and Windows.

On NFS shares (on Linux), d_type is not filled (set to DT_UNKNOWN, so a syscall 
is required to get the file type). Benchmarks of os.walk() on a NFS share is 
also required.

--
messages: 237504
nosy: haypo
priority: normal
severity: normal
status: open
title: Use the new os.scandir() function in os.walk()
type: performance
versions: Python 3.5

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com