[issue28564] shutil.rmtree is inefficient due to listdir() instead of scandir()

2017-12-29 Thread Niklas Hambüchen

Niklas Hambüchen  added the comment:

I've filed https://bugs.python.org/issue32453, which is about O(n^2) deletion 
behaviour for large directories.

--
nosy: +nh2

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28564] shutil.rmtree is inefficient due to listdir() instead of scandir()

2017-11-04 Thread Serhiy Storchaka

Change by Serhiy Storchaka :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28564] shutil.rmtree is inefficient due to listdir() instead of scandir()

2017-11-04 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:


New changeset d4d79bc1ff91b04625c312f0219c89aabcd19ce4 by Serhiy Storchaka in 
branch 'master':
bpo-28564: Use os.scandir() in shutil.rmtree(). (#4085)
https://github.com/python/cpython/commit/d4d79bc1ff91b04625c312f0219c89aabcd19ce4


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28564] shutil.rmtree is inefficient due to listdir() instead of scandir()

2017-10-25 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

Following Antoine's suggestion the patch now makes shutil.rmtree() using 
os.scandir() on all platforms.

I doubt about one thing. This patch changes os.listdir passed to the onerror 
handler to os.scandir. This can break a user code that checks if the first 
argument in onerror is os.listdir. If keep this change, it should be documented 
in the "Porting to 3.7" section. Alternatively, we can continue passing 
os.listdir if os.scandir() failed despites the fact that os.listdir no longer 
used.

--
nosy: +pitrou

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28564] shutil.rmtree is inefficient due to listdir() instead of scandir()

2017-10-25 Thread Antoine Pitrou

Antoine Pitrou  added the comment:

I think we should change to os.scandir.  No need to accumulate compatibility 
baggage like that.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28564] shutil.rmtree is inefficient due to listdir() instead of scandir()

2017-10-23 Thread Serhiy Storchaka

Change by Serhiy Storchaka :


--
pull_requests: +4055

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28564] shutil.rmtree is inefficient due to listdir() instead of scandir()

2016-11-24 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Benchmarks show about 20% speed up.

--
Added file: http://bugs.python.org/file45619/bench_rmtree.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28564] shutil.rmtree is inefficient due to listdir() instead of scandir()

2016-11-07 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Proposed patch implements shutil.rmtree using os.scandir. Needed file 
descriptors support in os.scandir (issue25996). I did not test how this affects 
the performance of shutil.rmtree.

--
keywords: +patch
stage:  -> patch review
Added file: http://bugs.python.org/file45382/shutil-rmtree-scandir.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28564] shutil.rmtree is inefficient due to listdir() instead of scandir()

2016-10-31 Thread Serhiy Storchaka

Changes by Serhiy Storchaka :


--
dependencies: +Add support of file descriptor in os.scandir()

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28564] shutil.rmtree is inefficient due to listdir() instead of scandir()

2016-10-31 Thread Marian Beermann

Marian Beermann added the comment:

The main issue on *nix is more likely that by using listdir you get directory 
order, while what you really need is inode ordering. scandir allows for that, 
since you get the inode from the DirEntry with no extra syscalls - especially 
without an open() or stat().

Other optimizations are also possible. For example opening the directory and 
using unlinkat() would likely shave off a bit of CPU. But the dominating factor 
here is likely the bad access pattern.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28564] shutil.rmtree is inefficient due to listdir() instead of scandir()

2016-10-31 Thread Josh Rosenberg

Josh Rosenberg added the comment:

You need to cache the names up front because the loop is unlinking entries, and 
readdir isn't consistent when the directory entries are mutated between calls. 
https://github.com/kripken/emscripten/issues/2528

FindFirstFile/FindNextFile likely has similar issues, even if they're not 
consistently seen (due to DeleteFile itself not guaranteeing deletion until the 
last handle to the file is closed).

scandir might save some stat calls, but you'd need to convert it from generator 
to list before the loop begins, which would limit the savings a bit.

--
nosy: +josh.r

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28564] shutil.rmtree is inefficient due to listdir() instead of scandir()

2016-10-31 Thread Xiang Zhang

Changes by Xiang Zhang :


--
nosy: +xiang.zhang

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28564] shutil.rmtree is inefficient due to listdir() instead of scandir()

2016-10-30 Thread Serhiy Storchaka

Changes by Serhiy Storchaka :


--
nosy: +serhiy.storchaka
type:  -> enhancement
versions: +Python 3.7 -Python 3.5

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28564] shutil.rmtree is inefficient due to listdir() instead of scandir()

2016-10-30 Thread Marian Beermann

New submission from Marian Beermann:

The use of os.listdir severely limits the speed of this function on
anything except solid-state drives.

Using the new-in-Python 3.5 os.scandir should eliminate this
bottleneck.

--
components: Library (Lib)
messages: 279745
nosy: enkore
priority: normal
severity: normal
status: open
title: shutil.rmtree is inefficient due to listdir() instead of scandir()
versions: Python 3.5

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com