Ben Hoyt added the comment:
Hi Victor. I'm attaching a patch to the What's New doc. I know it's a minor
thing, but I really read the What's New in Python x.y page whenever it comes
out, so making this as specific as possible is good. The changes are:
1. Be more specific and flesh it out
Roundup Robot added the comment:
New changeset 8d76dabd40f6 by Victor Stinner in branch 'default':
Issue #22524: Rephrase scandir addition in What's New in Python 3.5
https://hg.python.org/cpython/rev/8d76dabd40f6
--
___
Python tracker
Ben Hoyt added the comment:
Thanks. Will do!
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22524
___
___
Python-bugs-list mailing list
STINNER Victor added the comment:
2015-03-10 12:36 GMT+01:00 Ben Hoyt rep...@bugs.python.org:
I intend to do some review of the scandir/DirEntry docs as well. I'll send
those in the next few days.
Open maybe a new issue if you want to enhance the doc.
--
STINNER Victor added the comment:
It looks like test_os now pass on all buildbots, including OpenIndiana. I close
the issue.
--
resolution: - fixed
status: open - closed
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22524
STINNER Victor added the comment:
FYI os.scandir() is part of Python 3.5 alpha 2 which is now available
(including installers for Windows):
https://www.python.org/downloads/release/python-350a2/
--
___
Python tracker rep...@bugs.python.org
Ben Hoyt added the comment:
Oops, I'm sorry re previous comment -- looks like I forgot to attach
scandir-8.patch. Now attached. Please re-read my previous comment and review.
:-)
--
Added file: http://bugs.python.org/file38374/scandir-8.patch
___
Roundup Robot added the comment:
New changeset d04d5b9c29f6 by Victor Stinner in branch 'default':
Issue #22524: New os.scandir() function, part of the PEP 471: os.scandir()
https://hg.python.org/cpython/rev/d04d5b9c29f6
--
nosy: +python-dev
___
STINNER Victor added the comment:
KNOWN ISSUE: There's a reference leak in the POSIX version (found with
python -m test -R 3:2 test_os).
Don't worry, it was a simple refleak, I fixed it:
diff -r 392d3214fc23 Modules/posixmodule.c
--- a/Modules/posixmodule.c Sun Mar 08 01:58:04 2015
STINNER Victor added the comment:
I opened the issue #23605: Use the new os.scandir() function in os.walk().
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22524
___
STINNER Victor added the comment:
Oh oh, OpenIndiana doesn't support d_type: the dirent structure has no d_type
field and DT_xxx constants like DT_UNKNOWN are not defined.
gcc -Wsign-compare -g -O0 -Wall -Wstrict-prototypes
-I/usr/local/include/ncursesw -m64
Roundup Robot added the comment:
New changeset 60e5c34ec53a by Victor Stinner in branch 'default':
Issue #22524: Fix os.scandir() for platforms which don't have a d_type field in
https://hg.python.org/cpython/rev/60e5c34ec53a
--
___
Python tracker
STINNER Victor added the comment:
I commited scandir-8.patch with the documentation of scandir-5.patch. I wanted
os.scandir() to be part of Python 3.5 alpha 2.
Thanks for your patience Ben ;-)
We should now watch buildbots to check how they appreciate os.scandir().
Ben Hoyt added the comment:
Updated version of the patch after Victor's code review of scandir-7 and a
couple of his comments offline. Full commit log is available on my GitHub
project in posixmodule_scandir_main.c if anyone's interested. Again, I haven't
looked closely at the docs or tests
Ben Hoyt added the comment:
Okay, here's the next version of the patch. I've updated scandir-2.patch with a
number of modifications and several fixes to the C code.
This includes Victor's scandir-6.patch test suite (with minor mods) and my
original documentation. Note that I haven't looked at
Ben Hoyt added the comment:
BTW, I replied to Victor privately, but for the thread: no worries, and apology
accepted! :-) I'm working on updating my C patch with his feedback, and will
update this issue hopefully in the next few days.
--
___
Python
STINNER Victor added the comment:
Note: bench_scandir2.py is a micro-benchmark. Ben's benchmark using walk() is
more realistic, but I'm interested by micro-benchmark results.
scandir-2.patch is faster than scandir-6.patch, much fast on Windows.
Result of bench (cached): scandir-6.patch =
STINNER Victor added the comment:
Hi Ben,
2015-02-13 4:51 GMT+01:00 Ben Hoyt rep...@bugs.python.org:
Hi Victor, I thank you for your efforts here, especially your addition of
DirEntry.inode() and your work on the tests.
The addition of inode() should still be discussed on python-dev. The
Changes by Alexei Romanov drednout...@gmail.com:
--
nosy: +alexei.romanov
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22524
___
___
STINNER Victor added the comment:
Result of bench (cached): scandir-6.patch = scandir-2.patch
(...)
laptop using SSD and ext4: 1.3x faster = 2.8x faster
desktop PC using HDD and ext4: 1.4x faster = 1.4x faster
Oops, I copied the wrong numbers. scandir-2.patch is faster than that!
* laptop
Akira Li added the comment:
As I've mentioned in http://bugs.python.org/issue22524#msg231703
os.walk size 7925376343, scandir.walk size 5534939617 -- NOT EQUAL!
os.walk and scandir.walk do a different work here.
I don't see that it is acknowledged so I assume the benchmark is not fixed.
If
Ben Hoyt added the comment:
Akari: yes, I'm aware of this, and it's definitely a concern to me -- it may be
that scandir has a bug with counting the size of certain non-file directory
entries, or my implementation of os.walk() via scandir is not quite correct.
I'll look at it before too long!
STINNER Victor added the comment:
bench_scandir.py: dummy benchmark to compare listdir+stat vs scandir+is_dir.
os.scandir() is always slower than os.listdir() on tmpfs and ext4 partitions of
a local hard driver.
I will try with NFS.
Results with scandir-5.patch on Fedora 21 (Linux).
---
STINNER Victor added the comment:
scandir-3.patch: New implementation based on scandir-2.patch on Ben's github
repository.
Main changes with scandir-2.patch:
* new DirEntry.inode() method
* os.scandir() doesn't support bytes on Windows anymore: it's deprecated since
python 3.3 and using
STINNER Victor added the comment:
Similar benchmark result on my laptop which has a SSD (ext4 filesystem tool,
but I guess that the directory is small and fits into the memory).
Note: I'm not sure that the between ...x and ...x faster are revelant, I'm
not sure that my computation is correct.
STINNER Victor added the comment:
Benchmark on NFS. Client: my laptop, connected to the LAN by wifi. Server:
desktop, connected to the LAN by PLC. For an unknown reason, the creation of
files, symlinks and directories is very slow (more than 30 seconds while I
reduced the number of files
STINNER Victor added the comment:
I enhanced bench_scandir2.py to have one command to create a directory or a
different command to run the benchmark.
All commands:
- create: create the directory for tests (you don't need this command, you can
also use an existing directory)
- bench: compare
Changes by STINNER Victor victor.stin...@gmail.com:
Added file: http://bugs.python.org/file38120/bench_scandir2.py
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22524
___
Ben Hoyt added the comment:
Hi Victor, I thank you for your efforts here, especially your addition of
DirEntry.inode() and your work on the tests.
However, I'm a bit frustrated that you just re-implemented the whole thing
without discussion: I've been behind scandir and written the first
Ben Hoyt added the comment:
To continue the actual which implementation discussion: as I mentioned last
week in http://bugs.python.org/msg235458, I think the benchmarks above show
pretty clearly we should use the all-C version.
For background: PEP 471 doesn't add any new functionality, and
STINNER Victor added the comment:
Updated patch. Thanks for the review Serhiy.
--
Added file: http://bugs.python.org/file38110/scandir-4.patch
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22524
Changes by STINNER Victor victor.stin...@gmail.com:
Removed file: http://bugs.python.org/file38110/scandir-4.patch
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22524
___
STINNER Victor added the comment:
(I regenerated scandir-4.patch, I had a local private changeset. I removed it.)
--
Added file: http://bugs.python.org/file38111/scandir-4.patch
___
Python tracker rep...@bugs.python.org
STINNER Victor added the comment:
Patch version 5:
- Use None value for the d_type instead of DT_UNKNOW to *prepare* support for
platforms where the dirent structure has no d_type field. Serhiy guess that
such platform have no DT_xxx constant neither. DirEntry doesn't DT_xxx
constants if
Ben Hoyt added the comment:
Victor, I'd love to push forward with this (I doubt it's going to make alpha 1
on February 8, but hopefully alpha 2 on March 8).
It seems pretty settled that we need to use the all-C version I've created,
especially for Linux.
Can you please review
Akira Li added the comment:
STINNER Victor added the comment:
scandir is slower on my machine:
Please share more information about your config: OS, disk type (hard
drive, SSD, something else), filesystem, etc.
Ubuntu 14.04, SSD, ext4 filesystem. Results for different python
versions are
Akira Li added the comment:
To see what happens at syscall level, I've run various implementations
of get_tree_size() functions (see get_tree_size_listdir.diff) with
strace:
get_tree_size_listdir_fd -- os.listdir(fd) + os.lstat
get_tree_size -- os.scandir(path) + entry.lstat
Changes by Akira Li 4kir4...@gmail.com:
Removed file: http://bugs.python.org/file37284/get_tree_size_listdir.diff
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22524
___
Miki Tebeka added the comment:
Can we also update iglob [1] as well?
[1] https://docs.python.org/2/library/glob.html#glob.iglob
--
nosy: +tebeka
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22524
Akira Li added the comment:
scandir is slower on my machine:
$ git clone https://github.com/benhoyt/scandir
$ cd scandir/
$ ../cpython/python benchmark.py /usr/
Using slower ctypes version of scandir
Comparing against builtin version of os.walk()
Priming the system's cache...
STINNER Victor added the comment:
scandir is slower on my machine:
Please share more information about your config: OS, disk type (hard
drive, SSD, something else), filesystem, etc.
--
___
Python tracker rep...@bugs.python.org
Ben Hoyt added the comment:
Akira, note the Using slower ctypes version of scandir -- this is the older,
ctypes implementation of scandir. On Linux, depending on file system etc, that
can often be slower. You need to at least be using the fast C version in
_scandir.c, which is then half C,
Ben Hoyt added the comment:
Attaching updated patch (scandir-2.patch) per Victor's code review here:
http://bugs.python.org/review/22524/diff/13005/Modules/posixmodule.c
Note that I haven't included test_scandir.py this time, as I haven't made any
changes there, and it's not really ready for
STINNER Victor added the comment:
I cloned https://github.com/benhoyt/scandir. I understand that the --scandir
command line option of benchmark.py are these choices:
- generic = call listdir() and then use yield GenericDirEntry which caches
os.stat() and os.lstat() results
- python = ctypes
STINNER Victor added the comment:
I also ran the benchmark without cache on disk. It looks like my hard drive is
too slow to see a real speedup of scandir(). The maximum speedup is 2.7 seconds
lesser when testing the c scandir (24.089 = 21.374 seconds): 1.1x as fast.
I modified the benchmark
STINNER Victor added the comment:
My previous benchmark runs were with the system Python 3.3.
New benchmark with a Python 3.5 patched with scandir-1.patch (compiled in
release mode: ./configure make).
os (os.scandir) is 2 times faster than c (_scandir.scandir_helper): c=0.533
sec, os=0.268
STINNER Victor added the comment:
On Windows, I guess that benchmark.py --size is faster with scandir() than
with os.walk(), because os.stat() is never called.
benchmark.py has a bug in do_os_walk() when the --size option is used: attached
do_os_walk_getsize.patch is needed.
Sizes returned
Ben Hoyt added the comment:
Thanks, Victor and Antone. I'm somewhat surprised at the 2-3x numbers you're
seeing, as I was consistently getting 4-5x in the Linux tests I did. But it
does depend quite a bit on what file system you're running, what hardware,
whether you're running in a VM, etc.
Antoine Pitrou added the comment:
Le 09/10/2014 14:35, Ben Hoyt a écrit :
Anyway, where to from here? Are we agreed given the numbers that --
especially on Linux -- it makes good performance sense to use an all-C
approach?
I think so, yes.
--
___
STINNER Victor added the comment:
I'm somewhat surprised at the 2-3x numbers you're seeing, as I was
consistently getting 4-5x in the Linux tests I did. But it does depend quite
a bit on what file system you're running, what hardware, whether you're
running in a VM, etc. Still, 2-3x
Ben Hoyt added the comment:
I dunno, I'd be happy if you implement this, but it does mean *more* C code,
not less. :-) I feel this would be a nice-to-have, but we should get the thing
working first, and then do the multiple-OS-calls-in-one optimization.
In any case, if you implement this, I
Changes by Josh Rosenberg shadowranger+pyt...@gmail.com:
--
nosy: +josh.rosenberg
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22524
___
___
Antoine Pitrou added the comment:
I agree that having a high-level wrapper around a low-level C primitive would
be cool, but someone has to experiment on that to find out how much performance
it would cost.
You may want to have the C primitive return results in batches (of e.g. 10 or
100
STINNER Victor added the comment:
You may want to have the C primitive return results in batches (of e.g. 10 or
100 entries) to limit the overhead, btw.
Ah yes, good idea. I read that internally readdir() also fetchs many
entries in a single syscall (prefetch / readahead, I don't know how
to
Ben Hoyt added the comment:
Thanks for the initial response and code review, Victor. I'll take a look and
respond further in the next few days.
In the meantime, however, I'm definitely open to splitting scandir out into its
own C file. This will mean a little refactoring (making some
Ben Hoyt added the comment:
Here are the actual numbers (instead of just from memory) running on my Windows
laptop, which happens to have an SSD drive, so os.walk() with scandir is
particularly good here.
python 3.4: benchmark.py -c python (all implemented in Python using ctypes):
os.walk
Antoine Pitrou added the comment:
Ben, how can I run the benchmark on my own machine?
Thanks!
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22524
___
Antoine Pitrou added the comment:
Ok, I've found it. The numbers don't look that good here (Linux, SSD):
$ ~/cpython/opt/python benchmark.py -c python /usr/share/
[...]
os.walk took 1.266s, scandir.walk took 1.852s -- 0.7x as fast
$ ~/cpython/opt/python benchmark.py -c c /usr/share/
[...]
STINNER Victor added the comment:
Hi Ben,
I started to review your patch, sorry for the delay :-( It's convinient to have
the patch on Rietveld to comment it inline.
I didn't expect so much C code. The posixmodule.c is really huge. Well, it's
just the longest C file of Python 3.5. Your patch
Ben Hoyt added the comment:
Attaching my first patch here. It includes docs, the implementation of scandir
and DirEntry in posixmodule.c, and tests in test_scandir.py. It does not yet
include the changes for os.walk() to use scandir; I'm hoping to do that in a
separate patch for easier code
Changes by Akira Li 4kir4...@gmail.com:
--
nosy: +akira
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22524
___
___
Python-bugs-list mailing list
New submission from Ben Hoyt:
Opening this to track the implementation of PEP 471: os.scandir() [1]. This
supercedes Issue #11406 (and possibly others)?
The implementation is most of the way there, but not yet done as a CPythono 3.5
patch. Before I have a proper patch, it's available on
Changes by Giampaolo Rodola' g.rod...@gmail.com:
--
nosy: +giampaolo.rodola
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22524
___
___
Changes by Evgeny Kapun abacabadabac...@gmail.com:
--
nosy: +abacabadabacaba
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22524
___
___
64 matches
Mail list logo