[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2015-03-10 Thread Ben Hoyt
Ben Hoyt added the comment: Hi Victor. I'm attaching a patch to the What's New doc. I know it's a minor thing, but I really read the What's New in Python x.y page whenever it comes out, so making this as specific as possible is good. The changes are: 1. Be more specific and flesh it out

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2015-03-10 Thread Roundup Robot
Roundup Robot added the comment: New changeset 8d76dabd40f6 by Victor Stinner in branch 'default': Issue #22524: Rephrase scandir addition in What's New in Python 3.5 https://hg.python.org/cpython/rev/8d76dabd40f6 -- ___ Python tracker

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2015-03-10 Thread Ben Hoyt
Ben Hoyt added the comment: Thanks. Will do! -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22524 ___ ___ Python-bugs-list mailing list

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2015-03-10 Thread STINNER Victor
STINNER Victor added the comment: 2015-03-10 12:36 GMT+01:00 Ben Hoyt rep...@bugs.python.org: I intend to do some review of the scandir/DirEntry docs as well. I'll send those in the next few days. Open maybe a new issue if you want to enhance the doc. --

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2015-03-09 Thread STINNER Victor
STINNER Victor added the comment: It looks like test_os now pass on all buildbots, including OpenIndiana. I close the issue. -- resolution: - fixed status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22524

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2015-03-09 Thread STINNER Victor
STINNER Victor added the comment: FYI os.scandir() is part of Python 3.5 alpha 2 which is now available (including installers for Windows): https://www.python.org/downloads/release/python-350a2/ -- ___ Python tracker rep...@bugs.python.org

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2015-03-07 Thread Ben Hoyt
Ben Hoyt added the comment: Oops, I'm sorry re previous comment -- looks like I forgot to attach scandir-8.patch. Now attached. Please re-read my previous comment and review. :-) -- Added file: http://bugs.python.org/file38374/scandir-8.patch ___

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2015-03-07 Thread Roundup Robot
Roundup Robot added the comment: New changeset d04d5b9c29f6 by Victor Stinner in branch 'default': Issue #22524: New os.scandir() function, part of the PEP 471: os.scandir() https://hg.python.org/cpython/rev/d04d5b9c29f6 -- nosy: +python-dev ___

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2015-03-07 Thread STINNER Victor
STINNER Victor added the comment: KNOWN ISSUE: There's a reference leak in the POSIX version (found with python -m test -R 3:2 test_os). Don't worry, it was a simple refleak, I fixed it: diff -r 392d3214fc23 Modules/posixmodule.c --- a/Modules/posixmodule.c Sun Mar 08 01:58:04 2015

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2015-03-07 Thread STINNER Victor
STINNER Victor added the comment: I opened the issue #23605: Use the new os.scandir() function in os.walk(). -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22524 ___

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2015-03-07 Thread STINNER Victor
STINNER Victor added the comment: Oh oh, OpenIndiana doesn't support d_type: the dirent structure has no d_type field and DT_xxx constants like DT_UNKNOWN are not defined. gcc -Wsign-compare -g -O0 -Wall -Wstrict-prototypes -I/usr/local/include/ncursesw -m64

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2015-03-07 Thread Roundup Robot
Roundup Robot added the comment: New changeset 60e5c34ec53a by Victor Stinner in branch 'default': Issue #22524: Fix os.scandir() for platforms which don't have a d_type field in https://hg.python.org/cpython/rev/60e5c34ec53a -- ___ Python tracker

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2015-03-07 Thread STINNER Victor
STINNER Victor added the comment: I commited scandir-8.patch with the documentation of scandir-5.patch. I wanted os.scandir() to be part of Python 3.5 alpha 2. Thanks for your patience Ben ;-) We should now watch buildbots to check how they appreciate os.scandir().

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2015-03-05 Thread Ben Hoyt
Ben Hoyt added the comment: Updated version of the patch after Victor's code review of scandir-7 and a couple of his comments offline. Full commit log is available on my GitHub project in posixmodule_scandir_main.c if anyone's interested. Again, I haven't looked closely at the docs or tests

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2015-02-27 Thread Ben Hoyt
Ben Hoyt added the comment: Okay, here's the next version of the patch. I've updated scandir-2.patch with a number of modifications and several fixes to the C code. This includes Victor's scandir-6.patch test suite (with minor mods) and my original documentation. Note that I haven't looked at

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2015-02-18 Thread Ben Hoyt
Ben Hoyt added the comment: BTW, I replied to Victor privately, but for the thread: no worries, and apology accepted! :-) I'm working on updating my C patch with his feedback, and will update this issue hopefully in the next few days. -- ___ Python

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2015-02-13 Thread STINNER Victor
STINNER Victor added the comment: Note: bench_scandir2.py is a micro-benchmark. Ben's benchmark using walk() is more realistic, but I'm interested by micro-benchmark results. scandir-2.patch is faster than scandir-6.patch, much fast on Windows. Result of bench (cached): scandir-6.patch =

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2015-02-13 Thread STINNER Victor
STINNER Victor added the comment: Hi Ben, 2015-02-13 4:51 GMT+01:00 Ben Hoyt rep...@bugs.python.org: Hi Victor, I thank you for your efforts here, especially your addition of DirEntry.inode() and your work on the tests. The addition of inode() should still be discussed on python-dev. The

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2015-02-13 Thread Alexei Romanov
Changes by Alexei Romanov drednout...@gmail.com: -- nosy: +alexei.romanov ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22524 ___ ___

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2015-02-13 Thread STINNER Victor
STINNER Victor added the comment: Result of bench (cached): scandir-6.patch = scandir-2.patch (...) laptop using SSD and ext4: 1.3x faster = 2.8x faster desktop PC using HDD and ext4: 1.4x faster = 1.4x faster Oops, I copied the wrong numbers. scandir-2.patch is faster than that! * laptop

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2015-02-13 Thread Akira Li
Akira Li added the comment: As I've mentioned in http://bugs.python.org/issue22524#msg231703 os.walk size 7925376343, scandir.walk size 5534939617 -- NOT EQUAL! os.walk and scandir.walk do a different work here. I don't see that it is acknowledged so I assume the benchmark is not fixed. If

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2015-02-13 Thread Ben Hoyt
Ben Hoyt added the comment: Akari: yes, I'm aware of this, and it's definitely a concern to me -- it may be that scandir has a bug with counting the size of certain non-file directory entries, or my implementation of os.walk() via scandir is not quite correct. I'll look at it before too long!

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2015-02-12 Thread STINNER Victor
STINNER Victor added the comment: bench_scandir.py: dummy benchmark to compare listdir+stat vs scandir+is_dir. os.scandir() is always slower than os.listdir() on tmpfs and ext4 partitions of a local hard driver. I will try with NFS. Results with scandir-5.patch on Fedora 21 (Linux). ---

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2015-02-12 Thread STINNER Victor
STINNER Victor added the comment: scandir-3.patch: New implementation based on scandir-2.patch on Ben's github repository. Main changes with scandir-2.patch: * new DirEntry.inode() method * os.scandir() doesn't support bytes on Windows anymore: it's deprecated since python 3.3 and using

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2015-02-12 Thread STINNER Victor
STINNER Victor added the comment: Similar benchmark result on my laptop which has a SSD (ext4 filesystem tool, but I guess that the directory is small and fits into the memory). Note: I'm not sure that the between ...x and ...x faster are revelant, I'm not sure that my computation is correct.

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2015-02-12 Thread STINNER Victor
STINNER Victor added the comment: Benchmark on NFS. Client: my laptop, connected to the LAN by wifi. Server: desktop, connected to the LAN by PLC. For an unknown reason, the creation of files, symlinks and directories is very slow (more than 30 seconds while I reduced the number of files

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2015-02-12 Thread STINNER Victor
STINNER Victor added the comment: I enhanced bench_scandir2.py to have one command to create a directory or a different command to run the benchmark. All commands: - create: create the directory for tests (you don't need this command, you can also use an existing directory) - bench: compare

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2015-02-12 Thread STINNER Victor
Changes by STINNER Victor victor.stin...@gmail.com: Added file: http://bugs.python.org/file38120/bench_scandir2.py ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22524 ___

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2015-02-12 Thread Ben Hoyt
Ben Hoyt added the comment: Hi Victor, I thank you for your efforts here, especially your addition of DirEntry.inode() and your work on the tests. However, I'm a bit frustrated that you just re-implemented the whole thing without discussion: I've been behind scandir and written the first

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2015-02-12 Thread Ben Hoyt
Ben Hoyt added the comment: To continue the actual which implementation discussion: as I mentioned last week in http://bugs.python.org/msg235458, I think the benchmarks above show pretty clearly we should use the all-C version. For background: PEP 471 doesn't add any new functionality, and

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2015-02-12 Thread STINNER Victor
STINNER Victor added the comment: Updated patch. Thanks for the review Serhiy. -- Added file: http://bugs.python.org/file38110/scandir-4.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22524

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2015-02-12 Thread STINNER Victor
Changes by STINNER Victor victor.stin...@gmail.com: Removed file: http://bugs.python.org/file38110/scandir-4.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22524 ___

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2015-02-12 Thread STINNER Victor
STINNER Victor added the comment: (I regenerated scandir-4.patch, I had a local private changeset. I removed it.) -- Added file: http://bugs.python.org/file38111/scandir-4.patch ___ Python tracker rep...@bugs.python.org

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2015-02-12 Thread STINNER Victor
STINNER Victor added the comment: Patch version 5: - Use None value for the d_type instead of DT_UNKNOW to *prepare* support for platforms where the dirent structure has no d_type field. Serhiy guess that such platform have no DT_xxx constant neither. DirEntry doesn't DT_xxx constants if

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2015-02-05 Thread Ben Hoyt
Ben Hoyt added the comment: Victor, I'd love to push forward with this (I doubt it's going to make alpha 1 on February 8, but hopefully alpha 2 on March 8). It seems pretty settled that we need to use the all-C version I've created, especially for Linux. Can you please review

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2014-11-26 Thread Akira Li
Akira Li added the comment: STINNER Victor added the comment: scandir is slower on my machine: Please share more information about your config: OS, disk type (hard drive, SSD, something else), filesystem, etc. Ubuntu 14.04, SSD, ext4 filesystem. Results for different python versions are

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2014-11-26 Thread Akira Li
Akira Li added the comment: To see what happens at syscall level, I've run various implementations of get_tree_size() functions (see get_tree_size_listdir.diff) with strace: get_tree_size_listdir_fd -- os.listdir(fd) + os.lstat get_tree_size -- os.scandir(path) + entry.lstat

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2014-11-26 Thread Akira Li
Changes by Akira Li 4kir4...@gmail.com: Removed file: http://bugs.python.org/file37284/get_tree_size_listdir.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22524 ___

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2014-11-25 Thread Miki Tebeka
Miki Tebeka added the comment: Can we also update iglob [1] as well? [1] https://docs.python.org/2/library/glob.html#glob.iglob -- nosy: +tebeka ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22524

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2014-11-25 Thread Akira Li
Akira Li added the comment: scandir is slower on my machine: $ git clone https://github.com/benhoyt/scandir $ cd scandir/ $ ../cpython/python benchmark.py /usr/ Using slower ctypes version of scandir Comparing against builtin version of os.walk() Priming the system's cache...

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2014-11-25 Thread STINNER Victor
STINNER Victor added the comment: scandir is slower on my machine: Please share more information about your config: OS, disk type (hard drive, SSD, something else), filesystem, etc. -- ___ Python tracker rep...@bugs.python.org

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2014-11-25 Thread Ben Hoyt
Ben Hoyt added the comment: Akira, note the Using slower ctypes version of scandir -- this is the older, ctypes implementation of scandir. On Linux, depending on file system etc, that can often be slower. You need to at least be using the fast C version in _scandir.c, which is then half C,

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2014-10-18 Thread Ben Hoyt
Ben Hoyt added the comment: Attaching updated patch (scandir-2.patch) per Victor's code review here: http://bugs.python.org/review/22524/diff/13005/Modules/posixmodule.c Note that I haven't included test_scandir.py this time, as I haven't made any changes there, and it's not really ready for

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2014-10-09 Thread STINNER Victor
STINNER Victor added the comment: I cloned https://github.com/benhoyt/scandir. I understand that the --scandir command line option of benchmark.py are these choices: - generic = call listdir() and then use yield GenericDirEntry which caches os.stat() and os.lstat() results - python = ctypes

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2014-10-09 Thread STINNER Victor
STINNER Victor added the comment: I also ran the benchmark without cache on disk. It looks like my hard drive is too slow to see a real speedup of scandir(). The maximum speedup is 2.7 seconds lesser when testing the c scandir (24.089 = 21.374 seconds): 1.1x as fast. I modified the benchmark

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2014-10-09 Thread STINNER Victor
STINNER Victor added the comment: My previous benchmark runs were with the system Python 3.3. New benchmark with a Python 3.5 patched with scandir-1.patch (compiled in release mode: ./configure make). os (os.scandir) is 2 times faster than c (_scandir.scandir_helper): c=0.533 sec, os=0.268

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2014-10-09 Thread STINNER Victor
STINNER Victor added the comment: On Windows, I guess that benchmark.py --size is faster with scandir() than with os.walk(), because os.stat() is never called. benchmark.py has a bug in do_os_walk() when the --size option is used: attached do_os_walk_getsize.patch is needed. Sizes returned

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2014-10-09 Thread Ben Hoyt
Ben Hoyt added the comment: Thanks, Victor and Antone. I'm somewhat surprised at the 2-3x numbers you're seeing, as I was consistently getting 4-5x in the Linux tests I did. But it does depend quite a bit on what file system you're running, what hardware, whether you're running in a VM, etc.

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2014-10-09 Thread Antoine Pitrou
Antoine Pitrou added the comment: Le 09/10/2014 14:35, Ben Hoyt a écrit : Anyway, where to from here? Are we agreed given the numbers that -- especially on Linux -- it makes good performance sense to use an all-C approach? I think so, yes. -- ___

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2014-10-09 Thread STINNER Victor
STINNER Victor added the comment: I'm somewhat surprised at the 2-3x numbers you're seeing, as I was consistently getting 4-5x in the Linux tests I did. But it does depend quite a bit on what file system you're running, what hardware, whether you're running in a VM, etc. Still, 2-3x

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2014-10-09 Thread Ben Hoyt
Ben Hoyt added the comment: I dunno, I'd be happy if you implement this, but it does mean *more* C code, not less. :-) I feel this would be a nice-to-have, but we should get the thing working first, and then do the multiple-OS-calls-in-one optimization. In any case, if you implement this, I

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2014-10-09 Thread Josh Rosenberg
Changes by Josh Rosenberg shadowranger+pyt...@gmail.com: -- nosy: +josh.rosenberg ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22524 ___ ___

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2014-10-08 Thread Antoine Pitrou
Antoine Pitrou added the comment: I agree that having a high-level wrapper around a low-level C primitive would be cool, but someone has to experiment on that to find out how much performance it would cost. You may want to have the C primitive return results in batches (of e.g. 10 or 100

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2014-10-08 Thread STINNER Victor
STINNER Victor added the comment: You may want to have the C primitive return results in batches (of e.g. 10 or 100 entries) to limit the overhead, btw. Ah yes, good idea. I read that internally readdir() also fetchs many entries in a single syscall (prefetch / readahead, I don't know how to

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2014-10-08 Thread Ben Hoyt
Ben Hoyt added the comment: Thanks for the initial response and code review, Victor. I'll take a look and respond further in the next few days. In the meantime, however, I'm definitely open to splitting scandir out into its own C file. This will mean a little refactoring (making some

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2014-10-08 Thread Ben Hoyt
Ben Hoyt added the comment: Here are the actual numbers (instead of just from memory) running on my Windows laptop, which happens to have an SSD drive, so os.walk() with scandir is particularly good here. python 3.4: benchmark.py -c python (all implemented in Python using ctypes): os.walk

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2014-10-08 Thread Antoine Pitrou
Antoine Pitrou added the comment: Ben, how can I run the benchmark on my own machine? Thanks! -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22524 ___

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2014-10-08 Thread Antoine Pitrou
Antoine Pitrou added the comment: Ok, I've found it. The numbers don't look that good here (Linux, SSD): $ ~/cpython/opt/python benchmark.py -c python /usr/share/ [...] os.walk took 1.266s, scandir.walk took 1.852s -- 0.7x as fast $ ~/cpython/opt/python benchmark.py -c c /usr/share/ [...]

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2014-10-07 Thread STINNER Victor
STINNER Victor added the comment: Hi Ben, I started to review your patch, sorry for the delay :-( It's convinient to have the patch on Rietveld to comment it inline. I didn't expect so much C code. The posixmodule.c is really huge. Well, it's just the longest C file of Python 3.5. Your patch

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2014-10-06 Thread Ben Hoyt
Ben Hoyt added the comment: Attaching my first patch here. It includes docs, the implementation of scandir and DirEntry in posixmodule.c, and tests in test_scandir.py. It does not yet include the changes for os.walk() to use scandir; I'm hoping to do that in a separate patch for easier code

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2014-10-06 Thread Akira Li
Changes by Akira Li 4kir4...@gmail.com: -- nosy: +akira ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22524 ___ ___ Python-bugs-list mailing list

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2014-09-30 Thread Ben Hoyt
New submission from Ben Hoyt: Opening this to track the implementation of PEP 471: os.scandir() [1]. This supercedes Issue #11406 (and possibly others)? The implementation is most of the way there, but not yet done as a CPythono 3.5 patch. Before I have a proper patch, it's available on

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2014-09-30 Thread Giampaolo Rodola'
Changes by Giampaolo Rodola' g.rod...@gmail.com: -- nosy: +giampaolo.rodola ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22524 ___ ___

[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2014-09-30 Thread Evgeny Kapun
Changes by Evgeny Kapun abacabadabac...@gmail.com: -- nosy: +abacabadabacaba ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22524 ___ ___