[issue22167] iglob() has misleading documentation (does indeed store names internally)

2021-06-21 Thread Andrei Kulakov
Andrei Kulakov added the comment: I have put up a PR here: https://github.com/python/cpython/pull/25767 -- ___ Python tracker ___

[issue22167] iglob() has misleading documentation (does indeed store names internally)

2021-04-30 Thread Andrei Kulakov
Change by Andrei Kulakov : -- keywords: +patch nosy: +andrei.avk nosy_count: 8.0 -> 9.0 pull_requests: +24460 stage: -> patch review pull_request: https://github.com/python/cpython/pull/25767 ___ Python tracker

[issue22167] iglob() has misleading documentation (does indeed store names internally)

2020-06-16 Thread Guido van Rossum
Guido van Rossum added the comment: Sounds good. FWIW, and totally off-topic, I find it annoying that pathlib's .glob() method supports ** in patterns, but its cousing .match() does not: >>> p = pathlib.Path("Lib/test/support/os_helper.py") >>> p.match("Lib/**/*.py") False >>> --

[issue22167] iglob() has misleading documentation (does indeed store names internally)

2020-06-16 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: I am going to add the issue38144 feature first. Then maybe implement a dir_fd based optimization. -- ___ Python tracker ___

[issue22167] iglob() has misleading documentation (does indeed store names internally)

2020-06-15 Thread Guido van Rossum
Guido van Rossum added the comment: How's this going? -- ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue22167] iglob() has misleading documentation (does indeed store names internally)

2020-06-01 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Brilliant idea! I played with it yesterday, and it is easy to generalize it to work with a*/b*/c*/d/e/f and to "use not more than N simultaneously opened file descriptors per glob iterator". The only problem is if we want to use this idea with recursive

[issue22167] iglob() has misleading documentation (does indeed store names internally)

2020-05-31 Thread Guido van Rossum
Guido van Rossum added the comment: I hope some volunteer will submit a doc PR. In the meantime, throwing out one more idea: perhaps my first idea, to make _glob1() an iterator, could work if we only do it for leaves of the pattern, so for the a*/b*/c* example, only for the c* part.

[issue22167] iglob() has misleading documentation (does indeed store names internally)

2020-05-31 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: This is an interesting idea, but I afraid it will complicate the code too much, destroying the remnants of the initial elegant design. I am going to try to experiment with this idea after implementing other features, so it will not block them. For now we

[issue22167] iglob() has misleading documentation (does indeed store names internally)

2020-05-31 Thread Guido van Rossum
Guido van Rossum added the comment: Hm, yeah. Perhaps we can add some buffering so that for directories with a small number of files (say < 1000) the FD is closed before recursing into it, while for large directories it keeps the FD open until the last block of 1000 DirEntries is read? It's

[issue22167] iglob() has misleading documentation (does indeed store names internally)

2020-05-31 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Yes, for the pattern 'a*/b*/c*' you will have an open file descriptor for every component with metacharacters: for a in scandir('.'): if fnmatch(a.name, 'a*'): for b in scandir(a.path): if fnmatch(b.name, 'b*'):

[issue22167] iglob() has misleading documentation (does indeed store names internally)

2020-05-31 Thread Guido van Rossum
Guido van Rossum added the comment: Serhiy, what do you mean by "otherwise we could run out of file descriptiors"? I looked a bit at the code and there are different kinds of algorithms involved for different forms of patterns, and the code also takes vastly different paths for recursive

[issue22167] iglob() has misleading documentation (does indeed store names internally)

2020-05-31 Thread Serhiy Storchaka
Change by Serhiy Storchaka : -- versions: +Python 3.10, Python 3.6, Python 3.7, Python 3.8, Python 3.9 -Python 2.7 ___ Python tracker ___

[issue22167] iglob() has misleading documentation (does indeed store names internally)

2018-02-17 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Unfortunately issue25596 didn't change anything about this issue. iglob() still stores names (actually DirEntry objects) of all files in a directory before starting yielding the first of them. Otherwise we cold exceed the limit

[issue22167] iglob() has misleading documentation (does indeed store names internally)

2018-02-16 Thread Roger Erens
Roger Erens added the comment: http://bugs.python.org/issue25596 has been closed... -- nosy: +Roger Erens ___ Python tracker ___

[issue22167] iglob() has misleading documentation (does indeed store names internally)

2016-01-11 Thread Guido van Rossum
Guido van Rossum added the comment: Once http://bugs.python.org/issue25596 (switching glob to use scandir) is solved this issue can be closed IMO. -- nosy: +gvanrossum ___ Python tracker

[issue22167] iglob() has misleading documentation (does indeed store names internally)

2014-08-08 Thread Roy Smith
Roy Smith added the comment: How about something like this: Note: The current iglob() implementation is optimized for the case of many files distributed in a large directory tree. Internally, it iterates over the directory tree, and stores all the names from each directory at once. This

[issue22167] iglob() has misleading documentation (does indeed store names internally)

2014-08-08 Thread R. David Murray
R. David Murray added the comment: IMO the documentation isn't *wrong*, just misleading :) What it is saying is that *your program* doesn't have to store the full list returned by iglob before being able to use it (ie: iglob doesn't return a list). It says nothing about what resources are

[issue22167] iglob() has misleading documentation (does indeed store names internally)

2014-08-07 Thread Roy Smith
New submission from Roy Smith: For background, see: https://mail.python.org/pipermail/python-list/2014-August/676291.html In a nutshell, the iglob() docs say, Return an iterator which yields the same values as glob() without actually storing them all simultaneously. The problem is,

[issue22167] iglob() has misleading documentation (does indeed store names internally)

2014-08-07 Thread Tim Chase
Changes by Tim Chase python.b...@tim.thechases.com: -- nosy: +Gumnos ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22167 ___ ___ Python-bugs-list

[issue22167] iglob() has misleading documentation (does indeed store names internally)

2014-08-07 Thread Steven D'Aprano
Steven D'Aprano added the comment: I agree that the documentation could be improved, but it's not really *wrong*. Consider a glob like spam/[abc]/*.txt. What iglob does is conceptually closer to: (1) generate the list of files matching spam/a/*.txt and yield them; (2) generate the list of

[issue22167] iglob() has misleading documentation (does indeed store names internally)

2014-08-07 Thread Roy Smith
Roy Smith added the comment: The thread that led to this started out with the use case of a directory that had 200k files in it. If I ran iglob() on that and discovered that it had internally generated a list of all 200k names in memory at the same time, I would be pretty darn surprised,