[issue11406] There is no os.listdir() equivalent returning generator instead of list

Ben Hoyt Sun, 05 May 2013 01:53:23 -0700

Ben Hoyt added the comment:

> I find iterdir_stat() ugly :-) I like the scandir name, which has some 
> precedent with POSIX.


Fair enough. I'm cool with scandir().

> scandir() cannot return (name, stat), because on POSIX, readdir() only 
> returns d_name and d_type (the type of the entry): to return a stat, we would 
> have to call stat() on each entry, which would defeat the performance gain.

Yes, you're right. I "solved" this in BetterWalk with the solution you propose 
of returning a stat_result object with the fields it could get "for free" set, 
and the others set to None.

So on Linux, you'd get a stat_result with only st_mode set (or None for 
DT_UNKNOWN), and all the other fields None. However -- st_mode is the one 
you're most likely to use, usually looking just for whether it's a file or 
directory. So calling code would look something like this:

files = []
dirs = []
for name, st in scandir(path):
    if st.st_mode is None:
        st = os.stat(os.path.join(path, name))
    if stat.S_ISDIR(st.st_mode):
        dirs.append(name)
    else:
        files.append(name)

Meaning you'd get the speed improvements 99% of the time (when st_mode) was 
set, but if st_mode is None, you can call stat and handle errors and whatnot 
yourself.

> That's why scandir would be a rather low-level call, whose main user would be 
> walkdir, which only needs to know the entry time and not the whole stat 
> result.

Agreed. This is in the OS module after all, and there's tons of stuff that's 
OS-dependent in there. However, I think that doing something like the above, we 
can make it usable and performant on both Linux and Windows for use cases like 
walking directory trees.

> Also, I don't know which information is returned by the readdir equivalent on 
> Windows, but if we want a consistent API, we have to somehow map d_type and 
> Windows's returned type to a common type, like DT_FILE, DT_DIRECTORY, etc 
> (which could be an enum).

The Windows scan directory functions (FindFirstFile/FindNextFile) return a 
*full* stat (or at least, as much info as you get from a stat in Windows). We 
*could* map them to a common type -- but I'm suggesting that common type might 
as well be "stat_result with None meaning not present". That way users don't 
have to learn a completely new type.

> The other approach would be to return a dummy stat object with only st_mode 
> set, but that would be kind of a hack to return a dummy stat result with only 
> part of the attributes set (some people will get bitten by this).

We could document any platform-specific stuff, and places you'd users could get 
bitten. But can you give me an example of where the 
stat_result-with-st_mode-or-None approach falls over completely?

----------

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue11406>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11406] There is no os.listdir() equivalent returning generator instead of list

Reply via email to