Ben Hoyt added the comment:
> I find iterdir_stat() ugly :-) I like the scandir name, which has some
> precedent with POSIX.
Fair enough. I'm cool with scandir().
> scandir() cannot return (name, stat), because on POSIX, readdir() only
> returns d_name and d_type (the type of the entry): to return a stat, we would
> have to call stat() on each entry, which would defeat the performance gain.
Yes, you're right. I "solved" this in BetterWalk with the solution you propose
of returning a stat_result object with the fields it could get "for free" set,
and the others set to None.
So on Linux, you'd get a stat_result with only st_mode set (or None for
DT_UNKNOWN), and all the other fields None. However -- st_mode is the one
you're most likely to use, usually looking just for whether it's a file or
directory. So calling code would look something like this:
files = []
dirs = []
for name, st in scandir(path):
if st.st_mode is None:
st = os.stat(os.path.join(path, name))
if stat.S_ISDIR(st.st_mode):
dirs.append(name)
else:
files.append(name)
Meaning you'd get the speed improvements 99% of the time (when st_mode) was
set, but if st_mode is None, you can call stat and handle errors and whatnot
yourself.
> That's why scandir would be a rather low-level call, whose main user would be
> walkdir, which only needs to know the entry time and not the whole stat
> result.
Agreed. This is in the OS module after all, and there's tons of stuff that's
OS-dependent in there. However, I think that doing something like the above, we
can make it usable and performant on both Linux and Windows for use cases like
walking directory trees.
> Also, I don't know which information is returned by the readdir equivalent on
> Windows, but if we want a consistent API, we have to somehow map d_type and
> Windows's returned type to a common type, like DT_FILE, DT_DIRECTORY, etc
> (which could be an enum).
The Windows scan directory functions (FindFirstFile/FindNextFile) return a
*full* stat (or at least, as much info as you get from a stat in Windows). We
*could* map them to a common type -- but I'm suggesting that common type might
as well be "stat_result with None meaning not present". That way users don't
have to learn a completely new type.
> The other approach would be to return a dummy stat object with only st_mode
> set, but that would be kind of a hack to return a dummy stat result with only
> part of the attributes set (some people will get bitten by this).
We could document any platform-specific stuff, and places you'd users could get
bitten. But can you give me an example of where the
stat_result-with-st_mode-or-None approach falls over completely?
----------
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue11406>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com