> In this case because the names are exactly the same as the os versions which > /do/ make a system call.
Fair enough. > So if I'm finally understanding the root problem here: > > - listdir returns a list of strings, one for each filename and one for > each directory, and keeps no other O/S supplied info. > > - os.walk, which uses listdir, then needs to go back to the O/S and > refetch the thrown-away information > > - so it's slow. > ... > and the new problem: > > - not all O/Ses provide the same (or any) extra info about the > directory entries > > Have I got that right? Yes, that's exactly right. > If so, I still like the attribute idea better (surprise!), we just need to > revisit the 'ensure_lstat' (or whatever it's called) parameter: instead of > a true/false value, it could have a scale: > > - 0 = whatever the O/S gives us > > - 1 = at least the is_dir/is_file (whatever the other normal one is), > and if the O/S doesn't give it to us for free than call lstat > > - 2 = we want it all -- call lstat if necessary on this platform > > After all, the programmer should know up front how much of the extra info > will be needed for the work that is trying to be done. Yeah, I think this is a good idea to make option #2 a bit nicer. I don't like the magic constants, and using constants like os.SCANDIR_LSTAT is annoying, so how about using strings? I also suggest calling the parameter "info" (because it determines what info is returned), so you'd do scandir(path, info='type') if you need just the is_X type information. I also think it's nice to have a way for power users to "just return what the OS gives us". However, I think making this the default is a bad idea, as it's just asking for cross-platform bugs (and it's easy to prevent). Paul Moore basically agrees with this in his reply yesterday, though I disagree with him it would be unfriendly to fail hard unless you asked for the info -- quite the opposite, Linux users would think it very unfriendly when your code broke because you didn't ask for the info. :-) So how about tweaking option #2 a tiny bit more to this: def scandir(path='.', info=None, onerror=None): ... * if info is None (the default), only the .name and .full_name attributes are present * if info is 'type', scandir ensures the is_dir/is_file/is_symlink attributes are present and either True or False * if info is 'lstat', scandir additionally ensures a .lstat is present and is a full stat_result object * if info is 'os', scandir returns the attributes the OS provides (everything on Windows, only is_X -- most of the time -- on POSIX) * if onerror is not None and errors occur during any internal lstat() call, onerror(exc) is called with the OSError exception object Further point -- because the is_dir/is_file/is_symlink attributes are booleans, it would be very bad for them to be present but None if you didn't ask for (or the OS didn't return) the type information. Because then "if entry.is_dir:" would be None and your code would think it wasn't a directory, when actually you don't know. For this reason, all attributes should fail with AttributeError if not fetched. > Thank you for writing scandir, and this PEP. Excellent work. Thanks! > Oh, and +1 for option 2, slightly modified. :) With the above tweaks, I'm getting closer to being 50/50. It's probably 60% #1 and 40% #2 for me now. :-) Okay folks -- please respond: option #1 as per the current PEP 471, or option #2 with Ethan's multi-level thing tweaks as per the above? -Ben _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com