Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In this case because the names are exactly the same as the os versions which /do/ make a system call. Fair enough. So if I'm finally understanding the root problem here: - listdir returns a list of strings, one for each filename and one for each directory, and keeps no other O/S supplied info. - os.walk, which uses listdir, then needs to go back to the O/S and refetch the thrown-away information - so it's slow. ... and the new problem: - not all O/Ses provide the same (or any) extra info about the directory entries Have I got that right? Yes, that's exactly right. If so, I still like the attribute idea better (surprise!), we just need to revisit the 'ensure_lstat' (or whatever it's called) parameter: instead of a true/false value, it could have a scale: - 0 = whatever the O/S gives us - 1 = at least the is_dir/is_file (whatever the other normal one is), and if the O/S doesn't give it to us for free than call lstat - 2 = we want it all -- call lstat if necessary on this platform After all, the programmer should know up front how much of the extra info will be needed for the work that is trying to be done. Yeah, I think this is a good idea to make option #2 a bit nicer. I don't like the magic constants, and using constants like os.SCANDIR_LSTAT is annoying, so how about using strings? I also suggest calling the parameter info (because it determines what info is returned), so you'd do scandir(path, info='type') if you need just the is_X type information. I also think it's nice to have a way for power users to just return what the OS gives us. However, I think making this the default is a bad idea, as it's just asking for cross-platform bugs (and it's easy to prevent). Paul Moore basically agrees with this in his reply yesterday, though I disagree with him it would be unfriendly to fail hard unless you asked for the info -- quite the opposite, Linux users would think it very unfriendly when your code broke because you didn't ask for the info. :-) So how about tweaking option #2 a tiny bit more to this: def scandir(path='.', info=None, onerror=None): ... * if info is None (the default), only the .name and .full_name attributes are present * if info is 'type', scandir ensures the is_dir/is_file/is_symlink attributes are present and either True or False * if info is 'lstat', scandir additionally ensures a .lstat is present and is a full stat_result object * if info is 'os', scandir returns the attributes the OS provides (everything on Windows, only is_X -- most of the time -- on POSIX) * if onerror is not None and errors occur during any internal lstat() call, onerror(exc) is called with the OSError exception object Further point -- because the is_dir/is_file/is_symlink attributes are booleans, it would be very bad for them to be present but None if you didn't ask for (or the OS didn't return) the type information. Because then if entry.is_dir: would be None and your code would think it wasn't a directory, when actually you don't know. For this reason, all attributes should fail with AttributeError if not fetched. Thank you for writing scandir, and this PEP. Excellent work. Thanks! Oh, and +1 for option 2, slightly modified. :) With the above tweaks, I'm getting closer to being 50/50. It's probably 60% #1 and 40% #2 for me now. :-) Okay folks -- please respond: option #1 as per the current PEP 471, or option #2 with Ethan's multi-level thing tweaks as per the above? -Ben ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
2014-07-08 22:09 GMT+02:00 Ben Hoyt benh...@gmail.com: I think you're misunderstanding is_dir() and is_file(), as these don't actually call os.stat(). All DirEntry methods either call nothing or os.lstat() to get the stat info on the entry itself (not the destination of the symlink). Oh. Extract of your PEP: is_dir(): like os.path.isdir(), but much cheaper. genericpath.isdir() and genericpath.isfile() use os.stat(), whereas posixpath.islink() uses os.lstat(). Is it a mistake in the PEP? Ah, you're dead right -- this is basically a bug in the PEP, as DirEntry.is_dir() is not like os.path.isdir() in that it is based on the entry itself (like lstat), not following the link. I'll improve the wording here and update the PEP. Ok, so it means that your example grouping files per type, files and directories, is also wrong. Or at least, it behaves differently than os.walk(). You should put symbolic links to directories in the dirs list too. if entry.is_dir(): # is_dir() checks os.lstat() dirs.append(entry) elif entry.is_symlink() and os.path.isdir(entry): # isdir() checks os.stat() dirs.append(entry) else: non_dirs.append(entry) Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
Ok, so it means that your example grouping files per type, files and directories, is also wrong. Or at least, it behaves differently than os.walk(). You should put symbolic links to directories in the dirs list too. if entry.is_dir(): # is_dir() checks os.lstat() dirs.append(entry) elif entry.is_symlink() and os.path.isdir(entry): # isdir() checks os.stat() dirs.append(entry) else: non_dirs.append(entry) Yes, good call. I believe I'm doing this wrong in the scandir.py os.walk() implementation too -- hence this open issue: https://github.com/benhoyt/scandir/issues/4 -Ben ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
On 9 July 2014 13:48, Ben Hoyt benh...@gmail.com wrote: Okay folks -- please respond: option #1 as per the current PEP 471, or option #2 with Ethan's multi-level thing tweaks as per the above? I'm probably about 50/50 at the moment. What will swing it for me is likely error handling, so let's try both approaches with some error handling: Rules are that we calculate the total size of all files in a tree (as returned from lstat), with files that fail to stat being logged and their size assumed to be 0. Option 1: def get_tree_size(path): total = 0 for entry in os.scandir(path): try: isdir = entry.is_dir() except OSError: logger.warn(Cannot stat {}.format(entry.full_name)) continue if entry.is_dir(): total += get_tree_size(entry.full_name) else: try: total += entry.lstat().st_size except OSError: logger.warn(Cannot stat {}.format(entry.full_name)) return total Option 2: def log_err(exc): logger.warn(Cannot stat {}.format(exc.filename)) def get_tree_size(path): total = 0 for entry in os.scandir(path, info='lstat', onerror=log_err): if entry.is_dir: total += get_tree_size(entry.full_name) else: total += entry.lstat.st_size return total On this basis, #2 wins. However, I'm slightly uncomfortable using the filename attribute of the exception in the logging, as there is nothing in the docs saying that this will give a full pathname. I'd hate to see Unable to stat __init__.py!!! So maybe the onerror function should also receive the DirEntry object - which will only have the name and full_name attributes, but that's all that is needed. OK, looks like option #2 is now my preferred option. My gut instinct still rebels over an API that deliberately throws information away in the default case, even though there is now an option to ask it to keep that information, but I see the logic and can learn to live with it. Paul ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] == on object tests identity in 3.x
Le 09/07/2014 00:21, Stephen J. Turnbull a écrit : Steven D'Aprano writes: I don't think so. Floating point == represents *numeric* equality, There is no such thing as floating point == in Python. You can apply == to two floating point numbers, but == (at the language level) handles any two numbers, as well as pairs of things that aren't numbers in the Python language. This is becoming pointless hair-splitting. float.__eq__(1.0, 2.0) False float.__eq__(1.0, 2) False float.__eq__(1.0, 1.0+0J) NotImplemented float.__eq__(1, 2) Traceback (most recent call last): File stdin, line 1, in module TypeError: descriptor '__eq__' requires a 'float' object but received a 'int' Please direct any further discussion of this to python-ideas. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
Option 2: def log_err(exc): logger.warn(Cannot stat {}.format(exc.filename)) def get_tree_size(path): total = 0 for entry in os.scandir(path, info='lstat', onerror=log_err): if entry.is_dir: total += get_tree_size(entry.full_name) else: total += entry.lstat.st_size return total On this basis, #2 wins. That's a pretty nice comparison, and you're right, onerror handling is nicer here. However, I'm slightly uncomfortable using the filename attribute of the exception in the logging, as there is nothing in the docs saying that this will give a full pathname. I'd hate to see Unable to stat __init__.py!!! Huh, you're right. I think this should be documented in os.walk() too. I think it should be the full filename (is it currently?). So maybe the onerror function should also receive the DirEntry object - which will only have the name and full_name attributes, but that's all that is needed. That's an interesting idea -- though enough of a deviation from os.walk()'s onerror that I'm uncomfortable with it -- I'd rather just document that the onerror exception .filename is the full path name. One issue with option #2 that I just realized -- does scandir yield the entry at all if there's a stat error? It can't really, because the caller will except the .lstat attribute to be set (assuming he asked for type='lstat') but it won't be. Is effectively removing these entries just because the stat failed a problem? I kind of think it is. If so, is there a way to solve it with option #2? OK, looks like option #2 is now my preferred option. My gut instinct still rebels over an API that deliberately throws information away in the default case, even though there is now an option to ask it to keep that information, but I see the logic and can learn to live with it. In terms of throwing away info in the default case -- it's simply a case of getting what you ask for. :-) Worst case, you'll write your code and test it, it'll fail hard on any system, you'll fix it immediately, and then it'll work on any system. -Ben ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
On 9 July 2014 14:22, Ben Hoyt benh...@gmail.com wrote: So maybe the onerror function should also receive the DirEntry object - which will only have the name and full_name attributes, but that's all that is needed. That's an interesting idea -- though enough of a deviation from os.walk()'s onerror that I'm uncomfortable with it -- I'd rather just document that the onerror exception .filename is the full path name. But the onerror exception will come from the lstat call, so it'll be a raw OSError (unless scandir modifies it, which may be what you're thinking of). And if so, aren't we at the mercy of what the OS module gives us? That's why I said we can't guarantee it. I looked at the documentation of OSError (in Built In Exceptions), and all it says is the filename (unqualified). I'd expect that to be whatever got passed to the underlying OS API - which may well be an absolute pathname if we're lucky, but who knows? (I'd actually prefer it if OSError guaranteed a full pathname, but that's a bigger issue...) Paul ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
On 07/09/2014 05:48 AM, Ben Hoyt wrote: So how about tweaking option #2 a tiny bit more to this: def scandir(path='.', info=None, onerror=None): ... * if info is None (the default), only the .name and .full_name attributes are present * if info is 'type', scandir ensures the is_dir/is_file/is_symlink attributes are present and either True or False * if info is 'lstat', scandir additionally ensures a .lstat is present and is a full stat_result object * if info is 'os', scandir returns the attributes the OS provides (everything on Windows, only is_X -- most of the time -- on POSIX) I would rather have the default for info be 'os': cross-platform is good, but there is no reason to force it on some poor script that is meant to run on a local machine and will never leave it. * if onerror is not None and errors occur during any internal lstat() call, onerror(exc) is called with the OSError exception object As Paul mentioned, 'onerror(exc, DirEntry)' would be better. Further point -- because the is_dir/is_file/is_symlink attributes are booleans, it would be very bad for them to be present but None if you didn't ask for (or the OS didn't return) the type information. Because then if entry.is_dir: would be None and your code would think it wasn't a directory, when actually you don't know. For this reason, all attributes should fail with AttributeError if not fetched. Fair point, and agreed. -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
On 07/09/2014 06:22 AM, Ben Hoyt wrote: One issue with option #2 that I just realized -- does scandir yield the entry at all if there's a stat error? It can't really, because the caller will expect the .lstat attribute to be set (assuming he asked for type='lstat') but it won't be. Is effectively removing these entries just because the stat failed a problem? I kind of think it is. If so, is there a way to solve it with option #2? Leave it up to the onerror handler. If it returns None, skip yielding the entry, otherwise yield whatever it returned -- which also means the error handler should be able to set fields on the DirEntry: def log_err(exc, entry): logger.warn(Cannot stat {}.format(exc.filename)) entry.lstat.st_size = 0 return True def get_tree_size(path): total = 0 for entry in os.scandir(path, info='lstat', onerror=log_err): if entry.is_dir: total += get_tree_size(entry.full_name) else: total += entry.lstat.st_size return total This particular example doesn't benefit much from the addition, but this way we don't have to guess what the programmer wants or needs to do in the case of failure. -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
On 8 Jul 2014, at 15:52, Ben Hoyt wrote: Hi folks, After some very good python-dev feedback on my first version of PEP 471, I've updated the PEP to clarify a few things and added various Rejected ideas subsections. Here's a link to the new version (I've also copied the full text below): http://legacy.python.org/dev/peps/pep-0471/ -- new PEP as HTML http://hg.python.org/peps/rev/0da4736c27e8 -- changes [...] Rejected ideas == [...] Return values being pathlib.Path objects With Antoine Pitrou's new standard library ``pathlib`` module, it at first seems like a great idea for ``scandir()`` to return instances of ``pathlib.Path``. However, ``pathlib.Path``'s ``is_X()`` and ``lstat()`` functions are explicitly not cached, whereas ``scandir`` has to cache them by design, because it's (often) returning values from the original directory iteration system call. And if the ``pathlib.Path`` instances returned by ``scandir`` cached lstat values, but the ordinary ``pathlib.Path`` objects explicitly don't, that would be more than a little confusing. Guido van Rossum explicitly rejected ``pathlib.Path`` caching lstat in the context of scandir `here https://mail.python.org/pipermail/python-dev/2013-November/130583.html`_, making ``pathlib.Path`` objects a bad choice for scandir return values. Can we at least make sure that attributes of DirEntry that have the same meaning as attributes of pathlib.Path have the same name? [...] Servus, Walter ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
On 07/09/2014 06:41 AM, Ethan Furman wrote: Leave it up to the onerror handler. If it returns None, skip yielding the entry, otherwise yield whatever it returned -- which also means the error handler should be able to set fields on the DirEntry: def log_err(exc, entry): logger.warn(Cannot stat {}.format(exc.filename)) entry.lstat.st_size = 0 return True Blah. Okay, either return the DirEntry (possibly modified), or have the log_err return entry instead of True. (Now where is that caffeine??) -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
2014-07-09 15:12 GMT+02:00 Ben Hoyt benh...@gmail.com: Ok, so it means that your example grouping files per type, files and directories, is also wrong. Or at least, it behaves differently than os.walk(). You should put symbolic links to directories in the dirs list too. if entry.is_dir(): # is_dir() checks os.lstat() dirs.append(entry) elif entry.is_symlink() and os.path.isdir(entry): # isdir() checks os.stat() dirs.append(entry) else: non_dirs.append(entry) Yes, good call. I believe I'm doing this wrong in the scandir.py os.walk() implementation too -- hence this open issue: https://github.com/benhoyt/scandir/issues/4 The PEP says that DirEntry should mimic pathlib.Path, so I think that DirEntry.is_dir() should work as os.path.isir(): if the entry is a symbolic link, you should follow the symlink to get the status of the linked file with os.stat(). entry.is_dir() or (entry.is_symlink() and os.path.isdir(entry)) looks wrong: why would you have to check is_dir() and isdir()? Duplicating this check is error prone and not convinient. Pseudo-code: --- class DirEntry: def __init__(self, lstat=None, d_type=None): self._stat = None self._lstat = lstat self._d_type = d_type def stat(self): if self._stat is None: self._stat = os.stat(self.full_name) return self._stat def lstat(self): if self._lstat is None: self._lstat = os.lstat(self.full_name) return self._lstat def is_dir(self): if self._d_type is not None: if self._d_type == DT_DIR: return True if self._d_type != DT_LNK: return False else: lstat = self.lstat() if stat.S_ISDIR(lstat.st_mode): return True if not stat.S_ISLNK(lstat.st_mode): return False stat = self.stat() return stat.S_ISDIR(stat.st_mode) --- DirEntry would be created with lstat (Windows) or d_type (Linux) prefilled. is_dir() would only need to call os.stat() once for symbolic links. With this code, it becomes even more obvious why is_dir() is a method and not a property ;-) Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
On 9 July 2014 16:05, Victor Stinner victor.stin...@gmail.com wrote: The PEP says that DirEntry should mimic pathlib.Path, so I think that DirEntry.is_dir() should work as os.path.isir(): if the entry is a symbolic link, you should follow the symlink to get the status of the linked file with os.stat(). Would this not break the tree size script being discussed in the other thread, as it would follow links and include linked directories in the size of the tree? As a Windows user with only a superficial understanding of how symlinks should behave, I don't have any intuition as to what the right answer should be. But I would say that the tree size code we've been debating over there (which recurses if is_dir is true and adds in st_size otherwise) should do whatever people would expect of a function with that name, as it's a perfect example of something a Windows user might write and expect it to work cross-platform. If it doesn't much of the worrying over making sure scandir's API is cross-platform by default is probably being wasted :-) (Obviously the walk_tree function could be modified if needed, but that's missing the point I'm trying to make :-)) Paul ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
The PEP says that DirEntry should mimic pathlib.Path, so I think that DirEntry.is_dir() should work as os.path.isir(): if the entry is a symbolic link, you should follow the symlink to get the status of the linked file with os.stat(). Would this not break the tree size script being discussed in the other thread, as it would follow links and include linked directories in the size of the tree? Yeah, I agree. Victor -- I don't think the DirEntry is_X() methods (or attributes) should mimic the link-following os.path.isdir() at all. You want the type of the entry, not the type of the source. Otherwise, as Paul says, you are essentially forced to follow links, and os.walk(followlinks=False), which is the default, can't do the right thing. -Ben ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
One issue with option #2 that I just realized -- does scandir yield the entry at all if there's a stat error? It can't really, because the caller will expect the .lstat attribute to be set (assuming he asked for type='lstat') but it won't be. Is effectively removing these entries just because the stat failed a problem? I kind of think it is. If so, is there a way to solve it with option #2? Leave it up to the onerror handler. If it returns None, skip yielding the entry, otherwise yield whatever it returned -- which also means the error handler should be able to set fields on the DirEntry: def log_err(exc, entry): logger.warn(Cannot stat {}.format(exc.filename)) entry.lstat.st_size = 0 return True This is an interesting idea, but it's just getting more and more complex, and I'm guessing that being able to change the attributes of DirEntry will make the C implementation more complex. Also, I'm not sure it's very workable. For log_err above, you'd actually have to do something like this, right? def log_err(exc, entry): logger.warn(Cannot stat {}.format(exc.filename)) entry.lstat = os.stat_result((0, 0, 0, 0, 0, 0, 0, 0, 0, 0)) return entry Unless there's another simple way around this issue, I'm back to loving the simplicity of option #1, which avoids this whole question. -Ben ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
On 9 July 2014 14:22, Ben Hoyt benh...@gmail.com wrote: One issue with option #2 that I just realized -- does scandir yield the entry at all if there's a stat error? It can't really, because the caller will except the .lstat attribute to be set (assuming he asked for type='lstat') but it won't be. Is effectively removing these entries just because the stat failed a problem? I kind of think it is. If so, is there a way to solve it with option #2? So the issue is that you need to do a stat but it failed. You have whatever the OS gave you, but can't get anything more. This is only an issue on POSIX, where the original OS call doesn't give you everything, so it's fine, those POSIX people can just learn to cope with their broken OS, right? :-) More seriously, why not just return a DirEntry that says it's a file with a stat entry that's all zeroes? That seems pretty harmless. And the onerror function will be called, so if it is inappropriate the application can do something. Maybe it's worth letting onerror return a boolean that says whether to skip the entry, but that's as far as I'd bother going. It's a close call, but I think option #2 still wins (just) for me. Paul ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
On 07/09/2014 08:35 AM, Ben Hoyt wrote: One issue with option #2 that I just realized -- does scandir yield the entry at all if there's a stat error? It can't really, because the caller will expect the .lstat attribute to be set (assuming he asked for type='lstat') but it won't be. Is effectively removing these entries just because the stat failed a problem? I kind of think it is. If so, is there a way to solve it with option #2? Leave it up to the onerror handler. If it returns None, skip yielding the entry, otherwise yield whatever it returned -- which also means the error handler should be able to set fields on the DirEntry: def log_err(exc, entry): logger.warn(Cannot stat {}.format(exc.filename)) entry.lstat.st_size = 0 return True This is an interesting idea, but it's just getting more and more complex, and I'm guessing that being able to change the attributes of DirEntry will make the C implementation more complex. Also, I'm not sure it's very workable. For log_err above, you'd actually have to do something like this, right? def log_err(exc, entry): logger.warn(Cannot stat {}.format(exc.filename)) entry.lstat = os.stat_result((0, 0, 0, 0, 0, 0, 0, 0, 0, 0)) return entry I would imagine we would provide a helper function: def stat_result(st_size=0, st_atime=0, st_mtime=0, ...): return os.stat_result((st_size, st_atime, st_mtime, ...)) and then in onerror entry.lstat = stat_result() Unless there's another simple way around this issue, I'm back to loving the simplicity of option #1, which avoids this whole question. Too simple is just as bad as too complex, and properly handling errors is rarely a simple task. Either we provide a clean way to deal with errors in the API, or we force every user everywhere to come up with their own system. Also, just because we provide it doesn't force people to use it, but if we don't provide it then people cannot use it. To summarize the choice I think we are looking at: 1) We provide a very basic tool that many will have to write wrappers around to get the desired behavior (choice 1) 2) We provide a more advanced tool that, in many cases, can be used as-is, and is also fairly easy to extend to handle odd situations (choice 2) More specifically, if we go with choice 1 (no built-in error handling, no mutable DirEntry), how would I implement choice 2? Would I have to write my own CustomDirEntry object? -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
On 9 July 2014 17:35, Ethan Furman et...@stoneleaf.us wrote: More specifically, if we go with choice 1 (no built-in error handling, no mutable DirEntry), how would I implement choice 2? Would I have to write my own CustomDirEntry object? Having built-in error handling is, I think, a key point. That's where #1 really falls down. But a mutable DirEntry and/or letting onerror manipulate the result is a lot more than just having a hook for being notified of errors. That seems to me to be a step too far, in the current context. Specifically, the tree size example doesn't need it. Do you have a compelling use case that needs a mutable DirEntry? It feels like YAGNI to me. Paul ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
On 07/09/2014 11:04 AM, Paul Moore wrote: On 9 July 2014 17:35, Ethan Furman et...@stoneleaf.us wrote: More specifically, if we go with choice 1 (no built-in error handling, no mutable DirEntry), how would I implement choice 2? Would I have to write my own CustomDirEntry object? Having built-in error handling is, I think, a key point. That's where #1 really falls down. But a mutable DirEntry and/or letting onerror manipulate the result is a lot more than just having a hook for being notified of errors. That seems to me to be a step too far, in the current context. Specifically, the tree size example doesn't need it. Do you have a compelling use case that needs a mutable DirEntry? It feels like YAGNI to me. Not at this point. As I indicated in my reply to your response, as long as we have the onerror machinery now we can tweak it later if real-world use shows it would be beneficial. -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
This is just getting way too complex ... further thoughts below. This is an interesting idea, but it's just getting more and more complex, and I'm guessing that being able to change the attributes of DirEntry will make the C implementation more complex. Also, I'm not sure it's very workable. For log_err above, you'd actually have to do something like this, right? def log_err(exc, entry): logger.warn(Cannot stat {}.format(exc.filename)) entry.lstat = os.stat_result((0, 0, 0, 0, 0, 0, 0, 0, 0, 0)) return entry I would imagine we would provide a helper function: def stat_result(st_size=0, st_atime=0, st_mtime=0, ...): return os.stat_result((st_size, st_atime, st_mtime, ...)) and then in onerror entry.lstat = stat_result() Unless there's another simple way around this issue, I'm back to loving the simplicity of option #1, which avoids this whole question. Too simple is just as bad as too complex, and properly handling errors is rarely a simple task. Either we provide a clean way to deal with errors in the API, or we force every user everywhere to come up with their own system. Also, just because we provide it doesn't force people to use it, but if we don't provide it then people cannot use it. So here's the ways in which option #2 is now more complicated than option #1: 1) it has an additional info argument, the values of which have to be documented ('os', 'type', 'lstat', and what each one means) 2) it has an additional onerror argument, the signature of which and fairly complicated return value is non-obvious and has to be documented 3) it requires user modification of the DirEntry object, which needs documentation, and is potentially hard to implement 4) because the DirEntry object now allows modification, you need a stat_result() helper function to help you build your own stat values I'm afraid points 3 and 4 here add way too much complexity. Remind me why all this is better than the PEP 471 approach again? It handles all of these problems, is very direct, and uses built-in Python constructs (method calls and try/except error handling). And it's also simple to document -- much simpler than the above 4 things, which could be a couple of pages in the docs. Here's the doc required for the PEP 471 approach: Note about caching and error handling: The is_X() and lstat() functions may perform an lstat() on first call if the OS didn't already fetch this data when reading the directory. So if you need fine-grained error handling, catch OSError exceptions around these method calls. After the first call, the is_X() and lstat() functions cache the value on the DirEntry. -Ben ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
On 07/09/2014 12:03 PM, Ben Hoyt wrote: So here's the ways in which option #2 is now more complicated than option #1: 1) it has an additional info argument, the values of which have to be documented ('os', 'type', 'lstat', and what each one means) 2) it has an additional onerror argument, the signature of which and fairly complicated return value is non-obvious and has to be documented 3) it requires user modification of the DirEntry object, which needs documentation, and is potentially hard to implement 4) because the DirEntry object now allows modification, you need a stat_result() helper function to help you build your own stat values I'm afraid points 3 and 4 here add way too much complexity. I'm okay with dropping 3 and 4, and making the return from onerror being simply True to yield the entry, and False/None to skip it. That should make implementation much easier, and documentation not too strenuous either. -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
1) it has an additional info argument, the values of which have to be documented ('os', 'type', 'lstat', and what each one means) 2) it has an additional onerror argument, the signature of which and fairly complicated return value is non-obvious and has to be documented 3) it requires user modification of the DirEntry object, which needs documentation, and is potentially hard to implement 4) because the DirEntry object now allows modification, you need a stat_result() helper function to help you build your own stat values I'm afraid points 3 and 4 here add way too much complexity. I'm okay with dropping 3 and 4, and making the return from onerror being simply True to yield the entry, and False/None to skip it. That should make implementation much easier, and documentation not too strenuous either. That's definitely better in terms of complexity. Other python-devers, please chime in with your thoughts or votes. -Ben ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
2014-07-09 21:59 GMT+02:00 Ben Hoyt benh...@gmail.com: Other python-devers, please chime in with your thoughts or votes. Sorry, I didn't follow the whole discussion. IMO DirEntry must use methods and you should not expose nor document which infos are already provided by the OS or not. DirEntry should be a best-effort black-box object providing an API similar to pathlib.Path. is_dir() may be fast? fine, but don't say it in the documentation because Python must remain portable and you should not write code specific to one specific platform. is_dir(), is_file(), is_symlink() and lstat() can fail as any other Python function, no need to specialize them with custom error handler. If you care, just use a very standard try/except. I'm also against ensure_lstat=True or ideas like that fetching all datas transparently in the generator. The behaviour would be too different depending on the OS, and usually you don't need all informations. And it raises errors in the generator, which is something unusual, and difficult to handle (I don't like the onerror callback). Example where you may sometimes need is_dir(), but not always --- for entry in os.scandir(path): if ignore_entry(entry.name): # this entry is not interesting, lstat_result is useless here continue if entry.is_dir(): # fetch required data if needed continue ... --- I don't understand why you are all focused on handling os.stat() and os.lstat() errors. See for example the os.walk() function which is an old function (python 2.6!): it doesn't catch erros on isdir(), even if it has an onerror parameter... It only handles errors on listdir(). IMO errors on os.stat() and os.lstat() are very rare under very specific conditions. The most common case is that you can get the status if you can list files. Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
On 9 July 2014 21:24, Victor Stinner victor.stin...@gmail.com wrote: Example where you may sometimes need is_dir(), but not always --- for entry in os.scandir(path): if ignore_entry(entry.name): # this entry is not interesting, lstat_result is useless here continue if entry.is_dir(): # fetch required data if needed continue ... --- That is an extremely good point, and articulates why I've always been a bit uncomfortable with the whole ensure_stat idea. I don't understand why you are all focused on handling os.stat() and os.lstat() errors. See for example the os.walk() function which is an old function (python 2.6!): it doesn't catch erros on isdir(), even if it has an onerror parameter... It only handles errors on listdir(). IMO errors on os.stat() and os.lstat() are very rare under very specific conditions. The most common case is that you can get the status if you can list files. Personally, I'm only focused on it as a response to others feeling it's important. I'm on Windows, where there are no extra stat calls, so all *I* care about is having an API that deals with the use cases others are concerned about without making it too hard for me to use it on Windows where I don't have to worry about all this. If POSIX users come to a consensus that error handling doesn't need special treatment, I'm more than happy to go back to the PEP version. (Much as previously happened with the race condition debate). Paul ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
On 07/09/2014 01:57 PM, Paul Moore wrote: On 9 July 2014 21:24, Victor Stinner wrote: Example where you may sometimes need is_dir(), but not always --- for entry in os.scandir(path): if ignore_entry(entry.name): # this entry is not interesting, lstat_result is useless here continue if entry.is_dir(): # fetch required data if needed continue ... That is an extremely good point, and articulates why I've always been a bit uncomfortable with the whole ensure_stat idea. On a system which did not supply is_dir automatically I would write that as: for entry in os.scandir(path): # info defaults to 'os', which is basically None in this case if ignore_entry(entry.name): continue if os.path.isdir(entry.full_name): # do something interesting Not hard to read or understand, no time wasted in unnecessary lstat calls. -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
On 07/09/2014 01:24 PM, Victor Stinner wrote: Sorry, I didn't follow the whole discussion. IMO DirEntry must use methods and you should not expose nor document which infos are already provided by the OS or not. DirEntry should be a best-effort black-box object providing an API similar to pathlib.Path. is_dir() may be fast? fine, but don't say it in the documentation because Python must remain portable and you should not write code specific to one specific platform. Okay, so using that logic we should head over to the os module and remove: ctermid, getenv, getegid, geteuid, getgid, getgrouplist, getgroups, getpgid, getpgrp, getpriority, PRIO_PROCESS, PRIO_PGRP, PRIO_USER, getresuid, getresgid, getuid, initgroups, putenv, setegid, seteuid, setgid, setgroups, setpriority, setregid, setrusgid, setresuid, setreuid, getsid, setsid, setuid, unsetenv, fchmod, fchown, fdatasync, fpathconf, fstatvfs, ftruncate, lockf, F_LOCK, F_TLOCK, F_ULOCK, F_TEST, O_DSYNC, O_RSYNC, O_SYNC, O_NDELAY, O_NONBLOCK, O_NOCTTY, O_SHLOCK, O_EXLOCK, O_CLOEXEC, O_BINARY, O_NOINHERIT, O_SHORT_LIVED, O_TEMPORARY, O_RANDOM, O_SEQUENTIAL, O_TEXT, ... Okay, I'm tired of typing, but that list is not even half-way through the os page, and those are all methods or attributes that are not available on either Windows or Unix or some flavors of Unix. Oh, and all those upper-case attributes? Yup, documented. And when we don't document it ourselves we often refer readers to their system documentation because Python does not, in fact, return exactly the same results on all platforms -- particularly when calling into the OS. -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
On a system which did not supply is_dir automatically I would write that as: for entry in os.scandir(path): # info defaults to 'os', which is basically None in this case if ignore_entry(entry.name): continue if os.path.isdir(entry.full_name): # do something interesting Not hard to read or understand, no time wasted in unnecessary lstat calls. No, but how do you know whether you're on a system which did not supply is_dir automatically? The above is not cross-platform, or at least, not efficient cross-platform, which defeats the whole point of scandir -- the above is no better than listdir(). -Ben ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
I really don't understand why you *want* a worse, much less cross-platform API? Okay, so using that logic we should head over to the os module and remove: ctermid, getenv, getegid... Okay, I'm tired of typing, but that list is not even half-way through the os page, and those are all methods or attributes that are not available on either Windows or Unix or some flavors of Unix. True, is this really the precedent we want to *aim for*. listdir() is cross-platform, and it's relatively easy to make scandir() cross-platform, so why not? Oh, and all those upper-case attributes? Yup, documented. And when we don't document it ourselves we often refer readers to their system documentation because Python does not, in fact, return exactly the same results on all platforms -- particularly when calling into the OS. But again, why a worse, less cross-platform API when a simple, cross-platform one is a method call away? -Ben ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
2014-07-09 22:44 GMT+02:00 Ethan Furman et...@stoneleaf.us: On 07/09/2014 01:24 PM, Victor Stinner wrote: Sorry, I didn't follow the whole discussion. IMO DirEntry must use methods and you should not expose nor document which infos are already provided by the OS or not. DirEntry should be a best-effort black-box object providing an API similar to pathlib.Path. is_dir() may be fast? fine, but don't say it in the documentation because Python must remain portable and you should not write code specific to one specific platform. Okay, so using that logic we should head over to the os module and remove: (...) My comment was specific to the PEP 471, design of the DirEntry class. Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
On 07/09/2014 02:42 PM, Ben Hoyt wrote: Okay, so using that [no platform specific] logic we should head over to the os module and remove: ctermid, getenv, getegid... Okay, I'm tired of typing, but that list is not even half-way through the os page, and those are all methods or attributes that are not available on either Windows or Unix or some flavors of Unix. True, is this really the precedent we want to *aim for*. listdir() is cross-platform, and listdir has serious performance issues, which is why you developed scandir. Oh, and all those [snipped] upper-case attributes? Yup, documented. And when we don't document it ourselves we often refer readers to their system documentation because Python does not, in fact, return exactly the same results on all platforms -- particularly when calling into the OS. But again, why a worse, less cross-platform API when a simple, cross-platform one is a method call away? For the same reason we don't use code that makes threaded behavior better, but kills the single thread application. If the programmer would rather have consistency on all platforms rather than performance on the one being used, `info='lstat'` is the option to use. I like the 'onerror' API better primarily because it gives a single point to deal with the errors. This has at least a couple advantages: - less duplication of code: in the tree_size example, the error handling is duplicated twice - readablity: with the error handling in a separate routine, one does not have to jump around the try/except blocks looking for what happens if there are no errors -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
On 07/09/2014 02:38 PM, Victor Stinner wrote: 2014-07-09 22:44 GMT+02:00 Ethan Furman: On 07/09/2014 01:24 PM, Victor Stinner wrote: [...] Python must remain portable and you should not write code specific to one specific platform. Okay, so using that logic we should head over to the os module and remove: (...) My comment was specific to the PEP 471, design of the DirEntry class. And my comment was to the point of there being methods/attributes/return values that /do/ vary by platform, and /are/ documented as such. Even stat itself is not the same on Windows as posix. -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
On 07/09/2014 02:33 PM, Ben Hoyt wrote: On a system which did not supply is_dir automatically I would write that as: for entry in os.scandir(path): if ignore_entry(entry.name): continue if os.path.isdir(entry.full_name): # do something interesting Not hard to read or understand, no time wasted in unnecessary lstat calls. No, but how do you know whether you're on a system which did not supply is_dir automatically? The above is not cross-platform, or at least, not efficient cross-platform, which defeats the whole point of scandir -- the above is no better than listdir(). Hit a directory with 100,000 entries and you'll change your mind. ;) Okay, so the issue is you /want/ to write an efficient, cross-platform routine... hrmmm. thinking Okay, marry the two ideas together: scandir(path, info=None, onerror=None) Return a generator that returns one directory entry at a time in a DirEntry object info: None -- DirEntries will have whatever attributes the O/S provides 'type' -- DirEntries will already have at least the file/dir distinction 'stat' -- DirEntries will also already have stat information DirEntry.is_dir() Return True if this is a directory-type entry; may call os.lstat if the cache is empty. DirEntry.is_file() Return True if this is a file-type entry; may call os.lstat if the cache is empty. DirEntry.is_symlink() Return True if this is a symbolic link; may call os.lstat if the cache is empty. DirEntry.stat Return the stat info for this link; may call os.lstat if the cache is empty. This way both paradigms are supported. -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
On 2014-07-09 23:50, Ethan Furman wrote: On 07/09/2014 02:33 PM, Ben Hoyt wrote: On a system which did not supply is_dir automatically I would write that as: for entry in os.scandir(path): if ignore_entry(entry.name): continue if os.path.isdir(entry.full_name): # do something interesting Not hard to read or understand, no time wasted in unnecessary lstat calls. No, but how do you know whether you're on a system which did not supply is_dir automatically? The above is not cross-platform, or at least, not efficient cross-platform, which defeats the whole point of scandir -- the above is no better than listdir(). Hit a directory with 100,000 entries and you'll change your mind. ;) Okay, so the issue is you /want/ to write an efficient, cross-platform routine... hrmmm. thinking Okay, marry the two ideas together: scandir(path, info=None, onerror=None) Return a generator that returns one directory entry at a time in a DirEntry object Should that be that yields one directory entry at a time? info: None -- DirEntries will have whatever attributes the O/S provides 'type' -- DirEntries will already have at least the file/dir distinction 'stat' -- DirEntries will also already have stat information DirEntry.is_dir() Return True if this is a directory-type entry; may call os.lstat if the cache is empty. DirEntry.is_file() Return True if this is a file-type entry; may call os.lstat if the cache is empty. DirEntry.is_symlink() Return True if this is a symbolic link; may call os.lstat if the cache is empty. DirEntry.stat Return the stat info for this link; may call os.lstat if the cache is empty. Why is is_dir, et al, functions, but stat not a function? This way both paradigms are supported. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
On 07/09/2014 04:22 PM, MRAB wrote: On 2014-07-09 23:50, Ethan Furman wrote: Okay, marry the two ideas together: scandir(path, info=None, onerror=None) Return a generator that returns one directory entry at a time in a DirEntry object Should that be that yields one directory entry at a time? Yes, thanks. info: None -- DirEntries will have whatever attributes the O/S provides 'type' -- DirEntries will already have at least the file/dir distinction 'stat' -- DirEntries will also already have stat information DirEntry.is_dir() Return True if this is a directory-type entry; may call os.lstat if the cache is empty. DirEntry.is_file() Return True if this is a file-type entry; may call os.lstat if the cache is empty. DirEntry.is_symlink() Return True if this is a symbolic link; may call os.lstat if the cache is empty. DirEntry.stat Return the stat info for this link; may call os.lstat if the cache is empty. Why is is_dir, et al, functions, but stat not a function? Good point. Make stat a function as well. -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
2014-07-09 17:29 GMT+02:00 Ben Hoyt benh...@gmail.com: Would this not break the tree size script being discussed in the other thread, as it would follow links and include linked directories in the size of the tree? The get_tree_size() function in the PEP would use: if not entry.is_symlink() and entry.is_dir():. Note: First I wrote if entry.is_dir() and not entry.is_symlink():, but this syntax is slower on Linux because is_dir() has to call lstat(). Adding an optional keyword to DirEntry.is_dir() would allow to write if entry.is_dir(follow_symlink=False), but it looks like a micro optimization and as I said, I prefer to stick to pathlib.Path API (which was already heavily discussed in its PEP). Anyway, this case is rare (I explain that below), we should not worry too much about it. Yeah, I agree. Victor -- I don't think the DirEntry is_X() methods (or attributes) should mimic the link-following os.path.isdir() at all. You want the type of the entry, not the type of the source. On UNIX, a symlink to a directory is expected to behave like a directory. For example, in a file browser, you should enter in the linked directory when you click on a symlink to a directory. There are only a few cases where you want to handle symlinks differently: archive (ex: tar), compute the size of a directory (ex: du does not follow symlinks by default, du -L follows them), remove a directory. You should do a short poll in the Python stdlib and on the Internet to check what is the most common check. Examples of the Python stdlib: - zipfile: listdir + os.path.isdir - pkgutil: listdir + os.path.isdir - unittest.loader: listdir + os.path.isdir and os.path.isfile - http.server: listdir + os.path.isdir, it also uses os.path.islink: Append / for directories or @ for symbolic links - idlelib.GrepDialog: listdir + os.path.isdir - compileall: listdir + os.path.isdir and os.path.isdir(fullname) and not os.path.islink(fullname) = don't follow symlinks to directories - shutil (copytree): listdir + os.path.isdir + os.path.islink - shutil (rmtree): listdir + os.lstat() + stat.S_ISDIR(mode) = don't follow symlinks to directories - mailbox: listdir + os.path.isdir - tabnanny: listdir + os.path.isdir - os.walk: listdir + os.path.isdir + os.path.islink = don't follow symlinks to directories by default, but the behaviour is configurable ... but symlinks to directories are added to the dirs list (not all symlinks, only symlinks to directories) - setup.py: listdir + os.path.isfile In this list of 12 examples, only compileall, shutil.rmtree and os.walk check if entries are symlinks. compileall starts by checking if not os.path.isdir(fullname): which follows symlinks. os.walk() starts by checking if os.path.isdir(name): which follows symlinks. I consider that only one case on 12 (8.3%) doesn't follow symlinks. If entry.is_dir() doesn't follow symlinks, the other 91.7% will need to be modified to use if entry.is_dir() or (entry.is_link() and os.path.is_dir(entry.full_name)): to keep the same behaviour :-( Otherwise, as Paul says, you are essentially forced to follow links, and os.walk(followlinks=False), which is the default, can't do the right thing. os.walk() and get_tree_size() are good users of scandir(), but they are recursive functions. It means that you may handle symlinks differently, os.walk() gives the choice to follow or not symlinks for example. Recursive functions are rare. The most common case is to list files of a single directory and then filter files depending on various filters (is a file? is a directory? match the file name? ...). In such use case, you don't care of symlinks (you want to follow them). Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
2014-07-09 17:26 GMT+02:00 Paul Moore p.f.mo...@gmail.com: On 9 July 2014 16:05, Victor Stinner victor.stin...@gmail.com wrote: The PEP says that DirEntry should mimic pathlib.Path, so I think that DirEntry.is_dir() should work as os.path.isir(): if the entry is a symbolic link, you should follow the symlink to get the status of the linked file with os.stat(). (...) As a Windows user with only a superficial understanding of how symlinks should behave, (...) FYI Windows also supports symbolic links since Windows Vista. The feature is unknown because it is restricted to the administrator account. Try the mklink command in a terminal (cmd.exe) ;-) http://en.wikipedia.org/wiki/NTFS_symbolic_link ... To be honest, I never created a symlink on Windows. But since it is supported, you need to know it to write correctly your Windows code. (It's unrelated to LNK files.) Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
Ben Hoyt benh...@gmail.com writes: So here's the ways in which option #2 is now more complicated than option #1: 1) it has an additional info argument, the values of which have to be documented ('os', 'type', 'lstat', and what each one means) 2) it has an additional onerror argument, the signature of which and fairly complicated return value is non-obvious and has to be documented 3) it requires user modification of the DirEntry object, which needs documentation, and is potentially hard to implement 4) because the DirEntry object now allows modification, you need a stat_result() helper function to help you build your own stat values I'm afraid points 3 and 4 here add way too much complexity. Points 3 and 4 are not required to go with option #2, option #2 merely allows to implement points 3 and 4 at some point in the future if it turns out to be desirable. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.« ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
On 07/09/2014 05:15 PM, Victor Stinner wrote: 2014-07-09 17:29 GMT+02:00 Ben Hoyt benh...@gmail.com: Would this not break the tree size script being discussed in the other thread, as it would follow links and include linked directories in the size of the tree? The get_tree_size() function in the PEP would use: if not entry.is_symlink() and entry.is_dir():. Note: First I wrote if entry.is_dir() and not entry.is_symlink():, but this syntax is slower on Linux because is_dir() has to call lstat(). Wouldn't it only have to call lstat if the entry was, in fact, a link? There are only a few cases where you want to handle symlinks differently: archive (ex: tar), compute the size of a directory (ex: du does not follow symlinks by default, du -L follows them), remove a directory. I agree with Victor here. If the entry is a link I would want to know if it was a link to a directory or a link to a file. If I care about not following sym links I can check is_symlink() (or whatever it's called). -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
Oh, since I'm proposing to add a new stat() method to DirEntry, we can optimize it. stat() can reuse lstat() result if the file is not a symlink. It simplifies is_dir(). New pseudo-code: --- class DirEntry: def __init__(self, path, name, lstat=None, d_type=None): self.name = name self.full_name = os.path.join(path, name) # lstat is known on Windows self._lstat = lstat if lstat is not None and not stat.S_ISLNK(lstat.st_mode): # On Windows, stat() only calls os.stat() for symlinks self._stat = lstat else: self._stat = None # d_type is known on UNIX if d_type != DT_UNKNOWN: self._d_type = d_type else: # DT_UNKNOWN is not a very useful information :-p self._d_type = None def stat(self): if self._stat is None: self._stat = os.stat(self.full_name) return self._stat def lstat(self): if self._lstat is None: self._lstat = os.lstat(self.full_name) if self._stat is None and not stat.S_ISLNK(self._lstat.st_mode): self._stat = lstat return self._lstat def is_dir(self): if self._d_type is not None: if self._d_type == DT_DIR: return True if self._d_type != DT_LNK: return False else: lstat = self.lstat() if stat.S_ISDIR(lstat.st_mode): return True stat = self.stat() # if lstat() was already called, stat() will only call os.stat() for symlink return stat.S_ISDIR(stat.st_mode) --- The extra caching rules are complex, that's why I suggest to not document them. In short: is_dir() only needs an extra syscall for symlinks, for other file types it does not need any syscall. Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
On 10 July 2014 10:23, Victor Stinner victor.stin...@gmail.com wrote: 2014-07-09 17:26 GMT+02:00 Paul Moore p.f.mo...@gmail.com: On 9 July 2014 16:05, Victor Stinner victor.stin...@gmail.com wrote: The PEP says that DirEntry should mimic pathlib.Path, so I think that DirEntry.is_dir() should work as os.path.isir(): if the entry is a symbolic link, you should follow the symlink to get the status of the linked file with os.stat(). (...) As a Windows user with only a superficial understanding of how symlinks should behave, (...) FYI Windows also supports symbolic links since Windows Vista. The feature is unknown because it is restricted to the administrator account. Try the mklink command in a terminal (cmd.exe) ;-) http://en.wikipedia.org/wiki/NTFS_symbolic_link ... To be honest, I never created a symlink on Windows. But since it is supported, you need to know it to write correctly your Windows code. Personally, I create them all the time on Windows - mainly via the Link Shell Extension http://schinagl.priv.at/nt/hardlinkshellext/linkshellextension.html. It's the easiest way to ensure that my directory structures are as I want them whilst not worrying about where the files really are e.g. code on SSD, GB+-sized data files on rusty metal, symlinks makes it look like it's the same directory structure. Same thing can be done with junctions if you're only dealing with directories, but symlinks work with files as well. I work cross-platform, and have a mild preference for option #2 with similar semantics on all platforms. Tim Delaney ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
Ben Hoyt benh...@gmail.com writes: ... ``scandir()`` yields a ``DirEntry`` object for each file and directory in ``path``. Just like ``listdir``, the ``'.'`` and ``'..'`` pseudo-directories are skipped, and the entries are yielded in system-dependent order. Each ``DirEntry`` object has the following attributes and methods: * ``name``: the entry's filename, relative to the ``path`` argument (corresponds to the return values of ``os.listdir``) * ``full_name``: the entry's full path name -- the equivalent of ``os.path.join(path, entry.name)`` I suggest renaming .full_name - .path .full_name might be misleading e.g., it implies that .full_name == abspath(.full_name) that might be false. The .path name has no such associations. The semantics of the the .path attribute is defined by these assertions:: for entry in os.scandir(topdir): #NOTE: assume os.path.normpath(topdir) is not called to create .path assert entry.path == os.path.join(topdir, entry.name) assert entry.name == os.path.basename(entry.path) assert entry.name == os.path.relpath(entry.path, start=topdir) assert os.path.dirname(entry.path) == topdir assert (entry.path != os.path.abspath(entry.path) or os.path.isabs(topdir)) # it is absolute only if topdir is assert (entry.path != os.path.realpath(entry.path) or topdir == os.path.realpath(topdir)) # symlinks are not resolved assert (entry.path != os.path.normcase(entry.path) or topdir == os.path.normcase(topdir)) # no case-folding, # unlike PureWindowsPath ... * ``is_dir()``: like ``os.path.isdir()``, but much cheaper -- it never requires a system call on Windows, and usually doesn't on POSIX systems I suggest documenting the implicit follow_symlinks parameter for .is_X methods. Note: lstat == partial(stat, follow_symlinks=False). In particular, .is_dir() should probably use follow_symlinks=True by default as suggested by Victor Stinner *if .is_dir() does it on Windows* MSDN says: GetFileAttributes() does not follow symlinks. os.path.isdir docs imply follow_symlinks=True: both islink() and isdir() can be true for the same path. ... Like the other functions in the ``os`` module, ``scandir()`` accepts either a bytes or str object for the ``path`` parameter, and returns the ``DirEntry.name`` and ``DirEntry.full_name`` attributes with the same type as ``path``. However, it is *strongly recommended* to use the str type, as this ensures cross-platform support for Unicode filenames. Document when {e.name for e in os.scandir(path)} != set(os.listdir(path)) + e.g., path can be an open file descriptor in os.listdir(path) since Python 3.3 but the PEP doesn't mention it explicitly. It has been discussed already e.g., https://mail.python.org/pipermail/python-dev/2014-July/135296.html PEP 471 should explicitly reject the support for specifying a file descriptor so that a code that uses os.scandir may assume that entry.path (.full_name) attribute is always present (no exceptions due to a failure to read /proc/self/fd/NNN or an error while calling fcntl(F_GETPATH) or GetFileInformationByHandleEx() -- see http://stackoverflow.com/q/1188757 ). Reject explicitly in PEP 471 the support for dir_fd parameter + aka the support for paths relative to directory descriptors. Note: it is a *different* (but related) issue. ... Notes on exception handling --- ``DirEntry.is_X()`` and ``DirEntry.lstat()`` are explicitly methods rather than attributes or properties, to make it clear that they may not be cheap operations, and they may do a system call. As a result, these methods may raise ``OSError``. For example, ``DirEntry.lstat()`` will always make a system call on POSIX-based systems, and the ``DirEntry.is_X()`` methods will make a ``stat()`` system call on such systems if ``readdir()`` returns a ``d_type`` with a value of ``DT_UNKNOWN``, which can occur under certain conditions or on certain file systems. For this reason, when a user requires fine-grained error handling, it's good to catch ``OSError`` around these method calls and then handle as appropriate. I suggest documenting that next(os.scandir()) may raise OSError e.g., on POSIX it may happen due to an OS error in opendir/readdir/closedir Also, document whether os.scandir() itself may raise OSError (whether opendir or other OS functions may be called before the first yield). ... os.scandir() should allow the explicit cleanup ++ :: with closing(os.scandir()) as entries: for _ in entries: break entries.close() is called that frees the resources if necessary, to *avoid relying on garbage-collection for managing file
Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
On 9 Jul 2014 17:14, Ethan Furman et...@stoneleaf.us wrote: On 07/09/2014 02:42 PM, Ben Hoyt wrote: Okay, so using that [no platform specific] logic we should head over to the os module and remove: ctermid, getenv, getegid... Okay, I'm tired of typing, but that list is not even half-way through the os page, and those are all methods or attributes that are not available on either Windows or Unix or some flavors of Unix. True, is this really the precedent we want to *aim for*. listdir() is cross-platform, and listdir has serious performance issues, which is why you developed scandir. Oh, and all those [snipped] upper-case attributes? Yup, documented. And when we don't document it ourselves we often refer readers to their system documentation because Python does not, in fact, return exactly the same results on all platforms -- particularly when calling into the OS. But again, why a worse, less cross-platform API when a simple, cross-platform one is a method call away? For the same reason we don't use code that makes threaded behavior better, but kills the single thread application. If the programmer would rather have consistency on all platforms rather than performance on the one being used, `info='lstat'` is the option to use. I like the 'onerror' API better primarily because it gives a single point to deal with the errors. This has at least a couple advantages: - less duplication of code: in the tree_size example, the error handling is duplicated twice - readablity: with the error handling in a separate routine, one does not have to jump around the try/except blocks looking for what happens if there are no errors The onerror approach can also deal with readdir failing, which the PEP currently glosses over. I'm somewhat inclined towards the current approach in the PEP, but I'd like to see an explanation of two aspects: 1. How a scandir variant with an 'onerror' option could be implemented given the version in the PEP 2. How the existing scandir module handles the 'onerror' parameter to its directory walking function Regards, Nick. -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com