Re: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()
> Even though there is tangible performance improvement from scandir(), it > would be useful to find out if the API fits well. Got it -- I see where you're coming from now. I'll take a quick look (hopefully later this week). -Ben ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()
Hi, 2014-07-20 18:50 GMT+02:00 Antoine Pitrou : > Have you tried modifying importlib's _bootstrap.py to use scandir() instead > of listdir() + stat()? IMO the current os.scandir() API does not fit importlib requirements. importlib usually wants fresh data, whereas DirEntry cache cannot be invalidated. It's probably possible to cache some os.stat() result in importlib, but it looks like it requires a non trivial refactoring of the code. I don't know importlib enough to suggest how to change it. There are many open isssues related to stat() in importlib, I found these ones: http://bugs.python.org/issue14604 http://bugs.python.org/issue14067 http://bugs.python.org/issue19216 Closed issues: http://bugs.python.org/issue17330 http://bugs.python.org/issue18810 By the way, DirEntry constructor is not documented in the PEP. Should we document it? It might be a way to "invalidate the cache": entry = DirEntry(os.path.dirname(entry.path), entry.name) Maybe it is an abuse of the API. A clear_cache() method would be less ugly :-) But maybe Ben Hoyt does not want to promote keeping DirEntry for a long time? Another question: should we expose DirEntry type directly in the os namespace? (os.DirEntry) Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()
Victor Stinner wrote: > 2014-07-20 18:50 GMT+02:00 Antoine Pitrou : >> Have you tried modifying importlib's _bootstrap.py to use scandir() >> instead of listdir() + stat()? > > IMO the current os.scandir() API does not fit importlib requirements. > importlib usually wants fresh data, whereas DirEntry cache cannot be > invalidated. It's probably possible to cache some os.stat() result in > importlib, but it looks like it requires a non trivial refactoring of > the code. I don't know importlib enough to suggest how to change it. The data is completely fresh at the time it is obtained, which is identical to using stat(). There will always be a race-condition between looking and doing, which is why we still use exception handling on actions. > By the way, DirEntry constructor is not documented in the PEP. Should > we document it? It might be a way to "invalidate the cache": > > entry = DirEntry(os.path.dirname(entry.path), entry.name) > > Maybe it is an abuse of the API. A clear_cache() method would be less > ugly :-) But maybe Ben Hoyt does not want to promote keeping DirEntry > for a long time? DirEntry is a convenient way to return a tuple without returning a tuple, that's all. If you want up to date info, call os.stat() and pass in the path. This should just be a better (and ideally transparent) substitute for os.listdir() in every single context. Personally I'd make it a string subclass and put one-shot properties on it (i.e. call/cache stat() on first access where we don't already know the answer), which I think is close enough to where it's landed that I'm happy. (As far as bikeshedding goes, I prefer "_DirEntry" and no docs :) ) Cheers, Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()
Thanks for an initial look into this, Victor. > IMO the current os.scandir() API does not fit importlib requirements. > importlib usually wants fresh data, whereas DirEntry cache cannot be > invalidated. It's probably possible to cache some os.stat() result in > importlib, but it looks like it requires a non trivial refactoring of > the code. I don't know importlib enough to suggest how to change it. Yes, with importlib already doing its own caching (somewhat complicated, as the open and closed issues show), I get the feeling it wouldn't be a good fit. Note that I'm not saying we wouldn't use it if we were implementing importlib from scratch. > By the way, DirEntry constructor is not documented in the PEP. Should > we document it? It might be a way to "invalidate the cache": I would prefer not to, just to keep things simple. Similar to creating os.stat_result() objects ... you can kind of do it (see scandir.py), but it's not recommended or even documented. The entire purpose of DirEntry objects is so scandir can produce them, not for general use. > entry = DirEntry(os.path.dirname(entry.path), entry.name) > > Maybe it is an abuse of the API. A clear_cache() method would be less > ugly :-) But maybe Ben Hoyt does not want to promote keeping DirEntry > for a long time? > > Another question: should we expose DirEntry type directly in the os > namespace? (os.DirEntry) Again, I'd rather not expose this. It's quite system-specific (see the different system versions in scandir.py), and trying to combine this, make it consistent, and document it would be a bit of a pain, and also possibly prevent future modifications (because then the parts of the implementation would be set in stone). I'm not really opposed to a clear_cache() method -- basically it'd set _lstat and _stat and _d_type to None internally. However, I'd prefer to keep it as is, and as the PEP says: If developers want "refresh" behaviour (for example, for watching a file's size change), they can simply use pathlib.Path objects, or call the regular os.stat() or os.path.getsize() functions which get fresh data from the operating system every call. -Ben ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Reviving restricted mode?
Sorry about being a bit late on this front (just 5 years...), but I've extended tav's jail to module level, and added the niceties. It's goal is similar to that of rexec, stopping IO, but not crashes. It is currently at https://github.com/matsjoyce/sandypython, and it has instructions as to its use. I've bashed it with all the exploits I've found online, and its still holding, so I thought the public might like ago. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Reviving restricted mode?
Hi, 2014-07-21 21:26 GMT+02:00 matsjoyce : > Sorry about being a bit late on this front (just 5 years...), but I've > extended tav's jail to module level, and added the niceties. It's goal is > similar to that of rexec, stopping IO, but not crashes. It is currently at > https://github.com/matsjoyce/sandypython, and it has instructions as to its > use. I've bashed it with all the exploits I've found online, and its still > holding, so I thought the public might like ago. I wrote this project, started from tav's jail: https://github.com/haypo/pysandbox/ I gave up because I know consider that pysandbox is broken by design. Please read the LWN article: https://lwn.net/Articles/574215/ Don't hesitate to ask more specific questions. Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()
On 22 Jul 2014 02:46, "Steve Dower" wrote: > > Personally I'd make it a string subclass and put one-shot properties on it (i.e. call/cache stat() on first access where we don't already know the answer), which I think is close enough to where it's landed that I'm happy. (As far as bikeshedding goes, I prefer "_DirEntry" and no docs :) ) +1 for "_DirEntry" as the name in the implementation, and documenting its behaviour under "scandir" rather than as a standalone object. Only -0 for full documentation as a standalone class, though. Cheers, Nick. > > Cheers, > Steve > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 471 "scandir" accepted
Hi, I asked privately Guido van Rossum if I can be the BDFL-delegate for the PEP 471 and he agreed. I accept the latest version of the PEP: http://legacy.python.org/dev/peps/pep-0471/ I consider that the PEP 471 "scandir" was discussed enough to collect all possible options (variations of the API) and that main flaws have been detected. Ben Hoyt modified his PEP to list all these options, and for each option gives advantages and drawbacks. Great job Ben :-) Thanks all developers who contributed to the threads on the python-dev mailing list! The new version of the PEP has an optional "follow_symlinks" parameter which is True by default. IMO this API fits better the common case, list the content of a single directory, and it's now simple to not follow symlinks to implement a recursive function like os.walk(). The PEP also explicitly mentions that os.walk() will be modified to benefit of the new os.scandir() function. I'm happy because the final API is very close to os.path functions and pathlib.Path methods. Python stays consistent, which is a great power of this language! The PEP is accepted. It's time to review the implementation ;-) The current code can be found at: https://github.com/benhoyt/scandir (I don't think that Ben already updated his implementation for the latest version of the PEP.) Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()
2014-07-21 18:48 GMT+02:00 Ben Hoyt : >> By the way, DirEntry constructor is not documented in the PEP. Should >> we document it? It might be a way to "invalidate the cache": > > I would prefer not to, just to keep things simple. Similar to creating > os.stat_result() objects ... you can kind of do it (see scandir.py), > but it's not recommended or even documented. The entire purpose of > DirEntry objects is so scandir can produce them, not for general use. > >> entry = DirEntry(os.path.dirname(entry.path), entry.name) >> >> Maybe it is an abuse of the API. A clear_cache() method would be less >> ugly :-) But maybe Ben Hoyt does not want to promote keeping DirEntry >> for a long time? >> >> Another question: should we expose DirEntry type directly in the os >> namespace? (os.DirEntry) > > Again, I'd rather not expose this. It's quite system-specific (see the > different system versions in scandir.py), and trying to combine this, > make it consistent, and document it would be a bit of a pain, and also > possibly prevent future modifications (because then the parts of the > implementation would be set in stone). We should mimic os.stat() and os.stat_result: os.stat_result symbol exists in the os namespace, but the type constructor is not documented. No need for extra protection like not adding the type in the os module, or adding a "_" prefix to the name. By the way, it's possible to serialize a stat_result with pickle. See also my issue "Enhance doc of os.stat_result": http://bugs.python.org/issue21813 > I'm not really opposed to a clear_cache() method -- basically it'd set > _lstat and _stat and _d_type to None internally. However, I'd prefer > to keep it as is, and as the PEP says: (...) Ok, agreed. Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471 "scandir" accepted
> I asked privately Guido van Rossum if I can be the BDFL-delegate for > the PEP 471 and he agreed. I accept the latest version of the PEP: > > http://legacy.python.org/dev/peps/pep-0471/ Thank you! > The PEP also explicitly mentions that os.walk() will be modified to > benefit of the new os.scandir() function. Yes, this was a good suggestion to include that explicitly -- in actual fact, speeding up os.walk() was my main goal initially. > The PEP is accepted. Superb. Could you please update the PEP with the Resolution and BDFL-Delegate fields? > It's time to review the implementation ;-) The current code can be found at: > >https://github.com/benhoyt/scandir > > (I don't think that Ben already updated his implementation for the > latest version of the PEP.) I have actually updated my GitHub repo for the current PEP (did this last Saturday). However, there are still a few open issues, the main one is that my scandir.py module doesn't handle the bytes/str thing properly. I intend to work on the CPython implementation over the next few weeks. However, a couple of thoughts up-front: I think if I were doing this from scratch I'd reimplement listdir() in Python as "return [e.name for e in scandir(path)]". However, I'm not sure this is a good idea, as I don't really want listdir() to suddenly use more memory and perform slightly *worse* due to the extra DirEntry object allocations. So my basic plan is to have an internal helper function in posixmodule.c that either yields DirEntry objects or strings. And then listdir() would simply be defined something like "return list(_scandir(path, yield_strings=True))" in C or in Python. My reasoning is that then there'll be much less (if any) code duplication between scandir() and listdir(). Does this sound like a reasonable approach? -Ben ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()
> We should mimic os.stat() and os.stat_result: os.stat_result symbol > exists in the os namespace, but the type constructor is not > documented. No need for extra protection like not adding the type in > the os module, or adding a "_" prefix to the name. Yeah, that works for me. > By the way, it's possible to serialize a stat_result with pickle. That makes sense, as stat_result is basically just a tuple and a bit extra. I wonder if it should be possible to pickle DirEntry objects? I'm thinking possibly not. If so, would it cache the stat or file type info? -Ben ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com