Re: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()

2014-07-21 Thread Ben Hoyt
> Even though there is tangible performance improvement from scandir(), it
> would be useful to find out if the API fits well.

Got it -- I see where you're coming from now. I'll take a quick look
(hopefully later this week).

-Ben
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()

2014-07-21 Thread Victor Stinner
Hi,

2014-07-20 18:50 GMT+02:00 Antoine Pitrou :
> Have you tried modifying importlib's _bootstrap.py to use scandir() instead
> of listdir() + stat()?

IMO the current os.scandir() API does not fit importlib requirements.
importlib usually wants fresh data, whereas DirEntry cache cannot be
invalidated. It's probably possible to cache some os.stat() result in
importlib, but it looks like it requires a non trivial refactoring of
the code. I don't know importlib enough to suggest how to change it.

There are many open isssues related to stat() in importlib, I found these ones:

http://bugs.python.org/issue14604
http://bugs.python.org/issue14067
http://bugs.python.org/issue19216

Closed issues:

http://bugs.python.org/issue17330
http://bugs.python.org/issue18810


By the way, DirEntry constructor is not documented in the PEP. Should
we document it? It might be a way to "invalidate the cache":

entry = DirEntry(os.path.dirname(entry.path), entry.name)

Maybe it is an abuse of the API. A clear_cache() method would be less
ugly :-) But maybe Ben Hoyt does not want to promote keeping DirEntry
for a long time?

Another question: should we expose DirEntry type directly in the os
namespace? (os.DirEntry)

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()

2014-07-21 Thread Steve Dower
Victor Stinner wrote:
> 2014-07-20 18:50 GMT+02:00 Antoine Pitrou :
>> Have you tried modifying importlib's _bootstrap.py to use scandir() 
>> instead of listdir() + stat()?
>
> IMO the current os.scandir() API does not fit importlib requirements.
> importlib usually wants fresh data, whereas DirEntry cache cannot be
> invalidated. It's probably possible to cache some os.stat() result in
> importlib, but it looks like it requires a non trivial refactoring of
> the code. I don't know importlib enough to suggest how to change it.

The data is completely fresh at the time it is obtained, which is identical to 
using stat(). There will always be a race-condition between looking and doing, 
which is why we still use exception handling on actions.

> By the way, DirEntry constructor is not documented in the PEP. Should
> we document it? It might be a way to "invalidate the cache":
>
> entry = DirEntry(os.path.dirname(entry.path), entry.name)
>
> Maybe it is an abuse of the API. A clear_cache() method would be less
> ugly :-) But maybe Ben Hoyt does not want to promote keeping DirEntry
> for a long time?

DirEntry is a convenient way to return a tuple without returning a tuple, 
that's all. If you want up to date info, call os.stat() and pass in the path. 
This should just be a better (and ideally transparent) substitute for 
os.listdir() in every single context.

Personally I'd make it a string subclass and put one-shot properties on it 
(i.e. call/cache stat() on first access where we don't already know the 
answer), which I think is close enough to where it's landed that I'm happy. (As 
far as bikeshedding goes, I prefer "_DirEntry" and no docs :) )

Cheers,
Steve
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()

2014-07-21 Thread Ben Hoyt
Thanks for an initial look into this, Victor.

> IMO the current os.scandir() API does not fit importlib requirements.
> importlib usually wants fresh data, whereas DirEntry cache cannot be
> invalidated. It's probably possible to cache some os.stat() result in
> importlib, but it looks like it requires a non trivial refactoring of
> the code. I don't know importlib enough to suggest how to change it.

Yes, with importlib already doing its own caching (somewhat
complicated, as the open and closed issues show), I get the feeling it
wouldn't be a good fit. Note that I'm not saying we wouldn't use it if
we were implementing importlib from scratch.

> By the way, DirEntry constructor is not documented in the PEP. Should
> we document it? It might be a way to "invalidate the cache":

I would prefer not to, just to keep things simple. Similar to creating
os.stat_result() objects ... you can kind of do it (see scandir.py),
but it's not recommended or even documented. The entire purpose of
DirEntry objects is so scandir can produce them, not for general use.

> entry = DirEntry(os.path.dirname(entry.path), entry.name)
>
> Maybe it is an abuse of the API. A clear_cache() method would be less
> ugly :-) But maybe Ben Hoyt does not want to promote keeping DirEntry
> for a long time?
>
> Another question: should we expose DirEntry type directly in the os
> namespace? (os.DirEntry)

Again, I'd rather not expose this. It's quite system-specific (see the
different system versions in scandir.py), and trying to combine this,
make it consistent, and document it would be a bit of a pain, and also
possibly prevent future modifications (because then the parts of the
implementation would be set in stone).

I'm not really opposed to a clear_cache() method -- basically it'd set
_lstat and _stat and _d_type to None internally. However, I'd prefer
to keep it as is, and as the PEP says:

If developers want "refresh" behaviour (for example, for watching a
file's size change), they can simply use pathlib.Path objects, or call
the regular os.stat() or os.path.getsize() functions which get fresh
data from the operating system every call.

-Ben
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reviving restricted mode?

2014-07-21 Thread matsjoyce
Sorry about being a bit late on this front (just 5 years...), but I've 
extended tav's jail to module level, and added the niceties. It's goal is 
similar to that of rexec, stopping IO, but not crashes. It is currently at 
https://github.com/matsjoyce/sandypython, and it has instructions as to its 
use. I've bashed it with all the exploits I've found online, and its still 
holding, so I thought the public might like ago.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reviving restricted mode?

2014-07-21 Thread Victor Stinner
Hi,

2014-07-21 21:26 GMT+02:00 matsjoyce :
> Sorry about being a bit late on this front (just 5 years...), but I've
> extended tav's jail to module level, and added the niceties. It's goal is
> similar to that of rexec, stopping IO, but not crashes. It is currently at
> https://github.com/matsjoyce/sandypython, and it has instructions as to its
> use. I've bashed it with all the exploits I've found online, and its still
> holding, so I thought the public might like ago.

I wrote this project, started from tav's jail:
https://github.com/haypo/pysandbox/

I gave up because I know consider that pysandbox is broken by design.
Please read the LWN article:
https://lwn.net/Articles/574215/

Don't hesitate to ask more specific questions.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()

2014-07-21 Thread Nick Coghlan
On 22 Jul 2014 02:46, "Steve Dower"  wrote:
>
> Personally I'd make it a string subclass and put one-shot properties on
it (i.e. call/cache stat() on first access where we don't already know the
answer), which I think is close enough to where it's landed that I'm happy.
(As far as bikeshedding goes, I prefer "_DirEntry" and no docs :) )

+1 for "_DirEntry" as the name in the implementation, and documenting its
behaviour under "scandir" rather than as a standalone object.

Only -0 for full documentation as a standalone class, though.

Cheers,
Nick.

>
> Cheers,
> Steve
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] PEP 471 "scandir" accepted

2014-07-21 Thread Victor Stinner
Hi,

I asked privately Guido van Rossum if I can be the BDFL-delegate for
the PEP 471 and he agreed. I accept the latest version of the PEP:

http://legacy.python.org/dev/peps/pep-0471/

I consider that the PEP 471 "scandir" was discussed enough to collect
all possible options (variations of the API) and that main flaws have
been detected. Ben Hoyt modified his PEP to list all these options,
and for each option gives advantages and drawbacks. Great job Ben :-)
Thanks all developers who contributed to the threads on the python-dev
mailing list!

The new version of the PEP has an optional "follow_symlinks" parameter
which is True by default. IMO this API fits better the common case,
list the content of a single directory, and it's now simple to not
follow symlinks to implement a recursive function like os.walk().

The PEP also explicitly mentions that os.walk() will be modified to
benefit of the new os.scandir() function.

I'm happy because the final API is very close to os.path functions and
pathlib.Path methods. Python stays consistent, which is a great power
of this language!

The PEP is accepted. It's time to review the implementation ;-) The
current code can be found at:

   https://github.com/benhoyt/scandir

(I don't think that Ben already updated his implementation for the
latest version of the PEP.)

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()

2014-07-21 Thread Victor Stinner
2014-07-21 18:48 GMT+02:00 Ben Hoyt :
>> By the way, DirEntry constructor is not documented in the PEP. Should
>> we document it? It might be a way to "invalidate the cache":
>
> I would prefer not to, just to keep things simple. Similar to creating
> os.stat_result() objects ... you can kind of do it (see scandir.py),
> but it's not recommended or even documented. The entire purpose of
> DirEntry objects is so scandir can produce them, not for general use.
>
>> entry = DirEntry(os.path.dirname(entry.path), entry.name)
>>
>> Maybe it is an abuse of the API. A clear_cache() method would be less
>> ugly :-) But maybe Ben Hoyt does not want to promote keeping DirEntry
>> for a long time?
>>
>> Another question: should we expose DirEntry type directly in the os
>> namespace? (os.DirEntry)
>
> Again, I'd rather not expose this. It's quite system-specific (see the
> different system versions in scandir.py), and trying to combine this,
> make it consistent, and document it would be a bit of a pain, and also
> possibly prevent future modifications (because then the parts of the
> implementation would be set in stone).

We should mimic os.stat() and os.stat_result: os.stat_result symbol
exists in the os namespace, but the type constructor is not
documented. No need for extra protection like not adding the type in
the os module, or adding a "_" prefix to the name.

By the way, it's possible to serialize a stat_result with pickle.

See also my issue "Enhance doc of os.stat_result":
http://bugs.python.org/issue21813

> I'm not really opposed to a clear_cache() method -- basically it'd set
> _lstat and _stat and _d_type to None internally. However, I'd prefer
> to keep it as is, and as the PEP says: (...)

Ok, agreed.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 "scandir" accepted

2014-07-21 Thread Ben Hoyt
> I asked privately Guido van Rossum if I can be the BDFL-delegate for
> the PEP 471 and he agreed. I accept the latest version of the PEP:
>
> http://legacy.python.org/dev/peps/pep-0471/

Thank you!

> The PEP also explicitly mentions that os.walk() will be modified to
> benefit of the new os.scandir() function.

Yes, this was a good suggestion to include that explicitly -- in
actual fact, speeding up os.walk() was my main goal initially.

> The PEP is accepted.

Superb. Could you please update the PEP with the Resolution and
BDFL-Delegate fields?

> It's time to review the implementation ;-) The current code can be found at:
>
>https://github.com/benhoyt/scandir
>
> (I don't think that Ben already updated his implementation for the
> latest version of the PEP.)

I have actually updated my GitHub repo for the current PEP (did this
last Saturday). However, there are still a few open issues, the main
one is that my scandir.py module doesn't handle the bytes/str thing
properly.

I intend to work on the CPython implementation over the next few
weeks. However, a couple of thoughts up-front:

I think if I were doing this from scratch I'd reimplement listdir() in
Python as "return [e.name for e in scandir(path)]". However, I'm not
sure this is a good idea, as I don't really want listdir() to suddenly
use more memory and perform slightly *worse* due to the extra DirEntry
object allocations.

So my basic plan is to have an internal helper function in
posixmodule.c that either yields DirEntry objects or strings. And then
listdir() would simply be defined something like "return
list(_scandir(path, yield_strings=True))" in C or in Python.

My reasoning is that then there'll be much less (if any) code
duplication between scandir() and listdir().

Does this sound like a reasonable approach?

-Ben
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()

2014-07-21 Thread Ben Hoyt
> We should mimic os.stat() and os.stat_result: os.stat_result symbol
> exists in the os namespace, but the type constructor is not
> documented. No need for extra protection like not adding the type in
> the os module, or adding a "_" prefix to the name.

Yeah, that works for me.

> By the way, it's possible to serialize a stat_result with pickle.

That makes sense, as stat_result is basically just a tuple and a bit
extra. I wonder if it should be possible to pickle DirEntry objects?
I'm thinking possibly not. If so, would it cache the stat or file type
info?

-Ben
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com