Re: [Python-ideas] Allow manual creation of DirEntry objects

2016-08-18 Thread Brendan Moloney
Thanks, opened an issue here: http://bugs.python.org/issue27796

-Brendan

From: gvanros...@gmail.com [gvanros...@gmail.com] on behalf of Guido van Rossum 
[gu...@python.org]
Sent: Wednesday, August 17, 2016 7:20 AM
To: Nick Coghlan; Brendan Moloney
Cc: Victor Stinner; python-ideas@python.org
Subject: Re: [Python-ideas] Allow manual creation of DirEntry objects

Brendan,

The conclusion is that you should just file a bug asking for a working 
constructor -- or upload a patch if you want to.

--Guido

On Wed, Aug 17, 2016 at 12:18 AM, Nick Coghlan 
mailto:ncogh...@gmail.com>> wrote:
On 17 August 2016 at 09:56, Victor Stinner 
mailto:victor.stin...@gmail.com>> wrote:
> 2016-08-17 1:50 GMT+02:00 Guido van Rossum 
> mailto:gu...@python.org>>:
>> We could expose the class with a
>> constructor that always fails (the C code could construct instances through
>> a backdoor).
>
> Oh, in fact you cannot create an instance of os.DirEntry, it has no
> (Python) constructor:
>
> $ ./python
> Python 3.6.0a4+ (default:e615718a6455+, Aug 17 2016, 00:12:17)
>>>> import os
>>>> os.DirEntry(1)
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: cannot create 'posix.DirEntry' instances
>
> Only os.scandir() can produce such objects.
>
> The question is still if it makes sense to allow to create DirEntry
> objects in Python :-)

I think it does, as it isn't really any different from someone calling
the stat() method on a DirEntry instance created by os.scandir(). It
also prevents folks attempting things like:

def slow_constructor(dirname, entryname):
for entry in os.scandir(dirname):
if entry.name<http://entry.name> == entryname:
entry.stat()
return entry

Allowing DirEntry construction from Python further gives us a
straightforward answer to the "stat caching" question: "just use
os.DirEntry instances and call stat() to make the snapshot"

If folks ask why os.DirEntry caches results when pathlib.Path doesn't,
we have the answer that cache invalidation is a hard problem, and
hence we consider it useful in the lower level interface that is
optimised for speed, but problematic in the higher level one that is
more focused on cross-platform correctness of filesystem interactions.

I don't know whether it would make sense to allow a pre-existing stat
result to be based to DirEntry, but it does seem like it might be
useful for adapting existing stat-based backend APIs to a more user
friendly DirEntry based front end API.

Cheers,
Nick.

--
Nick Coghlan   |   ncogh...@gmail.com<mailto:ncogh...@gmail.com>   |   
Brisbane, Australia



--
--Guido van Rossum (python.org/~guido<http://python.org/~guido>)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Allow manual creation of DirEntry objects

2016-08-17 Thread Serhiy Storchaka

On 16.08.16 22:35, Brendan Moloney wrote:

I have a bunch of functions that operate on DirEntry objects, typically
doing some sort of filtering
to select the paths I actually want to process. The overwhelming
majority of the time these functions
are going to be operating on DirEntry objects produced by the scandir
function, but there are some
cases where the user will be supplying the path themselves (for example,
the root of a directory tree
to process). In my current code base that uses the scandir package I
just wrap these paths in a
'GenericDirEntry' object and then pass them through the filter functions
the same as any results
coming from the scandir function.


You can just create an object that duck-types DirEntry. See for example 
_DummyDirEntry in the os module.



___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Allow manual creation of DirEntry objects

2016-08-17 Thread Guido van Rossum
Brendan,

The conclusion is that you should just file a bug asking for a working
constructor -- or upload a patch if you want to.

--Guido

On Wed, Aug 17, 2016 at 12:18 AM, Nick Coghlan  wrote:

> On 17 August 2016 at 09:56, Victor Stinner 
> wrote:
> > 2016-08-17 1:50 GMT+02:00 Guido van Rossum :
> >> We could expose the class with a
> >> constructor that always fails (the C code could construct instances
> through
> >> a backdoor).
> >
> > Oh, in fact you cannot create an instance of os.DirEntry, it has no
> > (Python) constructor:
> >
> > $ ./python
> > Python 3.6.0a4+ (default:e615718a6455+, Aug 17 2016, 00:12:17)
>  import os
>  os.DirEntry(1)
> > Traceback (most recent call last):
> >   File "", line 1, in 
> > TypeError: cannot create 'posix.DirEntry' instances
> >
> > Only os.scandir() can produce such objects.
> >
> > The question is still if it makes sense to allow to create DirEntry
> > objects in Python :-)
>
> I think it does, as it isn't really any different from someone calling
> the stat() method on a DirEntry instance created by os.scandir(). It
> also prevents folks attempting things like:
>
> def slow_constructor(dirname, entryname):
> for entry in os.scandir(dirname):
> if entry.name == entryname:
> entry.stat()
> return entry
>
> Allowing DirEntry construction from Python further gives us a
> straightforward answer to the "stat caching" question: "just use
> os.DirEntry instances and call stat() to make the snapshot"
>
> If folks ask why os.DirEntry caches results when pathlib.Path doesn't,
> we have the answer that cache invalidation is a hard problem, and
> hence we consider it useful in the lower level interface that is
> optimised for speed, but problematic in the higher level one that is
> more focused on cross-platform correctness of filesystem interactions.
>
> I don't know whether it would make sense to allow a pre-existing stat
> result to be based to DirEntry, but it does seem like it might be
> useful for adapting existing stat-based backend APIs to a more user
> friendly DirEntry based front end API.
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
>



-- 
--Guido van Rossum (python.org/~guido)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Allow manual creation of DirEntry objects

2016-08-17 Thread Nick Coghlan
On 17 August 2016 at 09:56, Victor Stinner  wrote:
> 2016-08-17 1:50 GMT+02:00 Guido van Rossum :
>> We could expose the class with a
>> constructor that always fails (the C code could construct instances through
>> a backdoor).
>
> Oh, in fact you cannot create an instance of os.DirEntry, it has no
> (Python) constructor:
>
> $ ./python
> Python 3.6.0a4+ (default:e615718a6455+, Aug 17 2016, 00:12:17)
 import os
 os.DirEntry(1)
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: cannot create 'posix.DirEntry' instances
>
> Only os.scandir() can produce such objects.
>
> The question is still if it makes sense to allow to create DirEntry
> objects in Python :-)

I think it does, as it isn't really any different from someone calling
the stat() method on a DirEntry instance created by os.scandir(). It
also prevents folks attempting things like:

def slow_constructor(dirname, entryname):
for entry in os.scandir(dirname):
if entry.name == entryname:
entry.stat()
return entry

Allowing DirEntry construction from Python further gives us a
straightforward answer to the "stat caching" question: "just use
os.DirEntry instances and call stat() to make the snapshot"

If folks ask why os.DirEntry caches results when pathlib.Path doesn't,
we have the answer that cache invalidation is a hard problem, and
hence we consider it useful in the lower level interface that is
optimised for speed, but problematic in the higher level one that is
more focused on cross-platform correctness of filesystem interactions.

I don't know whether it would make sense to allow a pre-existing stat
result to be based to DirEntry, but it does seem like it might be
useful for adapting existing stat-based backend APIs to a more user
friendly DirEntry based front end API.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Allow manual creation of DirEntry objects

2016-08-16 Thread Brett Cannon
On Tue, 16 Aug 2016 at 16:15 Victor Stinner 
wrote:

> By the way, for all these reasons, I'm not really excited by Python
> 3.6 change exposing os.DirEntry ( https://bugs.python.org/issue27038
> ).
>

It was exposed at Guido's request for type hinting in typeshed.

-Brett


>
> Victor
>
> 2016-08-17 1:11 GMT+02:00 Victor Stinner :
> > 2016-08-16 23:13 GMT+02:00 Guido van Rossum :
> >> It sounds fine to just submit a patch to add and document the DirEntry
> >> constructor. I don't think anyone intended to disallow your use case,
> it's
> >> more likely that nobody thought of it.
> >
> > Currently, the DirEntry constructor expects data which comes from
> > opendir/readdir functions on UNIX/BSD or FindFirstFile/FindNextFile
> > functions on Windows. These functions are not exposed in Python, so
> > it's unlikely that you can get expected value. The DirEntry object was
> > created to avoid syscalls in the common case thanks to data provided
> > by these functions.
> >
> > But I guess that Brendan wants to create a DirEntry object which would
> > call os.stat() the first time that an attribute is read and then
> > benefit of the code. You loose the "no syscall" optimization, since at
> > least once syscall is needed.
> >
> > In this case, I guess that the constructor should be
> > DirEntry(directory, entry_name) where os.path.join(directory,
> > entry_name) is the full path.
> >
> > An issue is how to document the behaviour of DirEntry. Objects created
> > by os.scandir() would be "optimized", whereas objects created manually
> > would be "less optimized".
> >
> > DirEntry is designed for os.scandir(), it's very limited compared to
> > pathlib. IMO pathlib would be a better candidate for "cached os.stat
> > results" with a full API to access the file system.
> >
> > Victor
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Allow manual creation of DirEntry objects

2016-08-16 Thread Victor Stinner
2016-08-17 1:50 GMT+02:00 Guido van Rossum :
> We could expose the class with a
> constructor that always fails (the C code could construct instances through
> a backdoor).

Oh, in fact you cannot create an instance of os.DirEntry, it has no
(Python) constructor:

$ ./python
Python 3.6.0a4+ (default:e615718a6455+, Aug 17 2016, 00:12:17)
>>> import os
>>> os.DirEntry(1)
Traceback (most recent call last):
  File "", line 1, in 
TypeError: cannot create 'posix.DirEntry' instances

Only os.scandir() can produce such objects.

The question is still if it makes sense to allow to create DirEntry
objects in Python :-)


> Also, what does the scandir package mentioned by the OP use as the
> constructor signature?

The implementation of os.scandir() comes from the scandir package. It
contains the same code, and so has the same behaviour (DirEntry has no
constructor).

Victor
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Allow manual creation of DirEntry objects

2016-08-16 Thread Guido van Rossum
On Tue, Aug 16, 2016 at 4:14 PM, Victor Stinner 
wrote:

> By the way, for all these reasons, I'm not really excited by Python
> 3.6 change exposing os.DirEntry ( https://bugs.python.org/issue27038
> ).
>

But that's separate from the constructor. We could expose the class with a
constructor that always fails (the C code could construct instances through
a backdoor). Exposing the type is useful for type annotations, e.g.

def is_foobar(de: os.DirEntry) -> bool: ...

and for the occasional isinstance() check.

Also, what does the scandir package mentioned by the OP use as the
constructor signature?

-- 
--Guido van Rossum (python.org/~guido)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Allow manual creation of DirEntry objects

2016-08-16 Thread Victor Stinner
By the way, for all these reasons, I'm not really excited by Python
3.6 change exposing os.DirEntry ( https://bugs.python.org/issue27038
).

Victor

2016-08-17 1:11 GMT+02:00 Victor Stinner :
> 2016-08-16 23:13 GMT+02:00 Guido van Rossum :
>> It sounds fine to just submit a patch to add and document the DirEntry
>> constructor. I don't think anyone intended to disallow your use case, it's
>> more likely that nobody thought of it.
>
> Currently, the DirEntry constructor expects data which comes from
> opendir/readdir functions on UNIX/BSD or FindFirstFile/FindNextFile
> functions on Windows. These functions are not exposed in Python, so
> it's unlikely that you can get expected value. The DirEntry object was
> created to avoid syscalls in the common case thanks to data provided
> by these functions.
>
> But I guess that Brendan wants to create a DirEntry object which would
> call os.stat() the first time that an attribute is read and then
> benefit of the code. You loose the "no syscall" optimization, since at
> least once syscall is needed.
>
> In this case, I guess that the constructor should be
> DirEntry(directory, entry_name) where os.path.join(directory,
> entry_name) is the full path.
>
> An issue is how to document the behaviour of DirEntry. Objects created
> by os.scandir() would be "optimized", whereas objects created manually
> would be "less optimized".
>
> DirEntry is designed for os.scandir(), it's very limited compared to
> pathlib. IMO pathlib would be a better candidate for "cached os.stat
> results" with a full API to access the file system.
>
> Victor
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Allow manual creation of DirEntry objects

2016-08-16 Thread Victor Stinner
2016-08-16 23:13 GMT+02:00 Guido van Rossum :
> It sounds fine to just submit a patch to add and document the DirEntry
> constructor. I don't think anyone intended to disallow your use case, it's
> more likely that nobody thought of it.

Currently, the DirEntry constructor expects data which comes from
opendir/readdir functions on UNIX/BSD or FindFirstFile/FindNextFile
functions on Windows. These functions are not exposed in Python, so
it's unlikely that you can get expected value. The DirEntry object was
created to avoid syscalls in the common case thanks to data provided
by these functions.

But I guess that Brendan wants to create a DirEntry object which would
call os.stat() the first time that an attribute is read and then
benefit of the code. You loose the "no syscall" optimization, since at
least once syscall is needed.

In this case, I guess that the constructor should be
DirEntry(directory, entry_name) where os.path.join(directory,
entry_name) is the full path.

An issue is how to document the behaviour of DirEntry. Objects created
by os.scandir() would be "optimized", whereas objects created manually
would be "less optimized".

DirEntry is designed for os.scandir(), it's very limited compared to
pathlib. IMO pathlib would be a better candidate for "cached os.stat
results" with a full API to access the file system.

Victor
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Allow manual creation of DirEntry objects

2016-08-16 Thread Guido van Rossum
It sounds fine to just submit a patch to add and document the DirEntry
constructor. I don't think anyone intended to disallow your use case, it's
more likely that nobody thought of it.

On Tue, Aug 16, 2016 at 12:35 PM, Brendan Moloney  wrote:

> Hi,
>
> I have been using the 'scandir' package (https://github.com/benhoyt/
> scandir) for a while now to
> speed up some directory tree processing code. Since Python 3.5 now
> includes 'os.scandir' in the
> stdlib (https://www.python.org/dev/peps/pep-0471/) I decided to try to
> make my code work with
> the built-in version if available.
>
> The first issue I hit was that the 'DirEntry' class was not actually being
> exposed
> (http://bugs.python.org/issue27038). However in the discussion of that
> bug I noticed that the
> constructor for the 'DirEntry' class was deliberately being left
> undocumented and that there
> was no clear way to manually create a DirEntry object from a path. I
> brought up my objections
> to this decision in the bug tracker and was asked to have the discussion
> over here on
> python-ideas.
>
> I have a bunch of functions that operate on DirEntry objects, typically
> doing some sort of filtering
> to select the paths I actually want to process. The overwhelming majority
> of the time these functions
> are going to be operating on DirEntry objects produced by the scandir
> function, but there are some
> cases where the user will be supplying the path themselves (for example,
> the root of a directory tree
> to process). In my current code base that uses the scandir package I just
> wrap these paths in a
> 'GenericDirEntry' object and then pass them through the filter functions
> the same as any results
> coming from the scandir function.
>
> With the decision to not expose any method in the stdlib to manually
> create a DirEntry object, I am
> stuck with no good options.  The least bad option I guess would be to copy
> the GenericDirEntry code
> out of the scandir package into my own code base.  This seems rather
> silly.  I really don't understand
> the rationale for not giving users a way to create these objects
> themselves, and I haven't actually seen
> that explained anywhere. I guess people are unhappy with the overlap
> between pathlib.Path objects
> and DirEntry objects and this is a misguided attempt to prod people into
> using pathlib. I think a better
> approach is to document the differences between DirEntry and pathlib.Path
> objects and encourage
> users to default to using pathlib.Path unless they have good reasons for
> using DirEntry.
>
> Thanks,
> Brendan
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>



-- 
--Guido van Rossum (python.org/~guido)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Allow manual creation of DirEntry objects

2016-08-16 Thread Brendan Moloney
Hi,

I have been using the 'scandir' package (https://github.com/benhoyt/scandir) 
for a while now to
speed up some directory tree processing code. Since Python 3.5 now includes 
'os.scandir' in the
stdlib (https://www.python.org/dev/peps/pep-0471/) I decided to try to make my 
code work with
the built-in version if available.

The first issue I hit was that the 'DirEntry' class was not actually being 
exposed
(http://bugs.python.org/issue27038). However in the discussion of that bug I 
noticed that the
constructor for the 'DirEntry' class was deliberately being left undocumented 
and that there
was no clear way to manually create a DirEntry object from a path. I brought up 
my objections
to this decision in the bug tracker and was asked to have the discussion over 
here on
python-ideas.

I have a bunch of functions that operate on DirEntry objects, typically doing 
some sort of filtering
to select the paths I actually want to process. The overwhelming majority of 
the time these functions
are going to be operating on DirEntry objects produced by the scandir function, 
but there are some
cases where the user will be supplying the path themselves (for example, the 
root of a directory tree
to process). In my current code base that uses the scandir package I just wrap 
these paths in a
'GenericDirEntry' object and then pass them through the filter functions the 
same as any results
coming from the scandir function.

With the decision to not expose any method in the stdlib to manually create a 
DirEntry object, I am
stuck with no good options.  The least bad option I guess would be to copy the 
GenericDirEntry code
out of the scandir package into my own code base.  This seems rather silly.  I 
really don't understand
the rationale for not giving users a way to create these objects themselves, 
and I haven't actually seen
that explained anywhere. I guess people are unhappy with the overlap between 
pathlib.Path objects
and DirEntry objects and this is a misguided attempt to prod people into using 
pathlib. I think a better
approach is to document the differences between DirEntry and pathlib.Path 
objects and encourage
users to default to using pathlib.Path unless they have good reasons for using 
DirEntry.

Thanks,
Brendan
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/