Re: [Python-ideas] Allow manual creation of DirEntry objects
Thanks, opened an issue here: http://bugs.python.org/issue27796 -Brendan From: gvanros...@gmail.com [gvanros...@gmail.com] on behalf of Guido van Rossum [gu...@python.org] Sent: Wednesday, August 17, 2016 7:20 AM To: Nick Coghlan; Brendan Moloney Cc: Victor Stinner; python-ideas@python.org Subject: Re: [Python-ideas] Allow manual creation of DirEntry objects Brendan, The conclusion is that you should just file a bug asking for a working constructor -- or upload a patch if you want to. --Guido On Wed, Aug 17, 2016 at 12:18 AM, Nick Coghlan mailto:ncogh...@gmail.com>> wrote: On 17 August 2016 at 09:56, Victor Stinner mailto:victor.stin...@gmail.com>> wrote: > 2016-08-17 1:50 GMT+02:00 Guido van Rossum > mailto:gu...@python.org>>: >> We could expose the class with a >> constructor that always fails (the C code could construct instances through >> a backdoor). > > Oh, in fact you cannot create an instance of os.DirEntry, it has no > (Python) constructor: > > $ ./python > Python 3.6.0a4+ (default:e615718a6455+, Aug 17 2016, 00:12:17) >>>> import os >>>> os.DirEntry(1) > Traceback (most recent call last): > File "", line 1, in > TypeError: cannot create 'posix.DirEntry' instances > > Only os.scandir() can produce such objects. > > The question is still if it makes sense to allow to create DirEntry > objects in Python :-) I think it does, as it isn't really any different from someone calling the stat() method on a DirEntry instance created by os.scandir(). It also prevents folks attempting things like: def slow_constructor(dirname, entryname): for entry in os.scandir(dirname): if entry.name<http://entry.name> == entryname: entry.stat() return entry Allowing DirEntry construction from Python further gives us a straightforward answer to the "stat caching" question: "just use os.DirEntry instances and call stat() to make the snapshot" If folks ask why os.DirEntry caches results when pathlib.Path doesn't, we have the answer that cache invalidation is a hard problem, and hence we consider it useful in the lower level interface that is optimised for speed, but problematic in the higher level one that is more focused on cross-platform correctness of filesystem interactions. I don't know whether it would make sense to allow a pre-existing stat result to be based to DirEntry, but it does seem like it might be useful for adapting existing stat-based backend APIs to a more user friendly DirEntry based front end API. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com<mailto:ncogh...@gmail.com> | Brisbane, Australia -- --Guido van Rossum (python.org/~guido<http://python.org/~guido>) ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Allow manual creation of DirEntry objects
On 16.08.16 22:35, Brendan Moloney wrote: I have a bunch of functions that operate on DirEntry objects, typically doing some sort of filtering to select the paths I actually want to process. The overwhelming majority of the time these functions are going to be operating on DirEntry objects produced by the scandir function, but there are some cases where the user will be supplying the path themselves (for example, the root of a directory tree to process). In my current code base that uses the scandir package I just wrap these paths in a 'GenericDirEntry' object and then pass them through the filter functions the same as any results coming from the scandir function. You can just create an object that duck-types DirEntry. See for example _DummyDirEntry in the os module. ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Allow manual creation of DirEntry objects
Brendan, The conclusion is that you should just file a bug asking for a working constructor -- or upload a patch if you want to. --Guido On Wed, Aug 17, 2016 at 12:18 AM, Nick Coghlan wrote: > On 17 August 2016 at 09:56, Victor Stinner > wrote: > > 2016-08-17 1:50 GMT+02:00 Guido van Rossum : > >> We could expose the class with a > >> constructor that always fails (the C code could construct instances > through > >> a backdoor). > > > > Oh, in fact you cannot create an instance of os.DirEntry, it has no > > (Python) constructor: > > > > $ ./python > > Python 3.6.0a4+ (default:e615718a6455+, Aug 17 2016, 00:12:17) > import os > os.DirEntry(1) > > Traceback (most recent call last): > > File "", line 1, in > > TypeError: cannot create 'posix.DirEntry' instances > > > > Only os.scandir() can produce such objects. > > > > The question is still if it makes sense to allow to create DirEntry > > objects in Python :-) > > I think it does, as it isn't really any different from someone calling > the stat() method on a DirEntry instance created by os.scandir(). It > also prevents folks attempting things like: > > def slow_constructor(dirname, entryname): > for entry in os.scandir(dirname): > if entry.name == entryname: > entry.stat() > return entry > > Allowing DirEntry construction from Python further gives us a > straightforward answer to the "stat caching" question: "just use > os.DirEntry instances and call stat() to make the snapshot" > > If folks ask why os.DirEntry caches results when pathlib.Path doesn't, > we have the answer that cache invalidation is a hard problem, and > hence we consider it useful in the lower level interface that is > optimised for speed, but problematic in the higher level one that is > more focused on cross-platform correctness of filesystem interactions. > > I don't know whether it would make sense to allow a pre-existing stat > result to be based to DirEntry, but it does seem like it might be > useful for adapting existing stat-based backend APIs to a more user > friendly DirEntry based front end API. > > Cheers, > Nick. > > -- > Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia > -- --Guido van Rossum (python.org/~guido) ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Allow manual creation of DirEntry objects
On 17 August 2016 at 09:56, Victor Stinner wrote: > 2016-08-17 1:50 GMT+02:00 Guido van Rossum : >> We could expose the class with a >> constructor that always fails (the C code could construct instances through >> a backdoor). > > Oh, in fact you cannot create an instance of os.DirEntry, it has no > (Python) constructor: > > $ ./python > Python 3.6.0a4+ (default:e615718a6455+, Aug 17 2016, 00:12:17) import os os.DirEntry(1) > Traceback (most recent call last): > File "", line 1, in > TypeError: cannot create 'posix.DirEntry' instances > > Only os.scandir() can produce such objects. > > The question is still if it makes sense to allow to create DirEntry > objects in Python :-) I think it does, as it isn't really any different from someone calling the stat() method on a DirEntry instance created by os.scandir(). It also prevents folks attempting things like: def slow_constructor(dirname, entryname): for entry in os.scandir(dirname): if entry.name == entryname: entry.stat() return entry Allowing DirEntry construction from Python further gives us a straightforward answer to the "stat caching" question: "just use os.DirEntry instances and call stat() to make the snapshot" If folks ask why os.DirEntry caches results when pathlib.Path doesn't, we have the answer that cache invalidation is a hard problem, and hence we consider it useful in the lower level interface that is optimised for speed, but problematic in the higher level one that is more focused on cross-platform correctness of filesystem interactions. I don't know whether it would make sense to allow a pre-existing stat result to be based to DirEntry, but it does seem like it might be useful for adapting existing stat-based backend APIs to a more user friendly DirEntry based front end API. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Allow manual creation of DirEntry objects
On Tue, 16 Aug 2016 at 16:15 Victor Stinner wrote: > By the way, for all these reasons, I'm not really excited by Python > 3.6 change exposing os.DirEntry ( https://bugs.python.org/issue27038 > ). > It was exposed at Guido's request for type hinting in typeshed. -Brett > > Victor > > 2016-08-17 1:11 GMT+02:00 Victor Stinner : > > 2016-08-16 23:13 GMT+02:00 Guido van Rossum : > >> It sounds fine to just submit a patch to add and document the DirEntry > >> constructor. I don't think anyone intended to disallow your use case, > it's > >> more likely that nobody thought of it. > > > > Currently, the DirEntry constructor expects data which comes from > > opendir/readdir functions on UNIX/BSD or FindFirstFile/FindNextFile > > functions on Windows. These functions are not exposed in Python, so > > it's unlikely that you can get expected value. The DirEntry object was > > created to avoid syscalls in the common case thanks to data provided > > by these functions. > > > > But I guess that Brendan wants to create a DirEntry object which would > > call os.stat() the first time that an attribute is read and then > > benefit of the code. You loose the "no syscall" optimization, since at > > least once syscall is needed. > > > > In this case, I guess that the constructor should be > > DirEntry(directory, entry_name) where os.path.join(directory, > > entry_name) is the full path. > > > > An issue is how to document the behaviour of DirEntry. Objects created > > by os.scandir() would be "optimized", whereas objects created manually > > would be "less optimized". > > > > DirEntry is designed for os.scandir(), it's very limited compared to > > pathlib. IMO pathlib would be a better candidate for "cached os.stat > > results" with a full API to access the file system. > > > > Victor > ___ > Python-ideas mailing list > Python-ideas@python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Allow manual creation of DirEntry objects
2016-08-17 1:50 GMT+02:00 Guido van Rossum : > We could expose the class with a > constructor that always fails (the C code could construct instances through > a backdoor). Oh, in fact you cannot create an instance of os.DirEntry, it has no (Python) constructor: $ ./python Python 3.6.0a4+ (default:e615718a6455+, Aug 17 2016, 00:12:17) >>> import os >>> os.DirEntry(1) Traceback (most recent call last): File "", line 1, in TypeError: cannot create 'posix.DirEntry' instances Only os.scandir() can produce such objects. The question is still if it makes sense to allow to create DirEntry objects in Python :-) > Also, what does the scandir package mentioned by the OP use as the > constructor signature? The implementation of os.scandir() comes from the scandir package. It contains the same code, and so has the same behaviour (DirEntry has no constructor). Victor ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Allow manual creation of DirEntry objects
On Tue, Aug 16, 2016 at 4:14 PM, Victor Stinner wrote: > By the way, for all these reasons, I'm not really excited by Python > 3.6 change exposing os.DirEntry ( https://bugs.python.org/issue27038 > ). > But that's separate from the constructor. We could expose the class with a constructor that always fails (the C code could construct instances through a backdoor). Exposing the type is useful for type annotations, e.g. def is_foobar(de: os.DirEntry) -> bool: ... and for the occasional isinstance() check. Also, what does the scandir package mentioned by the OP use as the constructor signature? -- --Guido van Rossum (python.org/~guido) ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Allow manual creation of DirEntry objects
By the way, for all these reasons, I'm not really excited by Python 3.6 change exposing os.DirEntry ( https://bugs.python.org/issue27038 ). Victor 2016-08-17 1:11 GMT+02:00 Victor Stinner : > 2016-08-16 23:13 GMT+02:00 Guido van Rossum : >> It sounds fine to just submit a patch to add and document the DirEntry >> constructor. I don't think anyone intended to disallow your use case, it's >> more likely that nobody thought of it. > > Currently, the DirEntry constructor expects data which comes from > opendir/readdir functions on UNIX/BSD or FindFirstFile/FindNextFile > functions on Windows. These functions are not exposed in Python, so > it's unlikely that you can get expected value. The DirEntry object was > created to avoid syscalls in the common case thanks to data provided > by these functions. > > But I guess that Brendan wants to create a DirEntry object which would > call os.stat() the first time that an attribute is read and then > benefit of the code. You loose the "no syscall" optimization, since at > least once syscall is needed. > > In this case, I guess that the constructor should be > DirEntry(directory, entry_name) where os.path.join(directory, > entry_name) is the full path. > > An issue is how to document the behaviour of DirEntry. Objects created > by os.scandir() would be "optimized", whereas objects created manually > would be "less optimized". > > DirEntry is designed for os.scandir(), it's very limited compared to > pathlib. IMO pathlib would be a better candidate for "cached os.stat > results" with a full API to access the file system. > > Victor ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Allow manual creation of DirEntry objects
2016-08-16 23:13 GMT+02:00 Guido van Rossum : > It sounds fine to just submit a patch to add and document the DirEntry > constructor. I don't think anyone intended to disallow your use case, it's > more likely that nobody thought of it. Currently, the DirEntry constructor expects data which comes from opendir/readdir functions on UNIX/BSD or FindFirstFile/FindNextFile functions on Windows. These functions are not exposed in Python, so it's unlikely that you can get expected value. The DirEntry object was created to avoid syscalls in the common case thanks to data provided by these functions. But I guess that Brendan wants to create a DirEntry object which would call os.stat() the first time that an attribute is read and then benefit of the code. You loose the "no syscall" optimization, since at least once syscall is needed. In this case, I guess that the constructor should be DirEntry(directory, entry_name) where os.path.join(directory, entry_name) is the full path. An issue is how to document the behaviour of DirEntry. Objects created by os.scandir() would be "optimized", whereas objects created manually would be "less optimized". DirEntry is designed for os.scandir(), it's very limited compared to pathlib. IMO pathlib would be a better candidate for "cached os.stat results" with a full API to access the file system. Victor ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Allow manual creation of DirEntry objects
It sounds fine to just submit a patch to add and document the DirEntry constructor. I don't think anyone intended to disallow your use case, it's more likely that nobody thought of it. On Tue, Aug 16, 2016 at 12:35 PM, Brendan Moloney wrote: > Hi, > > I have been using the 'scandir' package (https://github.com/benhoyt/ > scandir) for a while now to > speed up some directory tree processing code. Since Python 3.5 now > includes 'os.scandir' in the > stdlib (https://www.python.org/dev/peps/pep-0471/) I decided to try to > make my code work with > the built-in version if available. > > The first issue I hit was that the 'DirEntry' class was not actually being > exposed > (http://bugs.python.org/issue27038). However in the discussion of that > bug I noticed that the > constructor for the 'DirEntry' class was deliberately being left > undocumented and that there > was no clear way to manually create a DirEntry object from a path. I > brought up my objections > to this decision in the bug tracker and was asked to have the discussion > over here on > python-ideas. > > I have a bunch of functions that operate on DirEntry objects, typically > doing some sort of filtering > to select the paths I actually want to process. The overwhelming majority > of the time these functions > are going to be operating on DirEntry objects produced by the scandir > function, but there are some > cases where the user will be supplying the path themselves (for example, > the root of a directory tree > to process). In my current code base that uses the scandir package I just > wrap these paths in a > 'GenericDirEntry' object and then pass them through the filter functions > the same as any results > coming from the scandir function. > > With the decision to not expose any method in the stdlib to manually > create a DirEntry object, I am > stuck with no good options. The least bad option I guess would be to copy > the GenericDirEntry code > out of the scandir package into my own code base. This seems rather > silly. I really don't understand > the rationale for not giving users a way to create these objects > themselves, and I haven't actually seen > that explained anywhere. I guess people are unhappy with the overlap > between pathlib.Path objects > and DirEntry objects and this is a misguided attempt to prod people into > using pathlib. I think a better > approach is to document the differences between DirEntry and pathlib.Path > objects and encourage > users to default to using pathlib.Path unless they have good reasons for > using DirEntry. > > Thanks, > Brendan > > ___ > Python-ideas mailing list > Python-ideas@python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Allow manual creation of DirEntry objects
Hi, I have been using the 'scandir' package (https://github.com/benhoyt/scandir) for a while now to speed up some directory tree processing code. Since Python 3.5 now includes 'os.scandir' in the stdlib (https://www.python.org/dev/peps/pep-0471/) I decided to try to make my code work with the built-in version if available. The first issue I hit was that the 'DirEntry' class was not actually being exposed (http://bugs.python.org/issue27038). However in the discussion of that bug I noticed that the constructor for the 'DirEntry' class was deliberately being left undocumented and that there was no clear way to manually create a DirEntry object from a path. I brought up my objections to this decision in the bug tracker and was asked to have the discussion over here on python-ideas. I have a bunch of functions that operate on DirEntry objects, typically doing some sort of filtering to select the paths I actually want to process. The overwhelming majority of the time these functions are going to be operating on DirEntry objects produced by the scandir function, but there are some cases where the user will be supplying the path themselves (for example, the root of a directory tree to process). In my current code base that uses the scandir package I just wrap these paths in a 'GenericDirEntry' object and then pass them through the filter functions the same as any results coming from the scandir function. With the decision to not expose any method in the stdlib to manually create a DirEntry object, I am stuck with no good options. The least bad option I guess would be to copy the GenericDirEntry code out of the scandir package into my own code base. This seems rather silly. I really don't understand the rationale for not giving users a way to create these objects themselves, and I haven't actually seen that explained anywhere. I guess people are unhappy with the overlap between pathlib.Path objects and DirEntry objects and this is a misguided attempt to prod people into using pathlib. I think a better approach is to document the differences between DirEntry and pathlib.Path objects and encourage users to default to using pathlib.Path unless they have good reasons for using DirEntry. Thanks, Brendan ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/