Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-30 Thread Victor Stinner
2014-07-01 4:04 GMT+02:00 Glenn Linderman :
>> +0 for stat fields to be None on all platforms unless ensure_lstat=True.
>
> This won't work well if lstat info is only needed for some entries. Is
> that a common use-case? It was mentioned earlier in the thread.
>
> If it is, use ensure_lstat=False, and use the proposed (by me) .refresh()
> API to update the data for those that need it.

We should make DirEntry as simple as possible. In Python, the classic
behaviour is to not define an attribute if it's not available on a
platform. For example, stat().st_file_attributes is only available on
Windows.

I don't like the idea of the ensure_lstat parameter because os.scandir
would have to call two system calls, it makes harder to guess which
syscall failed (readdir or lstat). If you need lstat on UNIX, write:

if hasattr(entry, 'lstat_result'):
size = entry.lstat_result.st_size
else:
size = os.lstat(entry.fullname()).st_size

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-30 Thread Terry Reedy

On 6/30/2014 9:44 PM, Ethan Furman wrote:

On 06/30/2014 06:28 PM, Ben Hoyt wrote:

I suppose the exact behavior is still under discussion, as there are
only
two or three fields one gets "for free" on Windows (I think...),
where as an
os.stat call would get everything available for the platform.


No, Windows is nice enough to give you all the same stat_result fields
during scandir (via FindFirstFile/FindNextFile) as a regular
os.stat().


Very nice.  Even less reason then to throw it away.  :)


I agree.

--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-30 Thread Eric V. Smith
On 6/30/2014 10:17 PM, Nick Coghlan wrote:
> 
> On 30 Jun 2014 19:13, "Glenn Linderman"  > wrote:
>>
>>
>> If it is, use ensure_lstat=False, and use the proposed (by me)
> .refresh() API to update the data for those that need it.
> 
> I'm -1 on a refresh API for DirEntry - just use pathlib in that case.

I'm not sure refresh() is the best name, but I think a
"get_stat_info_from_direntry_or_call_stat()" (hah!) makes sense. If you
really need the stat info, then you can write simple code like:

for entry in os.scandir(path):
mtime = entry.get_stat_info_from_direntry_or_call_stat().st_mtime

And it won't call stat() any more times than needed. Once per file on
Posix, zero times per file on Windows.

Without an API like this, you'll need a check in the application code on
whether or not to call stat().

Eric.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-30 Thread Ethan Furman

On 06/30/2014 06:28 PM, Ben Hoyt wrote:

I suppose the exact behavior is still under discussion, as there are only
two or three fields one gets "for free" on Windows (I think...), where as an
os.stat call would get everything available for the platform.


No, Windows is nice enough to give you all the same stat_result fields
during scandir (via FindFirstFile/FindNextFile) as a regular
os.stat().


Very nice.  Even less reason then to throw it away.  :)

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-30 Thread Nick Coghlan
On 30 Jun 2014 19:13, "Glenn Linderman"  wrote:
>
>
> If it is, use ensure_lstat=False, and use the proposed (by me) .refresh()
API to update the data for those that need it.

I'm -1 on a refresh API for DirEntry - just use pathlib in that case.

Cheers,
Nick.

>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-30 Thread Devin Jeanpierre
The proposal I was replying to was that:

- There is no .refresh()
- ensure_lstat=False means no OS has populated attributes
- ensure_lstat=True means ever OS has populated attributes

Even if we add a .refresh(), the latter two items mean that you can't
avoid doing extra work (either too much on windows, or too much on
linux), if you want only a subset of the files' lstat info.

-- Devin

P.S. your mail client's quoting breaks my mail client (gmail)'s quoting.

On Mon, Jun 30, 2014 at 7:04 PM, Glenn Linderman  wrote:
> On 6/30/2014 4:25 PM, Devin Jeanpierre wrote:
>
> On Mon, Jun 30, 2014 at 3:07 PM, Tim Delaney
>  wrote:
>
> On 1 July 2014 03:05, Ben Hoyt  wrote:
>
> So, here's my alternative proposal: add an "ensure_lstat" flag to
> scandir() itself, and don't have *any* methods on DirEntry, only
> attributes.
>
> ...
>
> Most importantly, *regardless of platform*, the cached stat result (if
> not None) would reflect the state of the entry at the time the
> directory was scanned, rather than at some arbitrary later point in
> time when lstat() was first called on the DirEntry object.
>
> I'm torn between whether I'd prefer the stat fields to be populated on
> Windows if ensure_lstat=False or not. There are good arguments each way, but
> overall I'm inclining towards having it consistent with POSIX - don't
> populate them unless ensure_lstat=True.
>
> +0 for stat fields to be None on all platforms unless ensure_lstat=True.
>
> This won't work well if lstat info is only needed for some entries. Is
> that a common use-case? It was mentioned earlier in the thread.
>
>
> If it is, use ensure_lstat=False, and use the proposed (by me) .refresh()
> API to update the data for those that need it.
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/jeanpierreda%40gmail.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-30 Thread Glenn Linderman

On 6/30/2014 4:25 PM, Devin Jeanpierre wrote:

On Mon, Jun 30, 2014 at 3:07 PM, Tim Delaney
 wrote:

On 1 July 2014 03:05, Ben Hoyt  wrote:

So, here's my alternative proposal: add an "ensure_lstat" flag to
scandir() itself, and don't have *any* methods on DirEntry, only
attributes.

...


Most importantly, *regardless of platform*, the cached stat result (if
not None) would reflect the state of the entry at the time the
directory was scanned, rather than at some arbitrary later point in
time when lstat() was first called on the DirEntry object.


I'm torn between whether I'd prefer the stat fields to be populated on
Windows if ensure_lstat=False or not. There are good arguments each way, but
overall I'm inclining towards having it consistent with POSIX - don't
populate them unless ensure_lstat=True.

+0 for stat fields to be None on all platforms unless ensure_lstat=True.

This won't work well if lstat info is only needed for some entries. Is
that a common use-case? It was mentioned earlier in the thread.


If it is, use ensure_lstat=False, and use the proposed (by me) 
.refresh() API to update the data for those that need it.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-30 Thread Ben Hoyt
> I suppose the exact behavior is still under discussion, as there are only
> two or three fields one gets "for free" on Windows (I think...), where as an
> os.stat call would get everything available for the platform.

No, Windows is nice enough to give you all the same stat_result fields
during scandir (via FindFirstFile/FindNextFile) as a regular
os.stat().

-Ben
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-30 Thread Ethan Furman

On 06/30/2014 04:15 PM, Tim Delaney wrote:

On 1 July 2014 08:38, Ethan Furman wrote:

On 06/30/2014 03:07 PM, Tim Delaney wrote:


I'm torn between whether I'd prefer the stat fields to be populated
on Windows if ensure_lstat=False or not. There are good arguments each
way, but overall I'm inclining towards having it consistent with POSIX
- don't populate them unless ensure_lstat=True.

+0 for stat fields to be None on all platforms unless ensure_lstat=True.


If a Windows user just needs the free info, why should s/he have to pay
the price of a full stat call?  I see no reason to hold the Windows side
 back and not take advantage of what it has available.  There are plenty
of posix calls that Windows is not able to use, after all.


On Windows ensure_lstat would either be either a NOP (if the fields are
always populated), or it simply determines if the fields get populated.
 No extra stat call.


I suppose the exact behavior is still under discussion, as there are only two or three fields one gets "for free" on 
Windows (I think...), where as an os.stat call would get everything available for the platform.




On POSIX it's the difference between an extra stat call or not.


Agreed on this part.

Still, no reason to slow down the Windows side by throwing away info 
unnecessarily -- that's why this PEP exists, after all.

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-30 Thread Devin Jeanpierre
On Mon, Jun 30, 2014 at 3:07 PM, Tim Delaney
 wrote:
> On 1 July 2014 03:05, Ben Hoyt  wrote:
>>
>> > So, here's my alternative proposal: add an "ensure_lstat" flag to
>> > scandir() itself, and don't have *any* methods on DirEntry, only
>> > attributes.
>> ...
>>
>> > Most importantly, *regardless of platform*, the cached stat result (if
>> > not None) would reflect the state of the entry at the time the
>> > directory was scanned, rather than at some arbitrary later point in
>> > time when lstat() was first called on the DirEntry object.
>
>
> I'm torn between whether I'd prefer the stat fields to be populated on
> Windows if ensure_lstat=False or not. There are good arguments each way, but
> overall I'm inclining towards having it consistent with POSIX - don't
> populate them unless ensure_lstat=True.
>
> +0 for stat fields to be None on all platforms unless ensure_lstat=True.

This won't work well if lstat info is only needed for some entries. Is
that a common use-case? It was mentioned earlier in the thread.

-- Devin
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-30 Thread Tim Delaney
On 1 July 2014 08:38, Ethan Furman  wrote:

> On 06/30/2014 03:07 PM, Tim Delaney wrote:
>
>> I'm torn between whether I'd prefer the stat fields to be populated
>> on Windows if ensure_lstat=False or not. There are good arguments each
>>  way, but overall I'm inclining towards having it consistent with POSIX
>> - don't populate them unless ensure_lstat=True.
>>
>> +0 for stat fields to be None on all platforms unless ensure_lstat=True.
>>
>
> If a Windows user just needs the free info, why should s/he have to pay
> the price of a full stat call?  I see no reason to hold the Windows side
> back and not take advantage of what it has available.  There are plenty of
> posix calls that Windows is not able to use, after all.
>

On Windows ensure_lstat would either be either a NOP (if the fields are
always populated), or it simply determines if the fields get populated. No
extra stat call.

On POSIX it's the difference between an extra stat call or not.

Tim Delaney
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-30 Thread Ethan Furman

On 06/30/2014 03:07 PM, Tim Delaney wrote:

On 1 July 2014 03:05, Ben Hoyt wrote:


So, here's my alternative proposal: add an "ensure_lstat" flag to
scandir() itself, and don't have *any* methods on DirEntry, only
attributes.
...
Most importantly, *regardless of platform*, the cached stat result (if
not None) would reflect the state of the entry at the time the
directory was scanned, rather than at some arbitrary later point in
time when lstat() was first called on the DirEntry object.


I'm torn between whether I'd prefer the stat fields to be populated
on Windows if ensure_lstat=False or not. There are good arguments each
 way, but overall I'm inclining towards having it consistent with POSIX
- don't populate them unless ensure_lstat=True.

+0 for stat fields to be None on all platforms unless ensure_lstat=True.


If a Windows user just needs the free info, why should s/he have to pay the price of a full stat call?  I see no reason 
to hold the Windows side back and not take advantage of what it has available.  There are plenty of posix calls that 
Windows is not able to use, after all.


--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-30 Thread Tim Delaney
On 1 July 2014 03:05, Ben Hoyt  wrote:

> > So, here's my alternative proposal: add an "ensure_lstat" flag to
> > scandir() itself, and don't have *any* methods on DirEntry, only
> > attributes.
> ...
> > Most importantly, *regardless of platform*, the cached stat result (if
> > not None) would reflect the state of the entry at the time the
> > directory was scanned, rather than at some arbitrary later point in
> > time when lstat() was first called on the DirEntry object.
>

I'm torn between whether I'd prefer the stat fields to be populated on
Windows if ensure_lstat=False or not. There are good arguments each way,
but overall I'm inclining towards having it consistent with POSIX - don't
populate them unless ensure_lstat=True.

+0 for stat fields to be None on all platforms unless ensure_lstat=True.


> Yeah, I quite like this. It does make the caching more explicit and
> consistent. It's slightly annoying that it's less like pathlib.Path
> now, but DirEntry was never pathlib.Path anyway, so maybe it doesn't
> matter. The differences in naming may highlight the difference in
> caching, so maybe it's a good thing.
>

See my comments below on .fullname.


> Two further questions from me:
>
> 1) How does error handling work? Now os.stat() will/may be called
> during iteration, so in __next__. But it hard to catch errors because
> you don't call __next__ explicitly. Is this a problem? How do other
> iterators that make system calls or raise errors handle this?
>

I think it just needs to be documented that iterating may throw the same
exceptions as os.lstat(). It's a little trickier if you don't want the
scope of your exception to be too broad, but you can always wrap the
iteration in a generator to catch and handle the exceptions you care about,
and allow the rest to propagate.

def scandir_accessible(path='.'):
gen = os.scandir(path)

while True:
try:
yield next(gen)
except PermissionError:
pass

2) There's still the open question in the PEP of whether to include a
> way to access the full path. This is cheap to build, it has to be
> built anyway on POSIX systems, and it's quite useful for further
> operations on the file. I think the best way to handle this is a
> .fullname or .full_name attribute as suggested elsewhere. Thoughts?
>

+1 for .fullname. The earlier suggestion to have __str__ return the name is
killed I think by the fact that .fullname could be bytes.

It would be nice if pathlib.Path objects were enhanced to take a DirEntry
and use the .fullname automatically, but you could always call
Path(direntry.fullname).

Tim Delaney
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-30 Thread Ben Hoyt
> So, here's my alternative proposal: add an "ensure_lstat" flag to
> scandir() itself, and don't have *any* methods on DirEntry, only
> attributes.
>
> That would make the DirEntry attributes:
>
> is_dir: boolean, always populated
> is_file: boolean, always populated
> is_symlink boolean, always populated
> lstat_result: stat result, may be None on POSIX systems if
> ensure_lstat is False
>
> (I'm not particularly sold on "lstat_result" as the name, but "lstat"
> reads as a verb to me, so doesn't sound right as an attribute name)
>
> What this would allow:
>
> - by default, scanning is efficient everywhere, but lstat_result may
> be None on POSIX systems
> - if you always need the lstat result, setting "ensure_lstat" will
> trigger the extra system call implicitly
> - if you only sometimes need the stat result, you can call os.lstat()
> explicitly when the DirEntry lstat attribute is None
>
> Most importantly, *regardless of platform*, the cached stat result (if
> not None) would reflect the state of the entry at the time the
> directory was scanned, rather than at some arbitrary later point in
> time when lstat() was first called on the DirEntry object.
>
> There'd still be a slight window of discrepancy (since the filesystem
> state may change between reading the directory entry and making the
> lstat() call), but this could be effectively eliminated from the
> perspective of the Python code by making the result of the lstat()
> call authoritative for the whole DirEntry object.

Yeah, I quite like this. It does make the caching more explicit and
consistent. It's slightly annoying that it's less like pathlib.Path
now, but DirEntry was never pathlib.Path anyway, so maybe it doesn't
matter. The differences in naming may highlight the difference in
caching, so maybe it's a good thing.

Two further questions from me:

1) How does error handling work? Now os.stat() will/may be called
during iteration, so in __next__. But it hard to catch errors because
you don't call __next__ explicitly. Is this a problem? How do other
iterators that make system calls or raise errors handle this?

2) There's still the open question in the PEP of whether to include a
way to access the full path. This is cheap to build, it has to be
built anyway on POSIX systems, and it's quite useful for further
operations on the file. I think the best way to handle this is a
.fullname or .full_name attribute as suggested elsewhere. Thoughts?

-Ben
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com