date:20140701

[Python-Dev] PEP 471: scandir(fd) and pathlib.Path(name, dir_fd=None)

2014-07-01 Thread Victor Stinner

Hi,

IMO we must decide if scandir() must support or not file descriptor.
It's an important decision which has an important impact on the API.


To support scandir(fd), the minimum is to store dir_fd in DirEntry:
dir_fd would be None for scandir(str).


scandir(fd) must not close the file descriptor, it should be done by
the caller. Handling the lifetime of the file descriptor is a
difficult problem, it's better to let the user decide how to handle
it.

There is the problem of the limit of open file descriptors, usually
1024 but it can be lower. It *can* be an issue for very deep file
hierarchy.

If we choose to support scandir(fd), it's probably safer to not use
scandir(fd) by default in os.walk() (use scandir(str) instead), wait
until the feature is well tested, corner cases are well known, etc.


The second step is to enhance pathlib.Path to support an optional file
descriptor. Path already has methods on filenames like chmod(),
exists(), rename(), etc.


Example:

fd = os.open(path, os.O_DIRECTORY)
try:
   for entry in os.scandir(fd):
  # ... use entry to benefit of entry cache: is_dir(), lstat_result ...
  path = pathlib.Path(entry.name, dir_fd=entry.dir_fd)
  # ... use path which uses dir_fd ...
finally:
os.close(fd)

Problem: if the path object is stored somewhere and use after the
loop, Path methods will fail because dir_fd was closed. It's even
worse if a new directory uses the same file descriptor :-/ (security
issue, or at least tricky bugs!)

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] My summary of the scandir (PEP 471)

2014-07-01 Thread Victor Stinner

Hi,

@Ben: it's time to update your PEP to complete it with this
discussion! IMO DirEntry must be as simple as possible and portable:

- os.scandir(str)
- DirEntry.lstat_result object only available on Windows, same result
than os.lstat()
- DirEntry.fullname(): os.path.join(directory, DirEntry.name), where
directory would be an hidden attribute of DirEntry


Notes:

- DirEntry.lstat_result is better than DirEntry.lstat() because it
makes explicitly that lstat_result is only computed once. When I call
DirEntry.lstat(), I expect to get the current status of the file, not
the cached one. It's also hard to explain (document) that
DirEntry.lstat() may or may call a system call. Don't do that, use
DirEntry.lstat_result.

- I don't think that we should support scandir(bytes). If you really
want to support os.scandir(bytes), it must raise an error on Windows
since bytes filename are already deprecated. It wouldn't make sense to
add new function with a deprecated feature. Since we have the PEP 383
(surrogateescape), it's better to advice to use Unicode on all
platforms. Almost all Python functions are able to encode back Unicode
filename automatically. Use os.fsencode() to encode manually if needd.

- We may not define a DirEntry.fullname() method: the directory name
is usually well known. Ok, but every time that I use os.listdir(), I
write os.path.join(directory, name) because in some cases I want the
full path. Example:

interesting = []
for name in os.listdir(path):
   fullpath = os.path.join(path, name)
   if os.path.isdir(fullpath):
  continue
   if ... test on the file ...:
  # i need the full path here, not the relative path
  # (ex: my own recursive "scandir"/"walk" function)
  interesting.append(fullpath)

- It must not be possible to "refresh" a DirEntry object. Call
os.stat(entry.fullname()) or pathlib.Path(entry.fullname()) to get
fresh data. DirEntry is only computed once, that's all. It's well
defined.

- No Windows wildcard, you wrote that the feature has many corner
cases, and it's only available on Windows. It's easy to combine
scandir with fnmatch.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 471: scandir(fd) and pathlib.Path(name, dir_fd=None)

2014-07-01 Thread Ben Hoyt

Thanks, Victor.

I don't have any experience with dir_fd handling, so unfortunately
can't really comment here.

What advantages does it bring? I notice that even os.listdir() on
Python 3.4 doesn't have anything related to file descriptors, so I'd
be in favour of not including support. We can always add it later.

-Ben

On Tue, Jul 1, 2014 at 3:44 AM, Victor Stinner  wrote:
> Hi,
>
> IMO we must decide if scandir() must support or not file descriptor.
> It's an important decision which has an important impact on the API.
>
>
> To support scandir(fd), the minimum is to store dir_fd in DirEntry:
> dir_fd would be None for scandir(str).
>
>
> scandir(fd) must not close the file descriptor, it should be done by
> the caller. Handling the lifetime of the file descriptor is a
> difficult problem, it's better to let the user decide how to handle
> it.
>
> There is the problem of the limit of open file descriptors, usually
> 1024 but it can be lower. It *can* be an issue for very deep file
> hierarchy.
>
> If we choose to support scandir(fd), it's probably safer to not use
> scandir(fd) by default in os.walk() (use scandir(str) instead), wait
> until the feature is well tested, corner cases are well known, etc.
>
>
> The second step is to enhance pathlib.Path to support an optional file
> descriptor. Path already has methods on filenames like chmod(),
> exists(), rename(), etc.
>
>
> Example:
>
> fd = os.open(path, os.O_DIRECTORY)
> try:
>for entry in os.scandir(fd):
>   # ... use entry to benefit of entry cache: is_dir(), lstat_result ...
>   path = pathlib.Path(entry.name, dir_fd=entry.dir_fd)
>   # ... use path which uses dir_fd ...
> finally:
> os.close(fd)
>
> Problem: if the path object is stored somewhere and use after the
> loop, Path methods will fail because dir_fd was closed. It's even
> worse if a new directory uses the same file descriptor :-/ (security
> issue, or at least tricky bugs!)
>
> Victor
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/benhoyt%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 471: scandir(fd) and pathlib.Path(name, dir_fd=None)

2014-07-01 Thread Victor Stinner

2014-07-01 14:26 GMT+02:00 Ben Hoyt :
> Thanks, Victor.
>
> I don't have any experience with dir_fd handling, so unfortunately
> can't really comment here.
>
> What advantages does it bring? I notice that even os.listdir() on
> Python 3.4 doesn't have anything related to file descriptors, so I'd
> be in favour of not including support.

See https://docs.python.org/dev/library/os.html#dir-fd

The idea is to make sure that you get files from the same directory.
Problems occur when a directory is moved or a symlink is modified.
Example:

- you're browsing /tmp/test/x as root (!), /tmp/copy/passwd is owned
by www user (website)
- you would like to remove the file "x": call unlink("/tmp/copy/passwd")
- ... but just before that, an attacker replaces the /tmp/copy
directory with a symlink to /etc
- you will remove /etc/passwd instead of /tmp/copy/passwd, oh oh

Using unlink("passwd", dir_fd=tmp_copy_fd), you don't have this issue.
You are sure that you are working in /tmp/copy directory.

You can imagine a lot of other scenarios to override files and read
sensitive files.

Hopefully, the Linux rm commands knows unlinkat() sycall ;-)

haypo@selma$ mkdir -p a/b/c
haypo@selma$ strace -e unlinkat rm -rf a
unlinkat(5, "c", AT_REMOVEDIR)  = 0
unlinkat(4, "b", AT_REMOVEDIR)  = 0
unlinkat(AT_FDCWD, "a", AT_REMOVEDIR)   = 0
+++ exited with 0 +++

We should implement a similar think in shutil.rmtree().

See also os.fwalk() which is a version of os.walk() providing dir_fd.

> We can always add it later.

I would prefer to discuss that right now. My proposition is to accept
an int for scandir() and copy the int into DirEntry.dir_fd. It's not
that complex :-)

The enhancement of the pathlib module can be done later. By the way, I
know that Antoine Pitrou wanted to implemented file descriptors in
pathlib, but the feature was rejected or at least delayed.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] My summary of the scandir (PEP 471)

2014-07-01 Thread Ben Hoyt

Thanks for spinning this off to (hopefully) finished the discussion. I
agree it's nearly time to update the PEP.

> @Ben: it's time to update your PEP to complete it with this
> discussion! IMO DirEntry must be as simple as possible and portable:
>
> - os.scandir(str)
> - DirEntry.lstat_result object only available on Windows, same result
> than os.lstat()
> - DirEntry.fullname(): os.path.join(directory, DirEntry.name), where
> directory would be an hidden attribute of DirEntry

I'm quite strongly against this, and I think it's actually the worst
of both worlds. It is not as good an API because:

(a) it doesn't call stat for you (on POSIX), so you have to check an
attribute and call scandir manually if you need it, turning what
should be one line of code into four. Your proposal above was kind of
how I had it originally, where you had to do extra tests and call
scandir manually if you needed it (see
https://mail.python.org/pipermail/python-dev/2013-May/126119.html)
(b) the .lstat_result attribute is available on Windows but not on
POSIX, meaning it's very easy for Windows developers to write code
that will run and work fine on Windows, but then break horribly on
POSIX; I think it'd be better if it broke hard on Windows to make
writing cross-platform code easy

The two alternates are:

1) the original proposal in the current version of PEP 471, where
DirEntry has an .lstat() method which calls stat() on POSIX but is
free on Windows
2) Nick Coghlan's proposal on the previous thread
(https://mail.python.org/pipermail/python-dev/2014-June/135261.html)
suggesting an ensure_lstat keyword param to scandir if you need the
lstat_result value

I would make one small tweak to Nick Coghlan's proposal to make
writing cross-platform code easier. Instead of .lstat_result being
None sometimes (on POSIX), have it None always unless you specify
ensure_lstat=True. (Actually, call it get_lstat=True to kind of make
this more obvious.) Per (b) above, this means Windows developers
wouldn't accidentally write code which failed on POSIX systems -- it'd
fail fast on Windows too if you accessed .lstat_result without
specifying get_lstat=True.

I'm still unsure which of these I like better. I think #1's API is
slightly nicer without the ensure_lstat parameter, and error handling
of the stat() is more explicit. But #2 always fetches the stat info at
the same time as the dir entry info, so eliminates the problem of
having the file info change between scandir iteration and the .lstat()
call.

I'm leaning towards preferring #2 (Nick's proposal) because it solves
or gets around the caching issue. My one concern is error handling. Is
it an issue if scandir's __next__ can raise an OSError either from the
readdir() call or the call to stat()? My thinking is probably not. In
practice, would it ever really happen that readdir() would succeed but
an os.stat() immediately after would fail? I guess it could if the
file is deleted, but then if it were deleted a microsecond earlier the
readdir() would fail anyway, or not? Or does readdir give you a
consistent, "snap-shotted" view on things?

The one other thing I'm not quite sure about with Nick's proposal is
the name .lstat_result, as it's long. I can see why he suggested that,
as .lstat sounds like a verb, but maybe that's okay? If we can have
.is_dir and .is_file as attributes, my thinking is an .lstat attribute
is fine too. I don't feel too strongly though.

> - I don't think that we should support scandir(bytes). If you really
> want to support os.scandir(bytes), it must raise an error on Windows
> since bytes filename are already deprecated. It wouldn't make sense to
> add new function with a deprecated feature. Since we have the PEP 383
> (surrogateescape), it's better to advice to use Unicode on all
> platforms. Almost all Python functions are able to encode back Unicode
> filename automatically. Use os.fsencode() to encode manually if needd.

Really, are bytes filenames deprecated? I think maybe they should be,
as they don't work on Windows :-), but the latest Python "os" docs
(https://docs.python.org/3.5/library/os.html) still say that all
functions that accept path names accept either str or bytes, and
return a value of the same type where necessary. So I think scandir()
should do the same thing.

> - We may not define a DirEntry.fullname() method: the directory name
> is usually well known. Ok, but every time that I use os.listdir(), I
> write os.path.join(directory, name) because in some cases I want the
> full path.

Agreed. I use this a lot too. However, I'd prefer a .fullname
attribute rather than a method, as it's free/cheap to compute and
doesn't require OS calls.

Out of interest, why do we have .is_dir and .stat_result but .fullname
rather than .full_name? .fullname seems reasonable to me, but maybe
consistency is a good thing here?

> - It must not be possible to "refresh" a DirEntry object. Call
> os.stat(entry.fullname()) or pathlib.Path(entry.fullname()) to get
> fresh data.

Re: [Python-Dev] My summary of the scandir (PEP 471)

2014-07-01 Thread Victor Stinner

2014-07-01 15:00 GMT+02:00 Ben Hoyt :
> (a) it doesn't call stat for you (on POSIX), so you have to check an
> attribute and call scandir manually if you need it,

Yes, and that's something common when you use the os module. For
example, don't try to call os.fork(), os.getgid() or os.fchmod() on
Windows :-) Closer to your PEP, the following OS attributes are only
available on UNIX: st_blocks, st_blksize, st_rdev, st_flags; and
st_file_attributes is only available on Windows.

I don't think that using lstat_result is a common need when browsing a
directoy. In most cases, you only need is_dir() and the name
attribute.

> 1) the original proposal in the current version of PEP 471, where
> DirEntry has an .lstat() method which calls stat() on POSIX but is
> free on Windows

On UNIX, does it mean that .lstat() calls os.lstat() at the first
call, and then always return the same result? It would be different
than os.lstat() and pathlib.Path.stat() :-( I would prefer to have the
same behaviour than pathlib and os (you know, the well known
consistency of Python stdlib). As I wrote, I expect a function call to
always retrieve the new status.

> 2) Nick Coghlan's proposal on the previous thread
> (https://mail.python.org/pipermail/python-dev/2014-June/135261.html)
> suggesting an ensure_lstat keyword param to scandir if you need the
> lstat_result value

I don't like this idea because it makes error handling more complex.
The syntax to catch exceptions on an iterator is verbose (while: try:
next() except ...).

Whereas calling os.lstat(entry.fullname()) is explicit and it's easy
to surround it with try/except.


> .lstat_result being None sometimes (on POSIX),

Don't do that, it's not how Python handles portability. We use hasattr().


> would it ever really happen that readdir() would succeed but an os.stat() 
> immediately after would fail?

Yes, it can happen. The filesystem is system-wide and shared by all
users. The file can be deleted.


> Really, are bytes filenames deprecated?

Yes, in all functions of the os module since Python 3.3. I'm sure
because I implemented the deprecation :-)

Try open(b'test.txt', w') on Windows with python -Werror.


> I think maybe they should be, as they don't work on Windows :-)

Windows has an API dedicated to bytes filenames, the ANSI API. But
this API has annoying bugs: it replaces unencodable characters by
question marks, and there is no option to be noticed on the encoding
error.

Different users complained about that. It was decided to not change
Python since Python is a light wrapper over the kernel system calls.
But bytes filenames are now deprecated to advice users to use the
native type for filenames on Windows: Unicode!


> but the latest Python "os" docs
> (https://docs.python.org/3.5/library/os.html) still say that all
> functions that accept path names accept either str or bytes,

Maybe I forgot to update the documentation :-(


> So I think scandir() should do the same thing.

You may support scandir(bytes) on Windows but you will need to emit a
deprecation warning too. (which are silent by default.)

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Excess help() output

2014-07-01 Thread anatoly techtonik

Hi,

The help() output is confusing for beginners:

  >>> class B(object):
  ...   pass
  ...
  >>> help(B)
  Help on class B in module __main__:

  class B(__builtin__.object)
   |  Data descriptors defined here:
   |
   |  __dict__
   |  dictionary for instance variables (if defined)
   |
   |  __weakref__
   |  list of weak references to the object (if defined)

Is it possible to remove this section from help output?
Why is it here at all?

  >>> dir(B)
  ['__class__', '__delattr__', '__dict__', '__doc__', '__format__',
'__getattribute__', '__hash__', '__init__', '__module__', '__new__',
'__reduce__', '__reduce_ex__', '__repr__', '__setattr__',
'__sizeof__', '__str__', '__subclasshook__', '__weakref__']

-- 
anatoly t.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] My summary of the scandir (PEP 471)

2014-07-01 Thread Jonas Wielicki

On 01.07.2014 15:00, Ben Hoyt wrote:
> I'm leaning towards preferring #2 (Nick's proposal) because it solves
> or gets around the caching issue. My one concern is error handling. Is
> it an issue if scandir's __next__ can raise an OSError either from the
> readdir() call or the call to stat()? My thinking is probably not. In
> practice, would it ever really happen that readdir() would succeed but
> an os.stat() immediately after would fail? I guess it could if the
> file is deleted, but then if it were deleted a microsecond earlier the
> readdir() would fail anyway, or not? Or does readdir give you a
> consistent, "snap-shotted" view on things?

No need for a microsecond-timed deletion -- a directory with +r but
without +x will allow you to list the entries, but stat calls on the
files will fail with EPERM:

$ ls -l
drwxr--r--.   2 root root60  1. Jul 16:52 test

$ sudo ls -l test
total 0
-rw-r--r--. 1 root root 0  1. Jul 16:52 foo

$ ls test
ls: cannot access test/foo: Permission denied
total 0
-? ? ? ? ? ? foo

$ stat test/foo
stat: cannot stat ‘test/foo’: Permission denied

I had the idea to treat a failing lstat() inside scandir() as if the
entry wasn’t found at all, but in this context, this seems wrong too.

regards,
jwi


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] My summary of the scandir (PEP 471)

2014-07-01 Thread Ben Hoyt

> No need for a microsecond-timed deletion -- a directory with +r but
> without +x will allow you to list the entries, but stat calls on the
> files will fail with EPERM:

Ah -- very good to know, thanks. This definitely points me in the
direction of wanting better control over error handling.

Speaking of errors, and thinking of handling errors during iteration
-- in what cases (if any) would an individual readdir fail if the
opendir succeeded?

-Ben
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] My summary of the scandir (PEP 471)

2014-07-01 Thread Nick Coghlan

On 1 Jul 2014 07:31, "Victor Stinner"  wrote:
>
> 2014-07-01 15:00 GMT+02:00 Ben Hoyt :

> > 2) Nick Coghlan's proposal on the previous thread
> > (https://mail.python.org/pipermail/python-dev/2014-June/135261.html)
> > suggesting an ensure_lstat keyword param to scandir if you need the
> > lstat_result value
>
> I don't like this idea because it makes error handling more complex.
> The syntax to catch exceptions on an iterator is verbose (while: try:
> next() except ...).

Actually, we may need to copy the os.walk API and accept an "onerror"
callback as a scandir argument. Regardless of whether or not we have
"ensure_lstat", the iteration step could fail, so I don't believe we can
just transfer the existing approach of catching exceptions from the listdir
call.

> Whereas calling os.lstat(entry.fullname()) is explicit and it's easy
> to surround it with try/except.
>
>
> > .lstat_result being None sometimes (on POSIX),
>
> Don't do that, it's not how Python handles portability. We use hasattr().

That's not true in general - we do either, depending on context.

With the addition of an os.walk style onerror callback, I'm still in favour
of a "get_lstat" flag (tweaked as Ben suggests to always be None unless
requested, so Windows code is less likely to be inadvertently non-portable)

> > would it ever really happen that readdir() would succeed but an
os.stat() immediately after would fail?
>
> Yes, it can happen. The filesystem is system-wide and shared by all
> users. The file can be deleted.

We need per-iteration error handling for the readdir call anyway, so I
think an onerror callback is a better option than dropping the ability to
easily obtain full stat information as part of the iteration.

Cheers,
Nick.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] My summary of the scandir (PEP 471)

2014-07-01 Thread Ben Hoyt

> We need per-iteration error handling for the readdir call anyway, so I think
> an onerror callback is a better option than dropping the ability to easily
> obtain full stat information as part of the iteration.

I don't mind the idea of an "onerror" callback, but it's adding
complexity. Putting aside the question of caching/timing for a second
and assuming .lstat() as per the current PEP 471, do we really need
per-iteration error handling for readdir()? When would that actually
fail in practice?

-Ben
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] My summary of the scandir (PEP 471)

2014-07-01 Thread Ethan Furman


On 07/01/2014 07:59 AM, Jonas Wielicki wrote:


I had the idea to treat a failing lstat() inside scandir() as if the
entry wasn’t found at all, but in this context, this seems wrong too.


Well, os.walk supports passing in an error handler -- perhaps scandir should as 
well.

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-07-01 Thread Janzert


On 6/26/2014 6:59 PM, Ben Hoyt wrote:

Rationale
=

Python's built-in ``os.walk()`` is significantly slower than it needs
to be, because -- in addition to calling ``os.listdir()`` on each
directory -- it executes the system call ``os.stat()`` or
``GetFileAttributes()`` on each file to determine whether the entry is
a directory or not.

But the underlying system calls -- ``FindFirstFile`` /
``FindNextFile`` on Windows and ``readdir`` on Linux and OS X --
already tell you whether the files returned are directories or not, so
no further system calls are needed. In short, you can reduce the
number of system calls from approximately 2N to N, where N is the
total number of files and directories in the tree. (And because
directory trees are usually much wider than they are deep, it's often
much better than this.)



One of the major reasons for this seems to be efficiently using 
information that is already available from the OS "for free". 
Unfortunately it seems that the current API and most of the leading 
alternate proposals hide from the user what information is actually 
there "free" and what is going to incur an extra cost.


I would prefer an API that simply gives whatever came for free from the 
OS and then let the user decide if the extra expense is worth the extra 
information. Maybe that stat information was only going to be used for 
an informational log that can be skipped if it's going to incur extra 
expense?


Janzert

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 471: scandir(fd) and pathlib.Path(name, dir_fd=None)

2014-07-01 Thread Akira Li

Ben Hoyt  writes:

> Thanks, Victor.
>
> I don't have any experience with dir_fd handling, so unfortunately
> can't really comment here.
>
> What advantages does it bring? I notice that even os.listdir() on
> Python 3.4 doesn't have anything related to file descriptors, so I'd
> be in favour of not including support. We can always add it later.
>
> -Ben

FYI, os.listdir does support file descriptors in Python 3.3+ try:

  >>> import os
  >>> os.listdir(os.open('.', os.O_RDONLY))

NOTE: os.supports_fd and os.supports_dir_fd are different sets.

See also,
https://mail.python.org/pipermail/python-dev/2014-June/135265.html


--
Akira


P.S. Please, don't put your answer on top of the message you are
replying to.

>
> On Tue, Jul 1, 2014 at 3:44 AM, Victor Stinner  
> wrote:
>> Hi,
>>
>> IMO we must decide if scandir() must support or not file descriptor.
>> It's an important decision which has an important impact on the API.
>>
>>
>> To support scandir(fd), the minimum is to store dir_fd in DirEntry:
>> dir_fd would be None for scandir(str).
>>
>>
>> scandir(fd) must not close the file descriptor, it should be done by
>> the caller. Handling the lifetime of the file descriptor is a
>> difficult problem, it's better to let the user decide how to handle
>> it.
>>
>> There is the problem of the limit of open file descriptors, usually
>> 1024 but it can be lower. It *can* be an issue for very deep file
>> hierarchy.
>>
>> If we choose to support scandir(fd), it's probably safer to not use
>> scandir(fd) by default in os.walk() (use scandir(str) instead), wait
>> until the feature is well tested, corner cases are well known, etc.
>>
>>
>> The second step is to enhance pathlib.Path to support an optional file
>> descriptor. Path already has methods on filenames like chmod(),
>> exists(), rename(), etc.
>>
>>
>> Example:
>>
>> fd = os.open(path, os.O_DIRECTORY)
>> try:
>>for entry in os.scandir(fd):
>>   # ... use entry to benefit of entry cache: is_dir(), lstat_result ...
>>   path = pathlib.Path(entry.name, dir_fd=entry.dir_fd)
>>   # ... use path which uses dir_fd ...
>> finally:
>> os.close(fd)
>>
>> Problem: if the path object is stored somewhere and use after the
>> loop, Path methods will fail because dir_fd was closed. It's even
>> worse if a new directory uses the same file descriptor :-/ (security
>> issue, or at least tricky bugs!)
>>
>> Victor
>> ___
>> Python-Dev mailing list
>> Python-Dev@python.org
>> https://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe: 
>> https://mail.python.org/mailman/options/python-dev/benhoyt%40gmail.com

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] My summary of the scandir (PEP 471)

2014-07-01 Thread Nick Coghlan

On 1 July 2014 08:42, Ben Hoyt  wrote:
>> We need per-iteration error handling for the readdir call anyway, so I think
>> an onerror callback is a better option than dropping the ability to easily
>> obtain full stat information as part of the iteration.
>
> I don't mind the idea of an "onerror" callback, but it's adding
> complexity. Putting aside the question of caching/timing for a second
> and assuming .lstat() as per the current PEP 471, do we really need
> per-iteration error handling for readdir()? When would that actually
> fail in practice?

An NFS mount dropping the connection or a USB key being removed are
the first that come to mind, but I expect there are others. I find
it's generally better to just assume that any system call may fail for
obscure reasons and put the infrastructure in place to deal with it
rather than getting ugly, hard to track down bugs later.

Cheers,
Nick.



-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Network Security Backport Status

2014-07-01 Thread Alex Gaynor

Hi all,

I wanted to bring everyone up to speed on the status of PEP 466, what's been
completed, and what's left to do.

First the completed stuff:

* hmac.compare_digest
* hashlib.pbkdf2_hmac

Are both backported, and I've added support to use them in Django, so users
should start seeing these benefits just as soon as we get a Python release into
their hands.

Now the uncompleted stuff:

* Persistent file descriptor for ``os.urandom``
* SSL module

It's the SSL module that I'll spend the rest of this email talking about.


Backporting the features from the Python3 version of this module has proven
more difficult than I had expected. This is primarily because the stdlib took a
maintenance strategy that was different from what most Python projects have
done for their 2/3 support: multiple independent codebases.

I've tried a few different strategies for the backport, none of which has
worked:

* Copying the ``ssl.py``, ``test_ssl.py``, and ``_ssl.c`` files from Python3
  and trying to port all the code.
* Coping just ``test_ssl.py`` and then copying individual chunks/functions as
  necessary to get stuff to pass.
* Manually doing stuff.

All of these proved to be a massive undertaking, and made it too easy to
accidentally introduce breaking changes.

I've come up with a new approach, which I believe is most likely to be
successful, but I'll need help to implement it.

The idea is to find the most recent commit which is a parent of both the
``2.7`` and ``default`` branches. Then take every single change to an ``ssl``
related file on the ``default`` branch, and attempt to replay it on the ``2.7``
branch. Require manual review on each commit to make sure it compiles, and to
ensure it doesn't make any backwards incompatible changes.

I think this provides the most iterative and guided approach to getting this
done.

I can do all the work of reviewing each commit, but I need some help from a
mercurial expert to automate the cherry-picking/rebasing of every single
commit.


What do folks think? Does this approach make sense? Anyone willing to help with
the mercurial scripting?

Cheers,
Alex

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Network Security Backport Status

2014-07-01 Thread Nick Coghlan

On 1 Jul 2014 11:28, "Alex Gaynor"  wrote:
>
> I've come up with a new approach, which I believe is most likely to be
> successful, but I'll need help to implement it.
>
> The idea is to find the most recent commit which is a parent of both the
> ``2.7`` and ``default`` branches. Then take every single change to an
``ssl``
> related file on the ``default`` branch, and attempt to replay it on the
``2.7``
> branch. Require manual review on each commit to make sure it compiles,
and to
> ensure it doesn't make any backwards incompatible changes.
>
> I think this provides the most iterative and guided approach to getting
this
> done.

Sounds promising, although it may still have some challenges if the SSL
code depends on earlier changes to other code.

> I can do all the work of reviewing each commit, but I need some help from
a
> mercurial expert to automate the cherry-picking/rebasing of every single
> commit.
>
> What do folks think? Does this approach make sense? Anyone willing to
help with
> the mercurial scripting?

For the Mercurial part, it's probably worth posing that as a Stack Overflow
question:

Given two named branches in http://hg.python.org  (default and 2.7) and 4
files (Python module, C module, tests, docs):
- find the common ancestor
- find all the commits affecting those files on default & graft them to 2.7
(with a chance to test and edit each one first)

It's just a better environment for asking & answering that kind of question
:)

Cheers,
Nick.

>
> Cheers,
> Alex
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] My summary of the scandir (PEP 471)

2014-07-01 Thread Jonas Wielicki

On 01.07.2014 17:30, Ben Hoyt wrote:
>> No need for a microsecond-timed deletion -- a directory with +r but
>> without +x will allow you to list the entries, but stat calls on the
>> files will fail with EPERM:
> 
> Ah -- very good to know, thanks. This definitely points me in the
> direction of wanting better control over error handling.
> 
> Speaking of errors, and thinking of handling errors during iteration
> -- in what cases (if any) would an individual readdir fail if the
> opendir succeeded?

readdir(3) manpage suggests that readdir can only fail if an invalid
directory fd was passed.

regards,
jwi

> 
> -Ben
> 

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Network Security Backport Status

2014-07-01 Thread Antoine Pitrou


Le 01/07/2014 14:26, Alex Gaynor a écrit :


I can do all the work of reviewing each commit, but I need some help from a
mercurial expert to automate the cherry-picking/rebasing of every single
commit.

What do folks think? Does this approach make sense? Anyone willing to help with
the mercurial scripting?


I don't think this makes much sense; Mercurial won't be smarter than you 
are. I think you'd have a better chance of succeeding by backporting one 
feature at a time. IMO, you'd first want to backport the _SSLContext 
base class and SSLContext.wrap_socket(). The latter *will* require some 
manual coding to adapt to 2.7's different SSLSocket implementation, not 
just applying patch hunks around.


Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Network Security Backport Status

2014-07-01 Thread Guido van Rossum

I have to agree with Antoine -- I don't think there's a shortcut that
avoids *someone* actually having to understand the code to the point of
being able to recreate the same behavior in the different context (pun not
intended) of Python 2.


On Tue, Jul 1, 2014 at 1:54 PM, Antoine Pitrou  wrote:

> Le 01/07/2014 14:26, Alex Gaynor a écrit :
>
>
>> I can do all the work of reviewing each commit, but I need some help from
>> a
>> mercurial expert to automate the cherry-picking/rebasing of every single
>> commit.
>>
>> What do folks think? Does this approach make sense? Anyone willing to
>> help with
>> the mercurial scripting?
>>
>
> I don't think this makes much sense; Mercurial won't be smarter than you
> are. I think you'd have a better chance of succeeding by backporting one
> feature at a time. IMO, you'd first want to backport the _SSLContext base
> class and SSLContext.wrap_socket(). The latter *will* require some manual
> coding to adapt to 2.7's different SSLSocket implementation, not just
> applying patch hunks around.
>
> Regards
>
> Antoine.
>
>
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> guido%40python.org
>



-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] My summary of the scandir (PEP 471)

2014-07-01 Thread Paul Moore

On 1 July 2014 14:00, Ben Hoyt  wrote:
> 2) Nick Coghlan's proposal on the previous thread
> (https://mail.python.org/pipermail/python-dev/2014-June/135261.html)
> suggesting an ensure_lstat keyword param to scandir if you need the
> lstat_result value
>
> I would make one small tweak to Nick Coghlan's proposal to make
> writing cross-platform code easier. Instead of .lstat_result being
> None sometimes (on POSIX), have it None always unless you specify
> ensure_lstat=True. (Actually, call it get_lstat=True to kind of make
> this more obvious.) Per (b) above, this means Windows developers
> wouldn't accidentally write code which failed on POSIX systems -- it'd
> fail fast on Windows too if you accessed .lstat_result without
> specifying get_lstat=True.

This is getting very complicated (at least to me, as a Windows user,
where the basic idea seems straightforward).

It seems to me that the right model is the standard "thin wrapper
round the OS feature" that acts as a building block - it's typical of
the rest of the os module. I think that thin wrapper is needed - even
if the various bells and whistles are useful, they can be built on top
of a low-level version (whereas the converse is not the case).
Typically, such thin wrappers expose POSIX semantics by default, and
Windows behaviour follows as closely as possible (see for example
stat, where st_ino makes no sense on Windows, but is present). In this
case, we're exposing Windows semantics, and POSIX is the one needing
to fit the model, but the principle is the same.

On that basis, optional attributes (as used in stat results) seem
entirely sensible.

The documentation for DirEntry could easily be written to parallel
that of a stat result:

"""
The return value is an object whose attributes correspond to the data
the OS returns about a directory entry:

  * name - the object's name
  * full_name - the object's full name (including path)
  * is_dir - whether the object is a directory
  * is file - whether the object is a plain file
  * is_symlink - whether the object is a symbolic link

On Windows, the following attributes are also available

  * st_size - the size, in bytes, of the object (only meaningful for files)
  * st_atime - time of last access
  * st_mtime - time of last write
  * st_ctime - time of creation
  * st_file_attributes - Windows file attribute bits (see the
FILE_ATTRIBUTE_* constants in the stat module)
"""

That's no harder to understand (or to work with) than the equivalent
stat result. The only difference is that the unavailable attributes
can be queried on POSIX, there's just a separate system call involved
(with implications in terms of performance, error handling and
potential race conditions).

The version of scandir with the ensure_lstat argument is easy to write
based on one with optional arguments (I'm playing fast and loose with
adding attributes to DirEntry values here, just for the sake of an
example - the details are left as an exercise)

def scandir_ensure(path='.', ensure_lstat=False):
for entry in os.scandir(path):
if ensure_lstat and not hasattr(entry, 'st_size'):
stat_data = os.lstat(entry.full_name)
entry.st_size = stat_data.st_size
entry.st_atime = stat_data.st_atime
entry.st_mtime = stat_data.st_mtime
entry.st_ctime = stat_data.st_ctime
# Ignore file_attributes, as we'll never get here on Windows
yield entry

Variations on how you handle errors in the lstat call, etc, can be
added to taste.

Please, let's stick to a low-level wrapper round the OS API for the
first iteration of this feature. Enhancements can be added later, when
real-world usage has proved their value.

Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] My summary of the scandir (PEP 471)

2014-07-01 Thread Glenn Linderman


On 7/1/2014 2:20 PM, Paul Moore wrote:

Please, let's stick to a low-level wrapper round the OS API for the
first iteration of this feature. Enhancements can be added later, when
real-world usage has proved their value.


I almost wrote this whole message this morning, but didn't have time.  
Thanks, Paul, for digging through the details.


+1
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] My summary of the scandir (PEP 471)

2014-07-01 Thread Ethan Furman


On 07/01/2014 02:20 PM, Paul Moore wrote:


Please, let's stick to a low-level wrapper round the OS API for the
first iteration of this feature. Enhancements can be added later, when
real-world usage has proved their value.


+1
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] My summary of the scandir (PEP 471)

2014-07-01 Thread Chris Angelico

On Wed, Jul 2, 2014 at 7:20 AM, Paul Moore  wrote:
> I think that thin wrapper is needed - even
> if the various bells and whistles are useful, they can be built on top
> of a low-level version (whereas the converse is not the case).

+1. Make everything as simple as possible (but no simpler).

ChrisA
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] [RELEASE] Python 2.7.8

2014-07-01 Thread Benjamin Peterson

Greetings,
I have the distinct privilege of informing you that the latest release
of the Python 2.7 series, 2.7.8, has been released and is available for
download. 2.7.8 contains several important regression fixes and security
changes:
  - The openssl version bundled in the Windows installer has been
  updated.
  - A regression in the mimetypes module on Windows has been fixed. [1]
  - A possible overflow in the buffer type has been fixed. [2]
  - A bug in the CGIHTTPServer module which allows arbitrary execution
  of code in the server root has been patched. [3]
  - A regression in the handling of UNC paths in os.path.join has been
  fixed. [4]

Downloads of 2.7.8 are at

https://www.python.org/download/releases/2.7.8/

The full changelog is located at

http://hg.python.org/cpython/raw-file/v2.7.8/Misc/NEWS

This is a production release. As always, please report bugs to

http://bugs.python.org/

Till next time,
Benjamin Peterson
2.7 Release Manager
(on behalf of all of Python's contributors)

[1] http://bugs.python.org/issue21652
[2] http://bugs.python.org/issue21831
[3] http://bugs.python.org/issue21766
[4] http://bugs.python.org/issue21672
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] My summary of the scandir (PEP 471)

2014-07-01 Thread Nick Coghlan

On 1 July 2014 14:20, Paul Moore  wrote:
> On 1 July 2014 14:00, Ben Hoyt  wrote:
>> 2) Nick Coghlan's proposal on the previous thread
>> (https://mail.python.org/pipermail/python-dev/2014-June/135261.html)
>> suggesting an ensure_lstat keyword param to scandir if you need the
>> lstat_result value
>>
>> I would make one small tweak to Nick Coghlan's proposal to make
>> writing cross-platform code easier. Instead of .lstat_result being
>> None sometimes (on POSIX), have it None always unless you specify
>> ensure_lstat=True. (Actually, call it get_lstat=True to kind of make
>> this more obvious.) Per (b) above, this means Windows developers
>> wouldn't accidentally write code which failed on POSIX systems -- it'd
>> fail fast on Windows too if you accessed .lstat_result without
>> specifying get_lstat=True.
>
> This is getting very complicated (at least to me, as a Windows user,
> where the basic idea seems straightforward).
>
> It seems to me that the right model is the standard "thin wrapper
> round the OS feature" that acts as a building block - it's typical of
> the rest of the os module. I think that thin wrapper is needed - even
> if the various bells and whistles are useful, they can be built on top
> of a low-level version (whereas the converse is not the case).
> Typically, such thin wrappers expose POSIX semantics by default, and
> Windows behaviour follows as closely as possible (see for example
> stat, where st_ino makes no sense on Windows, but is present). In this
> case, we're exposing Windows semantics, and POSIX is the one needing
> to fit the model, but the principle is the same.
>
> On that basis, optional attributes (as used in stat results) seem
> entirely sensible.
>
> The documentation for DirEntry could easily be written to parallel
> that of a stat result:
>
> """
> The return value is an object whose attributes correspond to the data
> the OS returns about a directory entry:
>
>   * name - the object's name
>   * full_name - the object's full name (including path)
>   * is_dir - whether the object is a directory
>   * is file - whether the object is a plain file
>   * is_symlink - whether the object is a symbolic link
>
> On Windows, the following attributes are also available
>
>   * st_size - the size, in bytes, of the object (only meaningful for files)
>   * st_atime - time of last access
>   * st_mtime - time of last write
>   * st_ctime - time of creation
>   * st_file_attributes - Windows file attribute bits (see the
> FILE_ATTRIBUTE_* constants in the stat module)
> """
>
> That's no harder to understand (or to work with) than the equivalent
> stat result. The only difference is that the unavailable attributes
> can be queried on POSIX, there's just a separate system call involved
> (with implications in terms of performance, error handling and
> potential race conditions).
>
> The version of scandir with the ensure_lstat argument is easy to write
> based on one with optional arguments (I'm playing fast and loose with
> adding attributes to DirEntry values here, just for the sake of an
> example - the details are left as an exercise)
>
> def scandir_ensure(path='.', ensure_lstat=False):
> for entry in os.scandir(path):
> if ensure_lstat and not hasattr(entry, 'st_size'):
> stat_data = os.lstat(entry.full_name)
> entry.st_size = stat_data.st_size
> entry.st_atime = stat_data.st_atime
> entry.st_mtime = stat_data.st_mtime
> entry.st_ctime = stat_data.st_ctime
> # Ignore file_attributes, as we'll never get here on Windows
> yield entry
>
> Variations on how you handle errors in the lstat call, etc, can be
> added to taste.
>
> Please, let's stick to a low-level wrapper round the OS API for the
> first iteration of this feature. Enhancements can be added later, when
> real-world usage has proved their value.

+1 from me - especially if this recipe goes in at least the PEP, and
potentially even the docs.

I'm also OK with postponing onerror support for the time being - that
should be straightforward to add later if we decide we need it.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] PEP 471: scandir(fd) and pathlib.Path(name, dir_fd=None)

[Python-Dev] My summary of the scandir (PEP 471)

Re: [Python-Dev] PEP 471: scandir(fd) and pathlib.Path(name, dir_fd=None)

Re: [Python-Dev] PEP 471: scandir(fd) and pathlib.Path(name, dir_fd=None)

Re: [Python-Dev] My summary of the scandir (PEP 471)

Re: [Python-Dev] My summary of the scandir (PEP 471)

[Python-Dev] Excess help() output

Re: [Python-Dev] My summary of the scandir (PEP 471)

Re: [Python-Dev] My summary of the scandir (PEP 471)

Re: [Python-Dev] My summary of the scandir (PEP 471)

Re: [Python-Dev] My summary of the scandir (PEP 471)

Re: [Python-Dev] My summary of the scandir (PEP 471)

Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

Re: [Python-Dev] PEP 471: scandir(fd) and pathlib.Path(name, dir_fd=None)

Re: [Python-Dev] My summary of the scandir (PEP 471)

[Python-Dev] Network Security Backport Status

Re: [Python-Dev] Network Security Backport Status

Re: [Python-Dev] My summary of the scandir (PEP 471)

Re: [Python-Dev] Network Security Backport Status

Re: [Python-Dev] Network Security Backport Status

Re: [Python-Dev] My summary of the scandir (PEP 471)

Re: [Python-Dev] My summary of the scandir (PEP 471)

Re: [Python-Dev] My summary of the scandir (PEP 471)

Re: [Python-Dev] My summary of the scandir (PEP 471)

[Python-Dev] [RELEASE] Python 2.7.8

Re: [Python-Dev] My summary of the scandir (PEP 471)

26 matches

Site Navigation

Mail list logo

Footer information