Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-07-01 Thread Victor Stinner
2014-07-01 4:04 GMT+02:00 Glenn Linderman v+pyt...@g.nevcal.com:
 +0 for stat fields to be None on all platforms unless ensure_lstat=True.

 This won't work well if lstat info is only needed for some entries. Is
 that a common use-case? It was mentioned earlier in the thread.

 If it is, use ensure_lstat=False, and use the proposed (by me) .refresh()
 API to update the data for those that need it.

We should make DirEntry as simple as possible. In Python, the classic
behaviour is to not define an attribute if it's not available on a
platform. For example, stat().st_file_attributes is only available on
Windows.

I don't like the idea of the ensure_lstat parameter because os.scandir
would have to call two system calls, it makes harder to guess which
syscall failed (readdir or lstat). If you need lstat on UNIX, write:

if hasattr(entry, 'lstat_result'):
size = entry.lstat_result.st_size
else:
size = os.lstat(entry.fullname()).st_size

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] PEP 471: scandir(fd) and pathlib.Path(name, dir_fd=None)

2014-07-01 Thread Victor Stinner
Hi,

IMO we must decide if scandir() must support or not file descriptor.
It's an important decision which has an important impact on the API.


To support scandir(fd), the minimum is to store dir_fd in DirEntry:
dir_fd would be None for scandir(str).


scandir(fd) must not close the file descriptor, it should be done by
the caller. Handling the lifetime of the file descriptor is a
difficult problem, it's better to let the user decide how to handle
it.

There is the problem of the limit of open file descriptors, usually
1024 but it can be lower. It *can* be an issue for very deep file
hierarchy.

If we choose to support scandir(fd), it's probably safer to not use
scandir(fd) by default in os.walk() (use scandir(str) instead), wait
until the feature is well tested, corner cases are well known, etc.


The second step is to enhance pathlib.Path to support an optional file
descriptor. Path already has methods on filenames like chmod(),
exists(), rename(), etc.


Example:

fd = os.open(path, os.O_DIRECTORY)
try:
   for entry in os.scandir(fd):
  # ... use entry to benefit of entry cache: is_dir(), lstat_result ...
  path = pathlib.Path(entry.name, dir_fd=entry.dir_fd)
  # ... use path which uses dir_fd ...
finally:
os.close(fd)

Problem: if the path object is stored somewhere and use after the
loop, Path methods will fail because dir_fd was closed. It's even
worse if a new directory uses the same file descriptor :-/ (security
issue, or at least tricky bugs!)

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471: scandir(fd) and pathlib.Path(name, dir_fd=None)

2014-07-01 Thread Ben Hoyt
Thanks, Victor.

I don't have any experience with dir_fd handling, so unfortunately
can't really comment here.

What advantages does it bring? I notice that even os.listdir() on
Python 3.4 doesn't have anything related to file descriptors, so I'd
be in favour of not including support. We can always add it later.

-Ben

On Tue, Jul 1, 2014 at 3:44 AM, Victor Stinner victor.stin...@gmail.com wrote:
 Hi,

 IMO we must decide if scandir() must support or not file descriptor.
 It's an important decision which has an important impact on the API.


 To support scandir(fd), the minimum is to store dir_fd in DirEntry:
 dir_fd would be None for scandir(str).


 scandir(fd) must not close the file descriptor, it should be done by
 the caller. Handling the lifetime of the file descriptor is a
 difficult problem, it's better to let the user decide how to handle
 it.

 There is the problem of the limit of open file descriptors, usually
 1024 but it can be lower. It *can* be an issue for very deep file
 hierarchy.

 If we choose to support scandir(fd), it's probably safer to not use
 scandir(fd) by default in os.walk() (use scandir(str) instead), wait
 until the feature is well tested, corner cases are well known, etc.


 The second step is to enhance pathlib.Path to support an optional file
 descriptor. Path already has methods on filenames like chmod(),
 exists(), rename(), etc.


 Example:

 fd = os.open(path, os.O_DIRECTORY)
 try:
for entry in os.scandir(fd):
   # ... use entry to benefit of entry cache: is_dir(), lstat_result ...
   path = pathlib.Path(entry.name, dir_fd=entry.dir_fd)
   # ... use path which uses dir_fd ...
 finally:
 os.close(fd)

 Problem: if the path object is stored somewhere and use after the
 loop, Path methods will fail because dir_fd was closed. It's even
 worse if a new directory uses the same file descriptor :-/ (security
 issue, or at least tricky bugs!)

 Victor
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 https://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: 
 https://mail.python.org/mailman/options/python-dev/benhoyt%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471: scandir(fd) and pathlib.Path(name, dir_fd=None)

2014-07-01 Thread Victor Stinner
2014-07-01 14:26 GMT+02:00 Ben Hoyt benh...@gmail.com:
 Thanks, Victor.

 I don't have any experience with dir_fd handling, so unfortunately
 can't really comment here.

 What advantages does it bring? I notice that even os.listdir() on
 Python 3.4 doesn't have anything related to file descriptors, so I'd
 be in favour of not including support.

See https://docs.python.org/dev/library/os.html#dir-fd

The idea is to make sure that you get files from the same directory.
Problems occur when a directory is moved or a symlink is modified.
Example:

- you're browsing /tmp/test/x as root (!), /tmp/copy/passwd is owned
by www user (website)
- you would like to remove the file x: call unlink(/tmp/copy/passwd)
- ... but just before that, an attacker replaces the /tmp/copy
directory with a symlink to /etc
- you will remove /etc/passwd instead of /tmp/copy/passwd, oh oh

Using unlink(passwd, dir_fd=tmp_copy_fd), you don't have this issue.
You are sure that you are working in /tmp/copy directory.

You can imagine a lot of other scenarios to override files and read
sensitive files.

Hopefully, the Linux rm commands knows unlinkat() sycall ;-)

haypo@selma$ mkdir -p a/b/c
haypo@selma$ strace -e unlinkat rm -rf a
unlinkat(5, c, AT_REMOVEDIR)  = 0
unlinkat(4, b, AT_REMOVEDIR)  = 0
unlinkat(AT_FDCWD, a, AT_REMOVEDIR)   = 0
+++ exited with 0 +++

We should implement a similar think in shutil.rmtree().

See also os.fwalk() which is a version of os.walk() providing dir_fd.

 We can always add it later.

I would prefer to discuss that right now. My proposition is to accept
an int for scandir() and copy the int into DirEntry.dir_fd. It's not
that complex :-)

The enhancement of the pathlib module can be done later. By the way, I
know that Antoine Pitrou wanted to implemented file descriptors in
pathlib, but the feature was rejected or at least delayed.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] My summary of the scandir (PEP 471)

2014-07-01 Thread Ben Hoyt
Thanks for spinning this off to (hopefully) finished the discussion. I
agree it's nearly time to update the PEP.

 @Ben: it's time to update your PEP to complete it with this
 discussion! IMO DirEntry must be as simple as possible and portable:

 - os.scandir(str)
 - DirEntry.lstat_result object only available on Windows, same result
 than os.lstat()
 - DirEntry.fullname(): os.path.join(directory, DirEntry.name), where
 directory would be an hidden attribute of DirEntry

I'm quite strongly against this, and I think it's actually the worst
of both worlds. It is not as good an API because:

(a) it doesn't call stat for you (on POSIX), so you have to check an
attribute and call scandir manually if you need it, turning what
should be one line of code into four. Your proposal above was kind of
how I had it originally, where you had to do extra tests and call
scandir manually if you needed it (see
https://mail.python.org/pipermail/python-dev/2013-May/126119.html)
(b) the .lstat_result attribute is available on Windows but not on
POSIX, meaning it's very easy for Windows developers to write code
that will run and work fine on Windows, but then break horribly on
POSIX; I think it'd be better if it broke hard on Windows to make
writing cross-platform code easy

The two alternates are:

1) the original proposal in the current version of PEP 471, where
DirEntry has an .lstat() method which calls stat() on POSIX but is
free on Windows
2) Nick Coghlan's proposal on the previous thread
(https://mail.python.org/pipermail/python-dev/2014-June/135261.html)
suggesting an ensure_lstat keyword param to scandir if you need the
lstat_result value

I would make one small tweak to Nick Coghlan's proposal to make
writing cross-platform code easier. Instead of .lstat_result being
None sometimes (on POSIX), have it None always unless you specify
ensure_lstat=True. (Actually, call it get_lstat=True to kind of make
this more obvious.) Per (b) above, this means Windows developers
wouldn't accidentally write code which failed on POSIX systems -- it'd
fail fast on Windows too if you accessed .lstat_result without
specifying get_lstat=True.

I'm still unsure which of these I like better. I think #1's API is
slightly nicer without the ensure_lstat parameter, and error handling
of the stat() is more explicit. But #2 always fetches the stat info at
the same time as the dir entry info, so eliminates the problem of
having the file info change between scandir iteration and the .lstat()
call.

I'm leaning towards preferring #2 (Nick's proposal) because it solves
or gets around the caching issue. My one concern is error handling. Is
it an issue if scandir's __next__ can raise an OSError either from the
readdir() call or the call to stat()? My thinking is probably not. In
practice, would it ever really happen that readdir() would succeed but
an os.stat() immediately after would fail? I guess it could if the
file is deleted, but then if it were deleted a microsecond earlier the
readdir() would fail anyway, or not? Or does readdir give you a
consistent, snap-shotted view on things?

The one other thing I'm not quite sure about with Nick's proposal is
the name .lstat_result, as it's long. I can see why he suggested that,
as .lstat sounds like a verb, but maybe that's okay? If we can have
.is_dir and .is_file as attributes, my thinking is an .lstat attribute
is fine too. I don't feel too strongly though.

 - I don't think that we should support scandir(bytes). If you really
 want to support os.scandir(bytes), it must raise an error on Windows
 since bytes filename are already deprecated. It wouldn't make sense to
 add new function with a deprecated feature. Since we have the PEP 383
 (surrogateescape), it's better to advice to use Unicode on all
 platforms. Almost all Python functions are able to encode back Unicode
 filename automatically. Use os.fsencode() to encode manually if needd.

Really, are bytes filenames deprecated? I think maybe they should be,
as they don't work on Windows :-), but the latest Python os docs
(https://docs.python.org/3.5/library/os.html) still say that all
functions that accept path names accept either str or bytes, and
return a value of the same type where necessary. So I think scandir()
should do the same thing.

 - We may not define a DirEntry.fullname() method: the directory name
 is usually well known. Ok, but every time that I use os.listdir(), I
 write os.path.join(directory, name) because in some cases I want the
 full path.

Agreed. I use this a lot too. However, I'd prefer a .fullname
attribute rather than a method, as it's free/cheap to compute and
doesn't require OS calls.

Out of interest, why do we have .is_dir and .stat_result but .fullname
rather than .full_name? .fullname seems reasonable to me, but maybe
consistency is a good thing here?

 - It must not be possible to refresh a DirEntry object. Call
 os.stat(entry.fullname()) or pathlib.Path(entry.fullname()) to get
 fresh data. DirEntry is only computed 

Re: [Python-Dev] My summary of the scandir (PEP 471)

2014-07-01 Thread Victor Stinner
2014-07-01 15:00 GMT+02:00 Ben Hoyt benh...@gmail.com:
 (a) it doesn't call stat for you (on POSIX), so you have to check an
 attribute and call scandir manually if you need it,

Yes, and that's something common when you use the os module. For
example, don't try to call os.fork(), os.getgid() or os.fchmod() on
Windows :-) Closer to your PEP, the following OS attributes are only
available on UNIX: st_blocks, st_blksize, st_rdev, st_flags; and
st_file_attributes is only available on Windows.

I don't think that using lstat_result is a common need when browsing a
directoy. In most cases, you only need is_dir() and the name
attribute.

 1) the original proposal in the current version of PEP 471, where
 DirEntry has an .lstat() method which calls stat() on POSIX but is
 free on Windows

On UNIX, does it mean that .lstat() calls os.lstat() at the first
call, and then always return the same result? It would be different
than os.lstat() and pathlib.Path.stat() :-( I would prefer to have the
same behaviour than pathlib and os (you know, the well known
consistency of Python stdlib). As I wrote, I expect a function call to
always retrieve the new status.

 2) Nick Coghlan's proposal on the previous thread
 (https://mail.python.org/pipermail/python-dev/2014-June/135261.html)
 suggesting an ensure_lstat keyword param to scandir if you need the
 lstat_result value

I don't like this idea because it makes error handling more complex.
The syntax to catch exceptions on an iterator is verbose (while: try:
next() except ...).

Whereas calling os.lstat(entry.fullname()) is explicit and it's easy
to surround it with try/except.


 .lstat_result being None sometimes (on POSIX),

Don't do that, it's not how Python handles portability. We use hasattr().


 would it ever really happen that readdir() would succeed but an os.stat() 
 immediately after would fail?

Yes, it can happen. The filesystem is system-wide and shared by all
users. The file can be deleted.


 Really, are bytes filenames deprecated?

Yes, in all functions of the os module since Python 3.3. I'm sure
because I implemented the deprecation :-)

Try open(b'test.txt', w') on Windows with python -Werror.


 I think maybe they should be, as they don't work on Windows :-)

Windows has an API dedicated to bytes filenames, the ANSI API. But
this API has annoying bugs: it replaces unencodable characters by
question marks, and there is no option to be noticed on the encoding
error.

Different users complained about that. It was decided to not change
Python since Python is a light wrapper over the kernel system calls.
But bytes filenames are now deprecated to advice users to use the
native type for filenames on Windows: Unicode!


 but the latest Python os docs
 (https://docs.python.org/3.5/library/os.html) still say that all
 functions that accept path names accept either str or bytes,

Maybe I forgot to update the documentation :-(


 So I think scandir() should do the same thing.

You may support scandir(bytes) on Windows but you will need to emit a
deprecation warning too. (which are silent by default.)

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Excess help() output

2014-07-01 Thread anatoly techtonik
Hi,

The help() output is confusing for beginners:

   class B(object):
  ...   pass
  ...
   help(B)
  Help on class B in module __main__:

  class B(__builtin__.object)
   |  Data descriptors defined here:
   |
   |  __dict__
   |  dictionary for instance variables (if defined)
   |
   |  __weakref__
   |  list of weak references to the object (if defined)

Is it possible to remove this section from help output?
Why is it here at all?

   dir(B)
  ['__class__', '__delattr__', '__dict__', '__doc__', '__format__',
'__getattribute__', '__hash__', '__init__', '__module__', '__new__',
'__reduce__', '__reduce_ex__', '__repr__', '__setattr__',
'__sizeof__', '__str__', '__subclasshook__', '__weakref__']

-- 
anatoly t.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] My summary of the scandir (PEP 471)

2014-07-01 Thread Jonas Wielicki
On 01.07.2014 15:00, Ben Hoyt wrote:
 I'm leaning towards preferring #2 (Nick's proposal) because it solves
 or gets around the caching issue. My one concern is error handling. Is
 it an issue if scandir's __next__ can raise an OSError either from the
 readdir() call or the call to stat()? My thinking is probably not. In
 practice, would it ever really happen that readdir() would succeed but
 an os.stat() immediately after would fail? I guess it could if the
 file is deleted, but then if it were deleted a microsecond earlier the
 readdir() would fail anyway, or not? Or does readdir give you a
 consistent, snap-shotted view on things?

No need for a microsecond-timed deletion -- a directory with +r but
without +x will allow you to list the entries, but stat calls on the
files will fail with EPERM:

$ ls -l
drwxr--r--.   2 root root60  1. Jul 16:52 test

$ sudo ls -l test
total 0
-rw-r--r--. 1 root root 0  1. Jul 16:52 foo

$ ls test
ls: cannot access test/foo: Permission denied
total 0
-? ? ? ? ? ? foo

$ stat test/foo
stat: cannot stat ‘test/foo’: Permission denied

I had the idea to treat a failing lstat() inside scandir() as if the
entry wasn’t found at all, but in this context, this seems wrong too.

regards,
jwi


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] My summary of the scandir (PEP 471)

2014-07-01 Thread Ben Hoyt
 No need for a microsecond-timed deletion -- a directory with +r but
 without +x will allow you to list the entries, but stat calls on the
 files will fail with EPERM:

Ah -- very good to know, thanks. This definitely points me in the
direction of wanting better control over error handling.

Speaking of errors, and thinking of handling errors during iteration
-- in what cases (if any) would an individual readdir fail if the
opendir succeeded?

-Ben
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] My summary of the scandir (PEP 471)

2014-07-01 Thread Nick Coghlan
On 1 Jul 2014 07:31, Victor Stinner victor.stin...@gmail.com wrote:

 2014-07-01 15:00 GMT+02:00 Ben Hoyt benh...@gmail.com:

  2) Nick Coghlan's proposal on the previous thread
  (https://mail.python.org/pipermail/python-dev/2014-June/135261.html)
  suggesting an ensure_lstat keyword param to scandir if you need the
  lstat_result value

 I don't like this idea because it makes error handling more complex.
 The syntax to catch exceptions on an iterator is verbose (while: try:
 next() except ...).

Actually, we may need to copy the os.walk API and accept an onerror
callback as a scandir argument. Regardless of whether or not we have
ensure_lstat, the iteration step could fail, so I don't believe we can
just transfer the existing approach of catching exceptions from the listdir
call.

 Whereas calling os.lstat(entry.fullname()) is explicit and it's easy
 to surround it with try/except.


  .lstat_result being None sometimes (on POSIX),

 Don't do that, it's not how Python handles portability. We use hasattr().

That's not true in general - we do either, depending on context.

With the addition of an os.walk style onerror callback, I'm still in favour
of a get_lstat flag (tweaked as Ben suggests to always be None unless
requested, so Windows code is less likely to be inadvertently non-portable)

  would it ever really happen that readdir() would succeed but an
os.stat() immediately after would fail?

 Yes, it can happen. The filesystem is system-wide and shared by all
 users. The file can be deleted.

We need per-iteration error handling for the readdir call anyway, so I
think an onerror callback is a better option than dropping the ability to
easily obtain full stat information as part of the iteration.

Cheers,
Nick.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] My summary of the scandir (PEP 471)

2014-07-01 Thread Ben Hoyt
 We need per-iteration error handling for the readdir call anyway, so I think
 an onerror callback is a better option than dropping the ability to easily
 obtain full stat information as part of the iteration.

I don't mind the idea of an onerror callback, but it's adding
complexity. Putting aside the question of caching/timing for a second
and assuming .lstat() as per the current PEP 471, do we really need
per-iteration error handling for readdir()? When would that actually
fail in practice?

-Ben
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] My summary of the scandir (PEP 471)

2014-07-01 Thread Ethan Furman

On 07/01/2014 07:59 AM, Jonas Wielicki wrote:


I had the idea to treat a failing lstat() inside scandir() as if the
entry wasn’t found at all, but in this context, this seems wrong too.


Well, os.walk supports passing in an error handler -- perhaps scandir should as 
well.

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-07-01 Thread Janzert

On 6/26/2014 6:59 PM, Ben Hoyt wrote:

Rationale
=

Python's built-in ``os.walk()`` is significantly slower than it needs
to be, because -- in addition to calling ``os.listdir()`` on each
directory -- it executes the system call ``os.stat()`` or
``GetFileAttributes()`` on each file to determine whether the entry is
a directory or not.

But the underlying system calls -- ``FindFirstFile`` /
``FindNextFile`` on Windows and ``readdir`` on Linux and OS X --
already tell you whether the files returned are directories or not, so
no further system calls are needed. In short, you can reduce the
number of system calls from approximately 2N to N, where N is the
total number of files and directories in the tree. (And because
directory trees are usually much wider than they are deep, it's often
much better than this.)



One of the major reasons for this seems to be efficiently using 
information that is already available from the OS for free. 
Unfortunately it seems that the current API and most of the leading 
alternate proposals hide from the user what information is actually 
there free and what is going to incur an extra cost.


I would prefer an API that simply gives whatever came for free from the 
OS and then let the user decide if the extra expense is worth the extra 
information. Maybe that stat information was only going to be used for 
an informational log that can be skipped if it's going to incur extra 
expense?


Janzert

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471: scandir(fd) and pathlib.Path(name, dir_fd=None)

2014-07-01 Thread Akira Li
Ben Hoyt benh...@gmail.com writes:

 Thanks, Victor.

 I don't have any experience with dir_fd handling, so unfortunately
 can't really comment here.

 What advantages does it bring? I notice that even os.listdir() on
 Python 3.4 doesn't have anything related to file descriptors, so I'd
 be in favour of not including support. We can always add it later.

 -Ben

FYI, os.listdir does support file descriptors in Python 3.3+ try:

   import os
   os.listdir(os.open('.', os.O_RDONLY))

NOTE: os.supports_fd and os.supports_dir_fd are different sets.

See also,
https://mail.python.org/pipermail/python-dev/2014-June/135265.html


--
Akira


P.S. Please, don't put your answer on top of the message you are
replying to.


 On Tue, Jul 1, 2014 at 3:44 AM, Victor Stinner victor.stin...@gmail.com 
 wrote:
 Hi,

 IMO we must decide if scandir() must support or not file descriptor.
 It's an important decision which has an important impact on the API.


 To support scandir(fd), the minimum is to store dir_fd in DirEntry:
 dir_fd would be None for scandir(str).


 scandir(fd) must not close the file descriptor, it should be done by
 the caller. Handling the lifetime of the file descriptor is a
 difficult problem, it's better to let the user decide how to handle
 it.

 There is the problem of the limit of open file descriptors, usually
 1024 but it can be lower. It *can* be an issue for very deep file
 hierarchy.

 If we choose to support scandir(fd), it's probably safer to not use
 scandir(fd) by default in os.walk() (use scandir(str) instead), wait
 until the feature is well tested, corner cases are well known, etc.


 The second step is to enhance pathlib.Path to support an optional file
 descriptor. Path already has methods on filenames like chmod(),
 exists(), rename(), etc.


 Example:

 fd = os.open(path, os.O_DIRECTORY)
 try:
for entry in os.scandir(fd):
   # ... use entry to benefit of entry cache: is_dir(), lstat_result ...
   path = pathlib.Path(entry.name, dir_fd=entry.dir_fd)
   # ... use path which uses dir_fd ...
 finally:
 os.close(fd)

 Problem: if the path object is stored somewhere and use after the
 loop, Path methods will fail because dir_fd was closed. It's even
 worse if a new directory uses the same file descriptor :-/ (security
 issue, or at least tricky bugs!)

 Victor
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 https://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: 
 https://mail.python.org/mailman/options/python-dev/benhoyt%40gmail.com

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] My summary of the scandir (PEP 471)

2014-07-01 Thread Nick Coghlan
On 1 July 2014 08:42, Ben Hoyt benh...@gmail.com wrote:
 We need per-iteration error handling for the readdir call anyway, so I think
 an onerror callback is a better option than dropping the ability to easily
 obtain full stat information as part of the iteration.

 I don't mind the idea of an onerror callback, but it's adding
 complexity. Putting aside the question of caching/timing for a second
 and assuming .lstat() as per the current PEP 471, do we really need
 per-iteration error handling for readdir()? When would that actually
 fail in practice?

An NFS mount dropping the connection or a USB key being removed are
the first that come to mind, but I expect there are others. I find
it's generally better to just assume that any system call may fail for
obscure reasons and put the infrastructure in place to deal with it
rather than getting ugly, hard to track down bugs later.

Cheers,
Nick.



-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Network Security Backport Status

2014-07-01 Thread Alex Gaynor
Hi all,

I wanted to bring everyone up to speed on the status of PEP 466, what's been
completed, and what's left to do.

First the completed stuff:

* hmac.compare_digest
* hashlib.pbkdf2_hmac

Are both backported, and I've added support to use them in Django, so users
should start seeing these benefits just as soon as we get a Python release into
their hands.

Now the uncompleted stuff:

* Persistent file descriptor for ``os.urandom``
* SSL module

It's the SSL module that I'll spend the rest of this email talking about.


Backporting the features from the Python3 version of this module has proven
more difficult than I had expected. This is primarily because the stdlib took a
maintenance strategy that was different from what most Python projects have
done for their 2/3 support: multiple independent codebases.

I've tried a few different strategies for the backport, none of which has
worked:

* Copying the ``ssl.py``, ``test_ssl.py``, and ``_ssl.c`` files from Python3
  and trying to port all the code.
* Coping just ``test_ssl.py`` and then copying individual chunks/functions as
  necessary to get stuff to pass.
* Manually doing stuff.

All of these proved to be a massive undertaking, and made it too easy to
accidentally introduce breaking changes.

I've come up with a new approach, which I believe is most likely to be
successful, but I'll need help to implement it.

The idea is to find the most recent commit which is a parent of both the
``2.7`` and ``default`` branches. Then take every single change to an ``ssl``
related file on the ``default`` branch, and attempt to replay it on the ``2.7``
branch. Require manual review on each commit to make sure it compiles, and to
ensure it doesn't make any backwards incompatible changes.

I think this provides the most iterative and guided approach to getting this
done.

I can do all the work of reviewing each commit, but I need some help from a
mercurial expert to automate the cherry-picking/rebasing of every single
commit.


What do folks think? Does this approach make sense? Anyone willing to help with
the mercurial scripting?

Cheers,
Alex

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Network Security Backport Status

2014-07-01 Thread Nick Coghlan
On 1 Jul 2014 11:28, Alex Gaynor alex.gay...@gmail.com wrote:

 I've come up with a new approach, which I believe is most likely to be
 successful, but I'll need help to implement it.

 The idea is to find the most recent commit which is a parent of both the
 ``2.7`` and ``default`` branches. Then take every single change to an
``ssl``
 related file on the ``default`` branch, and attempt to replay it on the
``2.7``
 branch. Require manual review on each commit to make sure it compiles,
and to
 ensure it doesn't make any backwards incompatible changes.

 I think this provides the most iterative and guided approach to getting
this
 done.

Sounds promising, although it may still have some challenges if the SSL
code depends on earlier changes to other code.

 I can do all the work of reviewing each commit, but I need some help from
a
 mercurial expert to automate the cherry-picking/rebasing of every single
 commit.

 What do folks think? Does this approach make sense? Anyone willing to
help with
 the mercurial scripting?

For the Mercurial part, it's probably worth posing that as a Stack Overflow
question:

Given two named branches in http://hg.python.org  (default and 2.7) and 4
files (Python module, C module, tests, docs):
- find the common ancestor
- find all the commits affecting those files on default  graft them to 2.7
(with a chance to test and edit each one first)

It's just a better environment for asking  answering that kind of question
:)

Cheers,
Nick.


 Cheers,
 Alex

 ___
 Python-Dev mailing list
 Python-Dev@python.org
 https://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Network Security Backport Status

2014-07-01 Thread Antoine Pitrou

Le 01/07/2014 14:26, Alex Gaynor a écrit :


I can do all the work of reviewing each commit, but I need some help from a
mercurial expert to automate the cherry-picking/rebasing of every single
commit.

What do folks think? Does this approach make sense? Anyone willing to help with
the mercurial scripting?


I don't think this makes much sense; Mercurial won't be smarter than you 
are. I think you'd have a better chance of succeeding by backporting one 
feature at a time. IMO, you'd first want to backport the _SSLContext 
base class and SSLContext.wrap_socket(). The latter *will* require some 
manual coding to adapt to 2.7's different SSLSocket implementation, not 
just applying patch hunks around.


Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Network Security Backport Status

2014-07-01 Thread Guido van Rossum
I have to agree with Antoine -- I don't think there's a shortcut that
avoids *someone* actually having to understand the code to the point of
being able to recreate the same behavior in the different context (pun not
intended) of Python 2.


On Tue, Jul 1, 2014 at 1:54 PM, Antoine Pitrou anto...@python.org wrote:

 Le 01/07/2014 14:26, Alex Gaynor a écrit :


 I can do all the work of reviewing each commit, but I need some help from
 a
 mercurial expert to automate the cherry-picking/rebasing of every single
 commit.

 What do folks think? Does this approach make sense? Anyone willing to
 help with
 the mercurial scripting?


 I don't think this makes much sense; Mercurial won't be smarter than you
 are. I think you'd have a better chance of succeeding by backporting one
 feature at a time. IMO, you'd first want to backport the _SSLContext base
 class and SSLContext.wrap_socket(). The latter *will* require some manual
 coding to adapt to 2.7's different SSLSocket implementation, not just
 applying patch hunks around.

 Regards

 Antoine.



 ___
 Python-Dev mailing list
 Python-Dev@python.org
 https://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: https://mail.python.org/mailman/options/python-dev/
 guido%40python.org




-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] My summary of the scandir (PEP 471)

2014-07-01 Thread Paul Moore
On 1 July 2014 14:00, Ben Hoyt benh...@gmail.com wrote:
 2) Nick Coghlan's proposal on the previous thread
 (https://mail.python.org/pipermail/python-dev/2014-June/135261.html)
 suggesting an ensure_lstat keyword param to scandir if you need the
 lstat_result value

 I would make one small tweak to Nick Coghlan's proposal to make
 writing cross-platform code easier. Instead of .lstat_result being
 None sometimes (on POSIX), have it None always unless you specify
 ensure_lstat=True. (Actually, call it get_lstat=True to kind of make
 this more obvious.) Per (b) above, this means Windows developers
 wouldn't accidentally write code which failed on POSIX systems -- it'd
 fail fast on Windows too if you accessed .lstat_result without
 specifying get_lstat=True.

This is getting very complicated (at least to me, as a Windows user,
where the basic idea seems straightforward).

It seems to me that the right model is the standard thin wrapper
round the OS feature that acts as a building block - it's typical of
the rest of the os module. I think that thin wrapper is needed - even
if the various bells and whistles are useful, they can be built on top
of a low-level version (whereas the converse is not the case).
Typically, such thin wrappers expose POSIX semantics by default, and
Windows behaviour follows as closely as possible (see for example
stat, where st_ino makes no sense on Windows, but is present). In this
case, we're exposing Windows semantics, and POSIX is the one needing
to fit the model, but the principle is the same.

On that basis, optional attributes (as used in stat results) seem
entirely sensible.

The documentation for DirEntry could easily be written to parallel
that of a stat result:


The return value is an object whose attributes correspond to the data
the OS returns about a directory entry:

  * name - the object's name
  * full_name - the object's full name (including path)
  * is_dir - whether the object is a directory
  * is file - whether the object is a plain file
  * is_symlink - whether the object is a symbolic link

On Windows, the following attributes are also available

  * st_size - the size, in bytes, of the object (only meaningful for files)
  * st_atime - time of last access
  * st_mtime - time of last write
  * st_ctime - time of creation
  * st_file_attributes - Windows file attribute bits (see the
FILE_ATTRIBUTE_* constants in the stat module)


That's no harder to understand (or to work with) than the equivalent
stat result. The only difference is that the unavailable attributes
can be queried on POSIX, there's just a separate system call involved
(with implications in terms of performance, error handling and
potential race conditions).

The version of scandir with the ensure_lstat argument is easy to write
based on one with optional arguments (I'm playing fast and loose with
adding attributes to DirEntry values here, just for the sake of an
example - the details are left as an exercise)

def scandir_ensure(path='.', ensure_lstat=False):
for entry in os.scandir(path):
if ensure_lstat and not hasattr(entry, 'st_size'):
stat_data = os.lstat(entry.full_name)
entry.st_size = stat_data.st_size
entry.st_atime = stat_data.st_atime
entry.st_mtime = stat_data.st_mtime
entry.st_ctime = stat_data.st_ctime
# Ignore file_attributes, as we'll never get here on Windows
yield entry

Variations on how you handle errors in the lstat call, etc, can be
added to taste.

Please, let's stick to a low-level wrapper round the OS API for the
first iteration of this feature. Enhancements can be added later, when
real-world usage has proved their value.

Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] My summary of the scandir (PEP 471)

2014-07-01 Thread Glenn Linderman

On 7/1/2014 2:20 PM, Paul Moore wrote:

Please, let's stick to a low-level wrapper round the OS API for the
first iteration of this feature. Enhancements can be added later, when
real-world usage has proved their value.


I almost wrote this whole message this morning, but didn't have time.  
Thanks, Paul, for digging through the details.


+1
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] [RELEASE] Python 2.7.8

2014-07-01 Thread Benjamin Peterson
Greetings,
I have the distinct privilege of informing you that the latest release
of the Python 2.7 series, 2.7.8, has been released and is available for
download. 2.7.8 contains several important regression fixes and security
changes:
  - The openssl version bundled in the Windows installer has been
  updated.
  - A regression in the mimetypes module on Windows has been fixed. [1]
  - A possible overflow in the buffer type has been fixed. [2]
  - A bug in the CGIHTTPServer module which allows arbitrary execution
  of code in the server root has been patched. [3]
  - A regression in the handling of UNC paths in os.path.join has been
  fixed. [4]

Downloads of 2.7.8 are at

https://www.python.org/download/releases/2.7.8/

The full changelog is located at

http://hg.python.org/cpython/raw-file/v2.7.8/Misc/NEWS

This is a production release. As always, please report bugs to

http://bugs.python.org/

Till next time,
Benjamin Peterson
2.7 Release Manager
(on behalf of all of Python's contributors)

[1] http://bugs.python.org/issue21652
[2] http://bugs.python.org/issue21831
[3] http://bugs.python.org/issue21766
[4] http://bugs.python.org/issue21672
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com