Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
2014-07-01 4:04 GMT+02:00 Glenn Linderman v+pyt...@g.nevcal.com: +0 for stat fields to be None on all platforms unless ensure_lstat=True. This won't work well if lstat info is only needed for some entries. Is that a common use-case? It was mentioned earlier in the thread. If it is, use ensure_lstat=False, and use the proposed (by me) .refresh() API to update the data for those that need it. We should make DirEntry as simple as possible. In Python, the classic behaviour is to not define an attribute if it's not available on a platform. For example, stat().st_file_attributes is only available on Windows. I don't like the idea of the ensure_lstat parameter because os.scandir would have to call two system calls, it makes harder to guess which syscall failed (readdir or lstat). If you need lstat on UNIX, write: if hasattr(entry, 'lstat_result'): size = entry.lstat_result.st_size else: size = os.lstat(entry.fullname()).st_size Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 471: scandir(fd) and pathlib.Path(name, dir_fd=None)
Hi, IMO we must decide if scandir() must support or not file descriptor. It's an important decision which has an important impact on the API. To support scandir(fd), the minimum is to store dir_fd in DirEntry: dir_fd would be None for scandir(str). scandir(fd) must not close the file descriptor, it should be done by the caller. Handling the lifetime of the file descriptor is a difficult problem, it's better to let the user decide how to handle it. There is the problem of the limit of open file descriptors, usually 1024 but it can be lower. It *can* be an issue for very deep file hierarchy. If we choose to support scandir(fd), it's probably safer to not use scandir(fd) by default in os.walk() (use scandir(str) instead), wait until the feature is well tested, corner cases are well known, etc. The second step is to enhance pathlib.Path to support an optional file descriptor. Path already has methods on filenames like chmod(), exists(), rename(), etc. Example: fd = os.open(path, os.O_DIRECTORY) try: for entry in os.scandir(fd): # ... use entry to benefit of entry cache: is_dir(), lstat_result ... path = pathlib.Path(entry.name, dir_fd=entry.dir_fd) # ... use path which uses dir_fd ... finally: os.close(fd) Problem: if the path object is stored somewhere and use after the loop, Path methods will fail because dir_fd was closed. It's even worse if a new directory uses the same file descriptor :-/ (security issue, or at least tricky bugs!) Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471: scandir(fd) and pathlib.Path(name, dir_fd=None)
Thanks, Victor. I don't have any experience with dir_fd handling, so unfortunately can't really comment here. What advantages does it bring? I notice that even os.listdir() on Python 3.4 doesn't have anything related to file descriptors, so I'd be in favour of not including support. We can always add it later. -Ben On Tue, Jul 1, 2014 at 3:44 AM, Victor Stinner victor.stin...@gmail.com wrote: Hi, IMO we must decide if scandir() must support or not file descriptor. It's an important decision which has an important impact on the API. To support scandir(fd), the minimum is to store dir_fd in DirEntry: dir_fd would be None for scandir(str). scandir(fd) must not close the file descriptor, it should be done by the caller. Handling the lifetime of the file descriptor is a difficult problem, it's better to let the user decide how to handle it. There is the problem of the limit of open file descriptors, usually 1024 but it can be lower. It *can* be an issue for very deep file hierarchy. If we choose to support scandir(fd), it's probably safer to not use scandir(fd) by default in os.walk() (use scandir(str) instead), wait until the feature is well tested, corner cases are well known, etc. The second step is to enhance pathlib.Path to support an optional file descriptor. Path already has methods on filenames like chmod(), exists(), rename(), etc. Example: fd = os.open(path, os.O_DIRECTORY) try: for entry in os.scandir(fd): # ... use entry to benefit of entry cache: is_dir(), lstat_result ... path = pathlib.Path(entry.name, dir_fd=entry.dir_fd) # ... use path which uses dir_fd ... finally: os.close(fd) Problem: if the path object is stored somewhere and use after the loop, Path methods will fail because dir_fd was closed. It's even worse if a new directory uses the same file descriptor :-/ (security issue, or at least tricky bugs!) Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/benhoyt%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471: scandir(fd) and pathlib.Path(name, dir_fd=None)
2014-07-01 14:26 GMT+02:00 Ben Hoyt benh...@gmail.com: Thanks, Victor. I don't have any experience with dir_fd handling, so unfortunately can't really comment here. What advantages does it bring? I notice that even os.listdir() on Python 3.4 doesn't have anything related to file descriptors, so I'd be in favour of not including support. See https://docs.python.org/dev/library/os.html#dir-fd The idea is to make sure that you get files from the same directory. Problems occur when a directory is moved or a symlink is modified. Example: - you're browsing /tmp/test/x as root (!), /tmp/copy/passwd is owned by www user (website) - you would like to remove the file x: call unlink(/tmp/copy/passwd) - ... but just before that, an attacker replaces the /tmp/copy directory with a symlink to /etc - you will remove /etc/passwd instead of /tmp/copy/passwd, oh oh Using unlink(passwd, dir_fd=tmp_copy_fd), you don't have this issue. You are sure that you are working in /tmp/copy directory. You can imagine a lot of other scenarios to override files and read sensitive files. Hopefully, the Linux rm commands knows unlinkat() sycall ;-) haypo@selma$ mkdir -p a/b/c haypo@selma$ strace -e unlinkat rm -rf a unlinkat(5, c, AT_REMOVEDIR) = 0 unlinkat(4, b, AT_REMOVEDIR) = 0 unlinkat(AT_FDCWD, a, AT_REMOVEDIR) = 0 +++ exited with 0 +++ We should implement a similar think in shutil.rmtree(). See also os.fwalk() which is a version of os.walk() providing dir_fd. We can always add it later. I would prefer to discuss that right now. My proposition is to accept an int for scandir() and copy the int into DirEntry.dir_fd. It's not that complex :-) The enhancement of the pathlib module can be done later. By the way, I know that Antoine Pitrou wanted to implemented file descriptors in pathlib, but the feature was rejected or at least delayed. Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] My summary of the scandir (PEP 471)
Thanks for spinning this off to (hopefully) finished the discussion. I agree it's nearly time to update the PEP. @Ben: it's time to update your PEP to complete it with this discussion! IMO DirEntry must be as simple as possible and portable: - os.scandir(str) - DirEntry.lstat_result object only available on Windows, same result than os.lstat() - DirEntry.fullname(): os.path.join(directory, DirEntry.name), where directory would be an hidden attribute of DirEntry I'm quite strongly against this, and I think it's actually the worst of both worlds. It is not as good an API because: (a) it doesn't call stat for you (on POSIX), so you have to check an attribute and call scandir manually if you need it, turning what should be one line of code into four. Your proposal above was kind of how I had it originally, where you had to do extra tests and call scandir manually if you needed it (see https://mail.python.org/pipermail/python-dev/2013-May/126119.html) (b) the .lstat_result attribute is available on Windows but not on POSIX, meaning it's very easy for Windows developers to write code that will run and work fine on Windows, but then break horribly on POSIX; I think it'd be better if it broke hard on Windows to make writing cross-platform code easy The two alternates are: 1) the original proposal in the current version of PEP 471, where DirEntry has an .lstat() method which calls stat() on POSIX but is free on Windows 2) Nick Coghlan's proposal on the previous thread (https://mail.python.org/pipermail/python-dev/2014-June/135261.html) suggesting an ensure_lstat keyword param to scandir if you need the lstat_result value I would make one small tweak to Nick Coghlan's proposal to make writing cross-platform code easier. Instead of .lstat_result being None sometimes (on POSIX), have it None always unless you specify ensure_lstat=True. (Actually, call it get_lstat=True to kind of make this more obvious.) Per (b) above, this means Windows developers wouldn't accidentally write code which failed on POSIX systems -- it'd fail fast on Windows too if you accessed .lstat_result without specifying get_lstat=True. I'm still unsure which of these I like better. I think #1's API is slightly nicer without the ensure_lstat parameter, and error handling of the stat() is more explicit. But #2 always fetches the stat info at the same time as the dir entry info, so eliminates the problem of having the file info change between scandir iteration and the .lstat() call. I'm leaning towards preferring #2 (Nick's proposal) because it solves or gets around the caching issue. My one concern is error handling. Is it an issue if scandir's __next__ can raise an OSError either from the readdir() call or the call to stat()? My thinking is probably not. In practice, would it ever really happen that readdir() would succeed but an os.stat() immediately after would fail? I guess it could if the file is deleted, but then if it were deleted a microsecond earlier the readdir() would fail anyway, or not? Or does readdir give you a consistent, snap-shotted view on things? The one other thing I'm not quite sure about with Nick's proposal is the name .lstat_result, as it's long. I can see why he suggested that, as .lstat sounds like a verb, but maybe that's okay? If we can have .is_dir and .is_file as attributes, my thinking is an .lstat attribute is fine too. I don't feel too strongly though. - I don't think that we should support scandir(bytes). If you really want to support os.scandir(bytes), it must raise an error on Windows since bytes filename are already deprecated. It wouldn't make sense to add new function with a deprecated feature. Since we have the PEP 383 (surrogateescape), it's better to advice to use Unicode on all platforms. Almost all Python functions are able to encode back Unicode filename automatically. Use os.fsencode() to encode manually if needd. Really, are bytes filenames deprecated? I think maybe they should be, as they don't work on Windows :-), but the latest Python os docs (https://docs.python.org/3.5/library/os.html) still say that all functions that accept path names accept either str or bytes, and return a value of the same type where necessary. So I think scandir() should do the same thing. - We may not define a DirEntry.fullname() method: the directory name is usually well known. Ok, but every time that I use os.listdir(), I write os.path.join(directory, name) because in some cases I want the full path. Agreed. I use this a lot too. However, I'd prefer a .fullname attribute rather than a method, as it's free/cheap to compute and doesn't require OS calls. Out of interest, why do we have .is_dir and .stat_result but .fullname rather than .full_name? .fullname seems reasonable to me, but maybe consistency is a good thing here? - It must not be possible to refresh a DirEntry object. Call os.stat(entry.fullname()) or pathlib.Path(entry.fullname()) to get fresh data. DirEntry is only computed
Re: [Python-Dev] My summary of the scandir (PEP 471)
2014-07-01 15:00 GMT+02:00 Ben Hoyt benh...@gmail.com: (a) it doesn't call stat for you (on POSIX), so you have to check an attribute and call scandir manually if you need it, Yes, and that's something common when you use the os module. For example, don't try to call os.fork(), os.getgid() or os.fchmod() on Windows :-) Closer to your PEP, the following OS attributes are only available on UNIX: st_blocks, st_blksize, st_rdev, st_flags; and st_file_attributes is only available on Windows. I don't think that using lstat_result is a common need when browsing a directoy. In most cases, you only need is_dir() and the name attribute. 1) the original proposal in the current version of PEP 471, where DirEntry has an .lstat() method which calls stat() on POSIX but is free on Windows On UNIX, does it mean that .lstat() calls os.lstat() at the first call, and then always return the same result? It would be different than os.lstat() and pathlib.Path.stat() :-( I would prefer to have the same behaviour than pathlib and os (you know, the well known consistency of Python stdlib). As I wrote, I expect a function call to always retrieve the new status. 2) Nick Coghlan's proposal on the previous thread (https://mail.python.org/pipermail/python-dev/2014-June/135261.html) suggesting an ensure_lstat keyword param to scandir if you need the lstat_result value I don't like this idea because it makes error handling more complex. The syntax to catch exceptions on an iterator is verbose (while: try: next() except ...). Whereas calling os.lstat(entry.fullname()) is explicit and it's easy to surround it with try/except. .lstat_result being None sometimes (on POSIX), Don't do that, it's not how Python handles portability. We use hasattr(). would it ever really happen that readdir() would succeed but an os.stat() immediately after would fail? Yes, it can happen. The filesystem is system-wide and shared by all users. The file can be deleted. Really, are bytes filenames deprecated? Yes, in all functions of the os module since Python 3.3. I'm sure because I implemented the deprecation :-) Try open(b'test.txt', w') on Windows with python -Werror. I think maybe they should be, as they don't work on Windows :-) Windows has an API dedicated to bytes filenames, the ANSI API. But this API has annoying bugs: it replaces unencodable characters by question marks, and there is no option to be noticed on the encoding error. Different users complained about that. It was decided to not change Python since Python is a light wrapper over the kernel system calls. But bytes filenames are now deprecated to advice users to use the native type for filenames on Windows: Unicode! but the latest Python os docs (https://docs.python.org/3.5/library/os.html) still say that all functions that accept path names accept either str or bytes, Maybe I forgot to update the documentation :-( So I think scandir() should do the same thing. You may support scandir(bytes) on Windows but you will need to emit a deprecation warning too. (which are silent by default.) Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Excess help() output
Hi, The help() output is confusing for beginners: class B(object): ... pass ... help(B) Help on class B in module __main__: class B(__builtin__.object) | Data descriptors defined here: | | __dict__ | dictionary for instance variables (if defined) | | __weakref__ | list of weak references to the object (if defined) Is it possible to remove this section from help output? Why is it here at all? dir(B) ['__class__', '__delattr__', '__dict__', '__doc__', '__format__', '__getattribute__', '__hash__', '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__'] -- anatoly t. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] My summary of the scandir (PEP 471)
On 01.07.2014 15:00, Ben Hoyt wrote: I'm leaning towards preferring #2 (Nick's proposal) because it solves or gets around the caching issue. My one concern is error handling. Is it an issue if scandir's __next__ can raise an OSError either from the readdir() call or the call to stat()? My thinking is probably not. In practice, would it ever really happen that readdir() would succeed but an os.stat() immediately after would fail? I guess it could if the file is deleted, but then if it were deleted a microsecond earlier the readdir() would fail anyway, or not? Or does readdir give you a consistent, snap-shotted view on things? No need for a microsecond-timed deletion -- a directory with +r but without +x will allow you to list the entries, but stat calls on the files will fail with EPERM: $ ls -l drwxr--r--. 2 root root60 1. Jul 16:52 test $ sudo ls -l test total 0 -rw-r--r--. 1 root root 0 1. Jul 16:52 foo $ ls test ls: cannot access test/foo: Permission denied total 0 -? ? ? ? ? ? foo $ stat test/foo stat: cannot stat ‘test/foo’: Permission denied I had the idea to treat a failing lstat() inside scandir() as if the entry wasn’t found at all, but in this context, this seems wrong too. regards, jwi ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] My summary of the scandir (PEP 471)
No need for a microsecond-timed deletion -- a directory with +r but without +x will allow you to list the entries, but stat calls on the files will fail with EPERM: Ah -- very good to know, thanks. This definitely points me in the direction of wanting better control over error handling. Speaking of errors, and thinking of handling errors during iteration -- in what cases (if any) would an individual readdir fail if the opendir succeeded? -Ben ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] My summary of the scandir (PEP 471)
On 1 Jul 2014 07:31, Victor Stinner victor.stin...@gmail.com wrote: 2014-07-01 15:00 GMT+02:00 Ben Hoyt benh...@gmail.com: 2) Nick Coghlan's proposal on the previous thread (https://mail.python.org/pipermail/python-dev/2014-June/135261.html) suggesting an ensure_lstat keyword param to scandir if you need the lstat_result value I don't like this idea because it makes error handling more complex. The syntax to catch exceptions on an iterator is verbose (while: try: next() except ...). Actually, we may need to copy the os.walk API and accept an onerror callback as a scandir argument. Regardless of whether or not we have ensure_lstat, the iteration step could fail, so I don't believe we can just transfer the existing approach of catching exceptions from the listdir call. Whereas calling os.lstat(entry.fullname()) is explicit and it's easy to surround it with try/except. .lstat_result being None sometimes (on POSIX), Don't do that, it's not how Python handles portability. We use hasattr(). That's not true in general - we do either, depending on context. With the addition of an os.walk style onerror callback, I'm still in favour of a get_lstat flag (tweaked as Ben suggests to always be None unless requested, so Windows code is less likely to be inadvertently non-portable) would it ever really happen that readdir() would succeed but an os.stat() immediately after would fail? Yes, it can happen. The filesystem is system-wide and shared by all users. The file can be deleted. We need per-iteration error handling for the readdir call anyway, so I think an onerror callback is a better option than dropping the ability to easily obtain full stat information as part of the iteration. Cheers, Nick. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] My summary of the scandir (PEP 471)
We need per-iteration error handling for the readdir call anyway, so I think an onerror callback is a better option than dropping the ability to easily obtain full stat information as part of the iteration. I don't mind the idea of an onerror callback, but it's adding complexity. Putting aside the question of caching/timing for a second and assuming .lstat() as per the current PEP 471, do we really need per-iteration error handling for readdir()? When would that actually fail in practice? -Ben ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] My summary of the scandir (PEP 471)
On 07/01/2014 07:59 AM, Jonas Wielicki wrote: I had the idea to treat a failing lstat() inside scandir() as if the entry wasn’t found at all, but in this context, this seems wrong too. Well, os.walk supports passing in an error handler -- perhaps scandir should as well. -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
On 6/26/2014 6:59 PM, Ben Hoyt wrote: Rationale = Python's built-in ``os.walk()`` is significantly slower than it needs to be, because -- in addition to calling ``os.listdir()`` on each directory -- it executes the system call ``os.stat()`` or ``GetFileAttributes()`` on each file to determine whether the entry is a directory or not. But the underlying system calls -- ``FindFirstFile`` / ``FindNextFile`` on Windows and ``readdir`` on Linux and OS X -- already tell you whether the files returned are directories or not, so no further system calls are needed. In short, you can reduce the number of system calls from approximately 2N to N, where N is the total number of files and directories in the tree. (And because directory trees are usually much wider than they are deep, it's often much better than this.) One of the major reasons for this seems to be efficiently using information that is already available from the OS for free. Unfortunately it seems that the current API and most of the leading alternate proposals hide from the user what information is actually there free and what is going to incur an extra cost. I would prefer an API that simply gives whatever came for free from the OS and then let the user decide if the extra expense is worth the extra information. Maybe that stat information was only going to be used for an informational log that can be skipped if it's going to incur extra expense? Janzert ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471: scandir(fd) and pathlib.Path(name, dir_fd=None)
Ben Hoyt benh...@gmail.com writes: Thanks, Victor. I don't have any experience with dir_fd handling, so unfortunately can't really comment here. What advantages does it bring? I notice that even os.listdir() on Python 3.4 doesn't have anything related to file descriptors, so I'd be in favour of not including support. We can always add it later. -Ben FYI, os.listdir does support file descriptors in Python 3.3+ try: import os os.listdir(os.open('.', os.O_RDONLY)) NOTE: os.supports_fd and os.supports_dir_fd are different sets. See also, https://mail.python.org/pipermail/python-dev/2014-June/135265.html -- Akira P.S. Please, don't put your answer on top of the message you are replying to. On Tue, Jul 1, 2014 at 3:44 AM, Victor Stinner victor.stin...@gmail.com wrote: Hi, IMO we must decide if scandir() must support or not file descriptor. It's an important decision which has an important impact on the API. To support scandir(fd), the minimum is to store dir_fd in DirEntry: dir_fd would be None for scandir(str). scandir(fd) must not close the file descriptor, it should be done by the caller. Handling the lifetime of the file descriptor is a difficult problem, it's better to let the user decide how to handle it. There is the problem of the limit of open file descriptors, usually 1024 but it can be lower. It *can* be an issue for very deep file hierarchy. If we choose to support scandir(fd), it's probably safer to not use scandir(fd) by default in os.walk() (use scandir(str) instead), wait until the feature is well tested, corner cases are well known, etc. The second step is to enhance pathlib.Path to support an optional file descriptor. Path already has methods on filenames like chmod(), exists(), rename(), etc. Example: fd = os.open(path, os.O_DIRECTORY) try: for entry in os.scandir(fd): # ... use entry to benefit of entry cache: is_dir(), lstat_result ... path = pathlib.Path(entry.name, dir_fd=entry.dir_fd) # ... use path which uses dir_fd ... finally: os.close(fd) Problem: if the path object is stored somewhere and use after the loop, Path methods will fail because dir_fd was closed. It's even worse if a new directory uses the same file descriptor :-/ (security issue, or at least tricky bugs!) Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/benhoyt%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] My summary of the scandir (PEP 471)
On 1 July 2014 08:42, Ben Hoyt benh...@gmail.com wrote: We need per-iteration error handling for the readdir call anyway, so I think an onerror callback is a better option than dropping the ability to easily obtain full stat information as part of the iteration. I don't mind the idea of an onerror callback, but it's adding complexity. Putting aside the question of caching/timing for a second and assuming .lstat() as per the current PEP 471, do we really need per-iteration error handling for readdir()? When would that actually fail in practice? An NFS mount dropping the connection or a USB key being removed are the first that come to mind, but I expect there are others. I find it's generally better to just assume that any system call may fail for obscure reasons and put the infrastructure in place to deal with it rather than getting ugly, hard to track down bugs later. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Network Security Backport Status
Hi all, I wanted to bring everyone up to speed on the status of PEP 466, what's been completed, and what's left to do. First the completed stuff: * hmac.compare_digest * hashlib.pbkdf2_hmac Are both backported, and I've added support to use them in Django, so users should start seeing these benefits just as soon as we get a Python release into their hands. Now the uncompleted stuff: * Persistent file descriptor for ``os.urandom`` * SSL module It's the SSL module that I'll spend the rest of this email talking about. Backporting the features from the Python3 version of this module has proven more difficult than I had expected. This is primarily because the stdlib took a maintenance strategy that was different from what most Python projects have done for their 2/3 support: multiple independent codebases. I've tried a few different strategies for the backport, none of which has worked: * Copying the ``ssl.py``, ``test_ssl.py``, and ``_ssl.c`` files from Python3 and trying to port all the code. * Coping just ``test_ssl.py`` and then copying individual chunks/functions as necessary to get stuff to pass. * Manually doing stuff. All of these proved to be a massive undertaking, and made it too easy to accidentally introduce breaking changes. I've come up with a new approach, which I believe is most likely to be successful, but I'll need help to implement it. The idea is to find the most recent commit which is a parent of both the ``2.7`` and ``default`` branches. Then take every single change to an ``ssl`` related file on the ``default`` branch, and attempt to replay it on the ``2.7`` branch. Require manual review on each commit to make sure it compiles, and to ensure it doesn't make any backwards incompatible changes. I think this provides the most iterative and guided approach to getting this done. I can do all the work of reviewing each commit, but I need some help from a mercurial expert to automate the cherry-picking/rebasing of every single commit. What do folks think? Does this approach make sense? Anyone willing to help with the mercurial scripting? Cheers, Alex ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Network Security Backport Status
On 1 Jul 2014 11:28, Alex Gaynor alex.gay...@gmail.com wrote: I've come up with a new approach, which I believe is most likely to be successful, but I'll need help to implement it. The idea is to find the most recent commit which is a parent of both the ``2.7`` and ``default`` branches. Then take every single change to an ``ssl`` related file on the ``default`` branch, and attempt to replay it on the ``2.7`` branch. Require manual review on each commit to make sure it compiles, and to ensure it doesn't make any backwards incompatible changes. I think this provides the most iterative and guided approach to getting this done. Sounds promising, although it may still have some challenges if the SSL code depends on earlier changes to other code. I can do all the work of reviewing each commit, but I need some help from a mercurial expert to automate the cherry-picking/rebasing of every single commit. What do folks think? Does this approach make sense? Anyone willing to help with the mercurial scripting? For the Mercurial part, it's probably worth posing that as a Stack Overflow question: Given two named branches in http://hg.python.org (default and 2.7) and 4 files (Python module, C module, tests, docs): - find the common ancestor - find all the commits affecting those files on default graft them to 2.7 (with a chance to test and edit each one first) It's just a better environment for asking answering that kind of question :) Cheers, Nick. Cheers, Alex ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Network Security Backport Status
Le 01/07/2014 14:26, Alex Gaynor a écrit : I can do all the work of reviewing each commit, but I need some help from a mercurial expert to automate the cherry-picking/rebasing of every single commit. What do folks think? Does this approach make sense? Anyone willing to help with the mercurial scripting? I don't think this makes much sense; Mercurial won't be smarter than you are. I think you'd have a better chance of succeeding by backporting one feature at a time. IMO, you'd first want to backport the _SSLContext base class and SSLContext.wrap_socket(). The latter *will* require some manual coding to adapt to 2.7's different SSLSocket implementation, not just applying patch hunks around. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Network Security Backport Status
I have to agree with Antoine -- I don't think there's a shortcut that avoids *someone* actually having to understand the code to the point of being able to recreate the same behavior in the different context (pun not intended) of Python 2. On Tue, Jul 1, 2014 at 1:54 PM, Antoine Pitrou anto...@python.org wrote: Le 01/07/2014 14:26, Alex Gaynor a écrit : I can do all the work of reviewing each commit, but I need some help from a mercurial expert to automate the cherry-picking/rebasing of every single commit. What do folks think? Does this approach make sense? Anyone willing to help with the mercurial scripting? I don't think this makes much sense; Mercurial won't be smarter than you are. I think you'd have a better chance of succeeding by backporting one feature at a time. IMO, you'd first want to backport the _SSLContext base class and SSLContext.wrap_socket(). The latter *will* require some manual coding to adapt to 2.7's different SSLSocket implementation, not just applying patch hunks around. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/ guido%40python.org -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] My summary of the scandir (PEP 471)
On 1 July 2014 14:00, Ben Hoyt benh...@gmail.com wrote: 2) Nick Coghlan's proposal on the previous thread (https://mail.python.org/pipermail/python-dev/2014-June/135261.html) suggesting an ensure_lstat keyword param to scandir if you need the lstat_result value I would make one small tweak to Nick Coghlan's proposal to make writing cross-platform code easier. Instead of .lstat_result being None sometimes (on POSIX), have it None always unless you specify ensure_lstat=True. (Actually, call it get_lstat=True to kind of make this more obvious.) Per (b) above, this means Windows developers wouldn't accidentally write code which failed on POSIX systems -- it'd fail fast on Windows too if you accessed .lstat_result without specifying get_lstat=True. This is getting very complicated (at least to me, as a Windows user, where the basic idea seems straightforward). It seems to me that the right model is the standard thin wrapper round the OS feature that acts as a building block - it's typical of the rest of the os module. I think that thin wrapper is needed - even if the various bells and whistles are useful, they can be built on top of a low-level version (whereas the converse is not the case). Typically, such thin wrappers expose POSIX semantics by default, and Windows behaviour follows as closely as possible (see for example stat, where st_ino makes no sense on Windows, but is present). In this case, we're exposing Windows semantics, and POSIX is the one needing to fit the model, but the principle is the same. On that basis, optional attributes (as used in stat results) seem entirely sensible. The documentation for DirEntry could easily be written to parallel that of a stat result: The return value is an object whose attributes correspond to the data the OS returns about a directory entry: * name - the object's name * full_name - the object's full name (including path) * is_dir - whether the object is a directory * is file - whether the object is a plain file * is_symlink - whether the object is a symbolic link On Windows, the following attributes are also available * st_size - the size, in bytes, of the object (only meaningful for files) * st_atime - time of last access * st_mtime - time of last write * st_ctime - time of creation * st_file_attributes - Windows file attribute bits (see the FILE_ATTRIBUTE_* constants in the stat module) That's no harder to understand (or to work with) than the equivalent stat result. The only difference is that the unavailable attributes can be queried on POSIX, there's just a separate system call involved (with implications in terms of performance, error handling and potential race conditions). The version of scandir with the ensure_lstat argument is easy to write based on one with optional arguments (I'm playing fast and loose with adding attributes to DirEntry values here, just for the sake of an example - the details are left as an exercise) def scandir_ensure(path='.', ensure_lstat=False): for entry in os.scandir(path): if ensure_lstat and not hasattr(entry, 'st_size'): stat_data = os.lstat(entry.full_name) entry.st_size = stat_data.st_size entry.st_atime = stat_data.st_atime entry.st_mtime = stat_data.st_mtime entry.st_ctime = stat_data.st_ctime # Ignore file_attributes, as we'll never get here on Windows yield entry Variations on how you handle errors in the lstat call, etc, can be added to taste. Please, let's stick to a low-level wrapper round the OS API for the first iteration of this feature. Enhancements can be added later, when real-world usage has proved their value. Paul ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] My summary of the scandir (PEP 471)
On 7/1/2014 2:20 PM, Paul Moore wrote: Please, let's stick to a low-level wrapper round the OS API for the first iteration of this feature. Enhancements can be added later, when real-world usage has proved their value. I almost wrote this whole message this morning, but didn't have time. Thanks, Paul, for digging through the details. +1 ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] [RELEASE] Python 2.7.8
Greetings, I have the distinct privilege of informing you that the latest release of the Python 2.7 series, 2.7.8, has been released and is available for download. 2.7.8 contains several important regression fixes and security changes: - The openssl version bundled in the Windows installer has been updated. - A regression in the mimetypes module on Windows has been fixed. [1] - A possible overflow in the buffer type has been fixed. [2] - A bug in the CGIHTTPServer module which allows arbitrary execution of code in the server root has been patched. [3] - A regression in the handling of UNC paths in os.path.join has been fixed. [4] Downloads of 2.7.8 are at https://www.python.org/download/releases/2.7.8/ The full changelog is located at http://hg.python.org/cpython/raw-file/v2.7.8/Misc/NEWS This is a production release. As always, please report bugs to http://bugs.python.org/ Till next time, Benjamin Peterson 2.7 Release Manager (on behalf of all of Python's contributors) [1] http://bugs.python.org/issue21652 [2] http://bugs.python.org/issue21831 [3] http://bugs.python.org/issue21766 [4] http://bugs.python.org/issue21672 ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com