[Python-Dev] PEP 471: scandir(fd) and pathlib.Path(name, dir_fd=None)
Hi, IMO we must decide if scandir() must support or not file descriptor. It's an important decision which has an important impact on the API. To support scandir(fd), the minimum is to store dir_fd in DirEntry: dir_fd would be None for scandir(str). scandir(fd) must not close the file descriptor, it should be done by the caller. Handling the lifetime of the file descriptor is a difficult problem, it's better to let the user decide how to handle it. There is the problem of the limit of open file descriptors, usually 1024 but it can be lower. It *can* be an issue for very deep file hierarchy. If we choose to support scandir(fd), it's probably safer to not use scandir(fd) by default in os.walk() (use scandir(str) instead), wait until the feature is well tested, corner cases are well known, etc. The second step is to enhance pathlib.Path to support an optional file descriptor. Path already has methods on filenames like chmod(), exists(), rename(), etc. Example: fd = os.open(path, os.O_DIRECTORY) try: for entry in os.scandir(fd): # ... use entry to benefit of entry cache: is_dir(), lstat_result ... path = pathlib.Path(entry.name, dir_fd=entry.dir_fd) # ... use path which uses dir_fd ... finally: os.close(fd) Problem: if the path object is stored somewhere and use after the loop, Path methods will fail because dir_fd was closed. It's even worse if a new directory uses the same file descriptor :-/ (security issue, or at least tricky bugs!) Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] My summary of the scandir (PEP 471)
Hi, @Ben: it's time to update your PEP to complete it with this discussion! IMO DirEntry must be as simple as possible and portable: - os.scandir(str) - DirEntry.lstat_result object only available on Windows, same result than os.lstat() - DirEntry.fullname(): os.path.join(directory, DirEntry.name), where directory would be an hidden attribute of DirEntry Notes: - DirEntry.lstat_result is better than DirEntry.lstat() because it makes explicitly that lstat_result is only computed once. When I call DirEntry.lstat(), I expect to get the current status of the file, not the cached one. It's also hard to explain (document) that DirEntry.lstat() may or may call a system call. Don't do that, use DirEntry.lstat_result. - I don't think that we should support scandir(bytes). If you really want to support os.scandir(bytes), it must raise an error on Windows since bytes filename are already deprecated. It wouldn't make sense to add new function with a deprecated feature. Since we have the PEP 383 (surrogateescape), it's better to advice to use Unicode on all platforms. Almost all Python functions are able to encode back Unicode filename automatically. Use os.fsencode() to encode manually if needd. - We may not define a DirEntry.fullname() method: the directory name is usually well known. Ok, but every time that I use os.listdir(), I write os.path.join(directory, name) because in some cases I want the full path. Example: interesting = [] for name in os.listdir(path): fullpath = os.path.join(path, name) if os.path.isdir(fullpath): continue if ... test on the file ...: # i need the full path here, not the relative path # (ex: my own recursive "scandir"/"walk" function) interesting.append(fullpath) - It must not be possible to "refresh" a DirEntry object. Call os.stat(entry.fullname()) or pathlib.Path(entry.fullname()) to get fresh data. DirEntry is only computed once, that's all. It's well defined. - No Windows wildcard, you wrote that the feature has many corner cases, and it's only available on Windows. It's easy to combine scandir with fnmatch. Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471: scandir(fd) and pathlib.Path(name, dir_fd=None)
Thanks, Victor. I don't have any experience with dir_fd handling, so unfortunately can't really comment here. What advantages does it bring? I notice that even os.listdir() on Python 3.4 doesn't have anything related to file descriptors, so I'd be in favour of not including support. We can always add it later. -Ben On Tue, Jul 1, 2014 at 3:44 AM, Victor Stinner wrote: > Hi, > > IMO we must decide if scandir() must support or not file descriptor. > It's an important decision which has an important impact on the API. > > > To support scandir(fd), the minimum is to store dir_fd in DirEntry: > dir_fd would be None for scandir(str). > > > scandir(fd) must not close the file descriptor, it should be done by > the caller. Handling the lifetime of the file descriptor is a > difficult problem, it's better to let the user decide how to handle > it. > > There is the problem of the limit of open file descriptors, usually > 1024 but it can be lower. It *can* be an issue for very deep file > hierarchy. > > If we choose to support scandir(fd), it's probably safer to not use > scandir(fd) by default in os.walk() (use scandir(str) instead), wait > until the feature is well tested, corner cases are well known, etc. > > > The second step is to enhance pathlib.Path to support an optional file > descriptor. Path already has methods on filenames like chmod(), > exists(), rename(), etc. > > > Example: > > fd = os.open(path, os.O_DIRECTORY) > try: >for entry in os.scandir(fd): > # ... use entry to benefit of entry cache: is_dir(), lstat_result ... > path = pathlib.Path(entry.name, dir_fd=entry.dir_fd) > # ... use path which uses dir_fd ... > finally: > os.close(fd) > > Problem: if the path object is stored somewhere and use after the > loop, Path methods will fail because dir_fd was closed. It's even > worse if a new directory uses the same file descriptor :-/ (security > issue, or at least tricky bugs!) > > Victor > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/benhoyt%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471: scandir(fd) and pathlib.Path(name, dir_fd=None)
2014-07-01 14:26 GMT+02:00 Ben Hoyt : > Thanks, Victor. > > I don't have any experience with dir_fd handling, so unfortunately > can't really comment here. > > What advantages does it bring? I notice that even os.listdir() on > Python 3.4 doesn't have anything related to file descriptors, so I'd > be in favour of not including support. See https://docs.python.org/dev/library/os.html#dir-fd The idea is to make sure that you get files from the same directory. Problems occur when a directory is moved or a symlink is modified. Example: - you're browsing /tmp/test/x as root (!), /tmp/copy/passwd is owned by www user (website) - you would like to remove the file "x": call unlink("/tmp/copy/passwd") - ... but just before that, an attacker replaces the /tmp/copy directory with a symlink to /etc - you will remove /etc/passwd instead of /tmp/copy/passwd, oh oh Using unlink("passwd", dir_fd=tmp_copy_fd), you don't have this issue. You are sure that you are working in /tmp/copy directory. You can imagine a lot of other scenarios to override files and read sensitive files. Hopefully, the Linux rm commands knows unlinkat() sycall ;-) haypo@selma$ mkdir -p a/b/c haypo@selma$ strace -e unlinkat rm -rf a unlinkat(5, "c", AT_REMOVEDIR) = 0 unlinkat(4, "b", AT_REMOVEDIR) = 0 unlinkat(AT_FDCWD, "a", AT_REMOVEDIR) = 0 +++ exited with 0 +++ We should implement a similar think in shutil.rmtree(). See also os.fwalk() which is a version of os.walk() providing dir_fd. > We can always add it later. I would prefer to discuss that right now. My proposition is to accept an int for scandir() and copy the int into DirEntry.dir_fd. It's not that complex :-) The enhancement of the pathlib module can be done later. By the way, I know that Antoine Pitrou wanted to implemented file descriptors in pathlib, but the feature was rejected or at least delayed. Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] My summary of the scandir (PEP 471)
Thanks for spinning this off to (hopefully) finished the discussion. I agree it's nearly time to update the PEP. > @Ben: it's time to update your PEP to complete it with this > discussion! IMO DirEntry must be as simple as possible and portable: > > - os.scandir(str) > - DirEntry.lstat_result object only available on Windows, same result > than os.lstat() > - DirEntry.fullname(): os.path.join(directory, DirEntry.name), where > directory would be an hidden attribute of DirEntry I'm quite strongly against this, and I think it's actually the worst of both worlds. It is not as good an API because: (a) it doesn't call stat for you (on POSIX), so you have to check an attribute and call scandir manually if you need it, turning what should be one line of code into four. Your proposal above was kind of how I had it originally, where you had to do extra tests and call scandir manually if you needed it (see https://mail.python.org/pipermail/python-dev/2013-May/126119.html) (b) the .lstat_result attribute is available on Windows but not on POSIX, meaning it's very easy for Windows developers to write code that will run and work fine on Windows, but then break horribly on POSIX; I think it'd be better if it broke hard on Windows to make writing cross-platform code easy The two alternates are: 1) the original proposal in the current version of PEP 471, where DirEntry has an .lstat() method which calls stat() on POSIX but is free on Windows 2) Nick Coghlan's proposal on the previous thread (https://mail.python.org/pipermail/python-dev/2014-June/135261.html) suggesting an ensure_lstat keyword param to scandir if you need the lstat_result value I would make one small tweak to Nick Coghlan's proposal to make writing cross-platform code easier. Instead of .lstat_result being None sometimes (on POSIX), have it None always unless you specify ensure_lstat=True. (Actually, call it get_lstat=True to kind of make this more obvious.) Per (b) above, this means Windows developers wouldn't accidentally write code which failed on POSIX systems -- it'd fail fast on Windows too if you accessed .lstat_result without specifying get_lstat=True. I'm still unsure which of these I like better. I think #1's API is slightly nicer without the ensure_lstat parameter, and error handling of the stat() is more explicit. But #2 always fetches the stat info at the same time as the dir entry info, so eliminates the problem of having the file info change between scandir iteration and the .lstat() call. I'm leaning towards preferring #2 (Nick's proposal) because it solves or gets around the caching issue. My one concern is error handling. Is it an issue if scandir's __next__ can raise an OSError either from the readdir() call or the call to stat()? My thinking is probably not. In practice, would it ever really happen that readdir() would succeed but an os.stat() immediately after would fail? I guess it could if the file is deleted, but then if it were deleted a microsecond earlier the readdir() would fail anyway, or not? Or does readdir give you a consistent, "snap-shotted" view on things? The one other thing I'm not quite sure about with Nick's proposal is the name .lstat_result, as it's long. I can see why he suggested that, as .lstat sounds like a verb, but maybe that's okay? If we can have .is_dir and .is_file as attributes, my thinking is an .lstat attribute is fine too. I don't feel too strongly though. > - I don't think that we should support scandir(bytes). If you really > want to support os.scandir(bytes), it must raise an error on Windows > since bytes filename are already deprecated. It wouldn't make sense to > add new function with a deprecated feature. Since we have the PEP 383 > (surrogateescape), it's better to advice to use Unicode on all > platforms. Almost all Python functions are able to encode back Unicode > filename automatically. Use os.fsencode() to encode manually if needd. Really, are bytes filenames deprecated? I think maybe they should be, as they don't work on Windows :-), but the latest Python "os" docs (https://docs.python.org/3.5/library/os.html) still say that all functions that accept path names accept either str or bytes, and return a value of the same type where necessary. So I think scandir() should do the same thing. > - We may not define a DirEntry.fullname() method: the directory name > is usually well known. Ok, but every time that I use os.listdir(), I > write os.path.join(directory, name) because in some cases I want the > full path. Agreed. I use this a lot too. However, I'd prefer a .fullname attribute rather than a method, as it's free/cheap to compute and doesn't require OS calls. Out of interest, why do we have .is_dir and .stat_result but .fullname rather than .full_name? .fullname seems reasonable to me, but maybe consistency is a good thing here? > - It must not be possible to "refresh" a DirEntry object. Call > os.stat(entry.fullname()) or pathlib.Path(entry.fullname()) to get > fresh data.
Re: [Python-Dev] My summary of the scandir (PEP 471)
2014-07-01 15:00 GMT+02:00 Ben Hoyt : > (a) it doesn't call stat for you (on POSIX), so you have to check an > attribute and call scandir manually if you need it, Yes, and that's something common when you use the os module. For example, don't try to call os.fork(), os.getgid() or os.fchmod() on Windows :-) Closer to your PEP, the following OS attributes are only available on UNIX: st_blocks, st_blksize, st_rdev, st_flags; and st_file_attributes is only available on Windows. I don't think that using lstat_result is a common need when browsing a directoy. In most cases, you only need is_dir() and the name attribute. > 1) the original proposal in the current version of PEP 471, where > DirEntry has an .lstat() method which calls stat() on POSIX but is > free on Windows On UNIX, does it mean that .lstat() calls os.lstat() at the first call, and then always return the same result? It would be different than os.lstat() and pathlib.Path.stat() :-( I would prefer to have the same behaviour than pathlib and os (you know, the well known consistency of Python stdlib). As I wrote, I expect a function call to always retrieve the new status. > 2) Nick Coghlan's proposal on the previous thread > (https://mail.python.org/pipermail/python-dev/2014-June/135261.html) > suggesting an ensure_lstat keyword param to scandir if you need the > lstat_result value I don't like this idea because it makes error handling more complex. The syntax to catch exceptions on an iterator is verbose (while: try: next() except ...). Whereas calling os.lstat(entry.fullname()) is explicit and it's easy to surround it with try/except. > .lstat_result being None sometimes (on POSIX), Don't do that, it's not how Python handles portability. We use hasattr(). > would it ever really happen that readdir() would succeed but an os.stat() > immediately after would fail? Yes, it can happen. The filesystem is system-wide and shared by all users. The file can be deleted. > Really, are bytes filenames deprecated? Yes, in all functions of the os module since Python 3.3. I'm sure because I implemented the deprecation :-) Try open(b'test.txt', w') on Windows with python -Werror. > I think maybe they should be, as they don't work on Windows :-) Windows has an API dedicated to bytes filenames, the ANSI API. But this API has annoying bugs: it replaces unencodable characters by question marks, and there is no option to be noticed on the encoding error. Different users complained about that. It was decided to not change Python since Python is a light wrapper over the kernel system calls. But bytes filenames are now deprecated to advice users to use the native type for filenames on Windows: Unicode! > but the latest Python "os" docs > (https://docs.python.org/3.5/library/os.html) still say that all > functions that accept path names accept either str or bytes, Maybe I forgot to update the documentation :-( > So I think scandir() should do the same thing. You may support scandir(bytes) on Windows but you will need to emit a deprecation warning too. (which are silent by default.) Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Excess help() output
Hi, The help() output is confusing for beginners: >>> class B(object): ... pass ... >>> help(B) Help on class B in module __main__: class B(__builtin__.object) | Data descriptors defined here: | | __dict__ | dictionary for instance variables (if defined) | | __weakref__ | list of weak references to the object (if defined) Is it possible to remove this section from help output? Why is it here at all? >>> dir(B) ['__class__', '__delattr__', '__dict__', '__doc__', '__format__', '__getattribute__', '__hash__', '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__'] -- anatoly t. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] My summary of the scandir (PEP 471)
On 01.07.2014 15:00, Ben Hoyt wrote: > I'm leaning towards preferring #2 (Nick's proposal) because it solves > or gets around the caching issue. My one concern is error handling. Is > it an issue if scandir's __next__ can raise an OSError either from the > readdir() call or the call to stat()? My thinking is probably not. In > practice, would it ever really happen that readdir() would succeed but > an os.stat() immediately after would fail? I guess it could if the > file is deleted, but then if it were deleted a microsecond earlier the > readdir() would fail anyway, or not? Or does readdir give you a > consistent, "snap-shotted" view on things? No need for a microsecond-timed deletion -- a directory with +r but without +x will allow you to list the entries, but stat calls on the files will fail with EPERM: $ ls -l drwxr--r--. 2 root root60 1. Jul 16:52 test $ sudo ls -l test total 0 -rw-r--r--. 1 root root 0 1. Jul 16:52 foo $ ls test ls: cannot access test/foo: Permission denied total 0 -? ? ? ? ? ? foo $ stat test/foo stat: cannot stat ‘test/foo’: Permission denied I had the idea to treat a failing lstat() inside scandir() as if the entry wasn’t found at all, but in this context, this seems wrong too. regards, jwi ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] My summary of the scandir (PEP 471)
> No need for a microsecond-timed deletion -- a directory with +r but > without +x will allow you to list the entries, but stat calls on the > files will fail with EPERM: Ah -- very good to know, thanks. This definitely points me in the direction of wanting better control over error handling. Speaking of errors, and thinking of handling errors during iteration -- in what cases (if any) would an individual readdir fail if the opendir succeeded? -Ben ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] My summary of the scandir (PEP 471)
On 1 Jul 2014 07:31, "Victor Stinner" wrote: > > 2014-07-01 15:00 GMT+02:00 Ben Hoyt : > > 2) Nick Coghlan's proposal on the previous thread > > (https://mail.python.org/pipermail/python-dev/2014-June/135261.html) > > suggesting an ensure_lstat keyword param to scandir if you need the > > lstat_result value > > I don't like this idea because it makes error handling more complex. > The syntax to catch exceptions on an iterator is verbose (while: try: > next() except ...). Actually, we may need to copy the os.walk API and accept an "onerror" callback as a scandir argument. Regardless of whether or not we have "ensure_lstat", the iteration step could fail, so I don't believe we can just transfer the existing approach of catching exceptions from the listdir call. > Whereas calling os.lstat(entry.fullname()) is explicit and it's easy > to surround it with try/except. > > > > .lstat_result being None sometimes (on POSIX), > > Don't do that, it's not how Python handles portability. We use hasattr(). That's not true in general - we do either, depending on context. With the addition of an os.walk style onerror callback, I'm still in favour of a "get_lstat" flag (tweaked as Ben suggests to always be None unless requested, so Windows code is less likely to be inadvertently non-portable) > > would it ever really happen that readdir() would succeed but an os.stat() immediately after would fail? > > Yes, it can happen. The filesystem is system-wide and shared by all > users. The file can be deleted. We need per-iteration error handling for the readdir call anyway, so I think an onerror callback is a better option than dropping the ability to easily obtain full stat information as part of the iteration. Cheers, Nick. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] My summary of the scandir (PEP 471)
> We need per-iteration error handling for the readdir call anyway, so I think > an onerror callback is a better option than dropping the ability to easily > obtain full stat information as part of the iteration. I don't mind the idea of an "onerror" callback, but it's adding complexity. Putting aside the question of caching/timing for a second and assuming .lstat() as per the current PEP 471, do we really need per-iteration error handling for readdir()? When would that actually fail in practice? -Ben ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] My summary of the scandir (PEP 471)
On 07/01/2014 07:59 AM, Jonas Wielicki wrote: I had the idea to treat a failing lstat() inside scandir() as if the entry wasn’t found at all, but in this context, this seems wrong too. Well, os.walk supports passing in an error handler -- perhaps scandir should as well. -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
On 6/26/2014 6:59 PM, Ben Hoyt wrote: Rationale = Python's built-in ``os.walk()`` is significantly slower than it needs to be, because -- in addition to calling ``os.listdir()`` on each directory -- it executes the system call ``os.stat()`` or ``GetFileAttributes()`` on each file to determine whether the entry is a directory or not. But the underlying system calls -- ``FindFirstFile`` / ``FindNextFile`` on Windows and ``readdir`` on Linux and OS X -- already tell you whether the files returned are directories or not, so no further system calls are needed. In short, you can reduce the number of system calls from approximately 2N to N, where N is the total number of files and directories in the tree. (And because directory trees are usually much wider than they are deep, it's often much better than this.) One of the major reasons for this seems to be efficiently using information that is already available from the OS "for free". Unfortunately it seems that the current API and most of the leading alternate proposals hide from the user what information is actually there "free" and what is going to incur an extra cost. I would prefer an API that simply gives whatever came for free from the OS and then let the user decide if the extra expense is worth the extra information. Maybe that stat information was only going to be used for an informational log that can be skipped if it's going to incur extra expense? Janzert ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471: scandir(fd) and pathlib.Path(name, dir_fd=None)
Ben Hoyt writes: > Thanks, Victor. > > I don't have any experience with dir_fd handling, so unfortunately > can't really comment here. > > What advantages does it bring? I notice that even os.listdir() on > Python 3.4 doesn't have anything related to file descriptors, so I'd > be in favour of not including support. We can always add it later. > > -Ben FYI, os.listdir does support file descriptors in Python 3.3+ try: >>> import os >>> os.listdir(os.open('.', os.O_RDONLY)) NOTE: os.supports_fd and os.supports_dir_fd are different sets. See also, https://mail.python.org/pipermail/python-dev/2014-June/135265.html -- Akira P.S. Please, don't put your answer on top of the message you are replying to. > > On Tue, Jul 1, 2014 at 3:44 AM, Victor Stinner > wrote: >> Hi, >> >> IMO we must decide if scandir() must support or not file descriptor. >> It's an important decision which has an important impact on the API. >> >> >> To support scandir(fd), the minimum is to store dir_fd in DirEntry: >> dir_fd would be None for scandir(str). >> >> >> scandir(fd) must not close the file descriptor, it should be done by >> the caller. Handling the lifetime of the file descriptor is a >> difficult problem, it's better to let the user decide how to handle >> it. >> >> There is the problem of the limit of open file descriptors, usually >> 1024 but it can be lower. It *can* be an issue for very deep file >> hierarchy. >> >> If we choose to support scandir(fd), it's probably safer to not use >> scandir(fd) by default in os.walk() (use scandir(str) instead), wait >> until the feature is well tested, corner cases are well known, etc. >> >> >> The second step is to enhance pathlib.Path to support an optional file >> descriptor. Path already has methods on filenames like chmod(), >> exists(), rename(), etc. >> >> >> Example: >> >> fd = os.open(path, os.O_DIRECTORY) >> try: >>for entry in os.scandir(fd): >> # ... use entry to benefit of entry cache: is_dir(), lstat_result ... >> path = pathlib.Path(entry.name, dir_fd=entry.dir_fd) >> # ... use path which uses dir_fd ... >> finally: >> os.close(fd) >> >> Problem: if the path object is stored somewhere and use after the >> loop, Path methods will fail because dir_fd was closed. It's even >> worse if a new directory uses the same file descriptor :-/ (security >> issue, or at least tricky bugs!) >> >> Victor >> ___ >> Python-Dev mailing list >> Python-Dev@python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/benhoyt%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] My summary of the scandir (PEP 471)
On 1 July 2014 08:42, Ben Hoyt wrote: >> We need per-iteration error handling for the readdir call anyway, so I think >> an onerror callback is a better option than dropping the ability to easily >> obtain full stat information as part of the iteration. > > I don't mind the idea of an "onerror" callback, but it's adding > complexity. Putting aside the question of caching/timing for a second > and assuming .lstat() as per the current PEP 471, do we really need > per-iteration error handling for readdir()? When would that actually > fail in practice? An NFS mount dropping the connection or a USB key being removed are the first that come to mind, but I expect there are others. I find it's generally better to just assume that any system call may fail for obscure reasons and put the infrastructure in place to deal with it rather than getting ugly, hard to track down bugs later. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Network Security Backport Status
Hi all, I wanted to bring everyone up to speed on the status of PEP 466, what's been completed, and what's left to do. First the completed stuff: * hmac.compare_digest * hashlib.pbkdf2_hmac Are both backported, and I've added support to use them in Django, so users should start seeing these benefits just as soon as we get a Python release into their hands. Now the uncompleted stuff: * Persistent file descriptor for ``os.urandom`` * SSL module It's the SSL module that I'll spend the rest of this email talking about. Backporting the features from the Python3 version of this module has proven more difficult than I had expected. This is primarily because the stdlib took a maintenance strategy that was different from what most Python projects have done for their 2/3 support: multiple independent codebases. I've tried a few different strategies for the backport, none of which has worked: * Copying the ``ssl.py``, ``test_ssl.py``, and ``_ssl.c`` files from Python3 and trying to port all the code. * Coping just ``test_ssl.py`` and then copying individual chunks/functions as necessary to get stuff to pass. * Manually doing stuff. All of these proved to be a massive undertaking, and made it too easy to accidentally introduce breaking changes. I've come up with a new approach, which I believe is most likely to be successful, but I'll need help to implement it. The idea is to find the most recent commit which is a parent of both the ``2.7`` and ``default`` branches. Then take every single change to an ``ssl`` related file on the ``default`` branch, and attempt to replay it on the ``2.7`` branch. Require manual review on each commit to make sure it compiles, and to ensure it doesn't make any backwards incompatible changes. I think this provides the most iterative and guided approach to getting this done. I can do all the work of reviewing each commit, but I need some help from a mercurial expert to automate the cherry-picking/rebasing of every single commit. What do folks think? Does this approach make sense? Anyone willing to help with the mercurial scripting? Cheers, Alex ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Network Security Backport Status
On 1 Jul 2014 11:28, "Alex Gaynor" wrote: > > I've come up with a new approach, which I believe is most likely to be > successful, but I'll need help to implement it. > > The idea is to find the most recent commit which is a parent of both the > ``2.7`` and ``default`` branches. Then take every single change to an ``ssl`` > related file on the ``default`` branch, and attempt to replay it on the ``2.7`` > branch. Require manual review on each commit to make sure it compiles, and to > ensure it doesn't make any backwards incompatible changes. > > I think this provides the most iterative and guided approach to getting this > done. Sounds promising, although it may still have some challenges if the SSL code depends on earlier changes to other code. > I can do all the work of reviewing each commit, but I need some help from a > mercurial expert to automate the cherry-picking/rebasing of every single > commit. > > What do folks think? Does this approach make sense? Anyone willing to help with > the mercurial scripting? For the Mercurial part, it's probably worth posing that as a Stack Overflow question: Given two named branches in http://hg.python.org (default and 2.7) and 4 files (Python module, C module, tests, docs): - find the common ancestor - find all the commits affecting those files on default & graft them to 2.7 (with a chance to test and edit each one first) It's just a better environment for asking & answering that kind of question :) Cheers, Nick. > > Cheers, > Alex > > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] My summary of the scandir (PEP 471)
On 01.07.2014 17:30, Ben Hoyt wrote: >> No need for a microsecond-timed deletion -- a directory with +r but >> without +x will allow you to list the entries, but stat calls on the >> files will fail with EPERM: > > Ah -- very good to know, thanks. This definitely points me in the > direction of wanting better control over error handling. > > Speaking of errors, and thinking of handling errors during iteration > -- in what cases (if any) would an individual readdir fail if the > opendir succeeded? readdir(3) manpage suggests that readdir can only fail if an invalid directory fd was passed. regards, jwi > > -Ben > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Network Security Backport Status
Le 01/07/2014 14:26, Alex Gaynor a écrit : I can do all the work of reviewing each commit, but I need some help from a mercurial expert to automate the cherry-picking/rebasing of every single commit. What do folks think? Does this approach make sense? Anyone willing to help with the mercurial scripting? I don't think this makes much sense; Mercurial won't be smarter than you are. I think you'd have a better chance of succeeding by backporting one feature at a time. IMO, you'd first want to backport the _SSLContext base class and SSLContext.wrap_socket(). The latter *will* require some manual coding to adapt to 2.7's different SSLSocket implementation, not just applying patch hunks around. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Network Security Backport Status
I have to agree with Antoine -- I don't think there's a shortcut that avoids *someone* actually having to understand the code to the point of being able to recreate the same behavior in the different context (pun not intended) of Python 2. On Tue, Jul 1, 2014 at 1:54 PM, Antoine Pitrou wrote: > Le 01/07/2014 14:26, Alex Gaynor a écrit : > > >> I can do all the work of reviewing each commit, but I need some help from >> a >> mercurial expert to automate the cherry-picking/rebasing of every single >> commit. >> >> What do folks think? Does this approach make sense? Anyone willing to >> help with >> the mercurial scripting? >> > > I don't think this makes much sense; Mercurial won't be smarter than you > are. I think you'd have a better chance of succeeding by backporting one > feature at a time. IMO, you'd first want to backport the _SSLContext base > class and SSLContext.wrap_socket(). The latter *will* require some manual > coding to adapt to 2.7's different SSLSocket implementation, not just > applying patch hunks around. > > Regards > > Antoine. > > > > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > guido%40python.org > -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] My summary of the scandir (PEP 471)
On 1 July 2014 14:00, Ben Hoyt wrote: > 2) Nick Coghlan's proposal on the previous thread > (https://mail.python.org/pipermail/python-dev/2014-June/135261.html) > suggesting an ensure_lstat keyword param to scandir if you need the > lstat_result value > > I would make one small tweak to Nick Coghlan's proposal to make > writing cross-platform code easier. Instead of .lstat_result being > None sometimes (on POSIX), have it None always unless you specify > ensure_lstat=True. (Actually, call it get_lstat=True to kind of make > this more obvious.) Per (b) above, this means Windows developers > wouldn't accidentally write code which failed on POSIX systems -- it'd > fail fast on Windows too if you accessed .lstat_result without > specifying get_lstat=True. This is getting very complicated (at least to me, as a Windows user, where the basic idea seems straightforward). It seems to me that the right model is the standard "thin wrapper round the OS feature" that acts as a building block - it's typical of the rest of the os module. I think that thin wrapper is needed - even if the various bells and whistles are useful, they can be built on top of a low-level version (whereas the converse is not the case). Typically, such thin wrappers expose POSIX semantics by default, and Windows behaviour follows as closely as possible (see for example stat, where st_ino makes no sense on Windows, but is present). In this case, we're exposing Windows semantics, and POSIX is the one needing to fit the model, but the principle is the same. On that basis, optional attributes (as used in stat results) seem entirely sensible. The documentation for DirEntry could easily be written to parallel that of a stat result: """ The return value is an object whose attributes correspond to the data the OS returns about a directory entry: * name - the object's name * full_name - the object's full name (including path) * is_dir - whether the object is a directory * is file - whether the object is a plain file * is_symlink - whether the object is a symbolic link On Windows, the following attributes are also available * st_size - the size, in bytes, of the object (only meaningful for files) * st_atime - time of last access * st_mtime - time of last write * st_ctime - time of creation * st_file_attributes - Windows file attribute bits (see the FILE_ATTRIBUTE_* constants in the stat module) """ That's no harder to understand (or to work with) than the equivalent stat result. The only difference is that the unavailable attributes can be queried on POSIX, there's just a separate system call involved (with implications in terms of performance, error handling and potential race conditions). The version of scandir with the ensure_lstat argument is easy to write based on one with optional arguments (I'm playing fast and loose with adding attributes to DirEntry values here, just for the sake of an example - the details are left as an exercise) def scandir_ensure(path='.', ensure_lstat=False): for entry in os.scandir(path): if ensure_lstat and not hasattr(entry, 'st_size'): stat_data = os.lstat(entry.full_name) entry.st_size = stat_data.st_size entry.st_atime = stat_data.st_atime entry.st_mtime = stat_data.st_mtime entry.st_ctime = stat_data.st_ctime # Ignore file_attributes, as we'll never get here on Windows yield entry Variations on how you handle errors in the lstat call, etc, can be added to taste. Please, let's stick to a low-level wrapper round the OS API for the first iteration of this feature. Enhancements can be added later, when real-world usage has proved their value. Paul ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] My summary of the scandir (PEP 471)
On 7/1/2014 2:20 PM, Paul Moore wrote: Please, let's stick to a low-level wrapper round the OS API for the first iteration of this feature. Enhancements can be added later, when real-world usage has proved their value. I almost wrote this whole message this morning, but didn't have time. Thanks, Paul, for digging through the details. +1 ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] My summary of the scandir (PEP 471)
On 07/01/2014 02:20 PM, Paul Moore wrote: Please, let's stick to a low-level wrapper round the OS API for the first iteration of this feature. Enhancements can be added later, when real-world usage has proved their value. +1 ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] My summary of the scandir (PEP 471)
On Wed, Jul 2, 2014 at 7:20 AM, Paul Moore wrote: > I think that thin wrapper is needed - even > if the various bells and whistles are useful, they can be built on top > of a low-level version (whereas the converse is not the case). +1. Make everything as simple as possible (but no simpler). ChrisA ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] [RELEASE] Python 2.7.8
Greetings, I have the distinct privilege of informing you that the latest release of the Python 2.7 series, 2.7.8, has been released and is available for download. 2.7.8 contains several important regression fixes and security changes: - The openssl version bundled in the Windows installer has been updated. - A regression in the mimetypes module on Windows has been fixed. [1] - A possible overflow in the buffer type has been fixed. [2] - A bug in the CGIHTTPServer module which allows arbitrary execution of code in the server root has been patched. [3] - A regression in the handling of UNC paths in os.path.join has been fixed. [4] Downloads of 2.7.8 are at https://www.python.org/download/releases/2.7.8/ The full changelog is located at http://hg.python.org/cpython/raw-file/v2.7.8/Misc/NEWS This is a production release. As always, please report bugs to http://bugs.python.org/ Till next time, Benjamin Peterson 2.7 Release Manager (on behalf of all of Python's contributors) [1] http://bugs.python.org/issue21652 [2] http://bugs.python.org/issue21831 [3] http://bugs.python.org/issue21766 [4] http://bugs.python.org/issue21672 ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] My summary of the scandir (PEP 471)
On 1 July 2014 14:20, Paul Moore wrote: > On 1 July 2014 14:00, Ben Hoyt wrote: >> 2) Nick Coghlan's proposal on the previous thread >> (https://mail.python.org/pipermail/python-dev/2014-June/135261.html) >> suggesting an ensure_lstat keyword param to scandir if you need the >> lstat_result value >> >> I would make one small tweak to Nick Coghlan's proposal to make >> writing cross-platform code easier. Instead of .lstat_result being >> None sometimes (on POSIX), have it None always unless you specify >> ensure_lstat=True. (Actually, call it get_lstat=True to kind of make >> this more obvious.) Per (b) above, this means Windows developers >> wouldn't accidentally write code which failed on POSIX systems -- it'd >> fail fast on Windows too if you accessed .lstat_result without >> specifying get_lstat=True. > > This is getting very complicated (at least to me, as a Windows user, > where the basic idea seems straightforward). > > It seems to me that the right model is the standard "thin wrapper > round the OS feature" that acts as a building block - it's typical of > the rest of the os module. I think that thin wrapper is needed - even > if the various bells and whistles are useful, they can be built on top > of a low-level version (whereas the converse is not the case). > Typically, such thin wrappers expose POSIX semantics by default, and > Windows behaviour follows as closely as possible (see for example > stat, where st_ino makes no sense on Windows, but is present). In this > case, we're exposing Windows semantics, and POSIX is the one needing > to fit the model, but the principle is the same. > > On that basis, optional attributes (as used in stat results) seem > entirely sensible. > > The documentation for DirEntry could easily be written to parallel > that of a stat result: > > """ > The return value is an object whose attributes correspond to the data > the OS returns about a directory entry: > > * name - the object's name > * full_name - the object's full name (including path) > * is_dir - whether the object is a directory > * is file - whether the object is a plain file > * is_symlink - whether the object is a symbolic link > > On Windows, the following attributes are also available > > * st_size - the size, in bytes, of the object (only meaningful for files) > * st_atime - time of last access > * st_mtime - time of last write > * st_ctime - time of creation > * st_file_attributes - Windows file attribute bits (see the > FILE_ATTRIBUTE_* constants in the stat module) > """ > > That's no harder to understand (or to work with) than the equivalent > stat result. The only difference is that the unavailable attributes > can be queried on POSIX, there's just a separate system call involved > (with implications in terms of performance, error handling and > potential race conditions). > > The version of scandir with the ensure_lstat argument is easy to write > based on one with optional arguments (I'm playing fast and loose with > adding attributes to DirEntry values here, just for the sake of an > example - the details are left as an exercise) > > def scandir_ensure(path='.', ensure_lstat=False): > for entry in os.scandir(path): > if ensure_lstat and not hasattr(entry, 'st_size'): > stat_data = os.lstat(entry.full_name) > entry.st_size = stat_data.st_size > entry.st_atime = stat_data.st_atime > entry.st_mtime = stat_data.st_mtime > entry.st_ctime = stat_data.st_ctime > # Ignore file_attributes, as we'll never get here on Windows > yield entry > > Variations on how you handle errors in the lstat call, etc, can be > added to taste. > > Please, let's stick to a low-level wrapper round the OS API for the > first iteration of this feature. Enhancements can be added later, when > real-world usage has proved their value. +1 from me - especially if this recipe goes in at least the PEP, and potentially even the docs. I'm also OK with postponing onerror support for the time being - that should be straightforward to add later if we decide we need it. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com