Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
On Fri, Jun 27, 2014 at 2:58 PM, Nick Coghlan wrote: > > * -1 on including Windows specific globbing support in the API > * -0 on including cross platform globbing support in the initial iteration > of the API (that could be done later as a separate RFE instead) > Agreed. Globbing or filtering support should not hold this up. If that part isn't settled, just don't include it and work out what it should be as a future enhancement. > * +1 on a new section in the PEP covering rejected design options (calling > it iterdir, returning a 2-tuple instead of a dedicated DirEntry type) > +1. IMNSHO, one of the most important part of PEPs: capturing the entire decision process to document the "why nots". > * regarding "why not a 2-tuple", we know from experience that operating > systems evolve and we end up wanting to add additional info to this kind of > API. A dedicated DirEntry type lets us adjust the information returned over > time, without breaking backwards compatibility and without resorting to > ugly hacks like those in some of the time and stat APIs (or even our own > codec info APIs) > * it would be nice to see some relative performance numbers for NFS and > CIFS network shares - the additional network round trips can make excessive > stat calls absolutely brutal from a speed perspective when using a network > drive (that's why the stat caching added to the import system in 3.3 > dramatically sped up the case of having network drives on sys.path, and why > I thought AJ had a point when he was complaining about the fact we didn't > expose the dirent data from os.listdir) > fwiw, I wouldn't wait for benchmark numbers. A needless stat call when you've got the information from an earlier API call is already brutal. It is easy to compute from existing ballparks remote file server / cloud access: ~100ms, local spinning disk seek+read: ~10ms. fetch of stat info cached in memory on file server on the local network: ~500us. You can go down further to local system call overhead which can vary wildly but should likely be assumed to be at least 10us. You don't need a benchmark to tell you that adding needless >= 500us-100ms blocking operations to your program is bad. :) -gps ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fix Unicode-disabled build of Python 2.7
2014-06-26 13:04 GMT+02:00 Antoine Pitrou : > For the same reason, I agree with Victor that we should ditch the > threading-disabled builds. It's too much of a hassle for no actual, > practical benefit. People who want a threadless unicodeless Python can > install Python 1.5.2 for all I care. By the way, adding a buildbot for testing Python without thread support is not enough. The buildbot is currently broken since more than one month and nobody noticed :-p http://buildbot.python.org/all/builders/AMD64%20Fedora%20without%20threads%203.x/ Ok, I noticed, but I consider that I spent too much time on this minor use case. I prefer to leave such task to someone else :-) Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
On 28 Jun 2014 01:27, "Jonas Wielicki" wrote: > > On 27.06.2014 00:59, Ben Hoyt wrote: > > Specifics of proposal > > = > > [snip] Each ``DirEntry`` object has the following > > attributes and methods: > > [snip] > > Notes on caching > > > > > > The ``DirEntry`` objects are relatively dumb -- the ``name`` attribute > > is obviously always cached, and the ``is_X`` and ``lstat`` methods > > cache their values (immediately on Windows via ``FindNextFile``, and > > on first use on Linux / OS X via a ``stat`` call) and never refetch > > from the system. > > I find this behaviour a bit misleading: using methods and have them > return cached results. How much (implementation and/or performance > and/or memory) overhead would incur by using property-like access here? > I think this would underline the static nature of the data. > > This would break the semantics with respect to pathlib, but they’re only > marginally equal anyways -- and as far as I understand it, pathlib won’t > cache, so I think this has a fair point here. Indeed - using properties rather than methods may help emphasise the deliberate *difference* from pathlib in this case (i.e. value when the result was retrieved from the OS, rather than the value right now). The main benefit is that switching from using the DirEntry object to a pathlib Path will require touching all the places where the performance characteristics switch from "memory access" to "system call". This benefit is also the main downside, so I'd actually be OK with either decision on this one. Other comments: * +1 on the general idea * +1 on scandir() over iterdir, since it *isn't* just an iterator version of listdir * -1 on including Windows specific globbing support in the API * -0 on including cross platform globbing support in the initial iteration of the API (that could be done later as a separate RFE instead) * +1 on a new section in the PEP covering rejected design options (calling it iterdir, returning a 2-tuple instead of a dedicated DirEntry type) * regarding "why not a 2-tuple", we know from experience that operating systems evolve and we end up wanting to add additional info to this kind of API. A dedicated DirEntry type lets us adjust the information returned over time, without breaking backwards compatibility and without resorting to ugly hacks like those in some of the time and stat APIs (or even our own codec info APIs) * it would be nice to see some relative performance numbers for NFS and CIFS network shares - the additional network round trips can make excessive stat calls absolutely brutal from a speed perspective when using a network drive (that's why the stat caching added to the import system in 3.3 dramatically sped up the case of having network drives on sys.path, and why I thought AJ had a point when he was complaining about the fact we didn't expose the dirent data from os.listdir) Regards, Nick. > > regards, > jwi > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] LZO bug
On Jun 27, 2014, at 9:56 AM, MRAB wrote: > Is this something that we need to worry about? > > Raising Lazarus - The 20 Year Old Bug that Went to Mars > http://blog.securitymouse.com/2014/06/raising-lazarus-20-year-old-bug-that.html Debunking the LZ4 "20 years old bug" myth http://fastcompression.blogspot.com/2014/06/debunking-lz4-20-years-old-bug-myth.html Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] LZO bug
Is this something that we need to worry about? Raising Lazarus - The 20 Year Old Bug that Went to Mars http://blog.securitymouse.com/2014/06/raising-lazarus-20-year-old-bug-that.html ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] buildbot.python.org down?
On Fri, Jun 27, 2014, at 02:14, Ned Deily wrote: > The buildbot web site seems to have been down for some hours and still > is as of 0915 UTC. I'm not sure who is watching over it but I'll ping > the infrastructure team as well. Fixed. The VM crashed, and Ernest rebooted it. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Summary of Python tracker Issues
ACTIVITY SUMMARY (2014-06-20 - 2014-06-27) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open4643 (-12) closed 29004 (+72) total 33647 (+60) Open issues with patches: 2162 Issues opened (50) == #6916: Remove deprecated items from asynchat http://bugs.python.org/issue6916 reopened by ezio.melotti #10312: intcatcher() can deadlock http://bugs.python.org/issue10312 reopened by Claudiu.Popa #21817: `concurrent.futures.ProcessPoolExecutor` swallows tracebacks http://bugs.python.org/issue21817 opened by cool-RR #21818: cookielib documentation references Cookie module, not cookieli http://bugs.python.org/issue21818 opened by Ajtag #21820: unittest: unhelpful truncating of long strings. http://bugs.python.org/issue21820 opened by cjw296 #21821: The function cygwinccompiler.is_cygwingcc leads to FileNotFoun http://bugs.python.org/issue21821 opened by paugier #21822: KeyboardInterrupt during Thread.join hangs that Thread http://bugs.python.org/issue21822 opened by tupl #21825: Embedding-Python example code from documentation crashes http://bugs.python.org/issue21825 opened by Pat.Le.Cat #21826: Performance issue (+fix) AIX ctypes.util with no /sbin/ldconfi http://bugs.python.org/issue21826 opened by tw.bert #21827: textwrap.dedent() fails when largest common whitespace is a su http://bugs.python.org/issue21827 opened by robertjli #21830: ssl.wrap_socket fails on Windows 7 when specifying ca_certs http://bugs.python.org/issue21830 opened by David.M.Noriega #21833: Fix unicodeless build of Python http://bugs.python.org/issue21833 opened by serhiy.storchaka #21834: Fix a number of tests in unicodeless build http://bugs.python.org/issue21834 opened by serhiy.storchaka #21835: Fix Tkinter in unicodeless build http://bugs.python.org/issue21835 opened by serhiy.storchaka #21836: Fix sqlite3 in unicodeless build http://bugs.python.org/issue21836 opened by serhiy.storchaka #21837: Fix tarfile in unicodeless build http://bugs.python.org/issue21837 opened by serhiy.storchaka #21838: Fix ctypes in unicodeless build http://bugs.python.org/issue21838 opened by serhiy.storchaka #21839: Fix distutils in unicodeless build http://bugs.python.org/issue21839 opened by serhiy.storchaka #21840: Fix os.path in unicodeless build http://bugs.python.org/issue21840 opened by serhiy.storchaka #21841: Fix xml.sax in unicodeless build http://bugs.python.org/issue21841 opened by serhiy.storchaka #21842: Fix IDLE in unicodeless build http://bugs.python.org/issue21842 opened by serhiy.storchaka #21843: Fix doctest in unicodeless build http://bugs.python.org/issue21843 opened by serhiy.storchaka #21844: Fix HTMLParser in unicodeless build http://bugs.python.org/issue21844 opened by serhiy.storchaka #21845: Fix plistlib in unicodeless build http://bugs.python.org/issue21845 opened by serhiy.storchaka #21846: Fix zipfile in unicodeless build http://bugs.python.org/issue21846 opened by serhiy.storchaka #21847: Fix xmlrpc in unicodeless build http://bugs.python.org/issue21847 opened by serhiy.storchaka #21848: Fix logging in unicodeless build http://bugs.python.org/issue21848 opened by serhiy.storchaka #21849: Fix multiprocessing for non-ascii data http://bugs.python.org/issue21849 opened by serhiy.storchaka #21850: Fix httplib and SimpleHTTPServer in unicodeless build http://bugs.python.org/issue21850 opened by serhiy.storchaka #21851: Fix gettext in unicodeless build http://bugs.python.org/issue21851 opened by serhiy.storchaka #21852: Fix optparse in unicodeless build http://bugs.python.org/issue21852 opened by serhiy.storchaka #21853: Fix inspect in unicodeless build http://bugs.python.org/issue21853 opened by serhiy.storchaka #21854: Fix cookielib in unicodeless build http://bugs.python.org/issue21854 opened by serhiy.storchaka #21855: Fix decimal in unicodeless build http://bugs.python.org/issue21855 opened by serhiy.storchaka #21856: memoryview: no overflow on large slice values (start, stop, st http://bugs.python.org/issue21856 opened by haypo #21857: assert that functions clearing the current exception are not c http://bugs.python.org/issue21857 opened by haypo #21859: Add Python implementation of FileIO http://bugs.python.org/issue21859 opened by serhiy.storchaka #21860: Correct FileIO docstrings http://bugs.python.org/issue21860 opened by serhiy.storchaka #21861: io class name are hardcoded in reprs http://bugs.python.org/issue21861 opened by serhiy.storchaka #21862: cProfile command-line should accept "-m module_name" as an alt http://bugs.python.org/issue21862 opened by pitrou #21863: Display module names of C functions in cProfile http://bugs.python.org/issue21863 opened by pitrou #21864: Error in documentation of point 9.8 'Exceptions are classes to http://bugs.python.org/issue21864 opened by Peib
Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
On 27.06.2014 00:59, Ben Hoyt wrote: > Specifics of proposal > = > [snip] Each ``DirEntry`` object has the following > attributes and methods: > [snip] > Notes on caching > > > The ``DirEntry`` objects are relatively dumb -- the ``name`` attribute > is obviously always cached, and the ``is_X`` and ``lstat`` methods > cache their values (immediately on Windows via ``FindNextFile``, and > on first use on Linux / OS X via a ``stat`` call) and never refetch > from the system. I find this behaviour a bit misleading: using methods and have them return cached results. How much (implementation and/or performance and/or memory) overhead would incur by using property-like access here? I think this would underline the static nature of the data. This would break the semantics with respect to pathlib, but they’re only marginally equal anyways -- and as far as I understand it, pathlib won’t cache, so I think this has a fair point here. regards, jwi ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
On 27.06.2014 03:50, MRAB wrote: > On 2014-06-27 02:37, Ben Hoyt wrote: >> I don't mind iterdir() and would take it :-), but I'll just say why I >> chose the name scandir() -- though it wasn't my suggestion originally: >> >> iterdir() sounds like just an iterator version of listdir(), kinda >> like keys() and iterkeys() in Python 2. Whereas in actual fact the >> return values are quite different (DirEntry objects vs strings), and >> so the name change reflects that difference a little. >> > [snip] > > The re module has 'findall', which returns a list of strings, and > 'finditer', which returns an iterator that yields match objects, so > there's a precedent. :-) A bad precedent in my opinion though -- I was just recently bitten by that, and I find it very untypical for python. regards, Jonas ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
Hello, On Fri, 27 Jun 2014 12:08:41 +1000 Steven D'Aprano wrote: > On Fri, Jun 27, 2014 at 03:07:46AM +0300, Paul Sokolovsky wrote: > > > With my MicroPython hat on, os.scandir() would make things only > > worse. With current interface, one can either have inefficient > > implementation (like CPython chose) or efficient implementation > > (like MicroPython chose) - all transparently. os.scandir() > > supposedly opens up efficient implementation for everyone, but at > > the price of bloating API and introducing heavy-weight objects to > > wrap info. > > os.scandir is not part of the Python API, it is not a built-in > function. It is part of the CPython standard library. Ok, so standard library also has API, and that's the API being discussed. > That means (in > my opinion) that there is an expectation that other Pythons should > provide it, but not an absolute requirement. Especially for the os > module, which by definition is platform-specific. Yes, that's intuitive, but not strict and formal, so is subject to interpretations. As a developer working on alternative Python implementation, I'd like to have better understanding of what needs to be done to be a compliant implementation (in particular, because I need to pass that info down to the users). So, I was told that https://docs.python.org/3/reference/index.html describes Python, not CPython. Next step is figuring out whether https://docs.python.org/3/library/index.html describes Python or CPython, and if the latter, how to separate Python's stdlib essence from extended library CPython provides? > In my opinion that > means you have four options: > > 1. provide os.scandir, with exactly the same semantics as on CPython; > > 2. provide os.scandir, but change its semantics to be more > lightweight (e.g. return an ordinary tuple, as you already suggest); > > 3. don't provide os.scandir at all; or > > 4. do something different depending on whether the platform is Linux >or an embedded system. > > I would consider any of those acceptable for a library feature, but > not for a language feature. Good, thanks. If that represents shared opinion of (C)Python developers (so, there won't be claims like "MicroPython is not Python because it doesn't provide os.scandir()" (or hundred of other missing stdlib functions ;-) )) that's good enough already. With that in mind, I wished that any Python implementation was as complete and as efficient as possible, and one way to achieve that is to not add stdlib entities without real need (be it more API calls or more data types). So, I'm glad to know that os.scandir() passed thru Occam's Razor in this respect and specified the way it is really for common good. [] -- Best regards, Paul mailto:pmis...@gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
Hello, On Thu, 26 Jun 2014 21:52:43 -0400 Ben Hoyt wrote: [] > It's a fair point that os.walk() can be implemented efficiently > without adding a new function and API. However, often you'll want more > info, like the file size, which scandir() can give you via > DirEntry.lstat(), which is free on Windows. So opening up this > efficient API is beneficial. > > In CPython, I think the DirEntry objects are as lightweight as > stat_result objects. > > I'm an embedded developer by background, so I know the constraints > here, but I really don't think Python's development should be tailored > to fit MicroPython. If os.scandir() is not very efficient on > MicroPython, so be it -- 99% of all desktop/server users will gain > from it. Surely, tailoring Python to MicroPython's needs is completely not what I suggest. It was an example of alternative implementation which optimized os.walk() without need for any additional public module APIs. Vice-versa, high-level nature of API call like os.walk() and underspecification of low-level details (like which function implemented in terms of which others) allow MicroPython provide optimized implementation even with its resource constraints. So, power of high-level interfaces and underspecification should not be underestimated ;-). But I don't want to argue that os.scandir() is "not needed", because that's hardly productive. Something I'd like to prototype in uPy and ideally lead further up to PEP status is to add iterator-based string methods, and I pretty much can expect "we lived without it" response, so don't want to go the same way regarding addition of other iterator-based APIs - it's clear that more iterator/generator based APIs is a good direction for Python to evolve. > > It would be better if os.scandir() was specified to return a struct > > (named tuple) compatible with return value of os.stat() (with only > > fields relevant to underlying readdir()-like system call). The > > grounds for that are obvious: it's already existing data interface > > in module "os", which is also based on open standard for operating > > systems - POSIX, so if one is to expect something about file > > attributes, it's what one can reasonably base expectations on. > > Yes, we considered this early on (see the python-ideas and python-dev > threads referenced in the PEP), but decided it wasn't a great API to > overload stat_result further, and have most of the attributes None or > not present on Linux. > [] > > However, for scandir() to be useful, you also need the name. My > original version of this directory iterator returned two-tuples of > (name, stat_result). But most people didn't like the API, and I don't > really either. You could overload stat_result with a .name attribute > in this case, but it still isn't a nice API to have most of the > attributes None, and then you have to test for that, etc. Yes, returning (name, stat_result) would be my first motion too, I don't see why someone wouldn't like pair of 2 values, with each value of obvious type and semantics within "os" module. Regarding stat result, os.stat() provides full information about a file, and intuitively, one may expect that os.scandir() would provide subset of that info, asymptotically reaching volume of what os.stat() may provide, depending on OS capabilities. So, if truly OS-independent interface is wanted to salvage more data from a dir scanning, using os.stat struct as data interface is hard to ignore. But well, if it was rejected already, what can be said? Perhaps, at least the PEP could be extended to explicitly mention other approached which were discussed and rejected, not just link to a discussion archive (from experience with reading other PEPs, they oftentimes contained such subsections, so hope this suggestion is not ungrounded). > > So basically we tweaked the API to do what was best, and ended up with > it returning DirEntry objects with is_file() and similar methods. > > Hope that helps give a bit more context. If you haven't read the > relevant python-ideas and python-dev threads, those are interesting > too. > > -Ben -- Best regards, Paul mailto:pmis...@gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
On Jun 26, 2014, at 4:38 PM, Tim Delaney wrote: On 27 June 2014 09:28, MRAB wrote: > > -1 for windows_wildcard (it would be an attractive nuisance to write windows-only code) Could you emulate it on other platforms? +1 on the rest of it. -Chris ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Binary CPython distribution for Linux
On 27 Jun 2014 17:33, "Bohuslav Kabrda" wrote: > > It's not true that 2.7 wasn't released until few weeks ago. It was released few weeks ago as part of RHEL 7, but Red Hat has been shipping Red Hat Software Collections (RHSCL) 1.0, that contain Python 2.7 and Python 3.3, for almost a year now [1] - RHSCL is installable on RHEL 6; RHSCL 1.1 (also with 2.7 and 3.3) has been released few weeks ago and is supported on RHEL 6 and 7. Also, these collections now have their community rebuilds at [2], so you can just download them without needing to talk to Red Hat at all. But yeah, these are all RPMs, so you have to be root to install them. Indeed, while there are still some rough edges, software collections look like the best approach to doing maintainable system installs of Python runtimes other than the system Python into Fedora/RHEL/CentOS et al (and I say that while wearing both my upstream and downstream hats). Collections solve this problem in a general (rather than CPython specific) way, since they can be used to get upgraded versions of language runtimes, databases, web servers, etc, all without risking the stability of the OS itself. I hope to see someone put together collections for PyPy and PyPy3 as well. The approaches used for runtime isolation of software collections should also be applicable to Debian systems, but (as far as I am aware) the tooling to build them as debs rather than RPMs doesn't exist yet. > Please don't take this as a criticism of your ideas, I see what you're trying to solve. I just think the way you're trying to solve it is unachievable or would consume so much community resources, that it would end up unmaintained and buggy most of the time. For prebuilt userland installs on Linux, I think "miniconda" is the current best available approach. It has its challenges (especially around its handling of security concerns), but it's designed to offer a full cross platform package management system that makes it well suited to the task of managing prebuilt language runtimes in user space. Cheers, Nick. > > -- > Regards, > Bohuslav "Slavek" Kabrda. > > [1] http://developerblog.redhat.com/2013/09/12/rhscl1-ga/ > [2] https://www.softwarecollections.org/en/scls/ > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] buildbot.python.org down?
The buildbot web site seems to have been down for some hours and still is as of 0915 UTC. I'm not sure who is watching over it but I'll ping the infrastructure team as well. -- Ned Deily, n...@acm.org ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
Hi, You wrote a great PEP Ben, thanks :-) But it's now time for comments! > But the underlying system calls -- ``FindFirstFile`` / > ``FindNextFile`` on Windows and ``readdir`` on Linux and OS X -- What about FreeBSD, OpenBSD, NetBSD, Solaris, etc. They don't provide readdir? You should add a link to FindFirstFile doc: http://msdn.microsoft.com/en-us/library/windows/desktop/aa364418%28v=vs.85%29.aspx It looks like the WIN32_FIND_DATA has a dwFileAttributes field. So we should mimic stat_result recent addition: the new stat_result.file_attributes field. Add DirEntry.file_attributes which would only be available on Windows. The Windows structure also contains FILETIME ftCreationTime; FILETIME ftLastAccessTime; FILETIME ftLastWriteTime; DWORDnFileSizeHigh; DWORDnFileSizeLow; It would be nice to expose them as well. I'm no more surprised that the exact API is different depending on the OS for functions of the os module. > * Instead of bare filename strings, it returns lightweight > ``DirEntry`` objects that hold the filename string and provide > simple methods that allow access to the stat-like data the operating > system returned. Does your implementation uses a free list to avoid the cost of memory allocation? A short free list of 10 or maybe just 1 may help. The free list may be stored directly in the generator object. > ``scandir()`` yields a ``DirEntry`` object for each file and directory > in ``path``. Just like ``listdir``, the ``'.'`` and ``'..'`` > pseudo-directories are skipped, and the entries are yielded in > system-dependent order. Each ``DirEntry`` object has the following > attributes and methods: Does it support also bytes filenames on UNIX? Python now supports undecodable filenames thanks to the PEP 383 (surrogateescape). I prefer to use the same type for filenames on Linux and Windows, so Unicode is better. But some users might prefer bytes for other reasons. > The ``DirEntry`` attribute and method names were chosen to be the same > as those in the new ``pathlib`` module for consistency. Great! That's exactly what I expected :-) Consistency with other modules. > Notes on caching > > > The ``DirEntry`` objects are relatively dumb -- the ``name`` attribute > is obviously always cached, and the ``is_X`` and ``lstat`` methods > cache their values (immediately on Windows via ``FindNextFile``, and > on first use on Linux / OS X via a ``stat`` call) and never refetch > from the system. > > For this reason, ``DirEntry`` objects are intended to be used and > thrown away after iteration, not stored in long-lived data structured > and the methods called again and again. > > If a user wants to do that (for example, for watching a file's size > change), they'll need to call the regular ``os.lstat()`` or > ``os.path.getsize()`` functions which force a new system call each > time. Crazy idea: would it be possible to "convert" a DirEntry object to a pathlib.Path object without losing the cache? I guess that pathlib.Path expects a full stat_result object. > Or, for getting the total size of files in a directory tree -- showing > use of the ``DirEntry.lstat()`` method:: > > def get_tree_size(path): > """Return total size of files in path and subdirs.""" > size = 0 > for entry in scandir(path): > if entry.is_dir(): > sub_path = os.path.join(path, entry.name) > size += get_tree_size(sub_path) > else: > size += entry.lstat().st_size > return size > > Note that ``get_tree_size()`` will get a huge speed boost on Windows, > because no extra stat call are needed, but on Linux and OS X the size > information is not returned by the directory iteration functions, so > this function won't gain anything there. I don't understand how you can build a full lstat() result without really calling stat. I see that WIN32_FIND_DATA contains the size, but here you call lstat(). If you know that it's not a symlink, you already know the size, but you still have to call stat() to retrieve all fields required to build a stat_result no? > Support > === > > The scandir module on GitHub has been forked and used quite a bit (see > "Use in the wild" in this PEP), Do you plan to continue to maintain your module for Python < 3.5, but upgrade your module for the final PEP? > Should scandir be in its own module? > > > Should the function be included in the standard library in a new > module, ``scandir.scandir()``, or just as ``os.scandir()`` as > discussed? The preference of this PEP's author (Ben Hoyt) would be > ``os.scandir()``, as it's just a single function. Yes, put it in the os module which is already bloated :-) > Should there be a way to access the full path? > -- > > Should ``DirEntry``'s have a way to get the full path without using > ``os.path.join(path, entry.name)``? This is
Re: [Python-Dev] Binary CPython distribution for Linux
- Original Message - > While much of the opposition to dropping Python <2.7 stems from the RHEL > community (they still have 2.4 in extended support and 2.7 wasn't in a > release until a few weeks ago), a common objection from the users is "I > can't install a different Python" or "it's too difficult to install a > different Python." The former is a legit complaint - if you are on > shared hosting and don't have root, as easy as it is to add an alternate > package repository that provides 2.7 (or newer), you don't have the > permissions so you can't do it. It's not true that 2.7 wasn't released until few weeks ago. It was released few weeks ago as part of RHEL 7, but Red Hat has been shipping Red Hat Software Collections (RHSCL) 1.0, that contain Python 2.7 and Python 3.3, for almost a year now [1] - RHSCL is installable on RHEL 6; RHSCL 1.1 (also with 2.7 and 3.3) has been released few weeks ago and is supported on RHEL 6 and 7. Also, these collections now have their community rebuilds at [2], so you can just download them without needing to talk to Red Hat at all. But yeah, these are all RPMs, so you have to be root to install them. > I'd like to propose a solution to this problem: a pre-built distribution > of CPython for Linux available via www.python.org in the list of > downloads for a particular release [5]. This distribution could be > downloaded and unarchived into the user's home directory and users could > start running it immediately by setting an environment variable or two, > creating a symlink, or even running a basic installer script. This would > hopefully remove the hurdles of obtaining a (sane) Python distribution > on Linux. This would allow projects to more easily drop end-of-life > Python versions and would speed adoption of modern Python, including > Python 3 (because porting is much easier if you only have to target 2.7). > > I understand there may be technical challenges with doing this for some > distributions and with producing a universal binary distribution. I > would settle for a binary distribution that was targeted towards RHEL > users and variant distros, as that is the user population that I > perceive to be the most conservative and responsible for holding modern > Python adoption back. Speaking with my Fedora/RHEL/RHSCL Python maintainer's hat on, prebuilding Python is not as easy task as it may seem :) Someone has to write the build scripts (e.g. sort of specfile, but rpm/specfile wouldn't really work for you, since you want to install in user's home dirs). Someone has to update them when new Python comes out, so in the worst case you end up with slightly different build scripts for different versions of Python. Someone has to do rebuilds when there is CVE. Or a bug. Or a user requests a feature that makes sense. Someone has to do that for *each packaged version* - and each packaged version needs to be maintained for some amount of time so that it all actually makes sense. Maintaining a prebuilt distribution of Python is a time consuming task even if you do it just for one Linux distro. If you want to maintain a *universal* prebuilt Python distribution, then you'll find out that it's a) undoable b) consumes so many resources and it's so fragile, that it's probably not worth it. You could just bundle all Python dependencies into your distribution to make it "easier", but that would just make the result grow in size (perhaps significantly) and you would then also need to update/bugfix/securityfix the bundled dependencies (which would consume even more time). Please don't take this as a criticism of your ideas, I see what you're trying to solve. I just think the way you're trying to solve it is unachievable or would consume so much community resources, that it would end up unmaintained and buggy most of the time. -- Regards, Bohuslav "Slavek" Kabrda. [1] http://developerblog.redhat.com/2013/09/12/rhscl1-ga/ [2] https://www.softwarecollections.org/en/scls/ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com