Re: [Python-Dev] Add a new locale codec?
I think there's a general expectation that if you encode something with one codec you will be able to decode it with the same codec. That's not necessarily true for the locale encoding. There is the same problem with the filesystem encoding (sys.getfilesystemencoding()), which is the user locale encoding (LC_ALL, LANG or LC_CTYPE) or the Windows ANSI code page. If you wrote a file using this encoding, you may not be able to read it if the filesystem encoding changes between two run, or on another computer. I agree that it is more surprising because the current locale encoding can change anytime, not only between two runs or when you use another computer. Don't you think that this special behaviour can be documented? Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] cpython: PEP 410
changeset: 74832:f8409b3d6449 user: Victor Stinner victor.stin...@haypocalc.com date: Wed Feb 08 14:31:50 2012 +0100 summary: PEP 410 Ah, even when written by a core dev, a PEP should still be at Accepted before we check anything in. PEP 410 is still at Draft. Never mind, I just saw the checkin that reverted the change. Yeah, I should use a clone of the repository instead of always working in the same repository. I pushed the commit by mistake. It is difficult to manipulate such huge patch. I just created a clone on my computer to avoid similar mistakes :-) Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] A new dictionary implementation
francis wrote: Hi Mark, I've just cloned : Repository: https://bitbucket.org/markshannon/cpython_new_dict Do please try it on your machine(s). that's a: Linux random 3.1.0-1-amd64 #1 SMP Tue Jan 10 05:01:58 UTC 2012 x86_64 GNU/Linux and I'm getting: gcc -pthread -c -Wno-unused-result -g -O0 -Wall -Wstrict-prototypes -I. -I./Include -DPy_BUILD_CORE -o Objects/dictobject.o Objects/dictobject.c gcc -pthread -c -Wno-unused-result -g -O0 -Wall -Wstrict-prototypes -I. -I./Include -DPy_BUILD_CORE -o Objects/memoryobject.o Objects/memoryobject.c Objects/dictobject.c: In function ‘dict_popitem’: Objects/dictobject.c:2208:5: error: ‘PyDictKeyEntry’ has no member named ‘me_value’ make: *** [Objects/dictobject.o] Error 1 make: *** Waiting for unfinished jobs Bah... typo in assert statement. My fault for not testing the debug build (release build worked fine). Both builds working now. Cheers, Mark. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] A new dictionary implementation
On 08/02/2012 15:16, Mark Shannon wrote: Hi, Version 2 is now available. Version 2 makes as few changes to tunable constants as possible, and generally does not change iteration order (so repr() is unchanged). All tests pass (the only changes to tests are for sys.getsizeof() ). Repository: https://bitbucket.org/markshannon/cpython_new_dict Issue http://bugs.python.org/issue13903 Performance changes are basically zero for non-OO code. Average -0.5% speed change on 2n3 benchamrks, a few benchmarks show a small reduction in memory use. (see notes below) GCbench uses 47% less memory and is 12% faster. 2to3, which seems to be the only realistic benchmark that runs on Py3, shows no change in speed and uses 10% less memory. In your first version 2to3 used 28% less memory. Do you know why it's worse in this version? Michael All benchmarks and tests performed on old, slow 32bit machine with linux. Do please try it on your machine(s). If accepted, the new dict implementation will allow a useful optimisation of the LOAD_GLOBAL (and possibly LOAD_ATTR) bytecode: By testing to see if the (immutable) keys-tables is the expected table, the value can accessed directly by index, rather than by name. Cheers, Mark. Notes: All benchmarks from http://hg.python.org/benchmarks/ using the -m flag to get memory usage data. I've ignored the json benchmarks which shows unstable behaviour on my machine. Tiny changes to the dict being serialized or to the random seed can change the relative speed of my implementation vs CPython from -25% to +10%. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] cpython: PEP 410
On Thu, Feb 9, 2012 at 7:32 PM, Victor Stinner victor.stin...@haypocalc.com wrote: changeset: 74832:f8409b3d6449 user: Victor Stinner victor.stin...@haypocalc.com date: Wed Feb 08 14:31:50 2012 +0100 summary: PEP 410 Ah, even when written by a core dev, a PEP should still be at Accepted before we check anything in. PEP 410 is still at Draft. Never mind, I just saw the checkin that reverted the change. Yeah, I should use a clone of the repository instead of always working in the same repository. I pushed the commit by mistake. It is difficult to manipulate such huge patch. I just created a clone on my computer to avoid similar mistakes :-) I maintain a separate sandbox clone for the same reason. I think I'm finally starting to get the hang of the mq extension for working with smaller not yet ready changes, too. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add a new locale codec?
On Thu, 9 Feb 2012 08:43:02 +0200 Simon Cross hodgestar+python...@gmail.com wrote: On Thu, Feb 9, 2012 at 2:35 AM, Steven D'Aprano st...@pearwood.info wrote: Simon Cross wrote: I think I'm -1 on a locale encoding because it refers to different actual encodings depending on where and when it's run, which seems surprising Why is it surprising? Surely that's the whole point of a locale encoding: to use the locale encoding, whatever that happens to be at the time. I think there's a general expectation that if you encode something with one codec you will be able to decode it with the same codec. That's not necessarily true for the locale encoding. As And pointed out, this is already the behaviour of the mbcs codec under Windows. locale would be the moral (*) equivalent of that under Unix. (*) or perhaps immoral :-) Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add a new locale codec?
2012/2/9 Antoine Pitrou solip...@pitrou.net I think there's a general expectation that if you encode something with one codec you will be able to decode it with the same codec. That's not necessarily true for the locale encoding. As And pointed out, this is already the behaviour of the mbcs codec under Windows. locale would be the moral (*) equivalent of that under Unix. With the difference that mbcs cannot change during execution. I don't even know if it is possible to change it at all, except by reinstalling Windows. -- Amaury Forgeot d'Arc ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add a new locale codec?
With the difference that mbcs cannot change during execution. It is possible to change the thread ANSI code page (CP_THREAD_ACP) at runtime, but setting the system ANSI code page (CP_ACP) requires to restart Windows. I don't even know if it is possible to change it at all, except by reinstalling Windows. The system ANSI code page can be set in the regional dialog of the control panel. If I remember correctly, it is badly called the language. Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add a new locale codec?
As And pointed out, this is already the behaviour of the mbcs codec under Windows. locale would be the moral (*) equivalent of that under Unix. On Windows, the ANSI code page codec will be accessible using 3 different names: locale, mbcs and the real encoding name (sys.getfilesystemencoding())! Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] patch
patch Description: Binary data ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Wed, Feb 8, 2012 at 20:28, PJ Eby p...@telecommunity.com wrote: On Wed, Feb 8, 2012 at 4:08 PM, Brett Cannon br...@python.org wrote: On Wed, Feb 8, 2012 at 15:31, Terry Reedy tjre...@udel.edu wrote: For top-level imports, unless *all* are made lazy, then there *must* be some indication in the code of whether to make it lazy or not. Not true; importlib would make it dead-simple to whitelist what modules to make lazy (e.g. your app code lazy but all stdlib stuff not, etc.). There's actually only a few things stopping all imports from being lazy. from x import y immediately de-lazies them, after all. ;-) The main two reasons you wouldn't want imports to *always* be lazy are: 1. Changing sys.path or other parameters between the import statement and the actual import 2. ImportErrors are likewise deferred until point-of-use, so conditional importing with try/except would break. This actually depends on the type of ImportError. My current solution actually would trigger an ImportError at the import statement if no finder could locate the module. But if some ImportError was raised because of some other issue during load then that would come up at first use. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add a new locale codec?
Victor Stinner writes: There is the same problem [that encode-decode with the 'locale' codec doesn't roundtrip reliably] with the filesystem encoding (sys.getfilesystemencoding()), -1 on a query to the OS that pretends to be a constant. You see, it's not the same problem. The difference is that 'locale' is a constant and should correspond to a constant encoding, while 'sys.getfilesystemcoding()' is a library function that queries the system, and it's obvious from the syntax that this is expected to change in various circumstances, so if you want roundtripping you need to save the result. Having a nondeterministic locale codec is just begging application (and maybe a few middleware) programmers to use it everywhere they don't feel like thinking about I18N. Experience shows that that is everywhere! If this is needed, it should be spelled os.getlocaleencoding() (or sys.getlocaleencoding()?) Possibly there should be corresponding getlocalelanguage(), getlocaleregion(), and getlocalemodifier() functions, and they should take an optional string argument whose appropriate component is returned. Or maybe there should be a parselocalestring() function that returns a named tuple. Or maybe this three-line function doesn't need to be a builtin? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Wed, Feb 8, 2012 at 20:26, Nick Coghlan ncogh...@gmail.com wrote: On Thu, Feb 9, 2012 at 2:09 AM, Antoine Pitrou solip...@pitrou.net wrote: I guess my point was: why is there a function call in that case? The import statement could look up sys.modules directly. Or the built-in __import__ could still be written in C, and only defer to importlib when the module isn't found in sys.modules. Practicality beats purity. I quite like the idea of having builtin __import__ be a *very* thin veneer around importlib that just does the is this in sys.modules already so we can just return it from there? checks and delegates other more complex cases to Python code in importlib. Poking around in importlib.__import__ [1] (as well as importlib._gcd_import), I'm thinking what we may want to do is break up the logic a bit so that there are multiple helper functions that a C version can call back into so that we can optimise certain simple code paths to not call back into Python at all, and others to only do so selectively. Step 1: separate out the fromlist processing from __import__ into a separate helper function def _process_fromlist(module, fromlist): # Perform any required imports as per existing code: # http://hg.python.org/cpython/file/aba513307f78/Lib/importlib/_bootstrap.py#l987 Fine by me. Step 2: separate out the relative import resolution from _gcd_import into a separate helper function. def _resolve_relative_name(name, package, level): assert hasattr(name, 'rpartition') assert hasattr(package, 'rpartition') assert level 0 name = # Recalculate as per the existing code: # http://hg.python.org/cpython/file/aba513307f78/Lib/importlib/_bootstrap.py#l889 return name I was actually already thinking of exposing this as importlib.resolve_name() so breaking it out makes sense. I also think it might be possible to expose a sort of importlib.find_module() that does nothing more than find the loader for a module (if available). Step 3: Implement builtin __import__ in C (pseudo-code below): def __import__(name, globals={}, locals={}, fromlist=[], level=0): if level 0: name = importlib._resolve_relative_import(name) try: module = sys.modules[name] except KeyError: # Not cached yet, need to invoke the full import machinery # We already resolved any relative imports though, so # treat it as an absolute import return importlib.__import__(name, globals, locals, fromlist, 0) # Got a hit in the cache, see if there's any more work to do if not fromlist: # Duplicate relevant importlib.__import__ logic as C code # to find the right module to return from sys.modules elif hasattr(module, __path__): importlib._process_fromlist(module, fromlist) return module This would then be similar to the way main.c already works when it interacts with runpy - simple cases are handled directly in C, more complex cases get handed over to the Python module. I suspect that if people want the case where you load from bytecode is fast then this will have to expand beyond this to include C functions and/or classes which can be used as accelerators; while this accelerates the common case of sys.modules, this (probably) won't make Antoine happy enough for importing a small module from bytecode (importing large modules like decimal are already fast enough). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Feb 9, 2012 9:58 AM, Brett Cannon br...@python.org wrote: This actually depends on the type of ImportError. My current solution actually would trigger an ImportError at the import statement if no finder could locate the module. But if some ImportError was raised because of some other issue during load then that would come up at first use. That's not really a lazy import then, or at least not as lazy as what Mercurial or PEAK use for general lazy importing. If you have a lot of them, that module-finding time really adds up. Again, the goal is fast startup of command-line tools that only use a small subset of the overall framework; doing disk access for lazy imports goes against that goal. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] folding cElementTree behind ElementTree in 3.3
2012/2/8 Nick Coghlan ncogh...@gmail.com On Wed, Feb 8, 2012 at 10:04 PM, Antoine Pitrou solip...@pitrou.net wrote: It's not frozen, it's actually maintained. Indeed, it sounds like the most appropriate course (if we don't hear otherwise from Fredrik) may be to just update PEP 360 to acknowledge current reality (i.e. the most current release of ElementTree is actually the one maintained by Florent in the stdlib). Actually, it was part of my learning curve to the development of Python, as you can see on the thread of the issue http://bugs.python.org/issue6472 . I spent some time between December 2009 and March 2010 to merge the experimental 1.3 in the standard library, both for 2.7 and 3.2. Upstream, there were 2 different test suites for the Python and the C implementation, but I merged them in a single test suite, and I've patched the C accelerator to conform to the same behaviour as the Python reference module. With the knowledge I acquired, I chased some other bugs related to ElementTree at the same time. With the feedback and some support coming from Antoine, Fredrik and Stefan we shaped a decent ElementTree 1.3 for the standard library. I am not aware of any effort to maintain the ElementTree package outside of the standard library since I did this merge. So, in the current state, we could consider the standard library package as the most up to date and stable version of ElementTree. I concur with Eli proposal to set the C accelerator as default : the test suite ensures that both implementations behave the same. I cannot commit myself for the long-term maintenance of ElementTree in the standard library, both because I don't have a strong interest in XML parsing, and because I have many other projects which keep me away from core python development for long period of times. However, I think it is a good thing if all the packages which are part of the standard library follow the same rules. We should try to find an agreement with Fredrik, explicit or implicit, which delegates the evolution and the maintenance of ElementTree to the Python community. IIRC, we have other examples in the standard library where the community support helped a lot to refresh a package where the original maintainer did not have enough time to pursue its work. I'll note that this change isn't *quite* as simple as Eli's description earlier in the thread may suggest, though - the test suite also needs to be updated to ensure that the Python version is still fully exercised without the C acceleration applied. And such an an alteration would definitely be an explicit fork, even though the user facing API doesn't change - we're changing the structure of the code in a way that means some upstream deltas (if they happen to occur) may not apply cleanly. The test suite is a de facto fork of the upstream test suites, since upstream test suites do not guarantee the same behaviour between cElementTree and ElementTree. -- Florent Xicluna ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Thu, Feb 9, 2012 at 13:43, PJ Eby p...@telecommunity.com wrote: On Feb 9, 2012 9:58 AM, Brett Cannon br...@python.org wrote: This actually depends on the type of ImportError. My current solution actually would trigger an ImportError at the import statement if no finder could locate the module. But if some ImportError was raised because of some other issue during load then that would come up at first use. That's not really a lazy import then, or at least not as lazy as what Mercurial or PEAK use for general lazy importing. If you have a lot of them, that module-finding time really adds up. Again, the goal is fast startup of command-line tools that only use a small subset of the overall framework; doing disk access for lazy imports goes against that goal. Depends if you consider stat calls the overhead vs. the actual disk read/write to load the data. Anyway, this is going to lead down to a discussion/argument over design parameters which I'm not up to having since I'm not actively working on a lazy loader for the stdlib right now. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] A new dictionary implementation
Hi Mark, Bah... typo in assert statement. My fault for not testing the debug build (release build worked fine). Both builds working now. Yeah, now is working and passes all tests also on my machine. I've tried to run the test suite but I'm getting a SyntaxError: (may be you know it's just the first time that I try the tool): = ci@random:~/prog/cpython/benchmarks$ python perf.py -r -b apps python ../cpython_new_dict/python Running 2to3... INFO:root:Running ../cpython_new_dict/python lib/2to3/2to3 -f all lib/2to3_data Traceback (most recent call last): File perf.py, line 2236, in module main(sys.argv[1:]) File perf.py, line 2192, in main options))) File perf.py, line 1279, in BM_2to3 return SimpleBenchmark(Measure2to3, *args, **kwargs) File perf.py, line 706, in SimpleBenchmark *args, **kwargs) File perf.py, line 1275, in Measure2to3 return MeasureCommand(command, trials, env, options.track_memory) File perf.py, line 1223, in MeasureCommand CallAndCaptureOutput(command, env=env) File perf.py, line 1053, in CallAndCaptureOutput raise RuntimeError(uBenchmark died: + unicode(stderr, 'ascii')) RuntimeError: Benchmark died: Traceback (most recent call last): File lib/2to3/2to3, line 3, in module from lib2to3.main import main File /home/ci/prog/cpython/benchmarks/lib/2to3/lib2to3/main.py, line 47 except os.error, err: ^ SyntaxError: invalid syntax = And the baseline is: Python 2.7.2+ (but it also gives me an SyntaxError running on python3 default (e50db1b7ad7b) What I'm doing wrong ? (from it's doc: “This project is intended to be an authoritative source of benchmarks for all Python implementations.”) Thanks in advance ! francis ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] peps: Update with bugfix releases.
In article 4f32df1e.40...@v.loewis.de, Martin v. Lowis mar...@v.loewis.de wrote: Am 05.02.2012 21:34, schrieb Ned Deily: In article 20120205204551.horde.ncdeyvnncxdpltxvnkzi...@webmail.df.eu, mar...@v.loewis.de wrote: I understand that but, to me, it makes no sense to send out truly broken releases. Besides, the hash collision attack is not exactly new either. Another few weeks can't make that much of a difference. Why would the release be truly broken? It surely can't be worse than the current releases (which apparently aren't truly broken, else there would have been no point in releasing them back then). They were broken by the release of OS X 10.7 and Xcode 4.2 which were subsequent to the previous releases. None of the currently available python.org installers provide a fully working system on OS X 10.7, or on OS X 10.6 if the user has installed Xcode 4.2 for 10.6. In what way are the current releases not fully working? Are you referring to issues with building extension modules? One problem I've run into is that the 64-bit Mac python 2.7 does not work properly with ActiveState Tcl/Tk. One symptom is to build matplotlib. The results fail -- both versions of Tcl/Tk somehow get linked in. We have had similar problems with the 32-bit python.org python in the past, but recent builds have been fine. I believe the solution that worked for the 32-bit versions was to install ActiveState Tcl/Tk before making the distribution build. The results would work fine with Apple's Tcl/Tk or with ActiveState Tcl/Tk. I don't know if the same solution would work for 64-bit python. I don't know of any issues with the 32-bit build of Python 2.7. I've not tried the Python 3 builds. -- Russell ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Thu, 9 Feb 2012 14:19:59 -0500 Brett Cannon br...@python.org wrote: On Thu, Feb 9, 2012 at 13:43, PJ Eby p...@telecommunity.com wrote: Again, the goal is fast startup of command-line tools that only use a small subset of the overall framework; doing disk access for lazy imports goes against that goal. Depends if you consider stat calls the overhead vs. the actual disk read/write to load the data. Anyway, this is going to lead down to a discussion/argument over design parameters which I'm not up to having since I'm not actively working on a lazy loader for the stdlib right now. For those of you not watching -ideas, or ignoring the Python TIOBE -3% discussion, this would seem to be relevant to any discussion of reworking the import mechanism: http://mail.scipy.org/pipermail/numpy-discussion/2012-January/059801.html mike -- Mike Meyer m...@mired.org http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O ascii ribbon campaign - stop html mail - www.asciiribbon.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On 2/9/2012 11:53 AM, Mike Meyer wrote: On Thu, 9 Feb 2012 14:19:59 -0500 Brett Cannonbr...@python.org wrote: On Thu, Feb 9, 2012 at 13:43, PJ Ebyp...@telecommunity.com wrote: Again, the goal is fast startup of command-line tools that only use a small subset of the overall framework; doing disk access for lazy imports goes against that goal. Depends if you consider stat calls the overhead vs. the actual disk read/write to load the data. Anyway, this is going to lead down to a discussion/argument over design parameters which I'm not up to having since I'm not actively working on a lazy loader for the stdlib right now. For those of you not watching -ideas, or ignoring the Python TIOBE -3% discussion, this would seem to be relevant to any discussion of reworking the import mechanism: http://mail.scipy.org/pipermail/numpy-discussion/2012-January/059801.html mike So what is the implication here? That building a cache of module locations (cleared when a new module is installed) would be more effective than optimizing the search for modules on every invocation of Python? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add a new locale codec?
On Fri, Feb 10, 2012 at 12:59 AM, Stephen J. Turnbull step...@xemacs.org wrote: If this is needed, it should be spelled os.getlocaleencoding() (or sys.getlocaleencoding()?) Or locale.getpreferredencoding(), even ;) FWIW, I agree with Stephen on this one, but take that with the grain of salt that I could probably decode most of the strings I work with as ASCII without breaking anything. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On 2/9/2012 3:27 PM, Glenn Linderman wrote: On 2/9/2012 11:53 AM, Mike Meyer wrote: On Thu, 9 Feb 2012 14:19:59 -0500 Brett Cannonbr...@python.org wrote: On Thu, Feb 9, 2012 at 13:43, PJ Ebyp...@telecommunity.com wrote: Again, the goal is fast startup of command-line tools that only use a small subset of the overall framework; doing disk access for lazy imports goes against that goal. Depends if you consider stat calls the overhead vs. the actual disk read/write to load the data. Anyway, this is going to lead down to a discussion/argument over design parameters which I'm not up to having since I'm not actively working on a lazy loader for the stdlib right now. For those of you not watching -ideas, or ignoring the Python TIOBE -3% discussion, this would seem to be relevant to any discussion of reworking the import mechanism: http://mail.scipy.org/pipermail/numpy-discussion/2012-January/059801.html For 32k processes on BlueGene/P, importing 100 trivial C-extension modules takes 5.5 hours, compared to 35 minutes for all other interpreter loading and initialization. We developed a simple pure-Python module (based on knee.py, a hierarchical import example) that cuts the import time from 5.5 hours to 6 minutes. So what is the implication here? That building a cache of module locations (cleared when a new module is installed) would be more effective than optimizing the search for modules on every invocation of Python? -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] peps: Update with bugfix releases.
In article rowen-ba4fcf.11522909022...@news.gmane.org, Russell E. Owen ro...@uw.edu wrote: One problem I've run into is that the 64-bit Mac python 2.7 does not work properly with ActiveState Tcl/Tk. One symptom is to build matplotlib. The results fail -- both versions of Tcl/Tk somehow get linked in. The 64-bit OS X installer is built on and tested on systems with A/S Tcl/Tk 8.5.x and we explicitly recommend its use when possible. http://www.python.org/download/mac/tcltk/ Please open a python bug for this and any other issues you know of regarding the use with current A/S Tcl/Tk 8.5.x with current 2.7.x or 3.2.x installers on OS X 10.6 or 10.7. -- Ned Deily, n...@acm.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add a new locale codec?
If this is needed, it should be spelled os.getlocaleencoding() (or sys.getlocaleencoding()?) There is already a locale.getpreferredencoding(False) function which give your the current locale encoding. The problem is that the current locale encoding may change and so you have to get the new value each time than you would like to encode or decode data. Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Thu, Feb 9, 2012 at 2:53 PM, Mike Meyer m...@mired.org wrote: For those of you not watching -ideas, or ignoring the Python TIOBE -3% discussion, this would seem to be relevant to any discussion of reworking the import mechanism: http://mail.scipy.org/pipermail/numpy-discussion/2012-January/059801.html Interesting. This gives me an idea for a way to cut stat calls per sys.path entry per import by roughly 4x, at the cost of a one-time directory read per sys.path entry. That is, an importer created for a particular directory could, upon first use, cache a frozenset(listdir()), and the stat().st_mtime of the directory. All the filename checks could then be performed against the frozenset, and the st_mtime of the directory only checked once per import, to verify whether the frozenset() needed refreshing. Since a failed module lookup takes at least 5 stat checks (pyc, pyo, py, directory, and compiled extension (pyd/so)), this cuts it down to only 1, at the price of a listdir(). The big question is how long does a listdir() take, compared to a stat() or failed open()? That would tell us whether the tradeoff is worth making. I did some crude timeit tests on frozenset(listdir()) and trapping failed stat calls. It looks like, for a Windows directory the size of the 2.7 stdlib, you need about four *failed* import attempts to overcome the initial caching cost, or about 8 successful bytecode imports. (For Linux, you might need to double these numbers; my tests showed a different ratio there, perhaps due to the Linux stdib I tested having nearly twice as many directory entries as the directory I tested on Windows!) However, the numbers are much better for application directories than for the stdlib, since they are located earlier on sys.path. Every successful stdlib import in an application is equal to one failed import attempt for every preceding directory on sys.path, so as long as the average directory on sys.path isn't vastly larger than the stdlib, and the average application imports at least four modules from the stdlib (on Windows, or 8 on Linux), there would be a net performance gain for the application as a whole. (That is, there'd be an improved per-sys.path entry import time for stdlib modules, even if not for any application modules.) For smaller directories, the tradeoff actually gets better. A directory one seventh the size of the 2.7 Windows stdlib has a listdir() that's proportionately faster, but failed stats() in that directory are *not* proportionately faster; they're only somewhat faster. This means that it takes fewer failed module lookups to make caching a win - about 2 in this case, vs. 4 for the stdlib. Now, these numbers are with actual disk or network access abstracted away, because the data's in the operating system cache when I run the tests. It's possible that this strategy could backfire if you used, say, an NFS directory with ten thousand files in it as your first sys.path entry. Without knowing the timings for listdir/stat/failed stat in that setup, it's hard to say how many stdlib imports you need before you come out ahead. When I tried a directory about 7 times larger than the stdlib, creating the frozenset took 10 times as long, but the cost of a failed stat didn't go up by very much. This suggests that there's probably an optimal directory size cutoff for this trick; if only there were some way to check the size of a directory without reading it, we could turn off the caching for oversize directories, and get a major speed boost for everything else. On most platforms, the stat().st_size of the directory itself will give you some idea, but on Windows that's always zero. On Windows, we could work around that by using a lower-level API than listdir() and simply stop reading the directory if we hit the maximum number of entries we're willing to build a cache for, and then call it off. (Another possibility would be to explicitly enable caching by putting a flag file in the directory, or perhaps by putting a special prefix on the sys.path entry, setting the cutoff in an environment variable, etc.) In any case, this seems really worth a closer look: in non-pathological cases, it could make directory-based importing as fast as zip imports are. I'd be especially interested in knowing how the listdir/stat/failed stat ratios work on NFS - ISTM that they might be even *more* conducive to this approach, if setup latency dominates the cost of individual system calls. If this works out, it'd be a good example of why importlib is a good idea; i.e., allowing us to play with ideas like this. Brett, wouldn't you love to be able to say importlib is *faster* than the old C-based importing? ;-) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Thu, 9 Feb 2012 17:00:04 -0500 PJ Eby p...@telecommunity.com wrote: On Thu, Feb 9, 2012 at 2:53 PM, Mike Meyer m...@mired.org wrote: For those of you not watching -ideas, or ignoring the Python TIOBE -3% discussion, this would seem to be relevant to any discussion of reworking the import mechanism: http://mail.scipy.org/pipermail/numpy-discussion/2012-January/059801.html Interesting. This gives me an idea for a way to cut stat calls per sys.path entry per import by roughly 4x, at the cost of a one-time directory read per sys.path entry. Why do you even think this is a problem with stat calls? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On 2/9/12 10:15 PM, Antoine Pitrou wrote: On Thu, 9 Feb 2012 17:00:04 -0500 PJ Ebyp...@telecommunity.com wrote: On Thu, Feb 9, 2012 at 2:53 PM, Mike Meyerm...@mired.org wrote: For those of you not watching -ideas, or ignoring the Python TIOBE -3% discussion, this would seem to be relevant to any discussion of reworking the import mechanism: http://mail.scipy.org/pipermail/numpy-discussion/2012-January/059801.html Interesting. This gives me an idea for a way to cut stat calls per sys.path entry per import by roughly 4x, at the cost of a one-time directory read per sys.path entry. Why do you even think this is a problem with stat calls? All he said is that reading about that problem and its solution gave him an idea about dealing with stat call overhead. The cost of stat calls has demonstrated itself to be a significant problem in other, more typical contexts. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] A new dictionary implementation
francis wrote: Hi Mark, Bah... typo in assert statement. My fault for not testing the debug build (release build worked fine). Both builds working now. Yeah, now is working and passes all tests also on my machine. I've tried to run the test suite but I'm getting a SyntaxError: (may be you know it's just the first time that I try the tool): = ci@random:~/prog/cpython/benchmarks$ python perf.py -r -b apps python ../cpython_new_dict/python Running 2to3... INFO:root:Running ../cpython_new_dict/python lib/2to3/2to3 -f all lib/2to3_data Traceback (most recent call last): File perf.py, line 2236, in module main(sys.argv[1:]) File perf.py, line 2192, in main options))) File perf.py, line 1279, in BM_2to3 return SimpleBenchmark(Measure2to3, *args, **kwargs) File perf.py, line 706, in SimpleBenchmark *args, **kwargs) File perf.py, line 1275, in Measure2to3 return MeasureCommand(command, trials, env, options.track_memory) File perf.py, line 1223, in MeasureCommand CallAndCaptureOutput(command, env=env) File perf.py, line 1053, in CallAndCaptureOutput raise RuntimeError(uBenchmark died: + unicode(stderr, 'ascii')) RuntimeError: Benchmark died: Traceback (most recent call last): File lib/2to3/2to3, line 3, in module from lib2to3.main import main File /home/ci/prog/cpython/benchmarks/lib/2to3/lib2to3/main.py, line 47 except os.error, err: ^ SyntaxError: invalid syntax = And the baseline is: Python 2.7.2+ (but it also gives me an SyntaxError running on python3 default (e50db1b7ad7b) What I'm doing wrong ? (from it's doc: “This project is intended to be an authoritative source of benchmarks for all Python implementations.”) You need to convert the benchamrks to Python3 using 2to3. Instructions are in the make_perf3.sh file. You may need to manually fix up the output as well :( Cheers, Mark. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Thu, Feb 9, 2012 at 5:34 PM, Robert Kern robert.k...@gmail.com wrote: On 2/9/12 10:15 PM, Antoine Pitrou wrote: On Thu, 9 Feb 2012 17:00:04 -0500 PJ Ebyp...@telecommunity.com wrote: On Thu, Feb 9, 2012 at 2:53 PM, Mike Meyerm...@mired.org wrote: For those of you not watching -ideas, or ignoring the Python TIOBE -3% discussion, this would seem to be relevant to any discussion of reworking the import mechanism: http://mail.scipy.org/**pipermail/numpy-discussion/** 2012-January/059801.htmlhttp://mail.scipy.org/pipermail/numpy-discussion/2012-January/059801.html Interesting. This gives me an idea for a way to cut stat calls per sys.path entry per import by roughly 4x, at the cost of a one-time directory read per sys.path entry. Why do you even think this is a problem with stat calls? All he said is that reading about that problem and its solution gave him an idea about dealing with stat call overhead. The cost of stat calls has demonstrated itself to be a significant problem in other, more typical contexts. Right. It was the part of the post that mentioned that all they sped up was knowing which directory the files were in, not the actual loading of bytecode. The thought then occurred to me that this could perhaps be applied to normal importing, as a zipimport-style speedup. (The zipimport module caches each zipfile directory it finds on sys.path, so failed import lookups are extremely fast.) It occurs to me, too, that applying the caching trick to *only* the stdlib directories would still be a win as soon as you have between four and eight site-packages (or user specific site-packages) imports in an application, so it might be worth applying unconditionally to system-defined stdlib (non-site) directories. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Fri, Feb 10, 2012 at 1:05 AM, Brett Cannon br...@python.org wrote: This would then be similar to the way main.c already works when it interacts with runpy - simple cases are handled directly in C, more complex cases get handed over to the Python module. I suspect that if people want the case where you load from bytecode is fast then this will have to expand beyond this to include C functions and/or classes which can be used as accelerators; while this accelerates the common case of sys.modules, this (probably) won't make Antoine happy enough for importing a small module from bytecode (importing large modules like decimal are already fast enough). No, my suggestion of keeping a de minimis C implementation for the builtin __import__ is purely about ensuring the case of repeated imports (especially those nested inside functions) remains as fast as it is today. To speed up *first time* imports (regardless of their origin), I think it makes a lot more sense to use better algorithms at the importlib level, and that's much easier in Python than it is in C. It's not like we've ever been philosophically *opposed* to smarter approaches, it's just that import.c was already hairy enough and we had grave doubts about messing with it too much (I still have immense respect for the effort that Victor put in to sorting out most of its problems with Unicode handling). Not having that millstone hanging around our necks should open up *lots* of avenues for improvement without breaking backwards compatibility (since we can really do what we like, so long as the PEP 302 APIs are still invoked in the right order and the various public APIs remain backwards compatible). Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On 2/9/2012 7:19 PM, PJ Eby wrote: Right. It was the part of the post that mentioned that all they sped up was knowing which directory the files were in, not the actual loading of bytecode. The thought then occurred to me that this could perhaps be applied to normal importing, as a zipimport-style speedup. (The zipimport module caches each zipfile directory it finds on sys.path, so failed import lookups are extremely fast.) It occurs to me, too, that applying the caching trick to *only* the stdlib directories would still be a win as soon as you have between four and eight site-packages (or user specific site-packages) imports in an application, so it might be worth applying unconditionally to system-defined stdlib (non-site) directories. It might be worthwhile to store a single file in in the directory that contains /Lib with the info inport needs to get files in /Lib and its subdirs, and check that it is not outdated relative to /Lib. Since in Python 3, .pyc files go in __pycache__, if /Lib included an empyty __pycache__ on installation, /Lib would never be touched on most installations. Ditto for the non-__pycache__ subdirs. -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] folding cElementTree behind ElementTree in 3.3
On Wed, Feb 8, 2012 at 10:04 PM, Antoine Pitrou solip...@pitrou.net wrote: It's not frozen, it's actually maintained. Indeed, it sounds like the most appropriate course (if we don't hear otherwise from Fredrik) may be to just update PEP 360 to acknowledge current reality (i.e. the most current release of ElementTree is actually the one maintained by Florent in the stdlib). Actually, it was part of my learning curve to the development of Python, as you can see on the thread of the issue http://bugs.python.org/issue6472 . I spent some time between December 2009 and March 2010 to merge the experimental 1.3 in the standard library, both for 2.7 and 3.2. Upstream, there were 2 different test suites for the Python and the C implementation, but I merged them in a single test suite, and I've patched the C accelerator to conform to the same behaviour as the Python reference module. With the knowledge I acquired, I chased some other bugs related to ElementTree at the same time. With the feedback and some support coming from Antoine, Fredrik and Stefan we shaped a decent ElementTree 1.3 for the standard library. I am not aware of any effort to maintain the ElementTree package outside of the standard library since I did this merge. So, in the current state, we could consider the standard library package as the most up to date and stable version of ElementTree. I concur with Eli proposal to set the C accelerator as default : the test suite ensures that both implementations behave the same. I cannot commit myself for the long-term maintenance of ElementTree in the standard library, both because I don't have a strong interest in XML parsing, and because I have many other projects which keep me away from core python development for long period of times. However, I think it is a good thing if all the packages which are part of the standard library follow the same rules. We should try to find an agreement with Fredrik, explicit or implicit, which delegates the evolution and the maintenance of ElementTree to the Python community. IIRC, we have other examples in the standard library where the community support helped a lot to refresh a package where the original maintainer did not have enough time to pursue its work. Thanks for the input, Florent. So, to paraphrase, there already are code changes in the stdlib version of ET/cET which are not upstream. You made it explicit about the tests, so the question is only left for the modules themselves. Is that right? Eli ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] folding cElementTree behind ElementTree in 3.3
That said, I think that the particular change discussed in this thread can be made anyway, since it doesn't really modify ET's APIs or functionality, just the way it gets imported from stdlib. I would suggest that, assuming python-dev want to take ownership of the module, one last-ditch attempt be made to contact Fredrik. We should email him, and copy python-dev (and maybe even python-list) asking for his view, and ideally his blessing on the stdlib version being forked and maintained independently going forward. Put a time limit on responses (if we don't hear by XXX, we'll assume Fredrik is either uncontactable or not interested, and therefore we can go ahead with maintaining the stdlib version independently). It's important to respect Fredrik's wishes and ownership, but we can't leave part of the stdlib frozen and abandoned just because he's not available any longer. IMHO it's no longer a question of wanting to take ownership. According to Florent, this has already happened to some extent. Also, given the support history of ET outside stdlib, we can't in the same breath not take ownership and keep recommending this module. Lack of maintenance makes it a dead end, which is a shame given the choice of alternative modules for XML parsing in the stdlib. I don't mind sending Fredrik an email as you detailed. Any suggested things to include in it? Also, the most recent email (from 2009) of him I can find is fredrik at pythonware.com. If anyone knows of anything more up-to-date, please let me know. Eli ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com