Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning
Éric Araujo merwok at netwok.org writes: Besides, putting data files in a Python package is held very poorly by some (mostly people following the File Hierarchy Standard), and in distutils2/packaging, we (will) have a resources system that’s as convenient for users and more flexible for OS packagers. Using __file__ for more than information on the module is frowned upon for other reasons anyway (I talked about a Debian developer about this one day but forgot), so I think the limitation is okay. The FHS does not apply in all scenarios - not all Python code is deployed/packaged at system level. For example, plug-ins (such as Django apps) are often not meant to be installed by a system-level packager. This might also be true in scenarios where Python is embedded into some other application. It's really useful to be able to co-locate packages with their data (e.g. in a zip file) and I don't think all instances of putting data files in a package are to be frowned upon. Regards, Vinay Sajip ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 3154 - pickle protocol 4
Hello, This PEP is an attempt to foster a number of small incremental improvements in a future pickle protocol version. The PEP process is used in order to gather as many improvements as possible, because the introduction of a new protocol version should be a rare occurrence. Feel free to suggest any additions. Regards Antoine. http://www.python.org/dev/peps/pep-3154/ PEP: 3154 Title: Pickle protocol version 4 Version: $Revision$ Last-Modified: $Date$ Author: Antoine Pitrou solip...@pitrou.net Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 2011-08-11 Python-Version: 3.3 Post-History: Resolution: TBD Abstract Data serialized using the pickle module must be portable accross Python versions. It should also support the latest language features as well as implementation-specific features. For this reason, the pickle module knows about several protocols (currently numbered from 0 to 3), each of which appeared in a different Python version. Using a low-numbered protocol version allows to exchange data with old Python versions, while using a high-numbered protocol allows access to newer features and sometimes more efficient resource use (both CPU time required for (de)serializing, and disk size / network bandwidth required for data transfer). Rationale = The latest current protocol, coincidentally named protocol 3, appeared with Python 3.0 and supports the new incompatible features in the language (mainly, unicode strings by default and the new bytes object). The opportunity was not taken at the time to improve the protocol in other ways. This PEP is an attempt to foster a number of small incremental improvements in a future new protocol version. The PEP process is used in order to gather as many improvements as possible, because the introduction of a new protocol version should be a rare occurrence. Improvements in discussion == 64-bit compatibility for large objects -- Current protocol versions export object sizes for various built-in types (str, bytes) as 32-bit ints. This forbids serialization of large data [1]_. New opcodes are required to support very large bytes and str objects. Native opcodes for sets and frozensets -- Many common built-in types (such as str, bytes, dict, list, tuple) have dedicated opcodes to improve resource consumption when serializing and deserializing them; however, sets and frozensets don't. Adding such opcodes would be an obvious improvement. Also, dedicated set support could help remove the current impossibility of pickling self-referential sets [2]_. Binary encoding for all opcodes --- The GLOBAL opcode, which is still used in protocol 3, uses the so-called text mode of the pickle protocol, which involves looking for newlines in the pickle stream. Looking for newlines is difficult to optimize on a non-seekable stream, and therefore a new version of GLOBAL (BINGLOBAL?) could use a binary encoding instead. It seems that all other opcodes emitted when using protocol 3 already use binary encoding. Acknowledgments === (...) References == .. [1] pickle not 64-bit ready: http://bugs.python.org/issue11564 .. [2] Cannot pickle self-referencing sets: http://bugs.python.org/issue9269 Copyright = This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 3154 - pickle protocol 4
On 2011-08-12, at 12:58 , Antoine Pitrou wrote: Current protocol versions export object sizes for various built-in types (str, bytes) as 32-bit ints. This forbids serialization of large data [1]_. New opcodes are required to support very large bytes and str objects. How about changing object sizes to be 64b always? Too much overhead for the common case (which might be smaller pickled objects)? Or a slightly more devious scheme (e.g. tag-bit, untagged is 31b size, tagged is 63), which would not require adding opcodes for that? Also, dedicated set support could help remove the current impossibility of pickling self-referential sets [2]_. Is there really no possibility of fix recursive pickling once and for all? Dedicated optcodes for resource consumption purposes (and to match those of other build-in types) is still a good idea, but being able to pickle arbitrary recursive structures would be even better would it not? And if specific (new) opcodes are required to handle recursive pickling correctly, that's the occasion. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 3154 - pickle protocol 4
Hello, Le vendredi 12 août 2011 à 14:32 +0200, Xavier Morel a écrit : On 2011-08-12, at 12:58 , Antoine Pitrou wrote: Current protocol versions export object sizes for various built-in types (str, bytes) as 32-bit ints. This forbids serialization of large data [1]_. New opcodes are required to support very large bytes and str objects. How about changing object sizes to be 64b always? Too much overhead for the common case (which might be smaller pickled objects)? Yes, and also the old opcodes must still be supported, so there's no maintenance gain in not exploiting them. Or a slightly more devious scheme (e.g. tag-bit, untagged is 31b size, tagged is 63), which would not require adding opcodes for that? The opcode space is not full enough to justify this kind of complication, IMO. Also, dedicated set support could help remove the current impossibility of pickling self-referential sets [2]_. Is there really no possibility of fix recursive pickling once and for all? Dedicated optcodes for resource consumption purposes (and to match those of other build-in types) is still a good idea, but being able to pickle arbitrary recursive structures would be even better would it not? That's true. Actually, it seems pickling recursive sets could have worked from the start, if a difference __reduce__ had been chosen and a __setstate__ had been defined: class X: pass ... class myset(set): ...def __reduce__(self): ...return (self.__class__, (), list(self)) ...def __setstate__(self, state): ...self.update(state) m = myset((1,2,3)) x = X() x.m = m m.add(x) mm = pickle.loads(pickle.dumps(m)) m myset({1, 2, 3, __main__.X object at 0x7fe3635c6990}) mm myset({1, 2, 3, __main__.X object at 0x7fe3635c6c30}) # m has a reference loop [x for x in m if getattr(x, 'm', None) is m] [__main__.X object at 0x7fe3635c6990] # mm retains a similar reference loop [x for x in mm if getattr(x, 'm', None) is mm] [__main__.X object at 0x7fe3635c6c30] # the representation is roughly as efficient as the original one len(pickle.dumps(set([1,2,3]))) 36 len(pickle.dumps(myset([1,2,3]))) 37 We can't change set.__reduce__ (or __reduce_ex__) without a protocol bump, though, since past Pythons would fail loading the pickles. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning
At 02:02 PM 8/11/2011 -0400, Glyph Lefkowitz wrote: Rather than a one-by-one ad-hoc consideration of which attribute should be set to None or empty strings or string or what have you, I'd really like to see a discussion in the PEP saying what a package really is vs. what a module is, and what one can reasonably expect from it from an API and tooling perspective. The assumption I've been working from is the only guarantee I've ever seen the Python docs give: i.e., that a package is a module object with a __path__ attribute. Modules aren't even required to have a __file__ object -- builtin modules don't, for example. (And the contents of __file__ are not required to have any particular semantics: PEP 302 notes that it can be a dummy value like frozen, for example.) Technically, btw, PEP 302 requires __file__ to be a string, so making __file__ = None will be a backwards-incompatible change. But any code that walks modules in sys.modules is going to break today if it expects a __file__ attribute to exist, because 'sys' itself doesn't have one! So, my leaning is towards leaving off __file__, since today's code already has to deal with it being nonexistent, if it's working with arbitrary modules, and that'll produce breakage sooner rather than later -- the twisted.python.modules code, for example, would fail with a loud AttributeError, rather than going on to silently assume that a module with a dummy __file__ isn't a package. (Which is NOT a valid assumption *now*, btw, as I'll explain below.) Anyway, if you have any suggestions for verbiage that should be added to the PEP to clarify these assumptions, I'd be happy to add them. However, I think that the real problem you're encountering at the moment has more to do with making assumptions about the Python import ecosystem that aren't valid today, and haven't been valid since at least the introduction of PEP 302, if not earlier import hook systems as well. But the whole pure virtual mechanism here seems to pile even more inconsistency on top of an already irritatingly inconsistent import mechanism. I was reasonably happy with my attempt to paper over PEP 302's weirdnesses from a user perspective: http://twistedmatrix.com/documents/11.0.0/api/twisted.python.modules.htmlhttp://twistedmatrix.com/documents/11.0.0/api/twisted.python.modules.html (or https://launchpad.net/moduleshttps://launchpad.net/modules if you are not a Twisted user) Users of this API can traverse the module hierarchy with certain expectations; each module or package would have .pathEntry and .filePath attributes, each of which would refer to the appropriate place. Of course __path__ complicates things a bit, but so it goes. I don't mean to be critical, and no doubt what you've written works fine for your current requirements, but on my quick attempt to skim through the code I found many things which appear to me to be incompatible with PEP 302. That is, the above code hardocdes a variety of assumptions about the import system that haven't been true since Python 2.3. (For example, it assumes that the contents of sys.path strings have inspectable semantics, that the contents of __file__ can tell you things about the module-ness or package-ness of a module object, etc.) If you want to fully support PEP 302, you might want to consider making this a wrapper over the corresponding pkgutil APIs (available since Python 2.5) that do roughly the same things, but which delegate all path string inspection to importer objects and allow extensible delegation for importers that don't support the optional methods involved. (Of course, if the pkgutil APIs are missing something you need, perhaps you could propose additions.) Now it seems like pure virtual packages are going to introduce a new type of special case into the hierarchy which have neither .pathEntry nor .filePath objects. The problem is that your API's notion that these things exist as coherent concepts was never really a valid assumption in the first place. .pth files and namespace packages already meant that the idea of a package coming from a single path entry made no sense. And namespace packages installed by setuptools' system packaging mode *don't have a __file__ attribute* today... heck they don't have __init__ modules, either. So, adding virtual packages isn't actually going to change anything, except perhaps by making these scenarios more common. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] GIL removal question
On Fri, 12 Aug 2011 09:32:23 -0500 VanL van.lindb...@gmail.com wrote: On 8/11/2011 2:11 PM, Sturla Molden wrote: (b) another threading model (e.g. one interpreter per thread, as in Tcl, Erlang, or .NET app domains). We are close to this, in that we already have baked-in support for subinterpreters. Out of curiosity, why isn't this being pursued? Because it is half-baked, breaks with some features in some extension modules, and still requires the GIL for shared data structures. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Summary of Python tracker Issues
ACTIVITY SUMMARY (2011-08-05 - 2011-08-12) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open2923 (+24) closed 21602 (+23) total 24525 (+47) Open issues with patches: 1264 Issues opened (35) == #12032: Tools/Scripts/crlf.py needs updating for python 3+ http://bugs.python.org/issue12032 reopened by eric.araujo #12701: Apple's clang 2.1 (xcode 4.1, OSX 10.7) optimizer miscompiles http://bugs.python.org/issue12701 opened by deadshort #12702: shutil.copytree() should use os.lutimes() to copy the metadata http://bugs.python.org/issue12702 opened by petri.lehtinen #12703: Improve error reporting for packaging.util.resolve_name http://bugs.python.org/issue12703 opened by Natim #12704: Language References does not specify exception raised by final http://bugs.python.org/issue12704 opened by Nikratio #12705: Make compile('1\n2\n', '', 'single') raise an exception instea http://bugs.python.org/issue12705 opened by Devin Jeanpierre #12706: timeout sentinel in ftplib and poplib documentation http://bugs.python.org/issue12706 opened by orsenthil #12707: Deprecate addinfourl getters http://bugs.python.org/issue12707 opened by ezio.melotti #12708: multiprocessing.Pool is missing a starmap[_async]() method. http://bugs.python.org/issue12708 opened by hynek #12711: Explain tracker components in devguide http://bugs.python.org/issue12711 opened by eric.araujo #12712: weave build_tools library identification http://bugs.python.org/issue12712 opened by Tim.Holme #12713: argparse: allow abbreviation of sub commands by users http://bugs.python.org/issue12713 opened by pwil3058 #12716: Reorganize os docs for files/dirs/fds http://bugs.python.org/issue12716 opened by benjamin.peterson #12720: Expose linux extended filesystem attributes http://bugs.python.org/issue12720 opened by benjamin.peterson #12721: Chaotic use of helper functions in test_shutil for reading and http://bugs.python.org/issue12721 opened by hynek #12723: Provide an API in tkSimpleDialog for defining custom validatio http://bugs.python.org/issue12723 opened by rabbidous #12725: Docs: Odd phrase floating seconds in socket.html http://bugs.python.org/issue12725 opened by Cris.Simpson #12726: explain why locale.getlocale() does not read system's locales http://bugs.python.org/issue12726 opened by alexis #12728: Python re lib fails case insensitive matches on Unicode data http://bugs.python.org/issue12728 opened by tchrist #12729: Python lib re cannot handle Unicode properly due to narrow/wid http://bugs.python.org/issue12729 opened by tchrist #12730: Python's casemapping functions are untrustworthy due to narrow http://bugs.python.org/issue12730 opened by tchrist #12731: python lib re uses obsolete sense of \w in full violation of U http://bugs.python.org/issue12731 opened by tchrist #12732: Can't portably use Unicode in Python identifiers http://bugs.python.org/issue12732 opened by tchrist #12733: Request for grapheme support in Python re lib http://bugs.python.org/issue12733 opened by tchrist #12734: Request for property support in Python re lib http://bugs.python.org/issue12734 opened by tchrist #12735: request full Unicode collation support in std python library http://bugs.python.org/issue12735 opened by tchrist #12737: string.title() is overzealous by upcasing combining marks ina http://bugs.python.org/issue12737 opened by tchrist #12738: Bug in multiprocessing.JoinableQueue() implementation on Ubunt http://bugs.python.org/issue12738 opened by Michael.Hall #12739: read stuck with multithreading and simultaneous subprocess.Pop http://bugs.python.org/issue12739 opened by SAPikachu #12740: Add struct.Struct.nmemb http://bugs.python.org/issue12740 opened by skrah #12741: Implementation of shutil.move http://bugs.python.org/issue12741 opened by David.Townshend #12742: Add support for CESU-8 encoding http://bugs.python.org/issue12742 opened by adalx #12743: C API marshalling doc contains XXX http://bugs.python.org/issue12743 opened by JJeffries #12715: Add symlink support to shutil functions http://bugs.python.org/issue12715 opened by petri.lehtinen #12736: Request for python casemapping functions to use full not simpl http://bugs.python.org/issue12736 opened by tchrist Most recent 15 issues with no replies (15) == #12743: C API marshalling doc contains XXX http://bugs.python.org/issue12743 #12742: Add support for CESU-8 encoding http://bugs.python.org/issue12742 #12741: Implementation of shutil.move http://bugs.python.org/issue12741 #12740: Add struct.Struct.nmemb http://bugs.python.org/issue12740 #12739: read stuck with multithreading and simultaneous subprocess.Pop http://bugs.python.org/issue12739 #12737: string.title() is overzealous by upcasing combining marks ina
Re: [Python-Dev] [PEPs] Rebooting PEP 394 (aka Support the /usr/bin/python2 symlink upstream)
On Aug 12, 2011, at 01:10 PM, Nick Coghlan wrote: 1. Accept the reality of that situation, and propose a mechanism that minimises the impact of the resulting ambiguity on end users of Python by allowing developers to be explicit about their target language. This is the approach advocated in PEP 394. 2. Tell the Arch developers (and anyone else inclined to point the python name at python3) that they're wrong, and the python symlink should, now and forever, always refer to a version of Python 2.x. FWIW, although I generally support the PEP, I also think that distros themselves have a responsibility to ensure their #! lines are correct, for scripts they install. Meaning, if it requires rewriting the #! line on OS package install, so be it. -Barry ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] GIL removal question
On 2011-08-11, at 21:11 , Sturla Molden wrote: (b) another threading model (e.g. one interpreter per thread, as in Tcl, Erlang, or .NET app domains). Nitpick: this is not correct re. erlang. While it is correct that it uses another threading model (one could even say no threading model), it's not a one interpreter per thread model at all: * Erlang uses erlang processes, which are very cheap preempted *processes* (no shared memory). There have always been tens to thousands to millions of erlang processes per interpreter * A long time ago (before 2006 and the SMP VM, that was R11B) the erlang VM was single-threaded, so all those erlang processes ran in a single OS thread. To use multiple OS threads one had to create an erlang cluster (start multiple VMs and distribute spawned processes over those). However, this was already an m:n model, there were multiple erlang processes for each VM. * Since the introduction of the SMP VM, the erlang interpreter can create multiple *schedulers* (one per physical core by default), with each scheduler running in its own OS thread. In this model, there's a single interpreter and an m:n mapping of erlang processes to OS threads within that single interpreter. (interestingly, because -smp generates resource contention within the interpreter going back to pre-SMP by setting the number of schedulers per node to 1 can yield increased overall performances) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Backporting howto/pyporting to 2.7
Hi everyone, I think it would be useful to have the “Porting Python 2 Code to Python 3” HOWTO in the 2.7 docs, as I think that a lot of users consult the 2.7 docs. Is there any reason not to do it? Regards ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] GIL removal question
My two danish kroner on GIL issues…. I think I understand the background and need for GIL. Without it Python programs would have been cluttered with lock/synchronized statements and C-extensions would be harder to write. Thanks to Sturla Molden for he's explanation earlier in this thread. However, the GIL is also from a time, where single threaded programs running in single core CPU's was the common case. On a new MacBook Pro I have 8 core's and would expect my multithreaded Python program to run significantly fast than on a one-core CPU. Instead the program slows down to a much worse performance than on a one-core CPU. (Have a look at David Beazley's excellent talk on PyCon 2010 and he's paper http://www.dabeaz.com/GIL/ and http://blip.tv/carlfk/mindblowing-python-gil-2243379) For my viewpoint the multicore performance problems is the primary problem with the GIL, event though the other issues pointed out are valid. I still believe that the solution for Python would be to have an every object is a thread/coroutine solution a'la - ABCL (http://en.wikipedia.org/wiki/Actor-Based_Concurrent_Language) and - COOC (Concurrent Object Oriented C, (ftp://tsbgw.isl.rdc.toshiba.co.jp/pub/toshiba/cooc-beta.1.1.tar.Z) at least looked into as a alternative to a STM solution. But, my head is not big enough to fully understand this :-) kind regards /rene ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning
On Aug 12, 2011, at 11:24 AM, P.J. Eby wrote: That is, the above code hardocdes a variety of assumptions about the import system that haven't been true since Python 2.3. Thanks for this feedback. I honestly did not realize how old and creaky this code had gotten. It was originally developed for Python 2.4 and it certainly shows its age. Practically speaking, the code is correct for the bundled importers, and paths and zipfiles are all we've cared about thus far. (For example, it assumes that the contents of sys.path strings have inspectable semantics, that the contents of __file__ can tell you things about the module-ness or package-ness of a module object, etc.) Unfortunately, the primary goal of this code is to do something impossible - walk the module hierarchy without importing any code. So some heuristics are necessary. Upon further reflection, PEP 402 _will_ make dealing with namespace packages from this code considerably easier: we won't need to do AST analysis to look for a __path__ attribute or anything gross like that improve correctness; we can just look in various directories on sys.path and accurately predict what __path__ will be synthesized to be. However, the isPackage() method can and should be looking at the module if it's already loaded, and not always guessing based on paths. The whole reason there's an 'importPackages' flag to walk() is that some applications of this code care more about accuracy than others, so it tries to be as correct as it can be. (Of course this is still wrong for the case where a __path__ is dynamically constructed by user code, but there's only so well one can do at that.) If you want to fully support PEP 302, you might want to consider making this a wrapper over the corresponding pkgutil APIs (available since Python 2.5) that do roughly the same things, but which delegate all path string inspection to importer objects and allow extensible delegation for importers that don't support the optional methods involved. This code still needs to support Python 2.4, but I will make a note of this for future reference. (Of course, if the pkgutil APIs are missing something you need, perhaps you could propose additions.) Now it seems like pure virtual packages are going to introduce a new type of special case into the hierarchy which have neither .pathEntry nor .filePath objects. The problem is that your API's notion that these things exist as coherent concepts was never really a valid assumption in the first place. .pth files and namespace packages already meant that the idea of a package coming from a single path entry made no sense. And namespace packages installed by setuptools' system packaging mode *don't have a __file__ attribute* today... heck they don't have __init__ modules, either. The fact that getModule('sys') breaks is reason enough to re-visit some of these design decisions. So, adding virtual packages isn't actually going to change anything, except perhaps by making these scenarios more common. In that case, I guess it's a good thing; these bugs should be dealt with. Thanks for pointing them out. My opinion of PEP 402 has been completely reversed - although I'd still like to see a section about the module system from a library/tools author point of view rather than a time-traveling perl user's narrative :). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] cpython (3.2): Use real word in English text (i.e. not code)
Hi, summary: Use real word in English text (i.e. not code) I agree that 'arg' for 'argument is email/twitter-speak, not proper document prose. - :synopsis: Command-line option and argument-parsing library. + :synopsis: Command-line option and argument parsing library. However, 'argument-parsing' could/should be left hyphenated as a compound adjective for the same reason 'command-line' is. With all due respect to the fact that you’re a native speaker and I’m not, here I disagree because I parse the sentence in this way (using parens to group things by precedence, if you want): (((command-line (option and argument)) parsing) library) To paraphrase, it’s a library to parse options and arguments from the command line, not a library to parse arguments and (missing verb-ing) options from the command line. (I’m not sure I’m clear.) An arg you missed Yes, I looked for all instances of args but not arg. Will do. Regards ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [PEPs] Rebooting PEP 394 (aka Support the /usr/bin/python2 symlink upstream)
On Fri, 12 Aug 2011 12:19:23 -0400, Barry Warsaw ba...@python.org wrote: On Aug 12, 2011, at 01:10 PM, Nick Coghlan wrote: 1. Accept the reality of that situation, and propose a mechanism that minimises the impact of the resulting ambiguity on end users of Python by allowing developers to be explicit about their target language. This is the approach advocated in PEP 394. 2. Tell the Arch developers (and anyone else inclined to point the python name at python3) that they're wrong, and the python symlink should, now and forever, always refer to a version of Python 2.x. FWIW, although I generally support the PEP, I also think that distros themselves have a responsibility to ensure their #! lines are correct, for scripts they install. Meaning, if it requires rewriting the #! line on OS package install, so be it. True, but I think that is orthogonal to the purposes of the PEP, which is about supporting writing of system independent scripts that are *not* provided by the distribution (or installed via packaging). And PEP 397 aims to extend that to Windows, as well. -- R. David Murray http://www.bitdance.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [PEPs] Rebooting PEP 394 (aka Support the /usr/bin/python2 symlink upstream)
On Aug 12, 2011, at 01:34 PM, R. David Murray wrote: True, but I think that is orthogonal to the purposes of the PEP, which is about supporting writing of system independent scripts that are *not* provided by the distribution (or installed via packaging). And PEP 397 aims to extend that to Windows, as well. Yep, agreed. It probably should also inform #! transformations that pysetup could do. -Barry signature.asc Description: PGP signature ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning
At 01:09 PM 8/12/2011 -0400, Glyph Lefkowitz wrote: Upon further reflection, PEP 402 _will_ make dealing with namespace packages from this code considerably easier: we won't need to do AST analysis to look for a __path__ attribute or anything gross like that improve correctness; we can just look in various directories on sys.path and accurately predict what __path__ will be synthesized to be. The flip side of that is that you can't always know whether a directory is a virtual package without deep inspection: one consequence of PEP 402 is that any directory that contains a Python module (of whatever type), however deeply nested, will be a valid package name. So, you can't rule out that a given directory *might* be a package, without walking its entire reachable subtree. (Within the subset of directory names that are valid Python identifiers, of course.) However, you *can* quickly tell that a directory *might* be a package or is *probably* one: if it contains modules, or is the same name as an already-discovered module, it's a pretty safe bet that you can flag it as such. In any case, you probably should *not* do the building of a virtual path yourself; the protocols and APIs added by PEP 402 should allow you to simply ask for the path to be constructed on your behalf. Otherwise, you are going to be back in the same business of second-guessing arbitrary importer backends again! (E.g. note that PEP 402 does not say virtual package subpaths must be filesystem or zipfile subdirectories of their parents - an importer could just as easily allow you to treat subdirectories named 'twisted.python' as part of a virtual package with that name!) Anyway, pkgutil defines some extra methods that importers can implement to support module-walking, and part of the PEP 402 implementation should be to make this support virtual packages as well. This code still needs to support Python 2.4, but I will make a note of this for future reference. A suggestion: just take the pkgutil code and bundle it for Python 2.4 as something._pkgutil. There's very little about it that's 2.5+ specific, at least when I wrote the bits that do the module walking. Of course, the main disadvantage of pkgutil for your purposes is that it currently requires packages to be imported in order to walk their child modules. (IIRC, it does *not*, however, require them to be imported in order to discover their existence.) In that case, I guess it's a good thing; these bugs should be dealt with. Thanks for pointing them out. My opinion of PEP 402 has been completely reversed - although I'd still like to see a section about the module system from a library/tools author point of view rather than a time-traveling perl user's narrative :). LOL. If you will propose the wording you'd like to see, I'll be happy to check it for any current-and-or-future incorrect assumptions. ;-) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] cpython (3.2): Use real word in English text (i.e. not code)
I think either Command-line option- and argument-parsing library. or Command-line option and argument parsing library. would be acceptable. -Fred -- Fred L. Drake, Jr. fdrake at acm.org A person who won't read has no advantage over one who can't read. --Samuel Langhorne Clemens ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Review request issue 12178
Could a core developer please review the patch I proposed for issue 12178 csv writer doesn't escape escapechar? Thanks! ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] GIL removal question
Den 12.08.2011 18:51, skrev Xavier Morel: * Erlang uses erlang processes, which are very cheap preempted *processes* (no shared memory). There have always been tens to thousands to millions of erlang processes per interpreter source contention within the interpreter going back to pre-SMP by setting the number of schedulers per node to 1 can yield increased overall performances) Technically, one can make threads behave like processes if they don't share memory pages (though they will still share address space). Erlangs use of 'process' instead of 'thread' does not mean an Erlang process has to be implemented as an OS process. With one interpreter per thread, and a malloc that does not let threads share memory pages (one heap per thread), Python could do the same. On Windows, there is an API function called HeapAlloc, which lets us allocate memory form a dedicated heap. The common use case is to prevent threads from sharing memory, thus behaving like light-weight processes (except address space is shared). On Unix, is is more common to use fork() to create new processes instead, as processes are more light-weight than on Windows. Sturla ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] GIL removal question
Den 12.08.2011 18:57, skrev Rene Nejsum: My two danish kroner on GIL issues…. I think I understand the background and need for GIL. Without it Python programs would have been cluttered with lock/synchronized statements and C-extensions would be harder to write. Thanks to Sturla Molden for he's explanation earlier in this thread. I doesn't seem I managed to explain it :( Yes, C extensions would be cluttered with synchronization statements, and that is annoying. But that was not my point all! Even with fine-grained locking in place, a system using reference counting will not scale on an multi-processor computer. Cache-lines containing reference counts will become incoherent between the processors, causing traffic jam on the memory bus. The technical term in parallel computing litterature is false sharing. However, the GIL is also from a time, where single threaded programs running in single core CPU's was the common case. On a new MacBook Pro I have 8 core's and would expect my multithreaded Python program to run significantly fast than on a one-core CPU. Instead the program slows down to a much worse performance than on a one-core CPU. A multi-threaded program can be slower on a multi-processor computer as well, if it suffered from extensive false sharing (which Python programs nearly always will do). That is, instead of doing useful work, the processors are stepping on each others toes. So they spend the bulk of the time synchronizing cache lines with RAM instead of computing. On a computer with a single processor, there cannot be any false sharing. So even without a GIL, a multi-threaded program can often run faster on a single-processor computer. That might seem counter-intuitive at first. I seen this inversed scaling blamed on the GIL many times, but it's dead wrong. Multi-threading is hard to get right, because the programmer must ensure that processors don't access the same cache lines. This is one of the reasons why numerical programs based on MPI (multiple processes and IPC) are likely to perform better than numerical programs based on OpenMP (multiple threads and shared memory). As for Python, it means that it is easier to make a program based on multiprocessing scale well on a multi-processor computer, than a program based on threading and releasing the GIL. And that has nothing to do with the GIL! Albeit, I'd estimate 99% of Python programmers would blame it on the GIL. It has to do with what shared memory does if cache lines are shared. Intuition about what affects the performance of a multi-threaded program is very often wrong. If one needs parallel computing, multiple processes is much more likely to scale correctly. Threads are better reserved for things like non-blocking I/O. The problem with the GIL is merely what people think it does -- not what it actually does. It is so easy to blame a performance issue on the GIL, when it is actually the use of threads and shared memory per se that is the problem. Sturla ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] GIL removal question
Even in the Erlang model, the afore-mentioned issues of bus contention put a cap on the number of threads you can run in any given application assuming there's any amount of cross-thread synchronization. I wrote a blog post on this subject with respect to my experience in tuning RabbitMQ on NUMA architectures. http://blog.agoragames.com/blog/2011/06/24/of-penguins-rabbits-and-buses/ It should be noted that Erlang processes are not the same as OS processes. They are more akin to green threads, scheduled on N number of legit OS threads which are in turn run on C number of cores. The end effect is the same though, as the data is effectively shared across NUMA nodes, which runs into basic physical constraints. I used to think the GIL was a major bottleneck, and though I'm not fond of it, my recent experience has highlighted that *any* application which uses shared memory will have significant bus contention when scaling across all cores. The best course of action is shared-nothing MPI style, but in 64bit land, that can mean significant wasted address space. http://blog.agoragames.com/blog/2011/06/24/of-penguins-rabbits-and-buses/ -Aaron On Fri, Aug 12, 2011 at 2:59 PM, Sturla Molden stu...@molden.no wrote: Den 12.08.2011 18:51, skrev Xavier Morel: * Erlang uses erlang processes, which are very cheap preempted *processes* (no shared memory). There have always been tens to thousands to millions of erlang processes per interpreter source contention within the interpreter going back to pre-SMP by setting the number of schedulers per node to 1 can yield increased overall performances) Technically, one can make threads behave like processes if they don't share memory pages (though they will still share address space). Erlangs use of 'process' instead of 'thread' does not mean an Erlang process has to be implemented as an OS process. With one interpreter per thread, and a malloc that does not let threads share memory pages (one heap per thread), Python could do the same. On Windows, there is an API function called HeapAlloc, which lets us allocate memory form a dedicated heap. The common use case is to prevent threads from sharing memory, thus behaving like light-weight processes (except address space is shared). On Unix, is is more common to use fork() to create new processes instead, as processes are more light-weight than on Windows. Sturla ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [PEPs] Rebooting PEP 394 (aka Support the /usr/bin/python2 symlink upstream)
On Fri, Aug 12, 2011 at 12:19:23PM -0400, Barry Warsaw wrote: On Aug 12, 2011, at 01:10 PM, Nick Coghlan wrote: 1. Accept the reality of that situation, and propose a mechanism that minimises the impact of the resulting ambiguity on end users of Python by allowing developers to be explicit about their target language. This is the approach advocated in PEP 394. 2. Tell the Arch developers (and anyone else inclined to point the python name at python3) that they're wrong, and the python symlink should, now and forever, always refer to a version of Python 2.x. FWIW, although I generally support the PEP, I also think that distros themselves have a responsibility to ensure their #! lines are correct, for scripts they install. Meaning, if it requires rewriting the #! line on OS package install, so be it. +1 with the one caveat... it's nice to upstream fixes. If there's a simple thing like python == python-2 and python3 == python-3 everywhere, this is possible. If there's something like python2 == python-2 and python-3 == python3 everywhere, this is also possible. The problem is that: the latter is not the case (python from python.org itself doesn't produce a python2 symlink on install) and historically the former was the case but since python-dev rejected the notion that python == python-2 that is no long true. As long as it's just Arch, there's still time to go with #2. #1 is not a complete solution (especially because /usr/bin/python2 will never exist on some historical systems [not ones I run though, so someone else will need to beat that horse :-)]) but is better than where we are now where there is no guidance on what's right and wrong at all. -Toshio pgpBwoEJ5g8Bg.pgp Description: PGP signature ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] GIL removal question
On 2011-08-12, at 20:59 , Sturla Molden wrote: Den 12.08.2011 18:51, skrev Xavier Morel: * Erlang uses erlang processes, which are very cheap preempted *processes* (no shared memory). There have always been tens to thousands to millions of erlang processes per interpreter source contention within the interpreter going back to pre-SMP by setting the number of schedulers per node to 1 can yield increased overall performances) Technically, one can make threads behave like processes if they don't share memory pages (though they will still share address space). Erlangs use of 'process' instead of 'thread' does not mean an Erlang process has to be implemented as an OS process. Of course not. I did not write anything implying that. With one interpreter per thread, and a malloc that does not let threads share memory pages (one heap per thread), Python could do the same. Again, my point is that Erlang does not work with one interpreter per thread. Which was your claim. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning
On Aug 12, 2011, at 2:33 PM, P.J. Eby wrote: At 01:09 PM 8/12/2011 -0400, Glyph Lefkowitz wrote: Upon further reflection, PEP 402 _will_ make dealing with namespace packages from this code considerably easier: we won't need to do AST analysis to look for a __path__ attribute or anything gross like that improve correctness; we can just look in various directories on sys.path and accurately predict what __path__ will be synthesized to be. The flip side of that is that you can't always know whether a directory is a virtual package without deep inspection: one consequence of PEP 402 is that any directory that contains a Python module (of whatever type), however deeply nested, will be a valid package name. So, you can't rule out that a given directory *might* be a package, without walking its entire reachable subtree. (Within the subset of directory names that are valid Python identifiers, of course.) Are there any rules about passing invalid identifiers to __import__ though, or is that just less likely? :) However, you *can* quickly tell that a directory *might* be a package or is *probably* one: if it contains modules, or is the same name as an already-discovered module, it's a pretty safe bet that you can flag it as such. I still like the idea of a 'marker' file. It would be great if there were a new marker like __package__.py. I say this more for the benefit of users looking at a directory on their filesystem and trying to understand whether this is a package or not than I do for my own programmatic tools though; it's already hard enough to understand the package-ness of a part of your filesystem and its interactions with PYTHONPATH; making directories mysteriously and automatically become packages depending on context will worsen that situation, I think. I also have this not-terribly-well-defined idea that it would be handy for different providers of the _contents_ of namespace packages to provide their own instrumentation to be made aware that they've been added to the __path__ of a particular package. This may be a solution in search of a problem, but I imagine that each __package__.py would be executed in the same module namespace. This would allow namespace packages to do things like set up compatibility aliases, lazy imports, plugin registrations, etc, as they currently do with __init__.py. Perhaps it would be better to define its relationship to the package-module namespace in a more sensible way than execute all over each other in no particular order. Also, if I had my druthers, Python would raise an exception if someone added a directory marked as a package to sys.path, to refuse to import things from it, and when a submodule was run as a script, add the nearest directory not marked as a package to sys.path, rather than the script's directory itself. The whole __name__ is wrong because your current directory was wrong when you ran that command thing is so confusing to explain that I hope we can eventually consign it to the dustbin of history. But if you can't even reasonably guess whether a directory is supposed to be an entry on sys.path or a package, that's going to be really hard to do. In any case, you probably should *not* do the building of a virtual path yourself; the protocols and APIs added by PEP 402 should allow you to simply ask for the path to be constructed on your behalf. Otherwise, you are going to be back in the same business of second-guessing arbitrary importer backends again! What do you mean building of a virtual path? (E.g. note that PEP 402 does not say virtual package subpaths must be filesystem or zipfile subdirectories of their parents - an importer could just as easily allow you to treat subdirectories named 'twisted.python' as part of a virtual package with that name!) Anyway, pkgutil defines some extra methods that importers can implement to support module-walking, and part of the PEP 402 implementation should be to make this support virtual packages as well. The more that this can focus on module-walking without executing code, the happier I'll be :). This code still needs to support Python 2.4, but I will make a note of this for future reference. A suggestion: just take the pkgutil code and bundle it for Python 2.4 as something._pkgutil. There's very little about it that's 2.5+ specific, at least when I wrote the bits that do the module walking. Of course, the main disadvantage of pkgutil for your purposes is that it currently requires packages to be imported in order to walk their child modules. (IIRC, it does *not*, however, require them to be imported in order to discover their existence.) One of the stipulations of this code is that it might give different results when the modules are loaded and not. So it's fine to inspect that first and then invoke pkgutil only in the 'loaded' case, with the knowledge that the not-loaded case may be
Re: [Python-Dev] GIL removal question
On Fri, Aug 12, 2011 at 12:57 PM, Rene Nejsum r...@stranden.com wrote: I think I understand the background and need for GIL. Without it Python programs would have been cluttered with lock/synchronized statements and C-extensions would be harder to write. No, sorry, the first half of this is incorrect: with or without the GIL *Python* code would need the same amount of fine-grained locking. (The part about C extensions is correct.) I am butting in because this is a common misunderstanding that really needs to be squashed whenever it is aired -- the GIL does *not* help Python code to synchronize. A thread-switch can occur between any two bytecode opcodes. Without the GIL, atomic operations (e.g. dict lookups that doesn't require evaluation of __eq__ or __hash__ implemented in Python) are still supposed to be atomic. -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [PEPs] Rebooting PEP 394 (aka Support the /usr/bin/python2 symlink upstream)
On Thu, Aug 11, 2011 at 6:05 PM, Terry Reedy tjre...@udel.edu wrote: There was no comparable transition. Python 2.0 was basically 1.6 renamed for a different distributor. No that's not true. If you compare the what's new sections there is quite a large difference between 1.6 and 2.0, despite being released simultaneously. I regard Python 2.2, which introduced new-style, as the beginning of Python 2 as something significantly different from Python 1. Just compare: http://www.python.org/download/releases/2.0/ http://www.python.org/download/releases/1.6/ No argument that 2.2 was a big jump for the type system -- but not for Unicode. I suppose one could also point to the earlier intro of unicode. In 1.6. (But internally we called it the contractual obligation release, a Monty Python reference.) The new iterator protocol was also a major change. In any case, back compatibility was kept in all three respects (and others) until Python 3. (I gotta go, but I don't think it was such a big deal -- it was very carefully made backwards compatible.) -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning
At 05:03 PM 8/12/2011 -0400, Glyph Lefkowitz wrote: Are there any rules about passing invalid identifiers to __import__ though, or is that just less likely? :) I suppose you have a point there. ;-) I still like the idea of a 'marker' file. It would be great if there were a new marker like __package__.py. Having any required marker file makes separately-installable portions of a package impossible, since it would then be in conflict at installation time. The (semi-)competing proposal, PEP 382, is based on allowing each portion to have a differently-named marker; we came up with PEP 402 as a way to get rid of the need for any marker files (not to mention the bikeshedding involved.) What do you mean building of a virtual path? Constructing the __path__-to-be of a not-yet-imported virtual package. The PEP defines a protocol for constructing this, by asking the importer objects to provide __path__ entries, and it does not require anything to be imported. So there's no reason to re-implement the algorithm yourself. The more that this can focus on module-walking without executing code, the happier I'll be :). Virtual packages actually improve on this situation, in that a virtual path can be computed without the need to import the package. (Assuming a submodule or subpackage doesn't munge the __path__, of course.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] GIL removal question
Thank you for the clarification, I should have been more precise... On 12/08/2011, at 23.38, Guido van Rossum wrote: On Fri, Aug 12, 2011 at 12:57 PM, Rene Nejsum r...@stranden.com wrote: I think I understand the background and need for GIL. Without it Python programs would have been cluttered with lock/synchronized statements and C-extensions would be harder to write. No, sorry, the first half of this is incorrect: with or without the GIL *Python* code would need the same amount of fine-grained locking. (The part about C extensions is correct.) I am butting in because this is a common misunderstanding that really needs to be squashed whenever it is aired -- the GIL does *not* help Python code to synchronize. A thread-switch can occur between any two bytecode opcodes. Without the GIL, atomic operations (e.g. dict lookups that doesn't require evaluation of __eq__ or __hash__ implemented in Python) are still supposed to be atomic. -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] cpython (3.2): Use real word in English text (i.e. not code)
On 8/12/2011 1:17 PM, Éric Araujo wrote: With all due respect to the fact that you’re a native speaker and I’m not, here I disagree because I parse the sentence in this way (using parens to group things by precedence, if you want): You are right, I misparsed without considering the full context. You actually mean Command-line option-and-argument-parsing library. But multiple compound-noun adjectives are awkward and the above is ugly. Would Command-line library for parsing options and arguments fit? -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] cpython (3.2): Use real word in English text (i.e. not code)
Terry Reedy tjre...@udel.edu writes: But multiple compound-noun adjectives are awkward and the above is ugly. Would Command-line library for parsing options and arguments fit? Better, but the binding is still wrong. The “command-line” should instead be a modifier for “options and arguments”. So: Library for parsing command-line options and arguments -- \“Please to bathe inside the tub.” —hotel room, Japan | `\ | _o__) | Ben Finney ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] GIL removal question
Sturla Molden wrote: With one interpreter per thread, and a malloc that does not let threads share memory pages (one heap per thread), Python could do the same. Wouldn't that be more or less equivalent to running each thread in a separate process? -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com