+1 overall. I'm certainly not concerned with replacing pyc clutter
with pyr clutter. I do like that you haven't _increased_ the number of
extraneous siblings of .py files.

I have a couple bikesheddy or "why didn't you do this" comments. I'll
be perfectly satisfied with an answer or a line in the pep.

1. Why the -R flag? It seems like this is a uniform improvement, so it
should be the default. Have faith in your design! ;-)

2. Vitor's suggestion to make 1 "pyr" directory per directory and
stick all the .pyc's there would solve the "pyc clutter" problem. Any
reason not to do that? Trying to make it 1-pyr-per-directory-hierarchy
as Ben suggested seems unworkable.  The one problem with this would
seem to be filename length limits; do we care about those anymore?

3. It seems like .pyr directories are nicely forward-compatible with
other uses like version-specific .so's or JIT caches. I don't think
this PEP needs to flesh out any of those other possibilities though.

4. -1 to a moratorium on bytecode changes. No moratorium can last
forever, and then packagers will be back to the same problem. The
rationale for 3003 doesn't seem to apply here.

On Sat, Jan 30, 2010 at 4:00 PM, Barry Warsaw <ba...@python.org> wrote:
> PEP: 3147
> Title: PYC Repository Directories
> Version: $Revision$
> Last-Modified: $Date$
> Author: Barry Warsaw <ba...@python.org>
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 2009-12-16
> Python-Version: 3.2
> Post-History:
>
>
> Abstract
> ========
>
> This PEP describes an extension to Python's import mechanism which
> improves sharing of Python source code files among multiple installed
> different versions of the Python interpreter.  It does this by
> allowing many different byte compilation files (.pyc files) to be
> co-located with the Python source file (.py file).  The extension
> described here can also be used to support different Python
> compilation caches, such as JIT output that may be produced by an
> Unladen Swallow [1]_ enabled C Python.
>
>
> Rationale
> =========
>
> Linux distributions such as Ubuntu [2]_ and Debian [3]_ provide more
> than one Python version at the same time to their users.  For example,
> Ubuntu 9.10 Karmic Koala can install Python 2.5, 2.6, and 3.1, with
> Python 2.6 being the default.
>
> In order to ease the burden on operating system packagers for these
> distributions, the distribution packages do not contain Python version
> numbers [4]_; they are shared across all Python versions installed on
> the system.  Putting Python version numbers in the packages would be a
> maintenance nightmare, since all the packages - *and their
> dependencies* - would have to be updated every time a new Python
> release was added or removed from the distribution.  Because of the
> sheer number of packages available, this amount of work is infeasible.
>
> For pure Python modules, sharing is possible because upstream
> maintainers typically support multiple versions of Python in a source
> compatible way.  In practice though, it is well known that pyc files
> are not compatible across Python major releases.  A reading of
> import.c [5]_ in the Python source code proves that within recent
> memory, every new CPython major release has bumped the pyc magic
> number.
>
> Even C extensions can be source compatible across multiple versions of
> Python.  Compiled extension modules are usually not compatible though,
> and PEP 384 [6]_ has been proposed to address this by defining a
> stable ABI for extension modules.
>
> Because the distributions cannot share pyc files, elaborate mechanisms
> have been developed to put the resulting pyc files in non-shared
> locations while the source code is still shared.  Examples include the
> symlink-based Debian regimes python-support [7]_ and python-central
> [8]_.  These approaches make for much more complicated, fragile,
> inscrutable, and fragmented policies for delivering Python
> applications to a wide range of users.  Arguably more users get Python
> from their operating system vendor than from upstream tarballs.  Thus,
> solving this pyc sharing problem for CPython is a high priority for
> such vendors.
>
> This PEP proposes a solution to this problem.
>
>
> Proposal
> ========
>
> Python's import machinery is extended to search for byte code cache
> files in a directory co-located with the source file, but with an
> extension 'pyr'.  The pyr directory contains individual files with the
> cached byte compilation of the source code, identical to current pyc
> and pyo files.  The files inside the pyr directory retain their file
> extensions, but the base name is replaced by the hexlified [10]_ magic
> number of the Python version the byte code is compatible with.
>
> The file extension pyr was chosen because 'r' is a mnemonic for
> 'repository', and there appears to be no prior uses of the extension
> [9]_.
>
> For example, a module `foo` with source code in `foo.py` and byte
> compiled with Python 2.5, Python 2.6, Python 2.6 `-O`, Python 2.6
> `-U`, and Python 3.1 would have the following file system layout::
>
>    foo.py
>    foo.pyr/
>        f2b30a0d.pyc # Python 2.5
>        f2d10a0d.pyc # Python 2.6
>        f2d10a0d.pyo # Python 2.6 -O
>        f2d20a0d.pyc # Python 2.6 -U
>        0c4f0a0d.pyc # Python 3.1
>
>
> Python behavior
> ===============
>
> When Python searches for a module to import (say `foo`), it may find
> one of several situations.  As per current Python rules, the term
> "matching pyc" means that the magic number matches the current
> interpreter's magic number, and the source file is not newer than the
> `pyc` file.
>
> When Python finds a `foo.py` file for which no `foo.pyc` file or
> `foo.pyr` directory exists, Python will by default load the `foo.py`
> file and write a `foo.pyc` file next to the source file.  This is
> unchanged from current behavior.
>
> When the Python executable is given a `-R` flag, or the environment
> variable `$PYTHONPYR` is set, then Python will create a `foo.pyr`
> directory and write a `pyc` file to that directory with the hexlified
> magic number as the base name.
>
> If during import, Python finds an existing `pyc` file but no `pyr`
> directory, and the `$PYTHONPYR` environment variable is not set, then
> the `pyc` file is loaded as normal and no `pyr` directory is created.
>
> If during import, Python finds a `pyr` directory with a matching `pyc`
> file, *regardless of whether `$PYTHONPYR` is set or not*, then
> `foo.pyr/<magic>.pyc` is loaded and import completes successfully.
> Thus a matching `pyc` file inside a `pyr` directory always takes
> precedence over a sibling `pyc` file.
>
> If during import, Python finds a `pyr` directory that does not contain
> a matching `pyc` file, and no sibling `foo.pyc` file exists, Python
> will load the source file and write a sibling `foo.pyc` file, unless
> the `-R` flag is given in which case a `foo.pyr/<magic>.pyc` file will
> be written.
>
> Here is a flowchart illustrating the rules.
>
> .. image:: pep-3147-1.png
>   :scale: 75
>
>
> Effects on non-conforming Python versions
> =========================================
>
> Python implementations which don't know anything about `pyr`
> directories will ignore them.  This means that they will read and
> write `pyc` files as usual.  A conforming implementation will still
> prefer any existing `foo.pyr/<magic>.pyc` file over an existing
> sibling `pyc` file.
>
> The one possible conflicting state is where a sibling `pyc` file
> exists, but its magic number does not match.
>
> In the default case, when Python finds a `pyc` file with a
> non-matching magic number, it simply overwrites the `pyc` file with
> the new byte code and magic number.  In the absence of the `-R` flag,
> this remains unchanged.  When the `-R` flag was given, the
> non-matching sibling `pyc` file is ignored - it is neither removed nor
> overwritten - and a `foo.pyr/<magic>.pyc` file is written instead.
>
>
> Implementation strategy
> =======================
>
> This feature is targeted for Python 3.2, solving the problem for those
> and all future versions.  It may be back-ported to Python 2.7.
> Vendors are free to backport the changes to earlier distributions as
> they see fit.
>
>
> Alternatives
> ============
>
> PEP 304
> -------
>
> There is some overlap between the goals of this PEP and PEP 304 [12]_,
> which has been withdrawn.  However PEP 304 would allow a user to
> create a shadow file system hierarchy in which to store `pyc` files.
> This concept of a shadow hierarchy for `pyc` files could be used to
> satisfy the aims of this PEP.  Although the PEP 304 does not indicate
> why it was withdrawn, shadow directories have a number of problems.
> The location of the shadow `pyc` files would not be easily discovered
> and would depend on the proper and consistent use of the
> `$PYTHONBYTECODE` environment variable both by the system and by end
> users.  There are also global implications, meaning that while the
> system might want to shadow `pyc` files, users might not want to, but
> the PEP defines only an all-or-nothing approach.
>
> As an example of the problem, a common (though fragile) Python idiom
> for locating data files is to do something like this::
>
>    from os import dirname, join
>    import foo.bar
>    data_file = join(dirname(foo.bar.__file__), 'my.dat')
>
> This would be problematic since `foo.bar.__file__` will give the
> location of the `pyc` file in the shadow directory, and it may not be
> possible to find the `my.dat` file relative to the source directory
> from there.
>
> On the other hand, this PEP keeps all byte code artifacts co-located
> with the source file.  Some adjustment will have to be made for the
> fact that the `pyc` file lives in a subdirectory.  For example, in
> current Python, when you import a module, its `__file__` attribute
> points to its `pyc` file.  A package's `__file__` points to the `pyc`
> file for its `__init__.py`.  E.g.::
>
>    >>> import foo
>    >>> foo.__file__
>    'foo.pyc'
>    # baz is a package
>    >>> import baz
>    >>> baz.__file__
>    'baz/__init__.pyc'
>
> The implementation of this PEP would have to ensure that the same
> directory level is returned from `__file__` as it does without the
> `pyr` directory, so that the common idiom above continues to work::
>
>    >>> import foo
>    >>> foo.__file__
>    'foo.pyr'
>    # baz is a package
>    >>> import baz
>    >>> baz.__file__
>    'baz/__init__.pyr'
>
> Note that some existing Python code only checks for `.py` and `.pyc`
> file extensions (and possibly `.pyo`).  These would have to be
> extended to also check for `.pyr` extensions.
>
>
> Fat byte compilation files
> --------------------------
>
> An earlier version of this PEP described "fat" Python byte code files.
> These files would contain the equivalent of multiple `pyc` files in a
> single `pyf` file, with a lookup table keyed off the appropriate magic
> number.  This was an extensible file format so that the first 5
> parallel Python implementations could be supported fairly efficiently,
> but with extension lookup tables available to scale `pyf` byte code
> objects as large as necessary.
>
> The fat byte compilation files were fairly complex, so the current
> simplification of using directories was suggested.
>
>
> Multiple file extensions
> ------------------------
>
> The PEP author also considered an approach where multiple thin byte
> compiled files lived in the same place, but used different file
> extensions to designate the Python version.  E.g. foo.pyc25,
> foo.pyc26, foo.pyc31 etc.  This was rejected because of the clutter
> involved in writing so many different files.  The multiple extension
> approach makes it more difficult (and an ongoing task) to update any
> tools that are dependent on the file extension.
>
>
> Open questions
> ==============
>
> * Are there any concurrency issues added by this PEP, above those that
>  already exist?  For example, what if two Python processes attempt to
>  write the same `<magic>.pyc` file?  Is that any different than two
>  Python processes trying to write to the same `foo.pyc` file?
>  Current thinking is that there isn't since the exclusive open
>  mechanism currently used, will still be used to open `pyc` files
>  inside a `pyr` directory.
>
> * How do the imp [13]_ and importlib [14]_ modules need to be updated
>  to conform to the `pyr` directories?
>
> * What about `py` source files that are compatible with most but not
>  all installed Python versions.  We might need a way to say "this py
>  file should be hidden from Python versions X.Y or earlier".  There
>  are three options:
>
>  - Use file system tricks to only share py files that are actually
>    sharable in all installed Python versions (e.g. different search
>    directories for Python X.Y and Python X.Z).
>  - Introduce Python syntax that is legal before __future__ imports
>    and is evaluated to determine if the py file is compatible,
>    raising an `ImportError('no module named foo')` if not.
>  - Add an optional metadata file co-located with the py file that
>    declares which Python versions it is compatible with.
>
>  How does this requirement interact with PEP 382 namespace packages [15]_?
>
> * Are there any opportunities for also sharing extension modules
>  (.so/.dll files) in a `pyr` directory?
>
> * Would a moratorium on byte code changes, similar to the language
>  moratorium described in PEP 3003 [16]_ be a better approach to
>  pursue, and would that solve the problem for vendors?  At the time
>  of this writing, PEP 3003 is silent on the issue.
>
>
> Reference implementation
> ========================
>
> TBD
>
>
> References
> ==========
>
> .. [1] PEP 3146
>
> .. [2] Ubuntu: <http://www.ubuntu.com>
>
> .. [3] Debian: <http://www.debian.org>
>
> .. [4] Debian Python Policy:
>   http://www.debian.org/doc/packaging-manuals/python-policy/
>
> .. [5] import.c:
>   http://svn.python.org/view/python/branches/py3k/Python/import.c?view=markup
>
> .. [6] PEP 384
>
> .. [7] python-support:
>   http://wiki.debian.org/DebianPythonFAQ#Whatispython-support.3F
>
> .. [8] python-central:
>   http://wiki.debian.org/DebianPythonFAQ#Whatispython-central.3F
>
> .. [9] http://www.filesuffix.com/?m=search&e=pyr&submit=Search
>
> .. [10] binascii.hexlify():
>   http://www.python.org/doc/current/library/binascii.html#binascii.hexlify
>
> .. [11] The marshal module:
>   http://www.python.org/doc/current/library/marshal.html
>
> .. [12] PEP 304:
>
> .. [13] imp: http://www.python.org/doc/current/library/imp.html
>
> .. [14] importlib: http://docs.python.org/3.1/library/importlib.html
>
> .. [15] PEP 382
>
> .. [16] PEP 3003
>
>
> ACKNOWLEDGMENTS
> ===============
>
> Barry Warsaw's original idea was for fat Python byte code files.
> Martin von Loewis reviewed an early draft of the PEP and suggested the
> simplification to store traditional `pyc` and `pyo` files in a
> directory.  Many other people reviewed early versions of this PEP and
> provided useful feedback including:
>
> * David Malcolm
> * Josselin Mouette
> * Matthias Klose
> * Michael Hudson
> * Michael Vogt
> * Piotr Ożarowski
> * Scott Kitterman
> * Toshio Kuratomi
>
>
> Copyright
> =========
>
> This document has been placed in the public domain.
>
>
>
> ..
>   Local Variables:
>   mode: indented-text
>   indent-tabs-mode: nil
>   sentence-end-double-space: t
>   fill-column: 70
>   coding: utf-8
>   End:
>
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to