[Python-Dev] PEP 489: Redesigning extension module loading

2015-03-16 Thread Petr Viktorin

Hello,
On import-sig, I've agreed to continue Nick Coghlan's work on making 
extension modules act more like Python ones, work well with PEP 451 
(ModuleSpec), and encourage proper subinterpreter and reloading support. 
Here is the resulting PEP.


I don't have a patch yet, but I'm working on it.

There's a remaining open issue: providing a tool that can be run in test 
suites to check if a module behaves well with subinterpreters/reloading. 
I believe it's out of scope for this PEP but speak out if you disagree.


Please discuss on import-sig.

===

PEP: 489
Title: Redesigning extension module loading
Version: $Revision$
Last-Modified: $Date$
Author: Petr Viktorin encu...@gmail.com,
Stefan Behnel stefan...@behnel.de,
Nick Coghlan ncogh...@gmail.com
Discussions-To: import-...@python.org
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 11-Aug-2013
Python-Version: 3.5
Post-History: 23-Aug-2013, 20-Feb-2015
Resolution:


Abstract


This PEP proposes a redesign of the way in which extension modules interact
with the import machinery. This was last revised for Python 3.0 in PEP
3121, but did not solve all problems at the time. The goal is to solve them
by bringing extension modules closer to the way Python modules behave;
specifically to hook into the ModuleSpec-based loading mechanism
introduced in PEP 451.

Extensions that do not require custom memory layout for their module objects
may be executed in arbitrary pre-defined namespaces, paving the way for
extension modules being runnable with Python's ``-m`` switch.
Other extensions can use custom types for their module implementation.
Module types are no longer restricted to types.ModuleType.

This proposal makes it easy to support properties at the module
level and to safely store arbitrary global state in the module that is
covered by normal garbage collection and supports reloading and
sub-interpreters.
Extension authors are encouraged to take these issues into account
when using the new API.



Motivation
==

Python modules and extension modules are not being set up in the same way.
For Python modules, the module is created and set up first, then the module
code is being executed (PEP 302).
A ModuleSpec object (PEP 451) is used to hold information about the module,
and passed to the relevant hooks.
For extensions, i.e. shared libraries, the module
init function is executed straight away and does both the creation and
initialisation. The initialisation function is not passed ModuleSpec
information about the loaded module, such as the __file__ or fully-qualified
name. This hinders relative imports and resource loading.

This is specifically a problem for Cython generated modules, for which it's
not uncommon that the module init code has the same level of complexity as
that of any 'regular' Python module. Also, the lack of __file__ and __name__
information hinders the compilation of __init__.py modules, i.e. packages,
especially when relative imports are being used at module init time.

The other disadvantage of the discrepancy is that existing Python 
programmers

learning C cannot effectively map concepts between the two domains.
As long as extension modules are fundamentally different from pure 
Python ones
in the way they're initialised, they are harder for people to pick up 
without
relying on something like cffi, SWIG or Cython to handle the actual 
extension

module creation.

Currently, extension modules are also not added to sys.modules until 
they are

fully initialized, which means that a (potentially transitive)
re-import of the module will really try to reimport it and thus run into an
infinite loop when it executes the module init function again.
Without the fully qualified module name, it is not trivial to correctly add
the module to sys.modules either.

Furthermore, the majority of currently existing extension modules has
problems with sub-interpreter support and/or reloading, and, while it is
possible with the current infrastructure to support these
features, it is neither easy nor efficient.
Addressing these issues was the goal of PEP 3121, but many extensions,
including some in the standard library, took the least-effort approach
to porting to Python 3, leaving these issues unresolved.
This PEP keeps the backwards-compatible behavior, which should reduce 
pressure
and give extension authors adequate time to consider these issues when 
porting.



The current process
===

Currently, extension modules export an initialisation function named
PyInit_modulename, named after the file name of the shared library. This
function is executed by the import machinery and must return either NULL in
the case of an exception, or a fully initialised module object. The
function receives no arguments, so it has no way of knowing about its
import context.

During its execution, the module init function creates a module object
based on a PyModuleDef struct. It then 

Re: [Python-Dev] PEP 489: Redesigning extension module loading

2015-03-16 Thread Jim J. Jewett

On 16 March 2015 Petr Viktorin wrote:

 If PyModuleCreate is not defined, PyModuleExec is expected to operate
 on any Python object for which attributes can be added by PyObject_GetAttr*
 and retrieved by PyObject_SetAttr*.

I assume it is the other way around (add with Set and retrieve with Get),
rather than a description of the required form of magic.


 PyObject *PyModule_AddCapsule(
 PyObject *module,
 const char *module_name,
 const char *attribute_name,
 void *pointer,
 PyCapsule_Destructor destructor)

What happens if module_name doesn't match the module's __name__?
Does it become a hidden attribute?  A dotted attribute?  Is the
result undefined?

Later, there is

 void *PyModule_GetCapsulePointer(
 PyObject *module,
 const char *module_name,
 const char *attribute_name)

with the same apparently redundant arguments, but not a
PyModule_SetCapsulePointer.  Are capsule pointers read-only, or can
they be replaced with another call to PyModule_AddCapsule, or by a
simple PyObject_SetAttr?

 Subinterpreters and Interpreter Reloading
...
 No user-defined functions, methods, or instances may leak to different
 interpreters.

By user-defined do you mean defined in python, as opposed to in
the extension itself?

If so, what is the recommendation for modules that do want to support,
say, callbacks?  A dual-layer mapping that uses the interpreter as the
first key?  Naming it _module and only using it indirectly through
module.py, which is not shared across interpreters?  Not using this
API at all?

 To achieve this, all module-level state should be kept in either the module
 dict, or in the module object.

I don't see how that is related to leakage.

 A simple rule of thumb is: Do not define any static data, except 
 built-in types
 with no mutable or user-settable class attributes.

What about singleton instances?  Should they be per-interpreter?
What about constants, such as PI?
Where should configuration variables (e.g., MAX_SEARCH_DEPTH) be
kept?


What happens if this no-leakage rule is violated?  Does the module
not load, or does it just maybe lead to a crash down the road?

-jJ

--

If there are still threading problems with my replies, please
email me with details, so that I can try to resolve them.  -jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 489: Redesigning extension module loading

2015-03-16 Thread Petr Viktorin
On Mon, Mar 16, 2015 at 4:42 PM, Jim J. Jewett jimjjew...@gmail.com wrote:

 On 16 March 2015 Petr Viktorin wrote:

 If PyModuleCreate is not defined, PyModuleExec is expected to operate
 on any Python object for which attributes can be added by PyObject_GetAttr*
 and retrieved by PyObject_SetAttr*.

 I assume it is the other way around (add with Set and retrieve with Get),
 rather than a description of the required form of magic.

Right you are, I mixed that up.

 PyObject *PyModule_AddCapsule(
 PyObject *module,
 const char *module_name,
 const char *attribute_name,
 void *pointer,
 PyCapsule_Destructor destructor)

 What happens if module_name doesn't match the module's __name__?
 Does it become a hidden attribute?  A dotted attribute?  Is the
 result undefined?

The module_name is used to name the capsule, following the convention
from PyCapsule_Import. The module.__name__ is not used or checked.
The function would do this:
capsule_name = module_name + '.' + attribute_name
capsule = PyCapsule_New(pointer, capsule_name, destructor)
PyModule_AddObject(module, attribute_name, capsule)
just with error handling, and suitable C code for the +.
I will add the pseudocode to the PEP.

 Later, there is

 void *PyModule_GetCapsulePointer(
 PyObject *module,
 const char *module_name,
 const char *attribute_name)

 with the same apparently redundant arguments,

Here the behavior would be:
capsule_name = module_name + '.' + attribute_name
capsule = PyObject_GetAttr(module, attribute_name)
return PyCapsule_GetPointer(capsule, capsule_name)

 but not a
 PyModule_SetCapsulePointer.  Are capsule pointers read-only, or can
 they be replaced with another call to PyModule_AddCapsule, or by a
 simple PyObject_SetAttr?

You can replace the capsule using any of those two, or set the pointer
using PyCapsule_SetPointer, or (most likely) change the data the
pointer points to.
The added functions are just simple helpers for common operations,
meant to encourage keeping per-module state.

 Subinterpreters and Interpreter Reloading
 ...
 No user-defined functions, methods, or instances may leak to different
 interpreters.

 By user-defined do you mean defined in python, as opposed to in
 the extension itself?

Yes.

 If so, what is the recommendation for modules that do want to support,
 say, callbacks?  A dual-layer mapping that uses the interpreter as the
 first key?  Naming it _module and only using it indirectly through
 module.py, which is not shared across interpreters?  Not using this
 API at all?

There is a separate module object, with its own dict, for each
subinterpreter (as when creating the module with PyModuleDef.m_size
== 0 today).
Callbacks should be stored on the appropriate module instance.
Does that answer your question? I'm not sure how you meant callbacks.

 To achieve this, all module-level state should be kept in either the module
 dict, or in the module object.

 I don't see how that is related to leakage.

 A simple rule of thumb is: Do not define any static data, except
 built-in types
 with no mutable or user-settable class attributes.

 What about singleton instances?  Should they be per-interpreter?

Yes, definitely.

 What about constants, such as PI?

In PyModuleExec, create the constant using PyFloat_FromDouble, and add
it using PyModule_FromObject. That will do the right thing.
(Float constants can be shared, since they cannot refer to
user-defined code. But this PEP shields you from needing to know this
for every type.)

 Where should configuration variables (e.g., MAX_SEARCH_DEPTH) be
 kept?

On the module object.

 What happens if this no-leakage rule is violated?  Does the module
 not load, or does it just maybe lead to a crash down the road?

It may, as today, lead to unexpected behavior down the road. This is
explained here:
https://docs.python.org/3/c-api/init.html#sub-interpreter-support
Unfortunately, there's no good way to detect such leakage. This PEP
adds the tools, documentation, and guidelines to make it easy to do
the right thing, but won't prevent you from shooting yourself in the
foot in C code.


Thank you for sharing your concerns! I will keep them in mind when
writing the docs for this.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com