[Python-Dev] PEP 489: Redesigning extension module loading
Hello, On import-sig, I've agreed to continue Nick Coghlan's work on making extension modules act more like Python ones, work well with PEP 451 (ModuleSpec), and encourage proper subinterpreter and reloading support. Here is the resulting PEP. I don't have a patch yet, but I'm working on it. There's a remaining open issue: providing a tool that can be run in test suites to check if a module behaves well with subinterpreters/reloading. I believe it's out of scope for this PEP but speak out if you disagree. Please discuss on import-sig. === PEP: 489 Title: Redesigning extension module loading Version: $Revision$ Last-Modified: $Date$ Author: Petr Viktorin encu...@gmail.com, Stefan Behnel stefan...@behnel.de, Nick Coghlan ncogh...@gmail.com Discussions-To: import-...@python.org Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 11-Aug-2013 Python-Version: 3.5 Post-History: 23-Aug-2013, 20-Feb-2015 Resolution: Abstract This PEP proposes a redesign of the way in which extension modules interact with the import machinery. This was last revised for Python 3.0 in PEP 3121, but did not solve all problems at the time. The goal is to solve them by bringing extension modules closer to the way Python modules behave; specifically to hook into the ModuleSpec-based loading mechanism introduced in PEP 451. Extensions that do not require custom memory layout for their module objects may be executed in arbitrary pre-defined namespaces, paving the way for extension modules being runnable with Python's ``-m`` switch. Other extensions can use custom types for their module implementation. Module types are no longer restricted to types.ModuleType. This proposal makes it easy to support properties at the module level and to safely store arbitrary global state in the module that is covered by normal garbage collection and supports reloading and sub-interpreters. Extension authors are encouraged to take these issues into account when using the new API. Motivation == Python modules and extension modules are not being set up in the same way. For Python modules, the module is created and set up first, then the module code is being executed (PEP 302). A ModuleSpec object (PEP 451) is used to hold information about the module, and passed to the relevant hooks. For extensions, i.e. shared libraries, the module init function is executed straight away and does both the creation and initialisation. The initialisation function is not passed ModuleSpec information about the loaded module, such as the __file__ or fully-qualified name. This hinders relative imports and resource loading. This is specifically a problem for Cython generated modules, for which it's not uncommon that the module init code has the same level of complexity as that of any 'regular' Python module. Also, the lack of __file__ and __name__ information hinders the compilation of __init__.py modules, i.e. packages, especially when relative imports are being used at module init time. The other disadvantage of the discrepancy is that existing Python programmers learning C cannot effectively map concepts between the two domains. As long as extension modules are fundamentally different from pure Python ones in the way they're initialised, they are harder for people to pick up without relying on something like cffi, SWIG or Cython to handle the actual extension module creation. Currently, extension modules are also not added to sys.modules until they are fully initialized, which means that a (potentially transitive) re-import of the module will really try to reimport it and thus run into an infinite loop when it executes the module init function again. Without the fully qualified module name, it is not trivial to correctly add the module to sys.modules either. Furthermore, the majority of currently existing extension modules has problems with sub-interpreter support and/or reloading, and, while it is possible with the current infrastructure to support these features, it is neither easy nor efficient. Addressing these issues was the goal of PEP 3121, but many extensions, including some in the standard library, took the least-effort approach to porting to Python 3, leaving these issues unresolved. This PEP keeps the backwards-compatible behavior, which should reduce pressure and give extension authors adequate time to consider these issues when porting. The current process === Currently, extension modules export an initialisation function named PyInit_modulename, named after the file name of the shared library. This function is executed by the import machinery and must return either NULL in the case of an exception, or a fully initialised module object. The function receives no arguments, so it has no way of knowing about its import context. During its execution, the module init function creates a module object based on a PyModuleDef struct. It then
Re: [Python-Dev] PEP 489: Redesigning extension module loading
On 16 March 2015 Petr Viktorin wrote: If PyModuleCreate is not defined, PyModuleExec is expected to operate on any Python object for which attributes can be added by PyObject_GetAttr* and retrieved by PyObject_SetAttr*. I assume it is the other way around (add with Set and retrieve with Get), rather than a description of the required form of magic. PyObject *PyModule_AddCapsule( PyObject *module, const char *module_name, const char *attribute_name, void *pointer, PyCapsule_Destructor destructor) What happens if module_name doesn't match the module's __name__? Does it become a hidden attribute? A dotted attribute? Is the result undefined? Later, there is void *PyModule_GetCapsulePointer( PyObject *module, const char *module_name, const char *attribute_name) with the same apparently redundant arguments, but not a PyModule_SetCapsulePointer. Are capsule pointers read-only, or can they be replaced with another call to PyModule_AddCapsule, or by a simple PyObject_SetAttr? Subinterpreters and Interpreter Reloading ... No user-defined functions, methods, or instances may leak to different interpreters. By user-defined do you mean defined in python, as opposed to in the extension itself? If so, what is the recommendation for modules that do want to support, say, callbacks? A dual-layer mapping that uses the interpreter as the first key? Naming it _module and only using it indirectly through module.py, which is not shared across interpreters? Not using this API at all? To achieve this, all module-level state should be kept in either the module dict, or in the module object. I don't see how that is related to leakage. A simple rule of thumb is: Do not define any static data, except built-in types with no mutable or user-settable class attributes. What about singleton instances? Should they be per-interpreter? What about constants, such as PI? Where should configuration variables (e.g., MAX_SEARCH_DEPTH) be kept? What happens if this no-leakage rule is violated? Does the module not load, or does it just maybe lead to a crash down the road? -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 489: Redesigning extension module loading
On Mon, Mar 16, 2015 at 4:42 PM, Jim J. Jewett jimjjew...@gmail.com wrote: On 16 March 2015 Petr Viktorin wrote: If PyModuleCreate is not defined, PyModuleExec is expected to operate on any Python object for which attributes can be added by PyObject_GetAttr* and retrieved by PyObject_SetAttr*. I assume it is the other way around (add with Set and retrieve with Get), rather than a description of the required form of magic. Right you are, I mixed that up. PyObject *PyModule_AddCapsule( PyObject *module, const char *module_name, const char *attribute_name, void *pointer, PyCapsule_Destructor destructor) What happens if module_name doesn't match the module's __name__? Does it become a hidden attribute? A dotted attribute? Is the result undefined? The module_name is used to name the capsule, following the convention from PyCapsule_Import. The module.__name__ is not used or checked. The function would do this: capsule_name = module_name + '.' + attribute_name capsule = PyCapsule_New(pointer, capsule_name, destructor) PyModule_AddObject(module, attribute_name, capsule) just with error handling, and suitable C code for the +. I will add the pseudocode to the PEP. Later, there is void *PyModule_GetCapsulePointer( PyObject *module, const char *module_name, const char *attribute_name) with the same apparently redundant arguments, Here the behavior would be: capsule_name = module_name + '.' + attribute_name capsule = PyObject_GetAttr(module, attribute_name) return PyCapsule_GetPointer(capsule, capsule_name) but not a PyModule_SetCapsulePointer. Are capsule pointers read-only, or can they be replaced with another call to PyModule_AddCapsule, or by a simple PyObject_SetAttr? You can replace the capsule using any of those two, or set the pointer using PyCapsule_SetPointer, or (most likely) change the data the pointer points to. The added functions are just simple helpers for common operations, meant to encourage keeping per-module state. Subinterpreters and Interpreter Reloading ... No user-defined functions, methods, or instances may leak to different interpreters. By user-defined do you mean defined in python, as opposed to in the extension itself? Yes. If so, what is the recommendation for modules that do want to support, say, callbacks? A dual-layer mapping that uses the interpreter as the first key? Naming it _module and only using it indirectly through module.py, which is not shared across interpreters? Not using this API at all? There is a separate module object, with its own dict, for each subinterpreter (as when creating the module with PyModuleDef.m_size == 0 today). Callbacks should be stored on the appropriate module instance. Does that answer your question? I'm not sure how you meant callbacks. To achieve this, all module-level state should be kept in either the module dict, or in the module object. I don't see how that is related to leakage. A simple rule of thumb is: Do not define any static data, except built-in types with no mutable or user-settable class attributes. What about singleton instances? Should they be per-interpreter? Yes, definitely. What about constants, such as PI? In PyModuleExec, create the constant using PyFloat_FromDouble, and add it using PyModule_FromObject. That will do the right thing. (Float constants can be shared, since they cannot refer to user-defined code. But this PEP shields you from needing to know this for every type.) Where should configuration variables (e.g., MAX_SEARCH_DEPTH) be kept? On the module object. What happens if this no-leakage rule is violated? Does the module not load, or does it just maybe lead to a crash down the road? It may, as today, lead to unexpected behavior down the road. This is explained here: https://docs.python.org/3/c-api/init.html#sub-interpreter-support Unfortunately, there's no good way to detect such leakage. This PEP adds the tools, documentation, and guidelines to make it easy to do the right thing, but won't prevent you from shooting yourself in the foot in C code. Thank you for sharing your concerns! I will keep them in mind when writing the docs for this. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com