Hi, I suspect that this will be put into a proper PEP at some point, but I'd like to bring this up for discussion first. This came out of issues 13429 and 16392.
http://bugs.python.org/issue13429 http://bugs.python.org/issue16392 Stefan The problem =========== Python modules and extension modules are not being set up in the same way. For Python modules, the module is created and set up first, then the module code is being executed. For extensions, i.e. shared libraries, the module init function is executed straight away and does both the creation and initialisation. This means that it knows neither the __file__ it is being loaded from nor its package (i.e. its FQMN). This hinders relative imports and resource loading. In Py3, it's also not being added to sys.modules, which means that a (potentially transitive) re-import of the module will really try to reimport it and thus run into an infinite loop when it executes the module init function again. And without the FQMN, it's not trivial to correctly add the module to sys.modules either. We specifically run into this for Cython generated modules, for which it's not uncommon that the module init code has the same level of complexity as that of any 'regular' Python module. Also, the lack of a FQMN and correct file path hinders the compilation of __init__.py modules, i.e. packages, especially when relative imports are being used at module init time. The proposal ============ I propose to split the extension module initialisation into two steps in Python 3.4, in a backwards compatible way. Step 1: The current module init function can be reduced to just creating the module instance and returning it (and potentially doing some simple C level setup). Optionally, after creating the module (and this is the new part), the module init code can register a C callback function that will be called after setting up the module. Step 2: The shared library importer receives the module instance from the module init function, adds __file__, __path__, __package__ and friends to the module dict, and then checks for the callback. If non-NULL, it calls it to continue the module initialisation by user code. The callback ============ The callback is defined as follows:: int (*PyModule_init_callback)(PyObject* the_module, PyModuleInitContext* context) "PyModuleInitContext" is a struct that is meant mostly for making the callback more future proof by allowing additional parameters to be passed in. For now, I can see a use case for the following fields:: struct PyModuleInitContext { char* module_name; char* qualified_module_name; } Both names are encoded in UTF-8. As for the file path, I consider it best to retrieve it from the module's __file__ attribute as a Python string object to reduce filename encoding problems. Note that this struct argument is not strictly required, but given that this proposal would have been much simpler if the module init function had accepted such an argument in the first place, I consider it a good idea not to let this chance pass by again. The registration of the callback uses a new C-API function: int PyModule_SetInitFunction(PyObject* module, PyModule_init_callback callback) The function name uses "Set" instead of "Register" to make it clear that there is only one such function per module. An alternative would be a new module creation function "PyModule_Create3()" that takes the callback as third argument, in addition to what "PyModule_Create2()" accepts. This would require users to explicitly pass in the (second) version argument, which might be considered only a minor issue. Implementation ============== The implementation requires local changes to the extension module importer and a new C-API function. In order to store the callback, it should use a new field in the module object struct. Open questions ============== It is not clear how extensions should be handled that register more than one module in their module init function, e.g. compiled packages. One possibility would be to leave the setup to the user, who would have to know all FQMNs anyway in this case, although not the import file path. Alternatively, the import machinery could use a stack to remember for which modules a callback was registered during the last init function call, set up all of them and then call their callbacks. It's not clear if this meets the intention of the user. Alternatives ============ 1) It would be possible to make extension modules optionally export another symbol, e.g. "PyInit2_modulename", that the shared library loader would call in addition to the required function "PyInit_modulename". This would remove the need for a new API that registers the above callback. The drawback is that it also makes it easier to write broken code because a Python version or implementation that does not support this second symbol would simply not call it, without error. The new C-API function would let the build fail instead if it is not supported. 2) The callback could be made available as a Python function in the module dict, thus also removing the need for an explicit registration API. However, this approach would add overhead to both sides, the importer code and the user provided module init code, as it would require additional dictionary handling and the implementation of a one-time Python function in user code. It would also suffer from the problem that missing support in the runtime would pass silently. 3) The callback could be registered statically in the PyModuleDef struct by adding a new field. This is not trivial to do in a backwards compatible way because the struct would grow longer without explicit initialisation by existing user code. Extending PyModuleDef_HEAD_INIT might be possible but would still break at least binary compatibility. 4) Pass a new context argument into the module init function that contains all information necessary to properly and completely set up the module at creation time. This would provide a much simpler and cleaner solution than the proposed solution. However, it will not be possible before Python 4 as it breaks backwards compatibility with all existing extension modules at both the source and binary level. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com