2005/6/9, Graham Dumpleton <[EMAIL PROTECTED]>: > I admit I haven't necessarily fully digested some of what has already been > proposed but, here is my take on the issue put together on main train ride > this morning to work ..... > > I feel that there are a lot of issues which need to be covered to solve the > problems with module loading. The apache.import_module() method is > currently used in a number of different contexts and each has differing > requirements. We need to look at each of these in turn and make sure we > clearly record and understand what is required for each. > > The first point at which apache.import_module() is used is to load the top > level handler. Ie., the module associated with a PythonHandler directive or > the directive associated with a phase other than the content handler. The > other type of top level import is that done by the PythonImport directive. > > If apache.import_module() were to be replaced with a mechanism which avoids > use of the "imp" module and storage of modules in sys.modules, these > particular cases of top level imports wouldn't be able to use it > exclusively. This is because the top level handler for PythonHandler will > often be a module which is stored in the Python site-packages directory. > Ie., modules such as mod_python.publisher, mod_python.psp, mpservlets and > vampire.
Are you saying that the top-level handlers should always reside on sys.path ? I'm OK with that, but that may be a big restriction in shared hosting environment. Then again, it could also be a security measure, as badly conceived top handlers could be a source for security holes (we know this all to well, hence the 3.1.4 release ;). So the official justification for this restriction would be "we only allow top handlers to come from the sys.path because being able to use any kinf of top level handler would be dangerous in a shared hosting environment". > There are already problems in situations where in one part of the > documentation tree someone defines PythonHandler to be mod_python.psp and > in a handler in a different part of the tree a handler does an explicit > import of mod_python.psp. From memory, if PythonHandler case is triggered > first, then when the explicit import of mod_python.psp occurs it will fail > as the apache.import_module() function doesn't quite set up the sys.modules > environment in a way that is compatible with the "import" statement. > > As well as top level imports from site-packages, PythonHandler has to > deal with the case where the module to be imported is loaded from the > document tree itself, specifically where the Directory directive is > specified or where the .htaccess file resides. Erm, so, no, handlers could also be imported from the document tree. No problem, we can do that, but the security issues pop up once again. > In this case, it currently works by virtue of sys.path being amended by > mod_python to include that directory before doing the import using > apache.import_module(). The problem here is that you can't then easily use > the same named module in different directories as the PythonHandler. Agreed. This is an ugly hack that we should get rid of. > What I think needs to happen for these top level imports is that mod_python > has to determine if the module to be loaded is to come from the document > tree or from somewhere else on sys.path. If the module is not from the > document tree then the standard Python import mechanisms would be used to > import it. Consequently, such modules would not be candidates for any form > of automatic module reloading. Ie., no module reloading is done on anything > in sys.modules as it is now. > > This would ensure for example that mod_python.psp is imported in a standard > way and that an explicit import of mod_python.psp from a users handler code > is going to work, thus avoiding the hack at the moment that mod_python.psp > must be loaded in a users handler using apache.import_module(). > > If mod_python finds that the module is not a standard module but one which > is defined within the document tree, then it would use the new and improved > apache.import_module() which doesn't rely on sys.modules. > > Note that the direction I am looking at here is that apache.import_module() > is made to function properly in the contexts it needs to and not perform > double duty in satisfying extra requirements of top level mod_python imports > where it has to import stuff from site-packages. The top level imports > should be treated specially and it should only defer to > apache.import_module() for imports from the document tree. > > If this separation is done, I think that the distinction that has been > introduced with a separate module loader in mod_python.publisher can be > eliminated. The apache.import_module() can simply be replaced with that in > mod_python.publisher or a modification of it to satisfy other requirements > I will talk about later in future emails. > > As far as imports from any of the above imported modules goes, the general > rule should be that if it is a standard module in sys.path, then "import" > is used. If it is within the document tree then apache.import_module(). > > As far as utility modules which exist outside of the document tree which > are specifically related to the web application but which aren't on sys.path > and for which you want module reloading to work, apache.import_module() > would still be used, but you have to specify the actual directory to the > function. > > In some respects the ability not to specify a path to apache.import_module() > should be disallowed with a path always required. Further, sys.path should > no longer be automatically ammended to include the directory where the > PythonHandler is defined for. And apache.import_module() should never > search in sys.path. > > As far as I can tell at the moment, the only real reason that sys.path is > searched at the moment is to satisfy the requirements of top level imports > as far as being able to find stuff in site-packages or elsewhere on sys.path. > As such, if mod_python does special checking and knows when standard Python > imports should be used, this ability can be discarded. > > The implication of not extending sys.path automatically is that "import" > will not work to load a file in the same directory as the handler when in > the document tree. This was always dangerous anyway as that module could > also have been loaded by apache.import_module() and a problem could thus > arise. If "import" is used in this way it would need to be changed to > apache.import_module(), or a simple import hook introduced which when > used in a module imported using apache.import_module() will use > apache.import_module() underneath for an "import" of a file in the same > directory. > > How does this seem to people? There is stil more detail just in this bit > which will need clarification and there are other issues as well which > I haven't even mentioned. > > Anyway, time to do some work. > > Graham > I've understood you point, but there is a difficulty in judging from a PythonHandler directive whether the handler should be loaded as a standard Python module, from the sys.path, or as a dynamic Python module, from the document tree. Maybe the context of the directive could be used for that ; if the directive is defined at the server or virtual host level, then it's a top level handler, otherwise if it is defined in a Location or Directory (or .htaccess file), then it's a handler that should be loaded from the document tree (with a possible fallback to sys.path if it is not found ?). Anyway, saying that "import" should be used to import from the sys.path and apache.import_module should be used to import from the document tree looks like a clean rule, easy to understand and to implement. The suggestion I've made in my former (way too long) mail was simply that when a module is not found from the document tree, we could fall back to a careful standard import from the sys.path, but this would smudge in appearance this clean separation between standard and dynamic modules. Regards, Nicolas