New committer: Jim Gallacher

2005-06-07 Thread Gregory (Grisha) Trubetskoy


Hi folks -

We have a new committer - Jim Gallacher. Thanks Jim for all your 
contributions to mod_python and for accepting the invitation to become a 
committer!


Cheers,

Grisha


Managing and updating the web site

2005-06-07 Thread Nicolas Lehuen
Grisha, can you tell us what is the process to release a new version,
including the update of the web site ?

Maybe include the web site source files in the subversion repository,
so that we can edit it ?

As for building the HTML documentation out of the LaTeX files, I'm a
bit at a loss...  Has anyone managed to do it ?

Regards,
Nicolas


The 3.2.0 release

2005-06-07 Thread Nicolas Lehuen
As you can see on
http://issues.apache.org/jira/secure/BrowseProject.jspa , there are
only two bugs still scheduled for 3.2.0 ; the rest is not scheduled.

Should we schedule all bugs for 3.2.0 and only release it once all the
bugs are released, or could we make an intermediary release ? I'm in
favor of an intermediary release with the bunch of bug fixes that
we've already done.

In this case, can everybody review the remaining bugs on JIRA and
select those they want to be fixed for 3.2.0 ?

Regards,
Nicolas


The new mod_python.publisher module loading system

2005-06-07 Thread Nicolas Lehuen
I'd especially like to have your advice (do not hesitate to be harsh
:) on the new publisher module loading system.

It does not uses apache.import_module, and fixes a lot of nasty bugs.
However, the new system means that standard Python modules and
published pages are not the same things anymore.

Well, published pages *are* Python modules, but they are not
registered into sys.modules. This breaks some of the import keyword
semantics : importing a published page from another published page
cannot be done using "import", but through the new
publisher.get_page(req,path) function. "import" can of course still be
used to import anything that can be found on the PYTHONPATH.

I really think this is the way to go, and both mpservlets and Vampire
are also separating standard Python modules from dynamically loaded
modules. But if someone feels this is wrong, please tell me !

Regards,
Nicolas


Re: Managing and updating the web site

2005-06-07 Thread Jim Gallacher

Nicolas Lehuen wrote:

Grisha, can you tell us what is the process to release a new version,
including the update of the web site ?

Maybe include the web site source files in the subversion repository,
so that we can edit it ?

As for building the HTML documentation out of the LaTeX files, I'm a
bit at a loss...  Has anyone managed to do it ?


Yes.

Jim



Solving the import problem

2005-06-07 Thread Nicolas Lehuen
One last thing that we should prepare is a clear and definite answer
to the zillion users who need to import a custom utility module.
Today, we have 4 ways of importing code :

a) the standard "import" keyword. Today, it works unchanged
(mod_python doesn't install any import hook). The consequence is that
the only modules that can be imported this way are those found on the
PYTHONPATH. Importing custom code is easy if you can manipulate this
variable (either directly or through the PythonPath configuration
directive), but not everybody has this luxury (think shared hosting,
although not being able to change the PythonPath through an .htaccess
file seems pretty restrictive to me.).

b) the PythonImport directive, which ensure that a module is imported
(hence its initialization code is ran), but doesn't really import it
into the handler's or published module's namespace.

c) the apache.import_module() function, which is frankly a strange
beast. It knows how to approximately reload some modules, but has many
tricks that makes it frankly dangerous, namely the one that causes
modules with the same name but residing in different directories to
collide. I really think that mixing dynamic (re)loading of code with
the usual import mechanisms is a recipe for disaster. Anyway, today,
it's the only way our users can import some shared code, using a
strange idiom like
apache.import_module('mymodule',[dirname(__file__)]).

d) the new publisher.get_page(req,path), which is not really an answer
since it is designed to allow a published object to call another
published object from another page (not to call some shared code).

This mess should be sorted out. As a baseline, I'd say that we have 4
kinds of code in mod_python :

1) the standard Python code that should be imported using the "import" keyword

2) handlers, which are dynamically loaded through apache.import_module
(so they are declared in sys.module, with all the problem that can
cause when sharing a single setup with multiple handlers that have the
same name, "publisher" for example) - this should be fixed.

3) published modules, which are dynamically loaded by the
mod_python.publisher handler (so now they don't have any problems that
were previously caused by apache.import_module). An important thing to
notice is that published module are usually stored in a directory
which is visible by Apache (handlers don't need to reside in a public
directory), amongst .html and image files. Hence, people can
legitimately be reluctant to put their core application code
(including DB passwords etc.) in published modules, for security and
code/presentation separation issues.

4) custom library code, AKA core application code. This code should
reside somewhere, preferably in a private directory (at least direct
access to this code from the web should be denied) and be easily
imported and reloaded into published modules, without having to tinker
too much with the PYTHONPATH variable or the PythonPath directive.

What would be nice is a clear and definite way to handle those 4 kinds
of code. To me, layers 2, 3 and 4 could be handled by the same dynamic
code cache, except that a careful directory structure or naming scheme
would prevent the layer 4 to be visible from the web.

I know Vampire solves a lot of these problems, so we have two alternatives : 

A) We decide that we won't solve the whole problem into mod_python. We
take apache.import_module out and shoot it. Handlers are loaded in a
real dynamic code cache maybe the same as the one now used by
mod_python.publisher), which solves a lot of problems.

Custom library code is not handled : if you want to import some code,
you put it wherever you like and make sure PYTHONPATH or the
PythonPath directive point to it, so you can import it like a standard
module. You'll never use apache.import_module anymore, it will
blissfully dissolve into oblivion (and be removed from the module,
anyway).

If you need to reload your core application code without restarting
Apache, then too bad, mod_python doesn't know how to do this. Check
out Vampire.

B) We decide to solve the whole problem into mod_python.
apache.import_module is not much luckier this time, it is still taken
out and shot in the head. We solve the handlers loading problem. But
now, with a little help from Graham, custom application code can be
dynamically loaded and reloaded from any place without having to
tinker with the PYTHONPATH variable and/or the PythonPath directive.
Everything can be done from the source code with a little help from an
.htaccess file.

So, sorry for this long mail, but I had to get this out. The current
situation is pretty bad, zillions of people need to do this simple
thing, and when they notice it's not that simple (or it's buggy), they
decide to build the nth application framework on mod_python. So,
either we reckon it's None of our business, that users should turn to
higher level frameworks like Vampire, and we remove
apache.import_module, or we 

Re: Managing and updating the web site

2005-06-07 Thread Nicolas Lehuen
2005/6/7, Jim Gallacher <[EMAIL PROTECTED]>:
> Nicolas Lehuen wrote:
> > Grisha, can you tell us what is the process to release a new version,
> > including the update of the web site ?
> >
> > Maybe include the web site source files in the subversion repository,
> > so that we can edit it ?
> >
> > As for building the HTML documentation out of the LaTeX files, I'm a
> > bit at a loss...  Has anyone managed to do it ?
> 
> Yes.
> 
> Jim

Great, so Jim, from now on, you're our official HTML documentation builder :).

Regards,
Nicolas


Re: Managing and updating the web site

2005-06-07 Thread Gregory (Grisha) Trubetskoy


On Tue, 7 Jun 2005, Nicolas Lehuen wrote:


Grisha, can you tell us what is the process to release a new version,
including the update of the web site ?


Updating the site is trivial, I can do it, that's the last thing to worry 
about. Releasing a new version is a bit more complicated.


The instructions for making the actual tgz file are at the bottom of 
Doc/Makefile file. I have no idea why I decided to stick them there, 
perhaps it's better to create a separate file called 
Doc/release-instructions.txt or something.


The instructions were written with CVS in mind so we'll need to figure out 
the SVN counterparts of the required commands.


You'll definitely need to be on a unix machine of some sort that has TeTeX 
installed to produce the documentation, it's actually pretty easy.


This being a somewhat major release, I think we should perhaps tag it 
3.2.0-rc1 (to be followed by rc2, etc).


Once the tgz file is ready we will need to collect votes on the list to 
make sure it actually works. Once it's been tested anough by the 
folks on the list (we need 4 or more I think) enough votes are collected, 
it can be released on the apache site and formally announced.


Grisha





Re: Solving the import problem

2005-06-07 Thread Barry Pearce
Indeed Im for fixing it...its on my list of things to do...right after 
'do everything the company want RSN'


I do believe it should be mod_python that is fixed. I have a VERY big 
need for reload of modules *without* taking down my server - end users 
are using it and credit card transactions are taking placeI cannot 
afford to take it down...


As for vampire - why would I want vampire? mod_python is great except 
this. I personally have no interest in adding yet more software to my 
system just to solve the mod_python import issue - Id rather it was 
fixed in the right place...not everyone uses vampire...


Nick wrote:
I'm all for fixing the importing of handler modules, because that can 
bite you if they are being loaded and initialized under heavy load in 
the same process in different threads.


? Surely not in different interpreters - and as mod_python uses a number 
of interpreters surely this is not an issue...or are we saying python 
itself is not thread safe in this regard?


As for user imports in general, I would prefer that everything work in a 
standard Python way out of the box.  mod_python should be what it is -- 
an interface to apache.  That's most useful for application and 
framework developers I think.  Otherwise you could end up breaking other 
people's applications and frameworks who have resolved the problem in a 
different way.


agreedbut...optional functionality to stop us having to restart 
apache would be good...and I consider that that is part of 
'interfacing'. It solves a problem that is inherent in 'interfacing'.


I'm not opposed to providing some sort of module or function for psp or 
publisher or whatever "example" handlers that come with mod_python, just 
so long as it's not in the core of mod_python itself.  To me this is an 
application/framework issue, not a mod_python issue.  That said, the 
behaviour should probably be documented well so people can avoid the 
pitfalls.


quite. From my perspective its like this...either it can be solved as a 
group here and in mod_python or ill solve it in my way outside of 
mod-pythonbut whatever happens the business driver states I cannot 
take down a web server - its like antivirus software asking you to 
reboot your machine every time it updates (no tales of woah please...ive 
heard them already!)


Barry


session handling - the next generation

2005-06-07 Thread Jim Gallacher

Hi All,

Sorry for the long post, but I think I have a nice contribution to the 
session handling mechanism, but I've got a bug I just can't track down. 
Make yourself a cup of coffee, sit down and have a read.


In an effort to make session handling more transparent and less error 
prone, I've added a new method called get_session to the request object.


(As an aside, I've also created a new apache config directive 
PythonSessionOptions, a req.get_session_options() to read the directive, 
and a small internal change to req.internal_redirect() to pass a session 
to a redirected request).


req.get_session() returns the session instance if it exists or creates a 
new one. Internally it calls some python code, 
mod_python.Session.create_session(), which does the actual work. The 
user will only use get_session() to access their session, and so all our 
deadlocking issues are gone forever.


It mostly works, but I've found that approx 1 in 1 requests will 
segfault. Calling mod_python.Session.create_session directly does not 
segfault (tested for 200 requests) so that is not the issue.


I really hope someone can spot where the problem is. I've reduced 
everything to the simplest possible case. The unlock method in 
TestSession very rarely segfault or raise an exception and then 
segfault, but very rarely. It only fails 0.004% of the time based on 
50 requests. I suspect some kind of garbage collection issue, maybe 
a circular reference caused by req->session->_req, but I don't 
understand why it works 99.996 % of the time.


I've included the full error.log for 500 requests at the end of this 
message, but the ones that caught my eye as unusual are:


[Tue Jun 07 17:52:02 2005] [notice] child pid 15066 exit signal 
Segmentation fault (11)

Fatal Python error: deletion of interned string failed

[Tue Jun 07 17:58:57 2005] [error] [client 192.168.1.12] PythonHandler 
mptest: Traceback (most recent call last):
[Tue Jun 07 17:58:57 2005] [error] [client 192.168.1.12] PythonHandler 
mptest:   File "/usr/lib/python2.3/site-packages/mod_python/apache.py", 
line 299, in HandlerDispatch\nresult = object(req)
[Tue Jun 07 17:58:57 2005] [error] [client 192.168.1.12] PythonHandler 
mptest:   File "/var/www/mp/mptest.py", line 19, in handler\nsess = 
req.get_session()
[Tue Jun 07 17:58:57 2005] [error] [client 192.168.1.12] PythonHandler 
mptest:   File 
"/usr/lib/python2.3/site-packages/mod_python/TestSession.py", line 46, 
in create_session\nreturn TestSession(req,sid)
[Tue Jun 07 17:58:57 2005] [error] [client 192.168.1.12] PythonHandler 
mptest:   File 
"/usr/lib/python2.3/site-packages/mod_python/TestSession.py", line 40, 
in unlock\nif self._lock and self._locked:
[Tue Jun 07 17:58:57 2005] [error] [client 192.168.1.12] PythonHandler 
mptest: AttributeError: 'TestSession' object has no attribute '_lock'


[Tue Jun 07 18:04:03 2005] [notice] child pid 16170 exit signal 
Segmentation fault (11)

Fatal Python error: GC object already tracked


I hope someone can give me some insight to the problem. Note that I'm 
using mpm-worker.


Regards,
Jim


mptest.py
# the test handler

from mod_python import apache
from mod_python import TestSession

def handler(req):
req.content_type = 'text/plain'
req.write('mptest.py')

if req.path_info:
test_case = req.path_info[1:]
else:
test_case = '1'

if test_case == '1':
# the good case - always works
sess = TestSession.create_session(req,None)
elif test_case == '2':
# the bad case - sometimes segfaults
sess = req.get_session()
else:
req.write('no test specified')

return apache.OK

---
mod_python/TestSession.py
# emulate a simple test session
import _apache
from Session import _new_sid

class TestSession(object):

def __init__(self, req, sid=0, secret=None, timeout=0, lock=1):
req.log_error("TestSession.__init__")
self._req = req
self._lock = 1
self._locked = 0
self._sid = _new_sid(req)
self.lock()
self.unlock()

def lock(self):
if self._lock:
_apache._global_lock(self._req.server, self._sid)
self._locked = 1

def unlock(self):
# unlock will ocassionally segfault
if self._lock and self._locked:
_apache._global_unlock(self._req.server, self._sid)
self._locked = 0

def create_session(req,sid):
return TestSession(req,sid)

-
src/requestobject.c
In the following code session is a PyObject* defined in the the 
requestobject struct.


static PyObject *req_get_session(requestobject *self, PyObject *args)
{
PyObject *m;
PyObject *sid;

if (!self->session) {
  

Re: Solving the import problem

2005-06-07 Thread Graham Dumpleton


On 08/06/2005, at 8:33 AM, Barry Pearce wrote:

Indeed Im for fixing it...its on my list of things to do...right after 
'do everything the company want RSN'


I do believe it should be mod_python that is fixed. I have a VERY big 
need for reload of modules *without* taking down my server - end users 
are using it and credit card transactions are taking placeI cannot 
afford to take it down...


As for vampire - why would I want vampire? mod_python is great except 
this. I personally have no interest in adding yet more software to my 
system just to solve the mod_python import issue - Id rather it was 
fixed in the right place...not everyone uses vampire...


From what I can see, hardly anyone actually uses Vampire and a big 
reason is

probably the same attitude you are expressing. :-(

To echo a comment I just made in a separate posting to the main mailing 
list,
I believe that mod_python and Apache in combination have huge potential 
as
being a base for quite powerful and complex systems. I feel though that 
most
people don't really appreciate the fullness of what mod_python has to 
offer
and just scratch the surface. Things aren't helped by mod_python having 
some
rough edges and gaps in its basic functionality which if present would 
make
it so much easier for people new to mod_python to make use of it. As a 
result

of these gaps I keep seeing people trying to harness what is provided in
mod_python in ways that it probably shouldn't, resulting in code which 
over
time will just become messy and hard to manage. This may be okay for 
simple

systems, but in a complicated system its asking for trouble.

One can liken mod_python to providing a good foundation and some basic 
bits
and pieces for building a house. Some of these bits are currently 
broken or

don't function in an ideal way. The point of Vampire is to provide fixed
versions of some of these bits and to provide some better bits to help 
you
in building your house. What Vampire isn't is a preconstructed house 
which you
are forced to adopt. I get the impression from various people that they 
think
Vampire is a house and as such it will be inflexible because it can 
only be
used in a certain way, consequently the often repeated thought I see 
expressed

is "why would I want to use it?".

In some respects, some of the bits and ideas embodied in Vampire are 
things
that should be in the core mod_python package. At the moment though, I 
see
enough bugs and other issues in mod_python that need fixing that one is 
better
concentrating on them first, rather than trying to push more stuff in 
there.
At least to my mind, Vampire is serving at the moment as a test bed for 
stuff
that could be later incorporated into mod_python when a clear idea 
develops
of where the best way to take mod_python would be. Unfortunately, a lot 
of
people seem to feel that since it isn't in mod_python now, that there 
can't be

much point to it and it isn't worth investigating. :-(

Graham



Re: Solving the import problem

2005-06-07 Thread Graham Dumpleton

An update on a few things that I have managed to get working in
Vampire in respect of some of the issues below, plus a few other
comments.

On 08/06/2005, at 6:33 AM, Nicolas Lehuen wrote:


One last thing that we should prepare is a clear and definite answer
to the zillion users who need to import a custom utility module.
Today, we have 4 ways of importing code :

a) the standard "import" keyword. Today, it works unchanged
(mod_python doesn't install any import hook). The consequence is that
the only modules that can be imported this way are those found on the
PYTHONPATH. Importing custom code is easy if you can manipulate this
variable (either directly or through the PythonPath configuration
directive), but not everybody has this luxury (think shared hosting,
although not being able to change the PythonPath through an .htaccess
file seems pretty restrictive to me.).


I finally worked out the proper way in Python that one is meant to
install import hooks so that you don't screw up other packages also
trying to use import hooks, although it relies on the other packages
doing it the correct way as well.

The result is that in Vampire, when the feature is enabled, you can
use the "import" keyword to import modules local to the document tree
where the handler is and it will use the Vampire module importing
system instead for those imports. Where the context is traceable back
to a top level import of a handler from Vampire, the automatic module
reloading mechanism, including changes in children causing parents
to be reloaded, is all working okay.

When this feature kicks in, it will only search in the same directory
as handler file is located and optionally along a module search path
which is distinct from the normal sys.path. This search path has to
be separate and can't overlap with sys.path because you will end up
with duplicate modules loaded in different ways if one isn't careful.
The preferred approach is that sys.path should simply not include any
directory which is a part of the document tree.

The only part of what "import" provides that isn't working completely
yet is importation of packages. The bits of this that do work are the
importing of the root of the package. Importing of a sub module/package
of the package which was already imported by the parent and using the
from/import syntax to import only bits of any of these.

The one bit that I haven't been able to get working yet is where you
have "import package.module" and where "module" wasn't explicitly
imported by "package/__init__.py".

The reason it doesn't work is that the part of the Python import system
that deals with packages assumes that any module imports are always 
stored

in sys.modules. It relies on this and will search sys.modules for the
parent module to determine which directory it is in and thus from where
it should import the sub module/package.

At the moment to me this makes is look like any system that tries to use
import hooks in Python, cannot support packages where the 
modules/packages

are not stored in sys.modules.

Because of this, even though packages partly work, at the moment I throw
an import error with a message saying that packages aren't supported in
the context of the Vampire module importing system if such an import is
attempted. This shouldn't be an issue for individual handler files 
stored
in the document tree as you wouldn't write them as packages normally 
anyway.
It might be an issue if someone had a set of utility modules living 
outside

the document tree that they wanted automatic reloading to work on. The
only choice there at the moment is not to use a traditional package in
that context. You could get more flexibility by accessing the module
loading API in Vampire directly, but that means the utility modules, 
that

perhaps shouldn't strictly know about Vampire/mod_python, will.


b) the PythonImport directive, which ensure that a module is imported
(hence its initialization code is ran), but doesn't really import it
into the handler's or published module's namespace.

c) the apache.import_module() function, which is frankly a strange
beast. It knows how to approximately reload some modules, but has many
tricks that makes it frankly dangerous, namely the one that causes
modules with the same name but residing in different directories to
collide. I really think that mixing dynamic (re)loading of code with
the usual import mechanisms is a recipe for disaster. Anyway, today,
it's the only way our users can import some shared code, using a
strange idiom like
apache.import_module('mymodule',[dirname(__file__)]).


I know you have marked:

  http://issues.apache.org/jira/browse/MODPYTHON-9

as resolved by virtue of including a new module importing system in
publisher, but there is still the underlying problem in import_module()
function that once you access an "index.py" in a subdirectory, the one
in the parent is effectively lost. I realise that even if this is fixed,
each still gets reloaded on cycli

Re: The 3.2.0 release

2005-06-07 Thread Graham Dumpleton


On 08/06/2005, at 7:19 AM, Jim Gallacher wrote:

MODPYTHON-37
Add apache.register_cleanup()

According to the FAQ entry linked by Graham, this probably can't 
be done, so change it's status to closed?


Actually, I still believe it can be done, but from memory Grisha 
expressed
reservations at the time about my thinking on how it could be achieved 
so

I didn't pursue it any further at the time.

Not sure my current thoughts on it now are the same as what I pushing
earlier, but at the time the mod_python.c creates the interpreters and
loads the _apache module into each, it has access to the server_rec 
struct
which is effectively what the Python serverobject is in part a wrapper 
for.


As such, with perhaps a bit of initialisation at the time _apache is 
setup
for each interpreter, one could provide a method in _apache which gives 
access
to an instance of a serverobject. Having access to that, one could then 
call

register_cleanup() on it to register the callback.

Graham



Re: Managing and updating the web site

2005-06-07 Thread Gregory (Grisha) Trubetskoy


On Tue, 7 Jun 2005, Jim Gallacher wrote:

What's the mechanism for this, since the generated docs are not in 
subversion?


The docs _are_ in subversion, but in .tex format. The html is included in 
the distribution tar file for convenience, but I don't think it'd make 
sense to have both tex and html in svn.


Presumably, when the release is ready you'll create a branch, and then 
the docs will get committed to that branch?


I think that generating the docs is the trickiest part of making the 
release tarball, so you might as well just do the whole thing :-) Or I 
could do it, but I'd really sleep better knowing that I am not alone able 
to create a tarball.


Same question for the generated psp.c (using flex) and configure, generated 
from configure.in with autoconf.


The generated psp.c is there just in case you don't have flex (especially 
because a specific version of flex is required). I guess it's similar to 
./configure - in theory there is no need to keep it since it is 
autogenerated from configure.in, but most projects do it anyway...


Grisha



Re: Solving the import problem

2005-06-07 Thread Gregory (Grisha) Trubetskoy


On Tue, 7 Jun 2005, Barry Pearce wrote:

I'm not opposed to providing some sort of module or function for psp or 
publisher or whatever "example" handlers that come with mod_python, just 
so long as it's not in the core of mod_python itself.  To me this is an 
application/framework issue, not a mod_python issue.  That said, the 
behaviour should probably be documented well so people can avoid the 
pitfalls.


quite. From my perspective its like this...either it can be solved as a group 
here and in mod_python or ill solve it in my way outside of mod-pythonbut 
whatever happens the business driver states I cannot take down a web server - 
its like antivirus software asking you to reboot your machine every time it 
updates (no tales of woah please...ive heard them already!)


Perhaps if we agree on the general direction but think that a change may 
be a bit radical, doing something like what Python has done with "future" 
is the way to go, i.e. new functionality has to be explicitely requested, 
and if people like it, then it becomes default?


Grisha


Re: Managing and updating the web site

2005-06-07 Thread Jim Gallacher

Gregory (Grisha) Trubetskoy wrote:


On Tue, 7 Jun 2005, Jim Gallacher wrote:

What's the mechanism for this, since the generated docs are not in 
subversion?



The docs _are_ in subversion, but in .tex format. The html is included 
in the distribution tar file for convenience, but I don't think it'd 
make sense to have both tex and html in svn.


Yes, I know the docs are in subversion. I was thinking of the html + pdf 
stuff. But you are right, no need for a copy of these in the repository, 
since they are in effect compiled from the source.


Presumably, when the release is ready you'll create a branch, and then 
the docs will get committed to that branch?



I think that generating the docs is the trickiest part of making the 
release tarball, so you might as well just do the whole thing :-) Or I 
could do it, but I'd really sleep better knowing that I am not alone 
able to create a tarball.


Can do. Your earlier suggestion of putting the release instructions in a 
separate file makes sense. I had looked in Doc/Makefile when trying to 
generate the docs, but had not read all the way to the bottom!


Same question for the generated psp.c (using flex) and configure, 
generated from configure.in with autoconf.



The generated psp.c is there just in case you don't have flex 
(especially because a specific version of flex is required). I guess 
it's similar to ./configure - in theory there is no need to keep it 
since it is autogenerated from configure.in, but most projects do it 
anyway...


I think it's nice to include both ./configure and psp_parser.c so that 
people don't need to track down the proper version of flex and autoconf.


Regards,
Jim


Re: Solving the import problem

2005-06-07 Thread Nicolas Lehuen
2005/6/8, Graham Dumpleton <[EMAIL PROTECTED]>:
> > c) the apache.import_module() function, which is frankly a strange
> > beast. It knows how to approximately reload some modules, but has many
> > tricks that makes it frankly dangerous, namely the one that causes
> > modules with the same name but residing in different directories to
> > collide. I really think that mixing dynamic (re)loading of code with
> > the usual import mechanisms is a recipe for disaster. Anyway, today,
> > it's the only way our users can import some shared code, using a
> > strange idiom like
> > apache.import_module('mymodule',[dirname(__file__)]).
> 
> I know you have marked:
> 
>http://issues.apache.org/jira/browse/MODPYTHON-9
> 
> as resolved by virtue of including a new module importing system in
> publisher, but there is still the underlying problem in import_module()
> function that once you access an "index.py" in a subdirectory, the one
> in the parent is effectively lost. I realise that even if this is fixed,
> each still gets reloaded on cyclic requests, but at least the parent
> doesn't become completely useless.

Well, the publisher problem is really solved, and you *can* publish
two index.py modules in different directories, even subdirectories,
without any problem or cyclic reload.

However, if you are still using apache.import_module, for example to
dynamically import a support module into a published module, then
you'll have some trouble, because *apache.import_module is hopelessly
broken*. I can't stress that enough.

I'm sorry to be harsh, but this function cannot be fixed so that it
works correclty,. Well, it can, but the result would be a kludge.
Right now, if two applications or two users (in shared hosting) have a
util.py module that they want to import, the first one will do
something like :

util = apache.import_module('util',['/path/to/my/application/code'])

and the second one, being very clever and having read the mailing
list, will do :

util = apache.import_module('util',[os.dirname(__file__)])

Bang, you've got a module collision. The problem is, there is no way
to solve this while retaining the current semantics of
apache.import_module.

apache.import_module tries to emulate an environment in which modules
are imported in the standard way, which a kind of
on-the-fly-reconfigurable PYTHONPATH. This simply does not work.

For starters, I really think that providing a search path as an
argument is very, very bad practice. It forces you to hardcode a path
into some application code (or to make complicated convolutions using
os.dirname or req.get_option). I think the way you did it in Vampire
is pretty clever, Graham. You look into the directory of the module
which handles the current request (and you provide a special __req__
attribute during import time to be able to refer to the current
request), then if not in a special, application-specific search path
which can be redefined through a configuration directive, and is
distinct from the sys.path.

You wrote it better than me, we should not mix sys.path, the
application-specific search path and the published document tree.

Another problem with apache.import_module is that it pollutes the
sys.modules list, hence causing a wealth of collisions opportunities.
sys.modules should be for nice, standard Python modules and modules
imported from the sys.path. Not for dynamically loaded modules.

The nice thing is that I think I understand the usage patterns of
apache.import_module much better now. So I think I see a way to
reimplement and fix it with the following behaviour :

A) Finding the module
1) If the path parameter is passed, use it
2) If the path parameter is absent, look for the module in the "local"
directory (local being based on the __file__ attribute of the current
module)
3) If there is no matching "local" module, look for it in the
application-specific path which is defined in a configuration file. I
think this requires access to a request object, so this one could be
tricky.
4) Look for the module into sys.path

B) Loading or reloading the module
In case 1 to 3, we use the cache.ModuleCache class as in the publisher
to load or reload the module. We DO NOT store the resulting module in
sys.modules. If two modules have the same same but reside in different
directories, there won't be any collision, thanks to
cache.ModuleCache. There also won't be any multithreading issues since
ModuleCache uses a two-level locking scheme.

In case 4, we lock the import lock. We check for the existence of the
module in sys.module.

- If it is absent, we load the module.
- If it is present, we check that its __file__ attribute is the same.
If not, something really weird is happening (the same module has moved
within the sys.path, or there is a possible collision) and we throw an
ImportError. We then check for a __timestamp__ attribute : if absent,
then too bad, this module is a core Python module which was not
imported using import_module, and we don't reload i