On 18 May 2010 10:09, Graham Dumpleton <[email protected]> wrote:
> On 18 May 2010 09:44, Graham Dumpleton <[email protected]> wrote:
>> On 18 May 2010 09:31, Alec Flett <[email protected]> wrote:
>>> Hey, me again - back with dying daemons, the bane of my existence at the 
>>> moment.
>>>
>>> Again, the problem is that daemons are mysteriously disappearing for some 
>>> reason, without restarting, in a single-threaded prefork mpm, with 
>>> single-threaded daemons. I cranked up logging on apache, and even added 
>>> some log messages of my own. It seems that wsgi_manage_process is not 
>>> always called, and it seems to be specifically during shutdown - the python 
>>> interpreter is shutting down, but the new daemon never starts up.
>>>
>>> So I have done some extensive log analysis and noticed a slightly odd 
>>> pattern... it seems that the failure is that 
>>> apr_proc_other_child_register() is somehow not registering, or at least the 
>>> cleanup is not getting called. I looked at the source for 
>>> apr_proc_other_child_register() and it is DEFINITELY not threadsafe, (uses 
>>> a linked list anchored in a global!) though it seems like this code would 
>>> never be called from multiple threads.
>>
>> No, would not ever be called from multithreaded context as only
>> invoked in Apache parent process which is single threaded.
>>
>>> Below I've extracted the mod_wsgi log messages for each failure - 
>>> essentially I took all the 'mod_wsgi (pid=xxx)' messages and extracted the 
>>> unique sequences, grouping the occurrence by PID. Imagine that each of the 
>>> messages below begins with 'mod_wsgi (pid=xxx'
>>>
>>> Successful daemon behavior looks like this:
>>>
>>> 586 pids with this sequence:
>>> ['9075', '10295', '9676', '10109', ...
>>>    ): Starting process 'apiserver-freebase.com' with threads=1.
>>>    ): Initializing Python.
>>>    ): Attach interpreter ''.
>>>    ): Adding '/mw/app/apiserver_94819/_install/lib/python2.6/site-packages' 
>>> to path.
>>>    , process='apiserver-freebase.com', application=''): Loading WSGI script 
>>> '/mw/app/apiserver_94819/_install/bin/apiserver.wsgi'.
>>>    ): Maximum requests reached 'apiserver-freebase.com'.
>>>    ): Shutdown requested 'apiserver-freebase.com'.
>>>    ): Stopping process 'apiserver-freebase.com'.
>>>    ): Destroying interpreters.
>>>    ): Cleanup interpreter ''.
>>>    ): Terminating Python.
>>>    ): Python has shutdown.
>>>    ): wsgi_manage_process(0, 'apiserver-freebase.com', 255)
>>>    ): wsgi_manage_process(3, 'apiserver-freebase.com', -1)
>>>    ): Process 'apiserver-freebase.com' unregistered, 
>>> (APR_OC_REASON_UNREGISTER) not doing anything
>>>    ): Process 'apiserver-freebase.com' has died, (APR_OC_REASON_DEATH) 
>>> restarting.
>>>    ): Successfully replaced with pid=xxx
>>>
>>> Failure to restart looks like this:
>>> 4 pids with this sequence:
>>> ['10901', '9335', '10910', '10952']
>>>    ): Starting process 'apiserver-freebase.com' with threads=1.
>>>    ): Initializing Python.
>>>    ): Attach interpreter ''.
>>>    ): Adding '/mw/app/apiserver_94819/_install/lib/python2.6/site-packages' 
>>> to path.
>>>    , process='apiserver-freebase.com', application=''): Loading WSGI script 
>>> '/mw/app/apiserver_94819/_install/bin/apiserver.wsgi'.
>>>    ): Maximum requests reached 'apiserver-freebase.com'.
>>>    ): Shutdown requested 'apiserver-freebase.com'.
>>>    ): Stopping process 'apiserver-freebase.com'.
>>>    ): Destroying interpreters.
>>>    ): Cleanup interpreter ''.
>>>    ): Terminating Python.
>>>    ): Python has shutdown.
>>>
>>> Obviously these last 5 log messages are all my own. I'm attaching the patch 
>>> that produced the above message. (disclaimer: I discovered that signal 
>>> handler is crashing with this patch because I'm passing NULL to 
>>> ap_log_error, but that is not the cause of this as the crash only occurs 
>>> during apache shutdown)
>>
>> Do know I haven't totally forgotten about this, it is still all in my
>> inbox and I have been noting your followups.
>>
>> My last thought about it, but which I haven't been able to investigate
>> and so hadn't posted about it as yet, is what the implications would
>> be if a WSGI applications code did something like call setsid() or
>> some other operation which disassociated itself from process group of
>> parent process. I cant remember if this would result in death of a
>> child process not being signalled back to the original parent process.
>>
>> This of course is dependent on code running in WSGI application that
>> did something odd like this, but then I have seen all sorts of strange
>> things done before. There is various stuff out there that assumes that
>> it is running from a command line Python for example, rather than an
>> embedded system, and does various stuff it shouldn't with controlling
>> tty as an example. Am not sure if use of the subprocess module could
>> also do strange things with the relationships between processes and
>> signalling on death.
>
> Neither setsid() or setpgrp() triggers the sort of thing I had in mind. :-(

Doing some reading, as I originally believed but couldn't remember for
sure, there shouldn't really be any way that a child process can
prevent death signal getting to parent. The only grey area I am not
sure about is where a process can increase its priority and do certain
kernel operations which would leave it stuck in a zombie state
forever, with only a reboot getting rid of it. If in such a permanent
zombie state, then may prevent parent process reaping the child. You
would have to be doing some weird stuff to cause that. It would also
perhaps be obvious as you would have zombies lying around that will
not vanish and I recollect you claiming you don't have zombies lying
around.

if the child cant be the cause then we go back to the parent process.
One would have to trust the core Apache code though. Thus, only issue
could be a mismatch between Apache core and APR/APR-UTIL shared
libraries or a third party Apache module that is doing weird crap in
context of Apache parent process to screw things up. What third party
Apache modules are you using besides mod_wsgi?

Graham

>> Are you able to able to list any non mainstream third party Python
>> packages you use?
>>
>> FWIW, you are still the only report of this sort of issue that I
>> recollect seeing which still makes me suspect it is something in the
>> hosted web application itself that is causing the issue
>>
>> Graham
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/modwsgi?hl=en.

Reply via email to