Ryan, I like this approach. Do you want us to prepare a patch against latest version? Also, we never committed to apache project. Would it be enough if we post a patch here. Or what would be the process?
On Sun, Apr 17, 2011 at 9:58 PM, pqf <p...@mailtech.cn> wrote: > Hi, all > Another question, does proc_wait_process() should update procnode->proc_id > to 0 too? or else mod_fcgid may send a signal to another irrelevant process > while apache is shutting down? I don't follow up mod_fcgid for a while, I > just took a glance, maybe it's updated somewhere else? > By the way, procnode->proc_id is set to 0 while apache startup, so why not > update procnode->proc_id to 0 while fcgid_create_privileged_process() is > fail? And then check magic number 0 rather than both -1 and 0, in both > proc_kill_gracefully() and proc_kill_force(). > > Cheers. > Ryan > > > > Hello, > > There is a very interesting, and quite a rare bug in mod_fcgid. It is easy > to reproduce if you can cause fork to fail (which can be done with > CloudLinux -- if anyone wants to replicate it). > > *Here is how it works: * > mod_fcgid tries to spawn a new process (proc_spawn_process in > fcgid_proc_unix.c), but fork returns -1. > More exactly fcgid_create_privileged_process function call returns error, > and fills in tmpproc.pid with -1 & tmpproc is assiged to procnode->proc_id). > > Now, if at the same time service httpd restart is executed, function > kill_all_subprocess in fcgid_pm_main.c will execute, and it will try to go > through all procnodes, sending SIGTERM via proc_kill_gracefully, (and then > SIGKILL via proc_kill_force) to procnode->proc_id.pid > Yet, one procnode will be pointing to procnode->proc_id.pid, causing kill > -15 -1 (kill all). > The end results all services on the server failing, including SSH, apache, > syslogd, etc.. > > I guess the problem is really rare for most people. Also it is quite hard > to diagnose, as it is completely not clear where the signal came from, and > it took us some time to correlate them with apache restarts.. Yet due to our > OS being used by shared hosts (where httpd restart is common thing), and our > ability to limit memory available to processes on per virtual host bases > (which causes fork to fail once that virtual host reaches memory limit), we > see the issue quite often. > > The solution is quite simple (not sure if it is the best / right solution), > in file: fcgid_proc_unix.c, in methods proc_kill_gracefully, line: > > rv = apr_proc_kill(&(procnode->proc_id), SIGTERM); > > should be changed to: > if (procnode->proc_id.pid != -1) { > rv = apr_proc_kill(&(procnode->proc_id), SIGTERM); > } else { > rv = APR_SUCCESS; > } > > Similarly in proc_kill_force > rv = apr_proc_kill(&(procnode->proc_id), SIGKILL); > should be changed to: > if (procnode->proc_id.pid != -1) { > rv = apr_proc_kill(&(procnode->proc_id), SIGKILL); > } else { > rv = APR_SUCCESS; > } > > Regards, > Igor Seletskiy > CEO @ Cloud Linux Inc > > >