On 31 Jan 2018, at 22:41, Yann Ylavic <ylavic....@gmail.com> wrote:
> 
> Hi Mark,
> 
> let's continue this debugging on dev@ if you don't mind..
> 
>> On Wed, Jan 31, 2018 at 10:15 PM,  <bugzi...@apache.org> wrote:
>> https://bz.apache.org/bugzilla/show_bug.cgi?id=62044
>> 
>> --- Comment #32 from m...@blackmans.org ---
>> so sig_coredump is being triggered by an unknown signal, multiple times a 
>> day.
>> It's not a segfault, nothing in /var/log/messages. That results in a bunch of
>> undeleted shared memory segments and probably some that will no longer be in
>> the global list, but still present in the kernel.
> 
> In 2.4.29, i.e. without patch [1], sig_coredump might be triggered by
> any signal received by httpd during a restart, and the signal handle
> crashes itself (double fault) so the process is forcibly SIGKILLed
> (presumably, no trace in /var/log/messages...).
> This was reported and discussed in [2], and seems to quite correspond
> to what you observe in your tests.
> 
> Moreover, if the parent process crashes nothing will delete the
> IPC-SysV SHMs (hence the leak in the system), while children processes
> may continue to be attached which prevents a new parent process to
> start (until children stop or are forcibly killed)...
> 
> When this happens, you should see non-root processes attached to PPID
> 1 (e.g. with "ps -ef"), "-f /path/to/httpd.conf" in the command line
> might help distinguish the different httpd instances to monitor
> processes.
> 
> If this is the case, you probably should try patch [1].
> If not, I can't explain why in httpd logs a process with a different
> PID appears after the SIGHUP, it must have been started
> (automatically?) after the previous one crashed.
> Here the generation number can't help, a new process always start at
> generation #0.
> 
> Regards,
> Yann.
> 
> [1] 
> https://svn.apache.org/repos/asf/httpd/httpd/patches/2.4.x/stop_signals-PR61558.patch
> [2] https://bz.apache.org/bugzilla/show_bug.cgi?id=61558

Thanks, for now, we will treat the “nasty error” as a separate question to 
resolve and hope that clean-up patch deals with the immediate issue.

I had originally treated that “nasty error” as a reference to the “file exists” 
error.  However, based on your feedback and reviewing the logs, I would 
conclude that “nasty error” is the trigger, as you suggrest, and the lack of 
SHM clean-up and consequent collisions are collateral damage.

Just to confirm, you expect that patch to handle SHM clean-up even in the 
“nasty error” case?  I suspect that nasty error is triggered by the Weblogic 
plugin based on the adjacency in the logs, but the tracing doesn’t reveal any 
details, so an strace will probably be required to get more detail.

Bugzilla was slightly easier to get log data into as I cannot use work email 
for these conversations.

Cheers,
Mark



Reply via email to