https://bz.apache.org/bugzilla/show_bug.cgi?id=60487
Bug ID: 60487 Summary: Core dumps in mpm_event during graceful restart Product: Apache httpd-2 Version: 2.4-HEAD Hardware: PC OS: FreeBSD Status: NEW Severity: normal Priority: P2 Component: mpm_event Assignee: bugs@httpd.apache.org Reporter: apa...@wheelhouse.org Target Milestone: --- Created attachment 34528 --> https://bz.apache.org/bugzilla/attachment.cgi?id=34528&action=edit Don't dereference retained if it isn't set yet. It seems like if signals are sent to an httpd process in a particular order/speed, Apache will segfault in mpm_event's ap_start_restart() function on the line: retained->is_graceful = graceful; This behavior has been observed on a number of different systems at different times. When explored from gdb: #0 0x00000008019dd2cc in ap_start_restart (graceful=1) at event.c:696 696 retained->is_graceful = graceful; [New Thread 802006400 (LWP 101541/<unknown>)] Current language: auto; currently minimal (gdb) print retained $1 = (event_retained_data *) 0x0 The "retained" static variable is never explicitly set to NULL in the code. It is only directly assigned in two places, both in event_pre_config(): retained = ap_retained_data_get(userdata_key); if (!retained) { retained = ap_retained_data_create(userdata_key, sizeof(*retained)); The processes that this happens to frequently crash after a couple of days, and they are the top-level run-as-root parent process, so it is not a case where the signal is coming in before the pointer has been allocated. This doesn't appear to be related to receiving a restart signal while shutting down; gdb reports shutdown_pending is not set: (gdb) print shutdown_pending $1 = 0 According to the call stack, this is happening from inside config parsing on the LoadModule directive for mpm_event: #0 0x00000008019dd2cc in ap_start_restart (graceful=1) at event.c:696 #1 0x00000008019dd27f in restart (sig=30) at event.c:706 #2 0x0000000801408b4a in pthread_sigmask () from /lib/libthr.so.3 #3 0x0000000801407c08 in pthread_getspecific () from /lib/libthr.so.3 #4 0x0000000801407abd in pthread_getspecific () from /lib/libthr.so.3 #5 0x000000080140cbd7 in pthread_timedjoin_np () from /lib/libthr.so.3 #6 0x00000008006c79fb in r_debug_state () from /libexec/ld-elf.so.1 #7 0x00000008006cc437 in _rtld_is_dlopened () from /libexec/ld-elf.so.1 #8 0x00000008006c8ea0 in dlopen () from /libexec/ld-elf.so.1 #9 0x0000000800fc0b00 in apr_dso_load () from /usr/local/lib/libapr-1.so.0 #10 0x0000000000493c40 in dso_load (cmd=0x7fffffffcfd0, modhandlep=0x7fffffffca88, filename=0x802095138 "libexec/mod_mpm_event.so", used_filename=0x7fffffffca70) at mod_so.c:162 #11 0x0000000000493705 in load_module (cmd=0x7fffffffcfd0, dummy=0x7fffffffce80, modname=0x802095120 "mpm_event_module", filename=0x802095138 "libexec/mod_mpm_event.so") at mod_so.c:263 #12 0x0000000000478ef3 in invoke_cmd (cmd=0x4b5130, parms=0x7fffffffcfd0, mconfig=0x7fffffffce80, args=0x80207b445 "") at config.c:923 #13 0x00000000004799c0 in execute_now (cmd_line=0x80207b4e0 "LoadModule", args=0x80207b41b "mpm_event_module libexec/mod_mpm_event.so", parms=0x7fffffffcfd0, p=0x802021028, ptemp=0x80207b028, sub_tree=0x7fffffffce80, parent=0x0) at config.c:1688 This makes me think that what is happening is that two restart signals are arriving in rapid succession. The first initiates a restart, and then second requests a restart after the previous restart has begun but before event_pre_config() has initialized the retained variable in the newly-loaded mod_mpm_event.so. If that's the case, then it may be sufficient simply to check retained before writing to it and just return if it is NULL (similar to what's done if restart_pending is already set). If so, the (trivial) attached patch accomplishes that. However, if a NULL value for retained indicates that the server hasn't finished a previous restart, perhaps the check should be one line higher (above "restart_pending = 1;") to short-circuit the second restart completely. It's also entirely possible that there's much more going on here and the NULL value for retained is indicative of a deeper problem. If the simple solution is not the correct one, this is something I'm happy to look into further and work to fix if someone would be willing to shove me in the right direction. -- You are receiving this mail because: You are the assignee for the bug. --------------------------------------------------------------------- To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org For additional commands, e-mail: bugs-h...@httpd.apache.org