I've never had much luck getting coredumps on Linux after a seg fault or whatever when I start the server as root. My usual procedure was to recreate the problem with httpd started as non-root, get a coredump, then debug the problem.

Recently I found out that some of my httpd's were seg faulting after an internal IBM security scan. I couldn't debug the problem with my usual technique of running as non-root because the security scan was only looking for a web server on port 80 AFAIK. So I decided to dig into what the deal was with root coredumps.

Turns out that if a program does a setuid(), the kernel turns off current->mm->dumpable for your process and you won't get a coredump, unless you do something like this:

Index: os/unix/unixd.c
===================================================================
RCS file: /home/cvs/httpd-2.0/os/unix/unixd.c,v
retrieving revision 1.56
diff -u -d -b -r1.56 unixd.c
--- os/unix/unixd.c     3 Feb 2003 17:53:17 -0000       1.56
+++ os/unix/unixd.c     3 Mar 2003 16:17:14 -0000
@@ -90,6 +90,8 @@
 #include <sys/sem.h>
 #endif

+#include <sys/prctl.h>
+
 unixd_config_rec unixd_config;

 /* Set group privileges.
@@ -180,6 +182,11 @@
                    "setuid: unable to change to uid: %ld",
                     (long) unixd_config.user_id);
        return -1;
+    }
+    if (prctl(PR_SET_DUMPABLE, 1)) {
+        ap_log_error(APLOG_MARK, APLOG_ALERT, errno, NULL,
+                     "set dumpable failed - this child will not produce"
+                     " coredumps after software failures");
     }
 #endif
     return 0;

I mentioned this to other Apache folks around here and got some interesting reactions:

* If we don't control this with a directive, we might violate the rule of least astonishment. For example, admins might be living with some buggy modules, and all of a sudden their disks fill up with unwanted coredump files. (Of course ServerRoot or CoreDumpDirectory would have to be writable by the httpd User for that to happen.)

* If we do control this with a new directive and we don't make it very clear that this is just a Linux 2.4+ thing, users might think that turning on the new directive will magically solve all lack-of-coredump issues. It won't...there are Linux kernels < 2.4, ulimit -c, permission issues, and the CoreDumpDirectory directive to name a few things, not to mention sysadmin stuff you have to do on FreeBSD and Solaris.

One solution might be to control it with CoreDumpDirectory. If that's in the config file, one would assume the admin wants coredumps on failures.

Opinions?

I'm also curious about how admins with Linux production sites are dealing with seg faults etc. today if they are not getting coredumps.

Thanks,
Greg



Reply via email to