> what happens to the newly spawned processes?
The -f tells truss to follow forks.
For completeness... The -l (that's an el) includes the thread-id and the pid
(the pid is what we want). The -t specifies the syscalls to trace, and the
!all turns them all off. The -s specifies signals to trace and the !SIGALRM
turns off the numerous alarms Apache creates. The -S specifies signals that
stop the process. Obviously, -p is used to specify the pid.
> what happens if the process segfaults immediately after it starts. You
> don't have enough time to get its PID.
I suppose you could be less lazy than I and edit apachectl to call truss
instead of using the -p option. :-)
truss -f -l -t \!all -s \!SIGALRM -S SIGSEGV /usr/local/bin/httpd -f
httpd.conf 2>&1 &
> Since I don't have an access to a Solaris system, is it possible for you
> to take the example code I've supplied below and apply these steps to it?
> So we can get a fully working example? Thanks a lot! (for example I'm not
> familiar with gcore... is it Solaris specific thing?)
gcore(1) will get a core image of a running process. Therefore, you can put
the core where you want it and have permission to write it. I assume it's a
Solaris goodie.
I don't have the pointer to the Bad::Segv module, but here's an example run.
To get the messages from truss, you need to keep your tty open. Otherwise,
redirect stdout/stderr somewhere else.
$ apachetl start
$ for pid in `ps -ef -o pid,comm | fgrep httpd | cut -d'/' -f1`;
do truss -f -l -t \!all -s \!SIGALRM -S SIGSEGV -p $pid 2>&1 &
done
[1] 23353
[2] 23354 <-- I'm only running one child for this example
$ kill -SEGV 662 <-- faking Bad::Segv (662 is the child pid)
662/1: Received signal #11, SIGSEGV, in accept() [caught]
662/1: siginfo: SIGSEGV pid=23306 uid=0
$ gcore 662
gcore: core.662 dumped
$ kill -9 662 <-- clean up the stopped process
(at this point, Apache forked a new child and truss is hooked on that one
too)
$ pkill truss <-- clean up the other truss processes that are still
running
$ gdb /usr/local/bin/httpd
(gdb) core-file core.662
...
#0 0xdfae4d2c in _so_accept () from /usr/lib/libc.so.1
(gdb)
Obviously, this isn't great to be doing on a production system since truss
stops the process after it dumps core and prevents Apache from reaping it.
So, you could use up a bunch of scoreboard slots and perhaps force httpd to
hit MaxClients if you segfault a lot.
--
Kyle Oppenheim
Tellme Networks, Inc.
http://www.tellme.com
-----Original Message-----
From: Stas Bekman [mailto:[EMAIL PROTECTED]]
Sent: Monday, August 06, 2001 8:33 PM
To: Kyle Oppenheim
Cc: mod_perl list
Subject: RE: Segfaults
[CC'ing back to the list for archival and possibly interesting followup
discussion]
On Mon, 6 Aug 2001, Kyle Oppenheim wrote:
> Here's another method to generate a core on Solaris that you may want to
add
> to the guide. (I hope I'm not repeating something already in the guide!)
>
> 1. Use truss(1) as root to stop a process on a segfault:
>
> truss -f -l -t \!all -s \!SIGALRM -S SIGSEGV -p <pid>
>
> or, to monitor all httpd processes (from bash):
>
> for pid in `ps -eaf -o pid,comm | fgrep httpd | cut -d'/' -f1`;
> do truss -f -l -t \!all -s \!SIGALRM -S SIGSEGV -p $pid 2>&1 &
> done
what happens to the newly spawned processes?
what happens if the process segfaults immediately after it starts. You
don't have enough time to get its PID.
> 2. Watch the server error_log for reaped processes
>
> 3. Use gcore to get a core of stopped process or attach gdb.
>
> 4. kill -9 the stopped process.
Since I don't have an access to a Solaris system, is it possible for you
to take the example code I've supplied below and apply these steps to it?
So we can get a fully working example? Thanks a lot! (for example I'm not
familiar with gcore... is it Solaris specific thing?)