> what happens to the newly spawned processes?

The -f tells truss to follow forks.

For completeness... The -l (that's an el) includes the thread-id and the pid
(the pid is what we want).  The -t specifies the syscalls to trace, and the
!all turns them all off.  The -s specifies signals to trace and the !SIGALRM
turns off the numerous alarms Apache creates.  The -S specifies signals that
stop the process.  Obviously, -p is used to specify the pid.

> what happens if the process segfaults immediately after it starts. You
> don't have enough time to get its PID.

I suppose you could be less lazy than I and edit apachectl to call truss
instead of using the -p option.  :-)

truss -f -l -t \!all -s \!SIGALRM -S SIGSEGV /usr/local/bin/httpd -f
httpd.conf 2>&1 &

> Since I don't have an access to a Solaris system, is it possible for you
> to take the example code I've supplied below and apply these steps to it?
> So we can get a fully working example? Thanks a lot! (for example I'm not
> familiar with gcore... is it Solaris specific thing?)

gcore(1) will get a core image of a running process.  Therefore, you can put
the core where you want it and have permission to write it.  I assume it's a
Solaris goodie.

I don't have the pointer to the Bad::Segv module, but here's an example run.
To get the messages from truss, you need to keep your tty open.  Otherwise,
redirect stdout/stderr somewhere else.

$ apachetl start

$ for pid in `ps -ef -o pid,comm | fgrep httpd | cut -d'/' -f1`;
do truss -f -l -t \!all -s \!SIGALRM -S SIGSEGV -p $pid 2>&1 &
done
[1] 23353
[2] 23354             <-- I'm only running one child for this example

$ kill -SEGV 662      <-- faking Bad::Segv (662 is the child pid)
662/1:              Received signal #11, SIGSEGV, in accept() [caught]
662/1:                siginfo: SIGSEGV pid=23306 uid=0

$ gcore 662
gcore: core.662 dumped

$ kill -9 662         <-- clean up the stopped process

(at this point, Apache forked a new child and truss is hooked on that one
too)

$ pkill truss         <-- clean up the other truss processes that are still
running

$ gdb /usr/local/bin/httpd
(gdb) core-file core.662
...
#0  0xdfae4d2c in _so_accept () from /usr/lib/libc.so.1
(gdb)

Obviously, this isn't great to be doing on a production system since truss
stops the process after it dumps core and prevents Apache from reaping it.
So, you could use up a bunch of scoreboard slots and perhaps force httpd to
hit MaxClients if you segfault a lot.

--
Kyle Oppenheim
Tellme Networks, Inc.
http://www.tellme.com

-----Original Message-----
From: Stas Bekman [mailto:[EMAIL PROTECTED]]
Sent: Monday, August 06, 2001 8:33 PM
To: Kyle Oppenheim
Cc: mod_perl list
Subject: RE: Segfaults


[CC'ing back to the list for archival and possibly interesting followup
discussion]

On Mon, 6 Aug 2001, Kyle Oppenheim wrote:

> Here's another method to generate a core on Solaris that you may want to
add
> to the guide.  (I hope I'm not repeating something already in the guide!)
>
> 1. Use truss(1) as root to stop a process on a segfault:
>
> truss -f -l -t \!all -s \!SIGALRM -S SIGSEGV -p <pid>
>
> or, to monitor all httpd processes (from bash):
>
> for pid in `ps -eaf -o pid,comm | fgrep httpd | cut -d'/' -f1`;
> do truss -f -l -t \!all -s \!SIGALRM -S SIGSEGV -p $pid 2>&1 &
> done

what happens to the newly spawned processes?

what happens if the process segfaults immediately after it starts. You
don't have enough time to get its PID.

> 2. Watch the server error_log for reaped processes
>
> 3. Use gcore to get a core of stopped process or attach gdb.
>
> 4. kill -9 the stopped process.

Since I don't have an access to a Solaris system, is it possible for you
to take the example code I've supplied below and apply these steps to it?
So we can get a fully working example? Thanks a lot! (for example I'm not
familiar with gcore... is it Solaris specific thing?)

Reply via email to