I should also add that in scanning the list archives, I see that Chris Conn reported something that looks somewhat similar on Dec 31 '04:
--------------------snip---------------------- Hello, For some reason since I have upgraded to a newer RAID firmware on my SCSI controller, some mrtg processes hang indefinately and need to be killed manually. The rest of the server seems fine. Other than claiming there is a stale lock file, the next polls continue without problem, and when I kill the stale process I get an email with ERROR: Bailout after SIG TERM While I investigate this phenomenon, is there a way to set the maximum execution time of either the mrtg process or the perl execution? Thanks in advance, Chris --------------------snip---------------------- On Tue, Jan 25, 2005 at 10:44:22AM -0600, Larry Fahnoe wrote: > Hello, > > >From time to time I'm finding that mrtg will hang on snmp requests. > The processes will never die (until I manually kill them) and if the > processes are not killed, they will eventually collect to the point > that virtual memory is exhausted. This is happening on Red Hat > Enterprise Linux release 3 which is kept current with patches from Red > Hat. mrtg is 2.11.0, rrdtool is 1.0.49, and perl is 5.8.0. > > I've been seeing this problem almost exclusively with a bunch of > Nortel and Cisco switches, the routers do not cause the problem. I > have not (yet) isolated down to a particular switch, but I don't think > it is just one switch that is causing the problem. What I typically > see is three mrtg processes in a group that are hung. Here is a > recent example, using strace to see what the parent, child, and > grandchild processes are doing: > > # strace -v -p 14914 [parent process] > wait4(-1, <unfinished ...> > > # strace -v -p 14916 [child process] > select(16, [4], NULL, [4], NULL <unfinished ...> > > # strace -v -p 15051 [grandchild process] > recvfrom(4, <unfinished ...> > > # netstat -anp | grep 15051 > udp 0 0 0.0.0.0:40692 0.0.0.0:* > 15051/perl > > Upon killing the grandchild, I get the following in the log: > > ERROR: Bailout after SIG TERM > ERROR: fork 0 has died ahead of time ... > Command exited with non-zero status 29 > 16.80user 0.65system 21:55:23elapsed 0%CPU (0avgtext+0avgdata > 0maxresident)k > 0inputs+0outputs (719major+35652minor)pagefaults 0swaps > > I have been seeing this off and on for several months with different > versions of mrtg and perl. My thought is that the snmp request > timeouts are not being honored but beyond that I'm stumped. Any > insight into what might be happening here? > > -- Larry Fahnoe, Fahnoe Technology Consulting, [EMAIL PROTECTED] 952/925-0744 Minneapolis, Minnesota www.FahnoeTech.com -- Unsubscribe mailto:[EMAIL PROTECTED] Help mailto:[EMAIL PROTECTED] Archive http://www.ee.ethz.ch/~slist/mrtg-developers
