I'm definitely no expert but... * What does it say when you 'ldd' the nagios binary? Are all the libraries the binary is linked against able to be found? Are those libraries up-to-date? * Where did you get nagios from? Did you compile it or is it pre-built? If pre-built, are there any updates? * I don't know Solaris well enough to know how to trace your running nagios with a very simple configuration, but that might be the next step. strace?
On 19 May 2010 10:49, nagios <nag...@chadmail.com> wrote: > Anybody? > > If you need extra information, just let me know what you need to see and > I'll upload it. > > Thanks. > > -----Original Message----- > From: "nagios" <nag...@chadmail.com> > To: nagios-users@lists.sourceforge.net > Date: Wed, 19 May 2010 01:42:15 +1000 > Subject: [Nagios-users] SIGSEGV when trying to use eventhandler > > Hi guys, > I am new to nagios but so far it's working well for me and is > monitoring a number of real and virtual hosts. Nagios 3.0.6 is installed on > an OpenSolaris 2009.06 host and monitoring routers other devices and VM's in > VirtualBox. > > My issue is when I try to add an event handler, I get a SIGSEGV and nagios > restarts. > > > I have posted the details of the code I am using and the error here... > http://pastebin.com/vBb7xTND and also below (but it reads better @ > pastebin). > > I have tried several different scripts and code combinations (even empty > scripts and commands like ls) and all give the same error. > > Can anyone help me work out why it's happening? > > Thanks. > > hosts.cfg > <snip> > define host{ > use windows-server ; Inherit default values from a template > host_name Server6 ; The name we're giving to this host > max_check_attempts 4 > event_handler vboxmanage-restart ; Restart the vm > alias Server 6 - Win2008 Server ; A longer name associated with the host > address 192.168.0.6 ; IP address of the host > } > <snip> > > commands.cfg - note I have tried various scripts here incl. ones from the > nagios guides/books and all give the same error. > <snip> > # 'vboxmanage_restart' command definition > define command{ > command_name vboxmanage-restart > # command_line ls > command_line sudo -u nas $USER1$/eventhandler/event_vboxmanage_restart -S > $SERVICESTATE$ -T $SERVICESTATETYPE$ -A $SERVICEATTEMPT$ -H Server6 > } > <snip> > > nagios.log > [1274193005] HOST ALERT: Server6;DOWN;SOFT;1;PING CRITICAL - Packet loss = > 100% > [1274193005] Caught SIGSEGV, shutting down... > [1274193005] Nagios 3.0.6 starting... (PID=5231) > [1274193005] Local time is Wed May 19 00:30:05 EST 2010 > [1274193005] LOG VERSION: 2.0 > [1274193005] Finished daemonizing... (New PID=5232) > > the scripts... (yes I know it should not be 777's but just to show it's not > a permissions thing) > -rwxrwxrwx 1 nagios nagios 1580 2010-05-18 00:52 event_vboxmanage_restart > -rwxrwxrwx 1 nagios nagios 3815 2010-05-18 23:07 filename.out > -rwxrwxrwx 1 nagios nagios 2211 2010-05-19 00:23 restart-httpd > n...@nas:/usr/nagios/libexec/eventhandler# > > The script work fine from the user nagios using sudo (added nagios to > /etc/sudoers) > n...@nas:…sr/nagios/libexec/eventhandler$ whoami > nagios > n...@nas:…sr/nagios/libexec/eventhandler$ sudo -u nas > ./event_vboxmanage_restart -S CRITICAL -T HARD -A 1 -H Server6 > CRITICAL(C) 2005-2010 Sun Microsystems, Inc. > > The event_vboxmanage_restart script...no that this is likely to be at fault > (I do not think anyway as I get the error with other very simple scripts > too). > #!/usr/bin/perl > > use Getopt::Long; > use Net::Telnet (); > use Switch; > my ($state,$type,$attempt,$cmd,$hostname); > open(MYOUTFILE, ">>/usr/nagios/libexec/eventhandler/filename.out"); > > &processargs; > print "$state"; > switch ($state) { > case "OK" { &state_OK } > case "WARNING" { &state_WARNING } > case "UNKNOWN" { &state_UNKNOWN } > case "CRITICAL" { &state_CRITICAL } > else { print "unrecognised state>$state" } > } > print MYOUTFILE">$state<"; > print MYOUTFILE">$hostname<"; > close(MYOUTFILE); > exit 0; > > sub processargs { > > GetOptions ( > "S|state=s" => \$state, > "T|type=s" => \$type, > "A|attempt=i" => \$attempt, > "H|hostname=s" => \$hostname, > "C|command=s" => \$cmd, > ); > } > > ### FUNC: print $state > sub print_state { > } > ### FUNC: print $state > sub state_OK { > } > ### FUNC: print $state > sub state_WARNING { > } > ### FUNC: print $state > sub state_UNKNOWN { > } > ### FUNC: print $state > sub state_CRITICAL { > if ("$type" eq "HARD" or ("$type" eq "SOFT" and $attempt == 3)) > {...@result=`vboxmanage controlvm $hostname acpipowerbutton`; foreach > (@result) > { > print MYOUTFILE"$_\n"; > };sleep(60);@result=`VBoxManage controlvm $hostname poweroff`;foreach > (@result) { > print MYOUTFILE"$_\n"; > }; @result=`VBoxManage startvm $hostname`; print "$result[1]"; > } > else { } > } > > As you can see from the below, it all works fine (ie. no SIGSEGV's) if I > comment out the eventhandler line from the hosts.cfg file. > [05-19-2010 01:33:50] SERVICE ALERT: > Server6;Explorer;OK;HARD;1;Explorer.EXE: Running > [05-19-2010 01:32:50] SERVICE ALERT: Server6;Uptime;OK;HARD;1;System Uptime > - 0 day(s) 0 hour(s) 9 minute(s) > [05-19-2010 01:32:40] SERVICE ALERT: Server6;C:\ Drive Space;OK;HARD;1;c:\ > - total: 39.90 Gb - used: 9.19 Gb (23%) - free 30.71 Gb (77%) > [05-19-2010 01:32:10] SERVICE ALERT: Server6;CPU Load;OK;HARD;1;CPU Load 3% > (5 min average) > [05-19-2010 01:25:00] HOST ALERT: Server6;UP;SOFT;4;PING OK - Packet loss = > 0%, RTA = 0.44 ms > [05-19-2010 01:23:50] SERVICE ALERT: > Server6;Explorer;CRITICAL;HARD;1;Connection refused > [05-19-2010 01:23:50] HOST ALERT: Server6;DOWN;SOFT;3;PING CRITICAL - > Packet loss = 100% > [05-19-2010 01:23:00] SERVICE ALERT: > Server6;Uptime;CRITICAL;HARD;1;CRITICAL - Socket timeout after 10 seconds > [05-19-2010 01:22:50] SERVICE ALERT: Server6;C:\ Drive > Space;CRITICAL;HARD;1;CRITICAL - Socket timeout after 10 seconds > [05-19-2010 01:22:30] HOST ALERT: Server6;DOWN;SOFT;2;PING CRITICAL - > Packet loss = 100% > [05-19-2010 01:22:20] SERVICE ALERT: Server6;CPU > Load;CRITICAL;HARD;1;CRITICAL - Socket timeout after 10 seconds > [05-19-2010 01:21:10] HOST ALERT: Server6;DOWN;SOFT;1;PING CRITICAL - > Packet loss = 100% > [05-19-2010 01:21:00] SERVICE ALERT: > Server6;Uptime;CRITICAL;SOFT;1;CRITICAL - Socket timeout after 10 seconds > [05-19-2010 01:20:50] SERVICE ALERT: Server6;C:\ Drive > Space;CRITICAL;SOFT;1;CRITICAL - Socket timeout after 10 seconds > [05-19-2010 01:02:10] SERVICE ALERT: Server6;CPU Load;OK;SOFT;1;CPU Load 0% > (5 min average) > [05-19-2010 01:00:50] SERVICE ALERT: Server6;Uptime;OK;SOFT;1;System Uptime > - 0 day(s) 0 hour(s) 57 minute(s) > [05-19-2010 01:00:40] SERVICE ALERT: Server6;C:\ Drive Space;OK;SOFT;1;c:\ > - total: 39.90 Gb - used: 9.19 Gb (23%) - free 30.71 Gb (77%) > > > > > ------------------------------------------------------------------------------ > > > _______________________________________________ > Nagios-users mailing list > Nagios-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when > reporting any issue. > ::: Messages without supporting info will risk being sent to /dev/null >
------------------------------------------------------------------------------
_______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null