Hi guys, I am new to nagios but so far it's working well for me and is monitoring a number of real and virtual hosts. Nagios 3.0.6 is installed on an OpenSolaris 2009.06 host and monitoring routers other devices and VM's in VirtualBox.
My issue is when I try to add an event handler, I get a SIGSEGV and nagios restarts. I have posted the details of the code I am using and the error here...http://pastebin.com/vBb7xTND and also below (but it reads better @ pastebin). I have tried several different scripts and code combinations (even empty scripts and commands like ls) and all give the same error. Can anyone help me work out why it's happening? Thanks. hosts.cfg <snip> define host{ use windows-server ; Inherit default values from a template host_name Server6 ; The name we're giving to this host max_check_attempts 4 event_handler vboxmanage-restart ; Restart the vm alias Server 6 - Win2008 Server ; A longer name associated with the host address 192.168.0.6 ; IP address of the host } <snip> commands.cfg - note I have tried various scripts here incl. ones from the nagios guides/books and all give the same error. <snip> # 'vboxmanage_restart' command definition define command{ command_name vboxmanage-restart # command_line ls command_line sudo -u nas $USER1$/eventhandler/event_vboxmanage_restart -S $SERVICESTATE$ -T $SERVICESTATETYPE$ -A $SERVICEATTEMPT$ -H Server6 } <snip> nagios.log [1274193005] HOST ALERT: Server6;DOWN;SOFT;1;PING CRITICAL - Packet loss = 100% [1274193005] Caught SIGSEGV, shutting down... [1274193005] Nagios 3.0.6 starting... (PID=5231) [1274193005] Local time is Wed May 19 00:30:05 EST 2010 [1274193005] LOG VERSION: 2.0 [1274193005] Finished daemonizing... (New PID=5232) the scripts... (yes I know it should not be 777's but just to show it's not a permissions thing) -rwxrwxrwx 1 nagios nagios 1580 2010-05-18 00:52 event_vboxmanage_restart -rwxrwxrwx 1 nagios nagios 3815 2010-05-18 23:07 filename.out -rwxrwxrwx 1 nagios nagios 2211 2010-05-19 00:23 restart-httpd n...@nas:/usr/nagios/libexec/eventhandler# The script work fine from the user nagios using sudo (added nagios to /etc/sudoers) n...@nas:…sr/nagios/libexec/eventhandler$ whoami nagios n...@nas:…sr/nagios/libexec/eventhandler$ sudo -u nas ./event_vboxmanage_restart -S CRITICAL -T HARD -A 1 -H Server6 CRITICAL(C) 2005-2010 Sun Microsystems, Inc. The event_vboxmanage_restart script...no that this is likely to be at fault (I do not think anyway as I get the error with other very simple scripts too). #!/usr/bin/perl use Getopt::Long; use Net::Telnet (); use Switch; my ($state,$type,$attempt,$cmd,$hostname); open(MYOUTFILE, ">>/usr/nagios/libexec/eventhandler/filename.out"); &processargs; print "$state"; switch ($state) { case "OK" { &state_OK } case "WARNING" { &state_WARNING } case "UNKNOWN" { &state_UNKNOWN } case "CRITICAL" { &state_CRITICAL } else { print "unrecognised state>$state" } } print MYOUTFILE">$state<"; print MYOUTFILE">$hostname<"; close(MYOUTFILE); exit 0; sub processargs { GetOptions ( "S|state=s" => \$state, "T|type=s" => \$type, "A|attempt=i" => \$attempt, "H|hostname=s" => \$hostname, "C|command=s" => \$cmd, ); } ### FUNC: print $state sub print_state { } ### FUNC: print $state sub state_OK { } ### FUNC: print $state sub state_WARNING { } ### FUNC: print $state sub state_UNKNOWN { } ### FUNC: print $state sub state_CRITICAL { if ("$type" eq "HARD" or ("$type" eq "SOFT" and $attempt == 3)) {...@result=`vboxmanage controlvm $hostname acpipowerbutton`; foreach (@result) { print MYOUTFILE"$_\n"; };sleep(60);@result=`VBoxManage controlvm $hostname poweroff`;foreach (@result) { print MYOUTFILE"$_\n"; }; @result=`VBoxManage startvm $hostname`; print "$result[1]"; } else { } } As you can see from the below, it all works fine (ie. no SIGSEGV's) if I comment out the eventhandler line from the hosts.cfg file. [05-19-2010 01:33:50] SERVICE ALERT: Server6;Explorer;OK;HARD;1;Explorer.EXE: Running [05-19-2010 01:32:50] SERVICE ALERT: Server6;Uptime;OK;HARD;1;System Uptime - 0 day(s) 0 hour(s) 9 minute(s) [05-19-2010 01:32:40] SERVICE ALERT: Server6;C:\ Drive Space;OK;HARD;1;c:\ - total: 39.90 Gb - used: 9.19 Gb (23%) - free 30.71 Gb (77%) [05-19-2010 01:32:10] SERVICE ALERT: Server6;CPU Load;OK;HARD;1;CPU Load 3% (5 min average) [05-19-2010 01:25:00] HOST ALERT: Server6;UP;SOFT;4;PING OK - Packet loss = 0%, RTA = 0.44 ms [05-19-2010 01:23:50] SERVICE ALERT: Server6;Explorer;CRITICAL;HARD;1;Connection refused [05-19-2010 01:23:50] HOST ALERT: Server6;DOWN;SOFT;3;PING CRITICAL - Packet loss = 100% [05-19-2010 01:23:00] SERVICE ALERT: Server6;Uptime;CRITICAL;HARD;1;CRITICAL - Socket timeout after 10 seconds [05-19-2010 01:22:50] SERVICE ALERT: Server6;C:\ Drive Space;CRITICAL;HARD;1;CRITICAL - Socket timeout after 10 seconds [05-19-2010 01:22:30] HOST ALERT: Server6;DOWN;SOFT;2;PING CRITICAL - Packet loss = 100% [05-19-2010 01:22:20] SERVICE ALERT: Server6;CPU Load;CRITICAL;HARD;1;CRITICAL - Socket timeout after 10 seconds [05-19-2010 01:21:10] HOST ALERT: Server6;DOWN;SOFT;1;PING CRITICAL - Packet loss = 100% [05-19-2010 01:21:00] SERVICE ALERT: Server6;Uptime;CRITICAL;SOFT;1;CRITICAL - Socket timeout after 10 seconds [05-19-2010 01:20:50] SERVICE ALERT: Server6;C:\ Drive Space;CRITICAL;SOFT;1;CRITICAL - Socket timeout after 10 seconds [05-19-2010 01:02:10] SERVICE ALERT: Server6;CPU Load;OK;SOFT;1;CPU Load 0% (5 min average) [05-19-2010 01:00:50] SERVICE ALERT: Server6;Uptime;OK;SOFT;1;System Uptime - 0 day(s) 0 hour(s) 57 minute(s) [05-19-2010 01:00:40] SERVICE ALERT: Server6;C:\ Drive Space;OK;SOFT;1;c:\ - total: 39.90 Gb - used: 9.19 Gb (23%) - free 30.71 Gb (77%)
------------------------------------------------------------------------------
_______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null