Hi to all,
I'm having a problem with our OPSView installation. We have a master (cluster active/Standbye), some
slaves, runtime/opsview databases on an external DB and ODW database on another external db. Versions:
opsview-reports-2.2.4.258-1.ct5
opsview-web-3.13.0.6479-1.ct5
opsview-perl-3.13.0.464-1.ct5
opsview-base-3.13.0.6479-1.ct5
opsview-3.13.0.6479-1.ct5
opsview-core-3.13.0.6479-1.ct5
I'm having this error now:
NDO CRITICAL - oldest ndo file is 1038 seconds old, 208 ndo files backlogged
Looking at /var/log/opsview/opsviewd.log, restarting opsview on the master, I
obtain:
[2011/09/09 01:36:40] [import_ndologsd] [FATAL] Received kill signal - forced
death
[2011/09/09 01:36:40] [import_ndologsd] [INFO] Stopping ndo2db
[2011/09/09 01:36:40] [import_ndologsd] [INFO] Checking if ndo2db has stopped
(20 seconds to go)
[2011/09/09 01:36:41] [import_ndologsd] [INFO] Stopping import_ndologsd
[2011/09/09 01:36:41] [import_perfdatarrd] [WARN] Received kill signal -
gracefully shutting down
[2011/09/09 01:36:41] [import_perfdatarrd] [INFO] Stopping import_perfdatarrd
[2011/09/09 01:36:42] [import_ndoconfigend] [WARN] Received kill signal -
gracefully shutting down
[2011/09/09 01:36:42] [import_ndoconfigend] [INFO] Stopping import_ndoconfigend
[2011/09/09 01:36:43] [import_ndologsd] [INFO] Starting
[2011/09/09 01:36:43] [import_ndologsd] [INFO] ndo2db started
[2011/09/09 01:36:43] [import_ndologsd] [INFO] Daemonised
[2011/09/09 01:36:44] [import_perfdatarrd] [INFO] Starting
[2011/09/09 01:36:44] [import_perfdatarrd] [INFO] Daemonised
[2011/09/09 01:36:44] [import_ndoconfigend] [INFO] Starting
[2011/09/09 01:36:44] [import_ndoconfigend] [INFO] Daemonised
[2011/09/09 01:36:45] [opsviewd] [INFO] detected ssh version: 4.3
[2011/09/09 01:36:45] [opsviewd] [INFO] ssh tunnel options for slaves: -o TCPKeepAlive=yes -o
ServerAliveCountMax=3 -o ServerAliveInterval=10
[2011/09/09 01:36:45] [opsviewd] [INFO] Starting opsviewd
[2011/09/09 01:36:46] [nrd] [WARN] Process Backgrounded
[2011/09/09 01:36:46] [nrd] [ERROR] Using serializer: crypt
[2011/09/09 01:36:46] [nrd] [ERROR] Using writer: resultdir
[2011/09/09 01:36:46] [nrd] [WARN] 2011/09/09-01:36:46 NRD::Daemon (type Net::Server::MultiType ->
MultiType -> Net::Server::PreFork) starting! pid(15381)
[2011/09/09 01:36:46] [nrd] [WARN] Using default listen value of 128
[2011/09/09 01:36:46] [nrd] [WARN] Binding to TCP port 5669 on host 127.0.0.1
[2011/09/09 01:36:46] [nrd] [WARN] Setting gid to "3002 3002"
[2011/09/09 01:36:46] [opsviewd] [INFO] Starting tunnel for opsview-probe01
(snip) here all ssh tunnels are started...
[2011/09/09 01:36:51] [exec_and_log] [WARN]
[2011/09/09 01:37:19] [nrd] [WARN] Couldn't process packet: No data received at
/data/nagios/bin/../perl/lib/NRD/Daemon.pm line 23
[2011/09/09 01:37:21] [slave_node_resync] [INFO] Starting
[2011/09/09 01:37:21] [slave_node_resync] [INFO] Only running on HARD state
change - currently SOFT
[2011/09/09 01:37:21] [slave_node_resync] [INFO] Finished
[2011/09/09 01:38:19] [nrd] [WARN] Couldn't process packet: No data received at
/data/nagios/bin/../perl/lib/NRD/Daemon.pm line 23
[2011/09/09 01:39:19] [nrd] [WARN] Couldn't process packet: No data received at
/data/nagios/bin/../perl/lib/NRD/Daemon.pm line 23
Also a lot of nrd errors like last two lines as well as older errors (until
some hours ago) like:
[import_ndologsd] [WARN] Import of 1315522078.529595, size=94167, took 20.22
seconds > 5 seconds
Executing the check manually the seconds are increasing:
NDO CRITICAL - oldest ndo file is 3606 seconds old, 713 ndo files backlogged |
last_import=3606s;30;60 ndo_file_backlog=713;1000;10000
NDO CRITICAL - oldest ndo file is 3645 seconds old, 721 ndo files backlogged |
last_import=3645s;30;60 ndo_file_backlog=721;1000;10000
(...)
conf files for nrd:
[nagios@opsview-core01-tn etc]$ more nrd.conf
# NRD configuration - generated by nagconfgen
server_type PreFork
min_servers 4
min_spare_servers 1
max_spare_servers 2
max_servers 12
user nagios
group nagios
background 1
setsid 1
reverse_lookups off
host 127.0.0.1
port 5669
timeout 120
# logging
log_file Log::Log4perl
log_level 2
log4perl_conf /usr/local/nagios/etc/Log4perl.conf
log4perl_logger nrd
pid_file /usr/local/nagios/var/nrd.pid
# access control
cidr_allow 127.0.0.0/8
encryption_method 2
serializer crypt
encrypt_type Blowfish
encrypt_key 40AB67AA-2942-11E0-8BDD-010783C5AAA9
writer resultdir
check_result_path /usr/local/nagios/var/spool/checkresults
batch_results 1
[nagios@opsview-core01-tn etc]$ more /usr/local/nagios/etc/Log4perl.conf
# NOTE: This file will be reverted on an upgrade
# NOTE: more info is held here:
# http://www.perl.com/pub/a/2002/09/11/log4perl.html
# Check below for any special behaviour with DEBUG levels
log4perl.rootLogger=INFO, OPSVIEWD_LOGFILE
# Overrides to specific components
#log4perl.logger.create_and_send_configs=DEBUG
#log4perl.logger.sendcmd2slaves=DEBUG
#log4perl.logger.opsviewd=DEBUG
# Setting import_ndologsd to DEBUG will also copy ndologs into
var/ndologs.archive.
# Can take up to 30 seconds to acknowledge. Make sure you remember to revert
back
#log4perl.logger.import_ndologsd=DEBUG
#log4perl.logger.import_perfdatarrd=DEBUG
#log4perl.logger.import_ndoconfigend=DEBUG
#log4perl.logger.ndoutils_configdumpend=DEBUG
#log4perl.logger.exec_and_log=DEBUG
#log4perl.logger.import_excel=DEBUG
# You will need to increase the logging at nrd.conf to get debug messages out
#log4perl.logger.nrd=DEBUG
log4perl.appender.OPSVIEWD_LOGFILE=Log::Dispatch::FileRotate
log4perl.appender.OPSVIEWD_LOGFILE.filename=/var/log/opsview/opsviewd.log
log4perl.appender.OPSVIEWD_LOGFILE.mode=append
log4perl.appender.OPSVIEWD_LOGFILE.size=1000000
log4perl.appender.OPSVIEWD_LOGFILE.max=5
log4perl.appender.OPSVIEWD_LOGFILE.recreate=1
log4perl.appender.OPSVIEWD_LOGFILE.layout=PatternLayout
log4perl.appender.OPSVIEWD_LOGFILE.layout.ConversionPattern=[%d] [%c] [%p] %m%n
# Default the SCREEN appender to output to STDERR
log4perl.appender.SCREEN=Log::Log4perl::Appender::Screen
log4perl.appender.SCREEN.layout=Log::Log4perl::Layout::SimpleLayout
[nagios@opsview-core01-tn etc]$
What can I do to troubleshoot and solve this issue?
How test import manually?
Many thank's!
Simon
_______________________________________________
Opsview-users mailing list
[email protected]
http://lists.opsview.org/lists/listinfo/opsview-users