[Nagios-users] scheduled downtime Retention problem

2013-05-29 Thread Cedric Jeanneret
Hello,

I'm having some problem with the retention status with this setup:
nagios-3.4.1-3
Debian Wheezy

Problem:
I schedule some downtime for either a host or a service, but it's not here 
after a reload (or a restart) of the nagios3 service

Steps:
- schedule some downtime (for one month, if this helps)
- wait for the icon to appear in the web interface
- restart (or reload) nagios3 service
- scheduled downtime is gone

Some more information:
I did check the retention.dat file, it contains, at the end, this block which 
seems completely correct:
servicedowntime {
host_name=HOSTNAME
service_description=SERVICE
downtime_id=1
entry_time=1369832618
start_time=1369832609
end_time=1372518209
triggered_by=0
fixed=1
duration=2685600
is_in_effect=1
author=USER
comment=MY COMMENT
}

There's also a servicecomment containing some Nagios comment as well - THIS 
block is shown and taken in account.
I tried to see whether nagios encounter some error while reading its retention 
file, but nothing shown up…

servicecomment {
host_name=HOSTNAME
service_description=SERVICE
entry_type=2
comment_id=4
source=0
persistent=0
entry_time=1369833542
expires=0
expire_time=0
author=(Nagios Process)
comment_data=This service has been scheduled for fixed downtime from 2013-05-29 
15:03:29 to 2013-06-29 17:03:29.  Notifications for the service will not be 
sent out during that time period.
}


Retention configuration:

grep retain nagios.cfg  | grep -v '#'
retain_state_information=1
use_retained_program_state=1
use_retained_scheduling_info=1
retained_host_attribute_mask=0
retained_service_attribute_mask=0
retained_process_host_attribute_mask=0
retained_process_service_attribute_mask=0
retained_contact_host_attribute_mask=0
retained_contact_service_attribute_mask=0

Another information: the Acknowledge works as expected…

Any idea?

Thanks a lot for your help!

Cheers,

C.

--
Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
Get 100% visibility into your production application - at no cost.
Code-level diagnostics for performance bottlenecks with 2% overhead
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap1
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] scheduled downtime Retention problem

2013-05-29 Thread Cedric Jeanneret
Me gain,

Of course, *after* having searched for a while, and then sent the mail, I 
fumble on the official nagios bug:
http://tracker.nagios.org/view.php?id=338

Will check on debian tracker if they can integrate this patch…

Cheers,

C.

On Wed, 29 May 2013 15:23:18 +0200
Cedric Jeanneret nagios...@tengu.ch wrote:

 Hello,
 
 I'm having some problem with the retention status with this setup:
 nagios-3.4.1-3
 Debian Wheezy
 
 Problem:
 I schedule some downtime for either a host or a service, but it's not here 
 after a reload (or a restart) of the nagios3 service
 
 Steps:
 - schedule some downtime (for one month, if this helps)
 - wait for the icon to appear in the web interface
 - restart (or reload) nagios3 service
 - scheduled downtime is gone
 
 Some more information:
 I did check the retention.dat file, it contains, at the end, this block which 
 seems completely correct:
 servicedowntime {
 host_name=HOSTNAME
 service_description=SERVICE
 downtime_id=1
 entry_time=1369832618
 start_time=1369832609
 end_time=1372518209
 triggered_by=0
 fixed=1
 duration=2685600
 is_in_effect=1
 author=USER
 comment=MY COMMENT
 }
 
 There's also a servicecomment containing some Nagios comment as well - THIS 
 block is shown and taken in account.
 I tried to see whether nagios encounter some error while reading its 
 retention file, but nothing shown up…
 
 servicecomment {
 host_name=HOSTNAME
 service_description=SERVICE
 entry_type=2
 comment_id=4
 source=0
 persistent=0
 entry_time=1369833542
 expires=0
 expire_time=0
 author=(Nagios Process)
 comment_data=This service has been scheduled for fixed downtime from 
 2013-05-29 15:03:29 to 2013-06-29 17:03:29.  Notifications for the service 
 will not be sent out during that time period.
 }
 
 
 Retention configuration:
 
 grep retain nagios.cfg  | grep -v '#'
 retain_state_information=1
 use_retained_program_state=1
 use_retained_scheduling_info=1
 retained_host_attribute_mask=0
 retained_service_attribute_mask=0
 retained_process_host_attribute_mask=0
 retained_process_service_attribute_mask=0
 retained_contact_host_attribute_mask=0
 retained_contact_service_attribute_mask=0
 
 Another information: the Acknowledge works as expected…
 
 Any idea?
 
 Thanks a lot for your help!
 
 Cheers,
 
 C.
 
 --
 Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
 Get 100% visibility into your production application - at no cost.
 Code-level diagnostics for performance bottlenecks with 2% overhead
 Download for free and get started troubleshooting in minutes.
 http://p.sf.net/sfu/appdyn_d2d_ap1
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when reporting 
 any issue. 
 ::: Messages without supporting info will risk being sent to /dev/null

--
Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
Get 100% visibility into your production application - at no cost.
Code-level diagnostics for performance bottlenecks with 2% overhead
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap1
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] nagios.cmd does not seem to work properly

2010-05-06 Thread Cedric Jeanneret
Hello,

We'd like to use nagios.cmd pipe to send some signals. According to the 
examples in documentation, we can do a simple thing like that :

/bin/printf [%lu] ACKNOWLEDGE_HOST_PROBLEM;host1;1;1;1;Some One;Some 
Acknowledgement Comment\n $now  $commandfile

The problem is, it seems it's not taken in account - no log in nagios.log, no 
message is printed in shell, and more over nagios doesn't seem to do anything.

I tested with an error inside the command, and it seems to be parsed:
[1273156151] Warning: Unrecognized external command - 
SCHEDULE_HOST_SVC_CHECKs;fqdn;1273156151

but nothing else happens.

Is there a way to be sure that nagios do what we ask?

The main thing is:
- we're using NSCA plugin so that we have about 80 servers with their own 
nagios, sending results and status to a single host
- we tried to run this command only on remote hosts - maybe we have to use it 
on the nsca server ?

Any help is welcome.

Thank you in advance !

Best regards,

C.

-- 
Cédric Jeanneret |  System Administrator
021 619 10 32|  Camptocamp SA
cedric.jeanne...@camptocamp.com  |  PSE-A / EPFL

--
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] NSCA strange behaviour

2009-12-11 Thread Cedric Jeanneret
Hello,

problem solved, it was indeed a version problem (minor number).. redhat has 
released a new nsca client/server package, removing a patch, and that made it 
incompatible with previous version.

See:
http://www.mail-archive.com/nagios-users@lists.sourceforge.net/msg19402.html 
and related patch:
http://cvs.fedoraproject.org/viewvc/EL-5/nsca/nsca-increase_max_plugin_output_length.patch?revision=1.1view=markupsortby=log

Best regards,

C.

On Tue, 8 Dec 2009 10:52:50 -0600
Marc Powell m...@ena.com wrote:

 
 On Dec 8, 2009, at 4:40 AM, Cedric Jeanneret wrote:
 
  - Enbling debug for nsca on server01 doesn't show me anything interesting. 
  I just don't see where nsca catch up client22 status, and it keeps on 
  saying :
  Warning: The results of host 'client22.domain.lt' are stale by 0d 0h 2m 0s 
  (threshold=0d 0h 6m 0s).  I'm forcing an immediate check of the host.
  
  On another hand, it shows me:
  [1260267958.216051] [016.1] [pid=23191] Check results for service 'Cron 
  service' on host 'client22.domain.lt' are fresh.
 
 What do they show? What you've quoted here doesn't come from NSCA (that I can 
 tell from a simple grep). --
 
 nsca-2.7.2]# grep -r 'are fresh' *
 nsca-2.7.2]# 
 
 I'm sure that comes from nagios via nagios.log, not NSCA. Are you sure you're 
 looking in the right place? It's typically in /var/log/messages. 
 
 How are you running nsca? daemon mode or via inetd? If inetd, is inetd 
 rejecting the connection?
 
 --
 Marc
 
 
 --
 Return on Information:
 Google Enterprise Search pays you back
 Get the facts.
 http://p.sf.net/sfu/google-dev2dev
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when reporting 
 any issue. 
 ::: Messages without supporting info will risk being sent to /dev/null


-- 
Cédric Jeanneret |  System Administrator
021 619 10 32|  Camptocamp SA
cedric.jeanne...@camptocamp.com  |  PSE-A / EPFL


signature.asc
Description: PGP signature
--
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] NSCA strange behaviour

2009-12-09 Thread Cedric Jeanneret
Hello again

I've made some other tests:

- I changed encryption algo on server01 and client22, restarted nagiosnsca on 
server01, restarted nagios on client22.
- let it run like that, checking logs for something. As I thought, nsca begins 
to output lot of error regarding version client and/or encryption method and/or 
password. Nice.
- I put back the right encryption method on server01 but NOT on client22. I 
should have seen a pattern like Received invalid packet but nothing.

I forced some status update from client22: 
for i in $(seq 1000); do 
  echo -n 'host  '; 
  /usr/local/bin/submit_ochp $(hostname -f) UP Host is up; 
  sleep 2; 
done

Nothing. TCPDump shows me traffic, in and out for both hosts...

Last test, I stopped nagiosnsca on server01, removed nsca.dump objects.cache 
retention.dat files, and start again nscanagios

If we let the fact that all my hosts are pending, my client22 doesn't push 
any status... nothing in nsca logs.

Any other idea ? It's like the nsca daemon ignore (without any output) client22 
queries. and client22 doesn't know about it.

Best regards,

C.

-- 
Cédric Jeanneret |  System Administrator
021 619 10 32|  Camptocamp SA
cedric.jeanne...@camptocamp.com  |  PSE-A / EPFL


signature.asc
Description: PGP signature
--
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] NSCA strange behaviour

2009-12-08 Thread Cedric Jeanneret
Hello,

I'm having troubles with NSCA.
What we have :

- about 47 passive hosts
- about 220 passive services

Versions : all are redhat servers, with:
- NSCA 2.7.2 (latest one)
- Nagios 3.1.2

We have a single nagios aggregator, which collect all NSCA status from the 
other hosts.

What's happening:
a host was reinstalled yesterday (say client22), and now it seems NSCA daemon 
on the aggregator (say server01) doesn't seem to collect data.

What I've done:

- tcpdump on both client22 and server01, both show me traffic between them, on 
NSCA default port (5667)

- checked iptables rules, all is ok (as tcpdump shows me traffic, that's a 
confirmation)

- trying to push status by hand from client22 to server01; ALL packets are sent 
successfully 1 data packet(s) sent to host successfully.. I've done this 
with a loop like that:
for i in $(seq 1000); do /usr/local/bin/submit_ochp $(hostname -f) UP 'Host is 
up'; sleep 2; done

- Enbling debug for nsca on server01 doesn't show me anything interesting. I 
just don't see where nsca catch up client22 status, and it keeps on saying :
Warning: The results of host 'client22.domain.lt' are stale by 0d 0h 2m 0s 
(threshold=0d 0h 6m 0s).  I'm forcing an immediate check of the host.

On another hand, it shows me:
[1260267958.216051] [016.1] [pid=23191] Check results for service 'Cron 
service' on host 'client22.domain.lt' are fresh.


I really don't know where to find a solution, neither where is the real 
problem. We have another network with about 200 passive hosts and over 350 
passive services, and it works fine.

The only differences are :
- the working network is debian-only
- the working network's NSCA server doesn't do anything else than central 
nagios server. server01 does some other stuff, like syslog server and collectd 
server... maybe there's a bottleneck in there, but I can't be sure about that.

Does anyone of you have an idea ?

Thank you in advance.

Best regards,

C.



-- 
Cédric Jeanneret |  System Administrator
021 619 10 32|  Camptocamp SA
cedric.jeanne...@camptocamp.com  |  PSE-A / EPFL


signature.asc
Description: PGP signature
--
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] NSCA strange behaviour

2009-12-08 Thread Cedric Jeanneret
Hello,

As far as I can see, only new one. Even if they are just reinstalled (client22 
was in nagios config before it was reinstalled).

Best regards,

C.

On Tue, 8 Dec 2009 08:08:03 -0600
Greg Pangrazio pangr...@gmail.com wrote:

 Do all of your clients fail, or just the new one?
 
 Greg Pangrazio
 pangr...@gmail.com
 
 
 
 
 On Tue, Dec 8, 2009 at 4:40 AM, Cedric Jeanneret
 cedric.jeanne...@camptocamp.com wrote:
  Hello,
 
  I'm having troubles with NSCA.
  What we have :
 
  - about 47 passive hosts
  - about 220 passive services
 
  Versions : all are redhat servers, with:
  - NSCA 2.7.2 (latest one)
  - Nagios 3.1.2
 
  We have a single nagios aggregator, which collect all NSCA status from 
  the other hosts.
 
  What's happening:
  a host was reinstalled yesterday (say client22), and now it seems NSCA 
  daemon on the aggregator (say server01) doesn't seem to collect data.
 
  What I've done:
 
  - tcpdump on both client22 and server01, both show me traffic between them, 
  on NSCA default port (5667)
 
  - checked iptables rules, all is ok (as tcpdump shows me traffic, that's a 
  confirmation)
 
  - trying to push status by hand from client22 to server01; ALL packets are 
  sent successfully 1 data packet(s) sent to host successfully.. I've 
  done this with a loop like that:
  for i in $(seq 1000); do /usr/local/bin/submit_ochp $(hostname -f) UP 'Host 
  is up'; sleep 2; done
 
  - Enbling debug for nsca on server01 doesn't show me anything interesting. 
  I just don't see where nsca catch up client22 status, and it keeps on 
  saying :
  Warning: The results of host 'client22.domain.lt' are stale by 0d 0h 2m 0s 
  (threshold=0d 0h 6m 0s).  I'm forcing an immediate check of the host.
 
  On another hand, it shows me:
  [1260267958.216051] [016.1] [pid=23191] Check results for service 'Cron 
  service' on host 'client22.domain.lt' are fresh.
 
 
  I really don't know where to find a solution, neither where is the real 
  problem. We have another network with about 200 passive hosts and over 350 
  passive services, and it works fine.
 
  The only differences are :
  - the working network is debian-only
  - the working network's NSCA server doesn't do anything else than central 
  nagios server. server01 does some other stuff, like syslog server and 
  collectd server... maybe there's a bottleneck in there, but I can't be sure 
  about that.
 
  Does anyone of you have an idea ?
 
  Thank you in advance.
 
  Best regards,
 
  C.
 
 
 
  --
  Cédric Jeanneret                 |  System Administrator
  021 619 10 32                    |  Camptocamp SA
  cedric.jeanne...@camptocamp.com  |  PSE-A / EPFL
 
  --
  Return on Information:
  Google Enterprise Search pays you back
  Get the facts.
  http://p.sf.net/sfu/google-dev2dev
 
  ___
  Nagios-users mailing list
  Nagios-users@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/nagios-users
  ::: Please include Nagios version, plugin version (-v) and OS when 
  reporting any issue.
  ::: Messages without supporting info will risk being sent to /dev/null
 


-- 
Cédric Jeanneret |  System Administrator
021 619 10 32|  Camptocamp SA
cedric.jeanne...@camptocamp.com  |  PSE-A / EPFL


signature.asc
Description: PGP signature
--
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] NSCA strange behaviour

2009-12-08 Thread Cedric Jeanneret
Hello again,

In fact, configuration files are dployed via 
puppet(http://reductivelabs.com/trac/puppet/wiki), so all files (should be) are 
the same. I'll check it, but as puppet runs on every hosts, they all should 
have the same files.
IP addresses are the same (fixed IP, fixed ports).

I'll check files once again.

If anyone has another idea...

Thank you.

Best regards,

C.

On Tue, 8 Dec 2009 08:31:59 -0600
Greg Pangrazio pangr...@gmail.com wrote:

 It sounds like there is something that changed with the re-install.
 
 Is the IP address of the system the same?
 
 Did you pick the same encryption type in the nsca config?
 
 Can you diff the nsca config with a working host?
 Greg Pangrazio
 pangr...@gmail.com
 
 
 
 
 
 On Tue, Dec 8, 2009 at 8:14 AM, Cedric Jeanneret
 cedric.jeanne...@camptocamp.com wrote:
  Hello,
 
  As far as I can see, only new one. Even if they are just reinstalled 
  (client22 was in nagios config before it was reinstalled).
 
  Best regards,
 
  C.
 
  On Tue, 8 Dec 2009 08:08:03 -0600
  Greg Pangrazio pangr...@gmail.com wrote:
 
  Do all of your clients fail, or just the new one?
 
  Greg Pangrazio
  pangr...@gmail.com
 
 
 
 
  On Tue, Dec 8, 2009 at 4:40 AM, Cedric Jeanneret
  cedric.jeanne...@camptocamp.com wrote:
   Hello,
  
   I'm having troubles with NSCA.
   What we have :
  
   - about 47 passive hosts
   - about 220 passive services
  
   Versions : all are redhat servers, with:
   - NSCA 2.7.2 (latest one)
   - Nagios 3.1.2
  
   We have a single nagios aggregator, which collect all NSCA status from 
   the other hosts.
  
   What's happening:
   a host was reinstalled yesterday (say client22), and now it seems NSCA 
   daemon on the aggregator (say server01) doesn't seem to collect data.
  
   What I've done:
  
   - tcpdump on both client22 and server01, both show me traffic between 
   them, on NSCA default port (5667)
  
   - checked iptables rules, all is ok (as tcpdump shows me traffic, that's 
   a confirmation)
  
   - trying to push status by hand from client22 to server01; ALL packets 
   are sent successfully 1 data packet(s) sent to host successfully.. 
   I've done this with a loop like that:
   for i in $(seq 1000); do /usr/local/bin/submit_ochp $(hostname -f) UP 
   'Host is up'; sleep 2; done
  
   - Enbling debug for nsca on server01 doesn't show me anything 
   interesting. I just don't see where nsca catch up client22 status, and 
   it keeps on saying :
   Warning: The results of host 'client22.domain.lt' are stale by 0d 0h 2m 
   0s (threshold=0d 0h 6m 0s).  I'm forcing an immediate check of the host.
  
   On another hand, it shows me:
   [1260267958.216051] [016.1] [pid=23191] Check results for service 'Cron 
   service' on host 'client22.domain.lt' are fresh.
  
  
   I really don't know where to find a solution, neither where is the real 
   problem. We have another network with about 200 passive hosts and over 
   350 passive services, and it works fine.
  
   The only differences are :
   - the working network is debian-only
   - the working network's NSCA server doesn't do anything else than 
   central nagios server. server01 does some other stuff, like syslog 
   server and collectd server... maybe there's a bottleneck in there, but I 
   can't be sure about that.
  
   Does anyone of you have an idea ?
  
   Thank you in advance.
  
   Best regards,
  
   C.
  
  
  
   --
   Cédric Jeanneret                 |  System Administrator
   021 619 10 32                    |  Camptocamp SA
   cedric.jeanne...@camptocamp.com  |  PSE-A / EPFL
  
   --
   Return on Information:
   Google Enterprise Search pays you back
   Get the facts.
   http://p.sf.net/sfu/google-dev2dev
  
   ___
   Nagios-users mailing list
   Nagios-users@lists.sourceforge.net
   https://lists.sourceforge.net/lists/listinfo/nagios-users
   ::: Please include Nagios version, plugin version (-v) and OS when 
   reporting any issue.
   ::: Messages without supporting info will risk being sent to /dev/null
  
 
 
  --
  Cédric Jeanneret                 |  System Administrator
  021 619 10 32                    |  Camptocamp SA
  cedric.jeanne...@camptocamp.com  |  PSE-A / EPFL
 
  --
  Return on Information:
  Google Enterprise Search pays you back
  Get the facts.
  http://p.sf.net/sfu/google-dev2dev
 
  ___
  Nagios-users mailing list
  Nagios-users@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/nagios-users
  ::: Please include Nagios version, plugin version (-v) and OS when 
  reporting any issue.
  ::: Messages without supporting info will risk being sent to /dev/null
 


-- 
Cédric Jeanneret |  System Administrator
021 619 10 32|  Camptocamp SA
cedric.jeanne

Re: [Nagios-users] NSCA strange behaviour

2009-12-08 Thread Cedric Jeanneret
I just rsync-ed a complet config from a working host, then:
for file in $(grep -lr working-client *); do
  sed -i 's/working-client/client22/g' $i
done

... and it doesn't work any better. More over, puppet doesn't want to change 
anything...

I'm stuck... :(

On Tue, 8 Dec 2009 08:31:59 -0600
Greg Pangrazio pangr...@gmail.com wrote:

 It sounds like there is something that changed with the re-install.
 
 Is the IP address of the system the same?
 
 Did you pick the same encryption type in the nsca config?
 
 Can you diff the nsca config with a working host?
 Greg Pangrazio
 pangr...@gmail.com
 
 
 
 
 
 On Tue, Dec 8, 2009 at 8:14 AM, Cedric Jeanneret
 cedric.jeanne...@camptocamp.com wrote:
  Hello,
 
  As far as I can see, only new one. Even if they are just reinstalled 
  (client22 was in nagios config before it was reinstalled).
 
  Best regards,
 
  C.
 
  On Tue, 8 Dec 2009 08:08:03 -0600
  Greg Pangrazio pangr...@gmail.com wrote:
 
  Do all of your clients fail, or just the new one?
 
  Greg Pangrazio
  pangr...@gmail.com
 
 
 
 
  On Tue, Dec 8, 2009 at 4:40 AM, Cedric Jeanneret
  cedric.jeanne...@camptocamp.com wrote:
   Hello,
  
   I'm having troubles with NSCA.
   What we have :
  
   - about 47 passive hosts
   - about 220 passive services
  
   Versions : all are redhat servers, with:
   - NSCA 2.7.2 (latest one)
   - Nagios 3.1.2
  
   We have a single nagios aggregator, which collect all NSCA status from 
   the other hosts.
  
   What's happening:
   a host was reinstalled yesterday (say client22), and now it seems NSCA 
   daemon on the aggregator (say server01) doesn't seem to collect data.
  
   What I've done:
  
   - tcpdump on both client22 and server01, both show me traffic between 
   them, on NSCA default port (5667)
  
   - checked iptables rules, all is ok (as tcpdump shows me traffic, that's 
   a confirmation)
  
   - trying to push status by hand from client22 to server01; ALL packets 
   are sent successfully 1 data packet(s) sent to host successfully.. 
   I've done this with a loop like that:
   for i in $(seq 1000); do /usr/local/bin/submit_ochp $(hostname -f) UP 
   'Host is up'; sleep 2; done
  
   - Enbling debug for nsca on server01 doesn't show me anything 
   interesting. I just don't see where nsca catch up client22 status, and 
   it keeps on saying :
   Warning: The results of host 'client22.domain.lt' are stale by 0d 0h 2m 
   0s (threshold=0d 0h 6m 0s).  I'm forcing an immediate check of the host.
  
   On another hand, it shows me:
   [1260267958.216051] [016.1] [pid=23191] Check results for service 'Cron 
   service' on host 'client22.domain.lt' are fresh.
  
  
   I really don't know where to find a solution, neither where is the real 
   problem. We have another network with about 200 passive hosts and over 
   350 passive services, and it works fine.
  
   The only differences are :
   - the working network is debian-only
   - the working network's NSCA server doesn't do anything else than 
   central nagios server. server01 does some other stuff, like syslog 
   server and collectd server... maybe there's a bottleneck in there, but I 
   can't be sure about that.
  
   Does anyone of you have an idea ?
  
   Thank you in advance.
  
   Best regards,
  
   C.
  
  
  
   --
   Cédric Jeanneret                 |  System Administrator
   021 619 10 32                    |  Camptocamp SA
   cedric.jeanne...@camptocamp.com  |  PSE-A / EPFL
  
   --
   Return on Information:
   Google Enterprise Search pays you back
   Get the facts.
   http://p.sf.net/sfu/google-dev2dev
  
   ___
   Nagios-users mailing list
   Nagios-users@lists.sourceforge.net
   https://lists.sourceforge.net/lists/listinfo/nagios-users
   ::: Please include Nagios version, plugin version (-v) and OS when 
   reporting any issue.
   ::: Messages without supporting info will risk being sent to /dev/null
  
 
 
  --
  Cédric Jeanneret                 |  System Administrator
  021 619 10 32                    |  Camptocamp SA
  cedric.jeanne...@camptocamp.com  |  PSE-A / EPFL
 
  --
  Return on Information:
  Google Enterprise Search pays you back
  Get the facts.
  http://p.sf.net/sfu/google-dev2dev
 
  ___
  Nagios-users mailing list
  Nagios-users@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/nagios-users
  ::: Please include Nagios version, plugin version (-v) and OS when 
  reporting any issue.
  ::: Messages without supporting info will risk being sent to /dev/null
 


-- 
Cédric Jeanneret |  System Administrator
021 619 10 32|  Camptocamp SA
cedric.jeanne...@camptocamp.com  |  PSE-A / EPFL


signature.asc
Description: PGP signature

Re: [Nagios-users] NSCA strange behaviour

2009-12-08 Thread Cedric Jeanneret
Hello Marcel,

well, same on server01 and client22... working-server has an earlier one, but 
it works on this.

NSCA doesn't show me any encryption error (encryption method and passphrase are 
correct on both ends) :/

Regards,

C.

On Tue, 8 Dec 2009 14:34:43 -0200
Marcel mits...@gmail.com wrote:

 check openssl versions and compatibility.
 
 On Tue, Dec 8, 2009 at 2:13 PM, Cedric Jeanneret 
 cedric.jeanne...@camptocamp.com wrote:
 
  Hello again,
 
  In fact, configuration files are dployed via puppet(
  http://reductivelabs.com/trac/puppet/wiki), so all files (should be) are
  the same. I'll check it, but as puppet runs on every hosts, they all should
  have the same files.
  IP addresses are the same (fixed IP, fixed ports).
 
  I'll check files once again.
 
  If anyone has another idea...
 
  Thank you.
 
  Best regards,
 
  C.
 
  On Tue, 8 Dec 2009 08:31:59 -0600
  Greg Pangrazio pangr...@gmail.com wrote:
 
   It sounds like there is something that changed with the re-install.
  
   Is the IP address of the system the same?
  
   Did you pick the same encryption type in the nsca config?
  
   Can you diff the nsca config with a working host?
   Greg Pangrazio
   pangr...@gmail.com
  
  
  
  
  
   On Tue, Dec 8, 2009 at 8:14 AM, Cedric Jeanneret
   cedric.jeanne...@camptocamp.com wrote:
Hello,
   
As far as I can see, only new one. Even if they are just reinstalled
  (client22 was in nagios config before it was reinstalled).
   
Best regards,
   
C.
   
On Tue, 8 Dec 2009 08:08:03 -0600
Greg Pangrazio pangr...@gmail.com wrote:
   
Do all of your clients fail, or just the new one?
   
Greg Pangrazio
pangr...@gmail.com
   
   
   
   
On Tue, Dec 8, 2009 at 4:40 AM, Cedric Jeanneret
cedric.jeanne...@camptocamp.com wrote:
 Hello,

 I'm having troubles with NSCA.
 What we have :

 - about 47 passive hosts
 - about 220 passive services

 Versions : all are redhat servers, with:
 - NSCA 2.7.2 (latest one)
 - Nagios 3.1.2

 We have a single nagios aggregator, which collect all NSCA status
  from the other hosts.

 What's happening:
 a host was reinstalled yesterday (say client22), and now it seems
  NSCA daemon on the aggregator (say server01) doesn't seem to collect data.

 What I've done:

 - tcpdump on both client22 and server01, both show me traffic
  between them, on NSCA default port (5667)

 - checked iptables rules, all is ok (as tcpdump shows me traffic,
  that's a confirmation)

 - trying to push status by hand from client22 to server01; ALL
  packets are sent successfully 1 data packet(s) sent to host
  successfully.. I've done this with a loop like that:
 for i in $(seq 1000); do /usr/local/bin/submit_ochp $(hostname -f)
  UP 'Host is up'; sleep 2; done

 - Enbling debug for nsca on server01 doesn't show me anything
  interesting. I just don't see where nsca catch up client22 status, and it
  keeps on saying :
 Warning: The results of host 'client22.domain.lt' are stale by 0d
  0h 2m 0s (threshold=0d 0h 6m 0s).  I'm forcing an immediate check of the
  host.

 On another hand, it shows me:
 [1260267958.216051] [016.1] [pid=23191] Check results for service
  'Cron service' on host 'client22.domain.lt' are fresh.


 I really don't know where to find a solution, neither where is the
  real problem. We have another network with about 200 passive hosts and over
  350 passive services, and it works fine.

 The only differences are :
 - the working network is debian-only
 - the working network's NSCA server doesn't do anything else than
  central nagios server. server01 does some other stuff, like syslog server
  and collectd server... maybe there's a bottleneck in there, but I can't be
  sure about that.

 Does anyone of you have an idea ?

 Thank you in advance.

 Best regards,

 C.



 --
 Cédric Jeanneret |  System Administrator
 021 619 10 32|  Camptocamp SA
 cedric.jeanne...@camptocamp.com  |  PSE-A / EPFL


  --
 Return on Information:
 Google Enterprise Search pays you back
 Get the facts.
 http://p.sf.net/sfu/google-dev2dev

 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when
  reporting any issue.
 ::: Messages without supporting info will risk being sent to
  /dev/null

   
   
--
Cédric Jeanneret |  System Administrator
021 619 10 32|  Camptocamp SA
cedric.jeanne...@camptocamp.com  |  PSE-A / EPFL

Re: [Nagios-users] NSCA strange behaviour

2009-12-08 Thread Cedric Jeanneret
Hello Marc,

Indeed, the are fresh comes from nagios.log. ok, done.

NSCA is running in daemon mode, iptables is opened for nsca port, and 
connections can go through it (tcpdump shows it to me, in both directions).

Setting debug=1 in nsca.cfg seems to do nothing more in /var/log/messages 
(redhat server). I just see down hosts passing through (results for ... are 
stalled - forcing immedia check...).

I set up debug for nagios itself, but it really seems to be a problem at NSCA 
level.

Regards,

C.

On Tue, 8 Dec 2009 10:52:50 -0600
Marc Powell m...@ena.com wrote:

 
 On Dec 8, 2009, at 4:40 AM, Cedric Jeanneret wrote:
 
  - Enbling debug for nsca on server01 doesn't show me anything interesting. 
  I just don't see where nsca catch up client22 status, and it keeps on 
  saying :
  Warning: The results of host 'client22.domain.lt' are stale by 0d 0h 2m 0s 
  (threshold=0d 0h 6m 0s).  I'm forcing an immediate check of the host.
  
  On another hand, it shows me:
  [1260267958.216051] [016.1] [pid=23191] Check results for service 'Cron 
  service' on host 'client22.domain.lt' are fresh.
 
 What do they show? What you've quoted here doesn't come from NSCA (that I can 
 tell from a simple grep). --
 
 nsca-2.7.2]# grep -r 'are fresh' *
 nsca-2.7.2]# 
 
 I'm sure that comes from nagios via nagios.log, not NSCA. Are you sure you're 
 looking in the right place? It's typically in /var/log/messages. 
 
 How are you running nsca? daemon mode or via inetd? If inetd, is inetd 
 rejecting the connection?
 
 --
 Marc
 
 
 --
 Return on Information:
 Google Enterprise Search pays you back
 Get the facts.
 http://p.sf.net/sfu/google-dev2dev
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when reporting 
 any issue. 
 ::: Messages without supporting info will risk being sent to /dev/null


-- 
Cédric Jeanneret |  System Administrator
021 619 10 32|  Camptocamp SA
cedric.jeanne...@camptocamp.com  |  PSE-A / EPFL


signature.asc
Description: PGP signature
--
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null