Re: [Nagios-users] scheduled downtime Retention problem

2013-05-29 Thread Cedric Jeanneret
Me gain,

Of course, *after* having searched for a while, and then sent the mail, I 
fumble on the official nagios bug:
http://tracker.nagios.org/view.php?id=338

Will check on debian tracker if they can integrate this patch…

Cheers,

C.

On Wed, 29 May 2013 15:23:18 +0200
Cedric Jeanneret  wrote:

> Hello,
> 
> I'm having some problem with the retention status with this setup:
> nagios-3.4.1-3
> Debian Wheezy
> 
> Problem:
> I schedule some downtime for either a host or a service, but it's not here 
> after a reload (or a restart) of the nagios3 service
> 
> Steps:
> - schedule some downtime (for one month, if this helps)
> - wait for the icon to appear in the web interface
> - restart (or reload) nagios3 service
> - scheduled downtime is gone
> 
> Some more information:
> I did check the retention.dat file, it contains, at the end, this block which 
> seems completely correct:
> servicedowntime {
> host_name=HOSTNAME
> service_description=SERVICE
> downtime_id=1
> entry_time=1369832618
> start_time=1369832609
> end_time=1372518209
> triggered_by=0
> fixed=1
> duration=2685600
> is_in_effect=1
> author=USER
> comment=MY COMMENT
> }
> 
> There's also a servicecomment containing some Nagios comment as well - THIS 
> block is shown and taken in account.
> I tried to see whether nagios encounter some error while reading its 
> retention file, but nothing shown up…
> 
> servicecomment {
> host_name=HOSTNAME
> service_description=SERVICE
> entry_type=2
> comment_id=4
> source=0
> persistent=0
> entry_time=1369833542
> expires=0
> expire_time=0
> author=(Nagios Process)
> comment_data=This service has been scheduled for fixed downtime from 
> 2013-05-29 15:03:29 to 2013-06-29 17:03:29.  Notifications for the service 
> will not be sent out during that time period.
> }
> 
> 
> Retention configuration:
> 
> grep retain nagios.cfg  | grep -v '#'
> retain_state_information=1
> use_retained_program_state=1
> use_retained_scheduling_info=1
> retained_host_attribute_mask=0
> retained_service_attribute_mask=0
> retained_process_host_attribute_mask=0
> retained_process_service_attribute_mask=0
> retained_contact_host_attribute_mask=0
> retained_contact_service_attribute_mask=0
> 
> Another information: the Acknowledge works as expected…
> 
> Any idea?
> 
> Thanks a lot for your help!
> 
> Cheers,
> 
> C.
> 
> --
> Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
> Get 100% visibility into your production application - at no cost.
> Code-level diagnostics for performance bottlenecks with <2% overhead
> Download for free and get started troubleshooting in minutes.
> http://p.sf.net/sfu/appdyn_d2d_ap1
> ___
> Nagios-users mailing list
> Nagios-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting 
> any issue. 
> ::: Messages without supporting info will risk being sent to /dev/null

--
Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
Get 100% visibility into your production application - at no cost.
Code-level diagnostics for performance bottlenecks with <2% overhead
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap1
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] scheduled downtime Retention problem

2013-05-29 Thread Cedric Jeanneret
Hello,

I'm having some problem with the retention status with this setup:
nagios-3.4.1-3
Debian Wheezy

Problem:
I schedule some downtime for either a host or a service, but it's not here 
after a reload (or a restart) of the nagios3 service

Steps:
- schedule some downtime (for one month, if this helps)
- wait for the icon to appear in the web interface
- restart (or reload) nagios3 service
- scheduled downtime is gone

Some more information:
I did check the retention.dat file, it contains, at the end, this block which 
seems completely correct:
servicedowntime {
host_name=HOSTNAME
service_description=SERVICE
downtime_id=1
entry_time=1369832618
start_time=1369832609
end_time=1372518209
triggered_by=0
fixed=1
duration=2685600
is_in_effect=1
author=USER
comment=MY COMMENT
}

There's also a servicecomment containing some Nagios comment as well - THIS 
block is shown and taken in account.
I tried to see whether nagios encounter some error while reading its retention 
file, but nothing shown up…

servicecomment {
host_name=HOSTNAME
service_description=SERVICE
entry_type=2
comment_id=4
source=0
persistent=0
entry_time=1369833542
expires=0
expire_time=0
author=(Nagios Process)
comment_data=This service has been scheduled for fixed downtime from 2013-05-29 
15:03:29 to 2013-06-29 17:03:29.  Notifications for the service will not be 
sent out during that time period.
}


Retention configuration:

grep retain nagios.cfg  | grep -v '#'
retain_state_information=1
use_retained_program_state=1
use_retained_scheduling_info=1
retained_host_attribute_mask=0
retained_service_attribute_mask=0
retained_process_host_attribute_mask=0
retained_process_service_attribute_mask=0
retained_contact_host_attribute_mask=0
retained_contact_service_attribute_mask=0

Another information: the Acknowledge works as expected…

Any idea?

Thanks a lot for your help!

Cheers,

C.

--
Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
Get 100% visibility into your production application - at no cost.
Code-level diagnostics for performance bottlenecks with <2% overhead
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap1
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] nagios.cmd does not seem to work properly

2010-05-06 Thread Cedric Jeanneret
Hello,

We'd like to use nagios.cmd pipe to send some signals. According to the 
examples in documentation, we can do a simple thing like that :

/bin/printf "[%lu] ACKNOWLEDGE_HOST_PROBLEM;host1;1;1;1;Some One;Some 
Acknowledgement Comment\n" $now > $commandfile

The problem is, it seems it's not taken in account - no log in nagios.log, no 
message is printed in shell, and more over nagios doesn't seem to do anything.

I tested with an error inside the command, and it seems to be parsed:
[1273156151] Warning: Unrecognized external command -> 
SCHEDULE_HOST_SVC_CHECKs;fqdn;1273156151

but nothing else happens.

Is there a way to be sure that nagios do what we ask?

The main thing is:
- we're using NSCA plugin so that we have about 80 servers with their own 
nagios, sending results and status to a single host
- we tried to run this command only on remote hosts - maybe we have to use it 
on the nsca server ?

Any help is welcome.

Thank you in advance !

Best regards,

C.

-- 
Cédric Jeanneret |  System Administrator
021 619 10 32|  Camptocamp SA
cedric.jeanne...@camptocamp.com  |  PSE-A / EPFL

--
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] NSCA strange behaviour

2009-12-11 Thread Cedric Jeanneret
Hello,

problem solved, it was indeed a version problem (minor number).. redhat has 
released a new nsca client/server package, removing a patch, and that made it 
incompatible with previous version.

See:
http://www.mail-archive.com/nagios-users@lists.sourceforge.net/msg19402.html 
and related patch:
http://cvs.fedoraproject.org/viewvc/EL-5/nsca/nsca-increase_max_plugin_output_length.patch?revision=1.1&view=markup&sortby=log

Best regards,

C.

On Tue, 8 Dec 2009 10:52:50 -0600
Marc Powell  wrote:

> 
> On Dec 8, 2009, at 4:40 AM, Cedric Jeanneret wrote:
> 
> > - Enbling debug for nsca on server01 doesn't show me anything interesting. 
> > I just don't see where nsca catch up client22 status, and it keeps on 
> > saying :
> > Warning: The results of host 'client22.domain.lt' are stale by 0d 0h 2m 0s 
> > (threshold=0d 0h 6m 0s).  I'm forcing an immediate check of the host.
> > 
> > On another hand, it shows me:
> > [1260267958.216051] [016.1] [pid=23191] Check results for service 'Cron 
> > service' on host 'client22.domain.lt' are fresh.
> 
> What do they show? What you've quoted here doesn't come from NSCA (that I can 
> tell from a simple grep). --
> 
> nsca-2.7.2]# grep -r 'are fresh' *
> nsca-2.7.2]# 
> 
> I'm sure that comes from nagios via nagios.log, not NSCA. Are you sure you're 
> looking in the right place? It's typically in /var/log/messages. 
> 
> How are you running nsca? daemon mode or via inetd? If inetd, is inetd 
> rejecting the connection?
> 
> --
> Marc
> 
> 
> --
> Return on Information:
> Google Enterprise Search pays you back
> Get the facts.
> http://p.sf.net/sfu/google-dev2dev
> ___
> Nagios-users mailing list
> Nagios-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting 
> any issue. 
> ::: Messages without supporting info will risk being sent to /dev/null


-- 
Cédric Jeanneret |  System Administrator
021 619 10 32|  Camptocamp SA
cedric.jeanne...@camptocamp.com  |  PSE-A / EPFL


signature.asc
Description: PGP signature
--
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] NSCA strange behaviour

2009-12-09 Thread Cedric Jeanneret
Hello again

I've made some other tests:

- I changed encryption algo on server01 and client22, restarted nagios&nsca on 
server01, restarted nagios on client22.
- let it run like that, checking logs for something. As I thought, nsca begins 
to output lot of error regarding version client and/or encryption method and/or 
password. Nice.
- I put back the right encryption method on server01 but NOT on client22. I 
should have seen a pattern like "Received invalid packet" but nothing.

I forced some status update from client22: 
for i in $(seq 1000); do 
  echo -n 'host  '; 
  /usr/local/bin/submit_ochp $(hostname -f) "UP" "Host is up"; 
  sleep 2; 
done

Nothing. TCPDump shows me traffic, in and out for both hosts...

Last test, I stopped nagios&nsca on server01, removed nsca.dump objects.cache 
retention.dat files, and start again nsca&nagios

If we let the fact that all my hosts are "pending", my client22 doesn't push 
any status... nothing in nsca logs.

Any other idea ? It's like the nsca daemon ignore (without any output) client22 
queries. and client22 doesn't know about it.

Best regards,

C.

-- 
Cédric Jeanneret |  System Administrator
021 619 10 32|  Camptocamp SA
cedric.jeanne...@camptocamp.com  |  PSE-A / EPFL


signature.asc
Description: PGP signature
--
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] NSCA strange behaviour

2009-12-08 Thread Cedric Jeanneret
Hello Marc,

Indeed, the "are fresh" comes from nagios.log. ok, done.

NSCA is running in daemon mode, iptables is opened for nsca port, and 
connections can go through it (tcpdump shows it to me, in both directions).

Setting "debug=1" in nsca.cfg seems to do nothing more in /var/log/messages 
(redhat server). I just see "down" hosts passing through (results for ... are 
stalled - forcing immedia check...).

I set up debug for nagios itself, but it really seems to be a problem at NSCA 
level.

Regards,

C.

On Tue, 8 Dec 2009 10:52:50 -0600
Marc Powell  wrote:

> 
> On Dec 8, 2009, at 4:40 AM, Cedric Jeanneret wrote:
> 
> > - Enbling debug for nsca on server01 doesn't show me anything interesting. 
> > I just don't see where nsca catch up client22 status, and it keeps on 
> > saying :
> > Warning: The results of host 'client22.domain.lt' are stale by 0d 0h 2m 0s 
> > (threshold=0d 0h 6m 0s).  I'm forcing an immediate check of the host.
> > 
> > On another hand, it shows me:
> > [1260267958.216051] [016.1] [pid=23191] Check results for service 'Cron 
> > service' on host 'client22.domain.lt' are fresh.
> 
> What do they show? What you've quoted here doesn't come from NSCA (that I can 
> tell from a simple grep). --
> 
> nsca-2.7.2]# grep -r 'are fresh' *
> nsca-2.7.2]# 
> 
> I'm sure that comes from nagios via nagios.log, not NSCA. Are you sure you're 
> looking in the right place? It's typically in /var/log/messages. 
> 
> How are you running nsca? daemon mode or via inetd? If inetd, is inetd 
> rejecting the connection?
> 
> --
> Marc
> 
> 
> --
> Return on Information:
> Google Enterprise Search pays you back
> Get the facts.
> http://p.sf.net/sfu/google-dev2dev
> ___
> Nagios-users mailing list
> Nagios-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting 
> any issue. 
> ::: Messages without supporting info will risk being sent to /dev/null


-- 
Cédric Jeanneret |  System Administrator
021 619 10 32|  Camptocamp SA
cedric.jeanne...@camptocamp.com  |  PSE-A / EPFL


signature.asc
Description: PGP signature
--
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] NSCA strange behaviour

2009-12-08 Thread Cedric Jeanneret
Hello Marcel,

well, same on server01 and client22... working-server has an earlier one, but 
it works on this.

NSCA doesn't show me any encryption error (encryption method and passphrase are 
correct on both ends) :/

Regards,

C.

On Tue, 8 Dec 2009 14:34:43 -0200
Marcel  wrote:

> check openssl versions and compatibility.
> 
> On Tue, Dec 8, 2009 at 2:13 PM, Cedric Jeanneret <
> cedric.jeanne...@camptocamp.com> wrote:
> 
> > Hello again,
> >
> > In fact, configuration files are dployed via puppet(
> > http://reductivelabs.com/trac/puppet/wiki), so all files (should be) are
> > the same. I'll check it, but as puppet runs on every hosts, they all should
> > have the same files.
> > IP addresses are the same (fixed IP, fixed ports).
> >
> > I'll check files once again.
> >
> > If anyone has another idea...
> >
> > Thank you.
> >
> > Best regards,
> >
> > C.
> >
> > On Tue, 8 Dec 2009 08:31:59 -0600
> > Greg Pangrazio  wrote:
> >
> > > It sounds like there is something that changed with the re-install.
> > >
> > > Is the IP address of the system the same?
> > >
> > > Did you pick the same encryption type in the nsca config?
> > >
> > > Can you diff the nsca config with a working host?
> > > Greg Pangrazio
> > > pangr...@gmail.com
> > >
> > >
> > >
> > >
> > >
> > > On Tue, Dec 8, 2009 at 8:14 AM, Cedric Jeanneret
> > >  wrote:
> > > > Hello,
> > > >
> > > > As far as I can see, only new one. Even if they are just reinstalled
> > (client22 was in nagios config before it was reinstalled....).
> > > >
> > > > Best regards,
> > > >
> > > > C.
> > > >
> > > > On Tue, 8 Dec 2009 08:08:03 -0600
> > > > Greg Pangrazio  wrote:
> > > >
> > > >> Do all of your clients fail, or just the new one?
> > > >>
> > > >> Greg Pangrazio
> > > >> pangr...@gmail.com
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> On Tue, Dec 8, 2009 at 4:40 AM, Cedric Jeanneret
> > > >>  wrote:
> > > >> > Hello,
> > > >> >
> > > >> > I'm having troubles with NSCA.
> > > >> > What we have :
> > > >> >
> > > >> > - about 47 passive hosts
> > > >> > - about 220 passive services
> > > >> >
> > > >> > Versions : all are redhat servers, with:
> > > >> > - NSCA 2.7.2 (latest one)
> > > >> > - Nagios 3.1.2
> > > >> >
> > > >> > We have a single "nagios aggregator", which collect all NSCA status
> > from the other hosts.
> > > >> >
> > > >> > What's happening:
> > > >> > a host was reinstalled yesterday (say client22), and now it seems
> > NSCA daemon on the aggregator (say server01) doesn't seem to collect data.
> > > >> >
> > > >> > What I've done:
> > > >> >
> > > >> > - tcpdump on both client22 and server01, both show me traffic
> > between them, on NSCA default port (5667)
> > > >> >
> > > >> > - checked iptables rules, all is ok (as tcpdump shows me traffic,
> > that's a confirmation)
> > > >> >
> > > >> > - trying to push status by hand from client22 to server01; ALL
> > packets are sent successfully """1 data packet(s) sent to host
> > successfully.""". I've done this with a loop like that:
> > > >> > for i in $(seq 1000); do /usr/local/bin/submit_ochp $(hostname -f)
> > UP 'Host is up'; sleep 2; done
> > > >> >
> > > >> > - Enbling debug for nsca on server01 doesn't show me anything
> > interesting. I just don't see where nsca catch up client22 status, and it
> > keeps on saying :
> > > >> > Warning: The results of host 'client22.domain.lt' are stale by 0d
> > 0h 2m 0s (threshold=0d 0h 6m 0s).  I'm forcing an immediate check of the
> > host.
> > > >> >
> > > >> > On another hand, it shows me:
> > > >> > [1260267958.216051] [016.1] [pid=23191] Check results for service
> > 'Cron service' on host 'client22.domain.lt' are

Re: [Nagios-users] NSCA strange behaviour

2009-12-08 Thread Cedric Jeanneret
I just rsync-ed a complet config from a working host, then:
for file in $(grep -lr working-client *); do
  sed -i 's/working-client/client22/g' $i
done

... and it doesn't work any better. More over, puppet doesn't want to change 
anything...

I'm stuck... :(

On Tue, 8 Dec 2009 08:31:59 -0600
Greg Pangrazio  wrote:

> It sounds like there is something that changed with the re-install.
> 
> Is the IP address of the system the same?
> 
> Did you pick the same encryption type in the nsca config?
> 
> Can you diff the nsca config with a working host?
> Greg Pangrazio
> pangr...@gmail.com
> 
> 
> 
> 
> 
> On Tue, Dec 8, 2009 at 8:14 AM, Cedric Jeanneret
>  wrote:
> > Hello,
> >
> > As far as I can see, only new one. Even if they are just reinstalled 
> > (client22 was in nagios config before it was reinstalled).
> >
> > Best regards,
> >
> > C.
> >
> > On Tue, 8 Dec 2009 08:08:03 -0600
> > Greg Pangrazio  wrote:
> >
> >> Do all of your clients fail, or just the new one?
> >>
> >> Greg Pangrazio
> >> pangr...@gmail.com
> >>
> >>
> >>
> >>
> >> On Tue, Dec 8, 2009 at 4:40 AM, Cedric Jeanneret
> >>  wrote:
> >> > Hello,
> >> >
> >> > I'm having troubles with NSCA.
> >> > What we have :
> >> >
> >> > - about 47 passive hosts
> >> > - about 220 passive services
> >> >
> >> > Versions : all are redhat servers, with:
> >> > - NSCA 2.7.2 (latest one)
> >> > - Nagios 3.1.2
> >> >
> >> > We have a single "nagios aggregator", which collect all NSCA status from 
> >> > the other hosts.
> >> >
> >> > What's happening:
> >> > a host was reinstalled yesterday (say client22), and now it seems NSCA 
> >> > daemon on the aggregator (say server01) doesn't seem to collect data.
> >> >
> >> > What I've done:
> >> >
> >> > - tcpdump on both client22 and server01, both show me traffic between 
> >> > them, on NSCA default port (5667)
> >> >
> >> > - checked iptables rules, all is ok (as tcpdump shows me traffic, that's 
> >> > a confirmation)
> >> >
> >> > - trying to push status by hand from client22 to server01; ALL packets 
> >> > are sent successfully """1 data packet(s) sent to host successfully.""". 
> >> > I've done this with a loop like that:
> >> > for i in $(seq 1000); do /usr/local/bin/submit_ochp $(hostname -f) UP 
> >> > 'Host is up'; sleep 2; done
> >> >
> >> > - Enbling debug for nsca on server01 doesn't show me anything 
> >> > interesting. I just don't see where nsca catch up client22 status, and 
> >> > it keeps on saying :
> >> > Warning: The results of host 'client22.domain.lt' are stale by 0d 0h 2m 
> >> > 0s (threshold=0d 0h 6m 0s).  I'm forcing an immediate check of the host.
> >> >
> >> > On another hand, it shows me:
> >> > [1260267958.216051] [016.1] [pid=23191] Check results for service 'Cron 
> >> > service' on host 'client22.domain.lt' are fresh.
> >> >
> >> >
> >> > I really don't know where to find a solution, neither where is the real 
> >> > problem. We have another network with about 200 passive hosts and over 
> >> > 350 passive services, and it works fine.
> >> >
> >> > The only differences are :
> >> > - the working network is debian-only
> >> > - the working network's NSCA server doesn't do anything else than 
> >> > central nagios server. server01 does some other stuff, like syslog 
> >> > server and collectd server... maybe there's a bottleneck in there, but I 
> >> > can't be sure about that.
> >> >
> >> > Does anyone of you have an idea ?
> >> >
> >> > Thank you in advance.
> >> >
> >> > Best regards,
> >> >
> >> > C.
> >> >
> >> >
> >> >
> >> > --
> >> > Cédric Jeanneret                 |  System Administrator
> >> > 021 619 10 32                    |  Camptocamp SA
> >> > cedric.jeanne...@camptocamp.com  |  PSE-A / EPFL
> >> >
> >> > ---

Re: [Nagios-users] NSCA strange behaviour

2009-12-08 Thread Cedric Jeanneret
Hello again,

In fact, configuration files are dployed via 
puppet(http://reductivelabs.com/trac/puppet/wiki), so all files (should be) are 
the same. I'll check it, but as puppet runs on every hosts, they all should 
have the same files.
IP addresses are the same (fixed IP, fixed ports).

I'll check files once again.

If anyone has another idea...

Thank you.

Best regards,

C.

On Tue, 8 Dec 2009 08:31:59 -0600
Greg Pangrazio  wrote:

> It sounds like there is something that changed with the re-install.
> 
> Is the IP address of the system the same?
> 
> Did you pick the same encryption type in the nsca config?
> 
> Can you diff the nsca config with a working host?
> Greg Pangrazio
> pangr...@gmail.com
> 
> 
> 
> 
> 
> On Tue, Dec 8, 2009 at 8:14 AM, Cedric Jeanneret
>  wrote:
> > Hello,
> >
> > As far as I can see, only new one. Even if they are just reinstalled 
> > (client22 was in nagios config before it was reinstalled).
> >
> > Best regards,
> >
> > C.
> >
> > On Tue, 8 Dec 2009 08:08:03 -0600
> > Greg Pangrazio  wrote:
> >
> >> Do all of your clients fail, or just the new one?
> >>
> >> Greg Pangrazio
> >> pangr...@gmail.com
> >>
> >>
> >>
> >>
> >> On Tue, Dec 8, 2009 at 4:40 AM, Cedric Jeanneret
> >>  wrote:
> >> > Hello,
> >> >
> >> > I'm having troubles with NSCA.
> >> > What we have :
> >> >
> >> > - about 47 passive hosts
> >> > - about 220 passive services
> >> >
> >> > Versions : all are redhat servers, with:
> >> > - NSCA 2.7.2 (latest one)
> >> > - Nagios 3.1.2
> >> >
> >> > We have a single "nagios aggregator", which collect all NSCA status from 
> >> > the other hosts.
> >> >
> >> > What's happening:
> >> > a host was reinstalled yesterday (say client22), and now it seems NSCA 
> >> > daemon on the aggregator (say server01) doesn't seem to collect data.
> >> >
> >> > What I've done:
> >> >
> >> > - tcpdump on both client22 and server01, both show me traffic between 
> >> > them, on NSCA default port (5667)
> >> >
> >> > - checked iptables rules, all is ok (as tcpdump shows me traffic, that's 
> >> > a confirmation)
> >> >
> >> > - trying to push status by hand from client22 to server01; ALL packets 
> >> > are sent successfully """1 data packet(s) sent to host successfully.""". 
> >> > I've done this with a loop like that:
> >> > for i in $(seq 1000); do /usr/local/bin/submit_ochp $(hostname -f) UP 
> >> > 'Host is up'; sleep 2; done
> >> >
> >> > - Enbling debug for nsca on server01 doesn't show me anything 
> >> > interesting. I just don't see where nsca catch up client22 status, and 
> >> > it keeps on saying :
> >> > Warning: The results of host 'client22.domain.lt' are stale by 0d 0h 2m 
> >> > 0s (threshold=0d 0h 6m 0s).  I'm forcing an immediate check of the host.
> >> >
> >> > On another hand, it shows me:
> >> > [1260267958.216051] [016.1] [pid=23191] Check results for service 'Cron 
> >> > service' on host 'client22.domain.lt' are fresh.
> >> >
> >> >
> >> > I really don't know where to find a solution, neither where is the real 
> >> > problem. We have another network with about 200 passive hosts and over 
> >> > 350 passive services, and it works fine.
> >> >
> >> > The only differences are :
> >> > - the working network is debian-only
> >> > - the working network's NSCA server doesn't do anything else than 
> >> > central nagios server. server01 does some other stuff, like syslog 
> >> > server and collectd server... maybe there's a bottleneck in there, but I 
> >> > can't be sure about that.
> >> >
> >> > Does anyone of you have an idea ?
> >> >
> >> > Thank you in advance.
> >> >
> >> > Best regards,
> >> >
> >> > C.
> >> >
> >> >
> >> >
> >> > --
> >> > Cédric Jeanneret                 |  System Administrator
> >> > 021 619 10 32                    |  Camptocamp SA
> >> > cedric

Re: [Nagios-users] NSCA strange behaviour

2009-12-08 Thread Cedric Jeanneret
Hello,

As far as I can see, only new one. Even if they are just reinstalled (client22 
was in nagios config before it was reinstalled).

Best regards,

C.

On Tue, 8 Dec 2009 08:08:03 -0600
Greg Pangrazio  wrote:

> Do all of your clients fail, or just the new one?
> 
> Greg Pangrazio
> pangr...@gmail.com
> 
> 
> 
> 
> On Tue, Dec 8, 2009 at 4:40 AM, Cedric Jeanneret
>  wrote:
> > Hello,
> >
> > I'm having troubles with NSCA.
> > What we have :
> >
> > - about 47 passive hosts
> > - about 220 passive services
> >
> > Versions : all are redhat servers, with:
> > - NSCA 2.7.2 (latest one)
> > - Nagios 3.1.2
> >
> > We have a single "nagios aggregator", which collect all NSCA status from 
> > the other hosts.
> >
> > What's happening:
> > a host was reinstalled yesterday (say client22), and now it seems NSCA 
> > daemon on the aggregator (say server01) doesn't seem to collect data.
> >
> > What I've done:
> >
> > - tcpdump on both client22 and server01, both show me traffic between them, 
> > on NSCA default port (5667)
> >
> > - checked iptables rules, all is ok (as tcpdump shows me traffic, that's a 
> > confirmation)
> >
> > - trying to push status by hand from client22 to server01; ALL packets are 
> > sent successfully """1 data packet(s) sent to host successfully.""". I've 
> > done this with a loop like that:
> > for i in $(seq 1000); do /usr/local/bin/submit_ochp $(hostname -f) UP 'Host 
> > is up'; sleep 2; done
> >
> > - Enbling debug for nsca on server01 doesn't show me anything interesting. 
> > I just don't see where nsca catch up client22 status, and it keeps on 
> > saying :
> > Warning: The results of host 'client22.domain.lt' are stale by 0d 0h 2m 0s 
> > (threshold=0d 0h 6m 0s).  I'm forcing an immediate check of the host.
> >
> > On another hand, it shows me:
> > [1260267958.216051] [016.1] [pid=23191] Check results for service 'Cron 
> > service' on host 'client22.domain.lt' are fresh.
> >
> >
> > I really don't know where to find a solution, neither where is the real 
> > problem. We have another network with about 200 passive hosts and over 350 
> > passive services, and it works fine.
> >
> > The only differences are :
> > - the working network is debian-only
> > - the working network's NSCA server doesn't do anything else than central 
> > nagios server. server01 does some other stuff, like syslog server and 
> > collectd server... maybe there's a bottleneck in there, but I can't be sure 
> > about that.
> >
> > Does anyone of you have an idea ?
> >
> > Thank you in advance.
> >
> > Best regards,
> >
> > C.
> >
> >
> >
> > --
> > Cédric Jeanneret                 |  System Administrator
> > 021 619 10 32                    |  Camptocamp SA
> > cedric.jeanne...@camptocamp.com  |  PSE-A / EPFL
> >
> > --
> > Return on Information:
> > Google Enterprise Search pays you back
> > Get the facts.
> > http://p.sf.net/sfu/google-dev2dev
> >
> > ___
> > Nagios-users mailing list
> > Nagios-users@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/nagios-users
> > ::: Please include Nagios version, plugin version (-v) and OS when 
> > reporting any issue.
> > ::: Messages without supporting info will risk being sent to /dev/null
> >


-- 
Cédric Jeanneret |  System Administrator
021 619 10 32|  Camptocamp SA
cedric.jeanne...@camptocamp.com  |  PSE-A / EPFL


signature.asc
Description: PGP signature
--
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] NSCA strange behaviour

2009-12-08 Thread Cedric Jeanneret
Hello,

I'm having troubles with NSCA.
What we have :

- about 47 passive hosts
- about 220 passive services

Versions : all are redhat servers, with:
- NSCA 2.7.2 (latest one)
- Nagios 3.1.2

We have a single "nagios aggregator", which collect all NSCA status from the 
other hosts.

What's happening:
a host was reinstalled yesterday (say client22), and now it seems NSCA daemon 
on the aggregator (say server01) doesn't seem to collect data.

What I've done:

- tcpdump on both client22 and server01, both show me traffic between them, on 
NSCA default port (5667)

- checked iptables rules, all is ok (as tcpdump shows me traffic, that's a 
confirmation)

- trying to push status by hand from client22 to server01; ALL packets are sent 
successfully """1 data packet(s) sent to host successfully.""". I've done this 
with a loop like that:
for i in $(seq 1000); do /usr/local/bin/submit_ochp $(hostname -f) UP 'Host is 
up'; sleep 2; done

- Enbling debug for nsca on server01 doesn't show me anything interesting. I 
just don't see where nsca catch up client22 status, and it keeps on saying :
Warning: The results of host 'client22.domain.lt' are stale by 0d 0h 2m 0s 
(threshold=0d 0h 6m 0s).  I'm forcing an immediate check of the host.

On another hand, it shows me:
[1260267958.216051] [016.1] [pid=23191] Check results for service 'Cron 
service' on host 'client22.domain.lt' are fresh.


I really don't know where to find a solution, neither where is the real 
problem. We have another network with about 200 passive hosts and over 350 
passive services, and it works fine.

The only differences are :
- the working network is debian-only
- the working network's NSCA server doesn't do anything else than central 
nagios server. server01 does some other stuff, like syslog server and collectd 
server... maybe there's a bottleneck in there, but I can't be sure about that.

Does anyone of you have an idea ?

Thank you in advance.

Best regards,

C.



-- 
Cédric Jeanneret |  System Administrator
021 619 10 32|  Camptocamp SA
cedric.jeanne...@camptocamp.com  |  PSE-A / EPFL


signature.asc
Description: PGP signature
--
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null