Re: [opsview-users] Opsview: our next step for monitoring evolution?

Luis F. Periquito Mon, 22 Feb 2010 01:35:40 -0800

Hi Simone,

I'll try to answer some of your questions. To others I don't have answers...


1. Opsview community 3.5.1 is running on nagios 3.2. I don't know if the import 
tools work as I never tried them. You should import configuration to the 
Opsview, and then manage it from the GUI. You shouldn't make changes directly 
in the nagios configuration as they will all be overwritten on the next reload. 
Remember you have an integrated environment where all you should do is on the 
GUI.

2. I currently have 296 hosts and 2746 services without breaking a sweat on the 
master (1 QC xeon 2.33 with 4GB ram 32 bit debian etch)

3. 
a) true. They will send the ncsa information back via an ssh tunnel.
b) never tried it.

4. never tried it.

5. the master shows up as it was a normal slave.

6. sure you can. Just have to make sure they print something when called with 
-h option.

Hope this helps.

Luis Periquito
Operações
[email protected]

-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Simone Felici
Sent: sexta-feira, 19 de Fevereiro de 2010 16:35
To: [email protected]
Subject: [opsview-users] Opsview: our next step for monitoring evolution?


Hello to all!

I'm working very well with nagios since 2006.
Meanwhile our company has increased the monitored devices day by day. Today our 
monitoring system is composed as follow:

1 cluster (2 nodes Xeon 2GB RAM) in HA with heartbeat, also one node per time 
is working. The other one is there in case 
of failure of the first.
- CentOS 5.4
- nagios 3.2.0 (monarch as web GUI for conf)
Actually it monitors 800 hosts and 2000 services with following stats:
Metric            Min.        Max.        Average
Check Execution Time:    0.00 sec    15.05 sec    2.681 sec
Check Latency:        0.00 sec    12.09 sec    0.785 sec
Percent State Change:    0.00%        12.11%        0.18%

There are some distributed installations that reports some status back to the 
core via NSCA.
All this is set manually, with only (great) help of monarch.

In the next months we need to merge another big monitoring system that will 
groove up the numbers a lot.
Maybe at the end we should monitor 2000 hosts and 13,000 services. This could 
be a problem for my Xeon Server and I'll 
need new hardware. This could be a good moment to start using a distributed 
solution to reduce single load.
The native distribution solution (NSCA) could be good but has two big 
limitations:
1. need to maintain duplicate configuration across different nagios 
installations
2. in case of down of one distributed nagios server, all checks done by the 
remote server would be considered CRITICAL.

Googling, first I've found is DNX, but official it DOESN'T support a 
distributed solution with some servers in DMZ. It 
means, I've some Nagios (distributed-slaves) that are behind firewall and can 
reach ONLY the devices behind that 
firewall. The check-results must be send (nsca) to master to send out sms 
notifications and collect logs (for SLA) and
that slaves must check ONLY the devices in that LAN. DNX (official and now) 
doesn't support a selective allocation of 
devices to be monitored from single slaves only, but they are distributed in 
load balancing everywhere.

And here comes  Opsview Community Edition! It could help me to extend my 
installation... I hope :)
Also I've some questions, hope someone can help me:

1. Is it compatible AND stable with nagios 3.2.0? It means, can I still import 
all configuration AND ALL LOGS to the new 
system?

2. How many hosts/services could I manage from central core with this solution? 
Some production scenmario examples in 
numbers.

3. It's right what I've understood reading the documentation?
There is one (or more in heartbeat active/passive) master that collect all 
infos and sends out notifications (mail, sms 
gsm modem or whatever I like)
It's possible to add two different types of slave servers:
a) single slave installations. Every single slave is handled as a separate 
datacenter. This is the solution that could 
be used to monitor devices not directly reachable from master. Status is sent 
to master back with nsca.
b) multi slaves in cluster. They are handled, like above, as a separate 
datacenter, but the checks in this slave cluster 
are divided between all slaves in load balancing as well as in high 
availability. If one slave dies, the others take the 
services/hosts to be monitored.

4. What happens if slave fails? I mean a slave like point 3a, also a slave with 
no clusters.

5. Can the master do active checks too or in case of slaves it demands the 
checks on slaves only?

6. Can I still use our custom plugins? They are bash scripts that perfoms check 
based on nagios macros (HOSTADDRESS, 
...) and give back as attended a message and an exit code, very simple.

I know I've asked maybe a lot, but up to there answers I'll start some tests.
Thank's a lot for your help!

Warmest Regards,

Simon
_______________________________________________
Opsview-users mailing list
[email protected]
http://lists.opsview.org/lists/listinfo/opsview-users
_______________________________________________
Opsview-users mailing list
[email protected]
http://lists.opsview.org/lists/listinfo/opsview-users

Re: [opsview-users] Opsview: our next step for monitoring evolution?

Reply via email to