Hi,
only a try to sort some things out.
Didn't know much of hadoop cluster, but think cluster means different
clusternodes.
Did you check the master node against the free disk space or each node
independently ?
An entry in the hosts.cfg for the world accessible hadoop cluster ip/dns name
and different entrys for each clusternode?
We use a small linux webcluster with replicated MySQL databases and
webdirectoys.
For replication we use DRBD and pacemaker as resource manager.
We get alerts for the whole cluster and each cluster node.
So, I use two different check_disk alerts. One for the replicated volume:
check_linux_drbd0_disk.
Volume size and free disk space is the same over each cluster node.
The second check_disk alert checks the real hdd in each clusternode:
check_linux_root_disk.
It's the physical hdd plugged into each cluster node.
$HOSTADDRESS$:
For check_linux_drbd0_disk it is the active, world accessible address. For
example: www.example.com
For check_linux_root_disk it is the internal address of each clusternode. For
example clusternode1.internal.com, clusternode2.internal.com
The objects/commands.cfg:
define command{
command_name check_linux_drbd0_disk
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 60 -p 5666 -n -c
check_drbd0
}
define command{
command_name check_linux_root_disk
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 60 -p 5666 -n -c
check_sda1
}
The /usr/local/nagios/etc/nrpe.cfg on each clusternode:
command[check_drbd0]=/usr/local/nagios/libexec/check_disk -w 15% -c 10% -p
/dev/drbd0
command[check_sda1]=/usr/local/nagios/libexec/check_disk -w 15% -c 10% -p
/dev/sda1
With this, we get alerts:
Running out of disk space for www.example.com
Running out of disk space for each clusternode
Regards,
Markus.
Earn money: http://www.verdiene-geld-im-netz.de/en/index.html
Von: Help [mailto:[email protected]]
Im Auftrag von Natva, Arun Kumar
Gesendet: Freitag, 23. Januar 2015 23:47
An: [email protected]
Betreff: help needed with nagios alert
Hi,
I am using nagios for alerting in our hadoop cluster.
When I setup a check_disk alert on all the nodes in the cluster, we are getting
emails for all the hosts even though only one of the nodes exceeds the disk
space threshold.
I tried multiple things but I am unable to figure out why nagios sends alerts
for all hosts instead of just one host. Can you please help
Regards,
Arun.