Dan Moniz wrote:
Hi all,

I've been testing Ganglia for a while on a cluster of approximately 280 hosts. Approximately 260 of these are of one class -- data hosts -- and another 18-20 are of another class -- compute hosts.

I had numerous issues getting Ganglia to work reliably while it was configured to use multicast. While I'm unsure if these were directly caused by multicast, turning off multicast and moving to a unicast model has improved reliability considerably.

However, I'm still experiencing some issues for which I have not determined explainable reasons for, and would love any feedback anyone has. I have a deadline of this Friday to make a "go/no-go" decision on whether to use Ganglia as my cluster monitoring and reporting package. I would like to use Ganglia rather than another package (given the features Ganglia has and the time invested in it thus far), but only if I can be comfortable about it's reliability; either by having explainable reasons for aberrant behavior and/or why I should be using a different configuration (which I can then take into account and work around), or by having these issues fixed, and preferably both over time.

1) When starting gmond on all the hosts and gmetad on the monitor host/head node (from a complete shutdown of gmond on all the hosts, a shutdown of gmetad on the monitor host/head node, and a purge of the RRDs on the gmetad host), gmond will start up fine but gmetad seems to lag behind in reporting even when all hosts are up and gmond is running. I've found that by initiating a network connection (e.g. ssh or simply using netcat (nc) to the TCP port on each host running gmond) to each of these hosts from the monitor host/head node will then prod gmetad into reporting for them. This seems odd.

i think this is directly related to a known bug in 3.0.0 (which will be fixed in 3.0.1 coming up in the next week or so).

the workaround for right now is to modified the file ./gmond/gmond.c
and recompile.  the change is simple.

at line 1643, change the line from

   return next;

to

   return next < now? now + 1 * APR_USEC_PER_SEC : next;

this basically says that if the next event is in the past set it to happen 1 sec from now.

sorry for the hassle.  this issue will be resolved soon.


I would hope that gmetad would start aggregating stats for each host once gmond was back up and running. Having to make a network connection to each host in order to get gmetad to see them is bad, since if gmond were to go down on a host (e.g. because the host itself went down) and come back up, gmetad may not see it until something else connected to it. However, if gmetad doesn't see it come back up, the cluster software I'm using will mark it as down and will intentionally not spawn connections to it. Has anyone else encountered this or a similar problem?

there have been others.  :(

2) Early last week I noticed that three compute hosts stopped reporting in gmetad, though those hosts were physically up and alive on the network and gmond was running. Using nc on the gmond TCP port returned the usual XML feed. Stopping and then starting gmond on these hosts seemed to do the trick, but there is no clear reason why gmetad lost track of them in the first place.

not sure about why this happened but i'm pretty sure it relates to the bug above.


3) Load on the monitor host/head node seems higher than it should be. It hovers around 2.6 - 3.0. While other software is running on this host, shutting down gmetad results in load falling back down to levels similar to other compute hosts (since the monitor host/head node is currently also a host in the Compute Hosts cluster). Also, in concert with the higher than expected load, ssh sessions to the monitor host/head node seem to take a long time to establish. Again, shutting down gmetad seems to alleviate these problems. While both of these issues don't prevent work from being done or gmetad from working (in the current configuration), it does seem abnormally high and is something of an annoyance.

this is a common issue. gmetad is pretty disk intensive. one workaround is to have gmetad write to a ram-backed filesystem and periodically (via cron) save the data in ram to disk (for long-term storage between reboots). i'm sure there is a user on this list with some great scripts and experience to share on this.



4) Snippets of my current configuration are provided below. I can provide more information if needed. Host names and what not are changed, but the particulars are the same. Should I be doing something else than what I am doing below? Anything not specified is left as the default setting.

everything looks right to me. you don't need to specify a udp_receive_channel for gmond that are not receiving data from other gmond.

good luck!
-matt



gmetad.conf excerpt:
--------------------

data_source "Compute Hosts" headnode:8649
data_source "Data Hosts" datahost2020:8649
scalable off
gridname "Our Cluster"
authority "http://headnode/ganglia/";
all_trusted on



gmond.conf (for data nodes) excerpt:
------------------------------------

globals {
  setuid = yes
  user = nobody
  cleanup_threshold = 300 /*secs */
}

cluster {
  name = "Data Hosts"
  owner ="Company"
  url = "http://www.example.com/";
}

udp_send_channel {
  host = datahost2020
  port = 8649
}

/* [ We set udp_recv_channel to be 8649 mostly just so
 *   that the host specified in udp_send_channel above
 *   (datahost2020 for the Data Hosts) can receive XDR
 *   from other Data Hosts. ]
 */

udp_recv_channel {
  port = 8649
}

/* [ "timeout = -1" turns on blocking I/O, which should
 *    alleviate XML corruption issues. ]
 */

tcp_accept_channel {
  port = 8649
  timeout = -1


The gmond.conf excerpts for my compute hosts is exactly the same as the one shown above for the data hosts, except for the following change, which just specifies the host for compute hosts to report to, which is the same as the monitor host/head node (i.e. it's also running gmetad and the web frontend):

udp_send_channel {
  host = headnode
  port = 8649
}


One thought I had was to add another layer of gmetad reporting, and put a host running gmetad dedicated for the data hosts, another dedicated for the compute hosts, and then *another* independent machine running gmetad which will poll both of those cluster-specific gmetad aggregators. This seems like it shouldn't be necessary though.

This is a lot for anyone to read, so if you've gotten this far, thanks for reading! If anyone has any feedback, I'd love to hear from you. I'd really like to make a "go" decision on Ganglia if I can figure out what's causing these issues and work on solutions or workarounds that still let me benefit from the rest of Ganglia's functionality.

Again, thanks in advance!



--
PGP fingerprint 'A7C2 3C2F 8445 AD3C 135E F40B 242A 5984 ACBC 91D3'

   They that can give up essential liberty to obtain a little
      temporary safety deserve neither liberty nor safety.
  --Benjamin Franklin, Historical Review of Pennsylvania, 1759

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to