>What happens when you have multiple clusters?  Each node needs to know
>which cluster name and multicast address to use.  In a huge
>organisation, it is not feasible for every node to join the same
>multicast group.  In some clusters, not all nodes are on a shared
>multicast segment.  So the default configuration is useful, but it is
>not a solution for everyone.

Right, I can't use the default at all.  My install is far too large and
complicated to configure by hand.  We have many clusters, and hosts move
from one to another over their lifetime.  I don't know if the options
mentioned so far would work for me, but here is what I did.

I wrote a perl script which allocates ports for new clusters and grids
and auto-generates all gmond and gmetad config files on NFS.  If a
config file has not changed, then it is not re-written, so the timestamp
on the files clearly shows the time of last change.

Clients can all auto-mount this NFS config file area on demand.  Shortly
after the regular updates, they run a simple wrapper which looks on NFS
to discover which config file they should be running.  If this file is
newer than their /etc/gmond.conf, then they copy it down and restart
their local gmond.  They do not restart if no change has occurred.

Meanwhile, the server runs a similar wrapper around gmetad.  Again, it
restarts only the gmetad processes that have experienced a change in
config file.  The automation has also given them each a directory in the
RRD tmpfs and a corresponding copy of the PHP in the htdocs area so they
all have a working authority URL.  I run several dozen gmetads on one
host to implement our big tree of grids.

The server can also automatically run mute gmond listeners for smaller
unicasting clusters, though most are multicasting within subnets and
then rolled up by a gmetad.  The server also renames hostnames within
the RRD area when they move from one cluster to another so that host
history is preserved for all time through cluster moves.

I could see a web service providing the config files instead of NFS;
that could be more portable.  In fact, it could be hosted on the ganglia
server itself.  The timestamp checks could still work the same.

-twitham

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Reply via email to