[Ganglia-developers] 2.6.0 snapshot/tour

Matt Massie Thu, 09 Dec 2004 12:28:36 -0800

guys-

i just uploaded a new 2.6.0 snapshot to
http://matt-massie.com/ganglia/ganglia-monitor-core-2.6.0.200412091116.tar.gz

if you look at this snapshot you'll see that gmetad is completely unchanged. all changes currently are focused on gmond.


configuration file

the 2.6.0. configuration file has a completely different format than 2.5.x. dotconf just doesn't cut it anymore since it wasn't writing to handle hierarchical information. if you look in ./srclib/confuse you'll see the new configuration file parser. libconfuse is not only more powerful but it is also simpler to use.


all gmond configuration options are found in ./gmond/conf.h.

you'll also see that in ./gmond/conf.h there is a definition for the default configuration. this allows people to deploy gmond without the need to distribute configuration files (although if the default needs to be changed then they will have to compile from source).


to see the default gmond configuration run.

% ./gmond -t
behavior {
  setuid = no
  user = nobody
}
udp_send_channel {
  ip   = 127.0.0.1
  port = 8649
}
udp_recv_channel {
  port = 8649
}
collection_group {
  name = "cpu_stat"
  metric {
    name = "cpu_user"
    absolute_minimum = 0
    absolute_maximum = 100
  }
  metric {
    name = "cpu_sys"
    absolute_minimum = 0
    absolute_maximum = 100
  }
  metric {
    name = "cpu_idle"
    absolute_minimum = 0
    absolute_maximum = 100
  }
  metric {
    name = "cpu_nice"
    absolute_minimum = 0
    absolute_maximum = 100
  }
}

this is just a simple configuration that i'm using to test gmond. gmond is not functional now. you can define as many collection groups and io channels as your memory/file handle limits will allow.

my main goal with the 2.6.0. release is to simplify the code while adding more features. i really want to lower the boundaries for contribution to the ganglia source. you'll see that right now all code for gmond is in ./gmond/gmond.c. the code is simple clean and commented. in the future we'll probably move some of the code out of gmond.c but it all in one place right now.

the commmunication protocol is simple and well defined and found in ./lib/protocol.x (i figured out how to scrink the size of the udp messages but the protocol is still not complete more on this later).

for developers who are interested, you can open ./gmond/gmond.c for a quick tour... i'll focus on the parts of 2.6.0 that are better than 2.5.x.

when gmond starts it saves its start time (just as before <HOSTNAME GMOND_STARTED="..."/>).

this timestamp is sent in each UDP message header (as a "source instance" number) along with a collection group index number.

for example, say a gmond is started at timestamp 1102592895 and has four collection groups (say for "cpu", "disk", "memory" and "load"). these collection group can collect/send any number of metrics as a group (e.g. "cpu" could collect cpu_user/nice/system/idle)


it would send message with timestamp and collection group index numbers
like the following

1102592895    0
1102592895    2
1102592895    3
1102592895    0   /* group 0 data resent */

the receiving data doesn't care at all how the remote gmond indexes its data. if it gets message for a group index it already has, it will just overwrite the old data with the new data.

the timestamp allows the receiving gmond to know when to flush the remote host's metric data. if the timestamp changes, then the receiving gmond knows that the remote gmond was rebooted (because god knows gmond never crashes :)). the receiving gmond will then flush the old metric data (since it won't assume the collection group index numbers match e.g. the remote gmond was started with a completely new collection group list).

the index also allow for transient data (e.g. processes) to easily be saved (we might even be able use the process id as the message index number).

this snapshot also has a working ACL for the UDP channel (although it needs to be worked on a bit). you specify a "allow_ip" and "allow_mask" for any io channel and gmond will ignore data that isn't from the subnet/host specified.

if you look in ./lib/protocol.x you'll see that 2.6.0. has a data state enum that explains why the data was sent.


enum gangliaDataState {
  GANGLIA_METRIC_CONSTANT,               /* slope == "zero" */
  GANGLIA_METRIC_TIME_THRESHOLD,         /* slope != "zero" here down */
  GANGLIA_METRIC_VALUE_THRESHOLD,
  GANGLIA_METRIC_WARNING,
  GANGLIA_METRIC_ALERT
};

this enum kills two birds with one stone. we don't need an explicit SLOPE value now (in the past we only used "both" and "zero" meaning the value was volatile or constant).

if a message is sent and the state is GANGLIA_METRIC_CONSTANT, the receiving gmond knows that the value is not volatile while the remote gmond is running (if a reboot of the remote machine occurs this constant can change e.g. a new CPU was added).

if the data state is GANGLIA_METRIC_TIME_THRESHOLD is set then the receiving gmond knows that the data was sent because of a time threshold being passed. a GANGLIA_METRIC_VALUE_THRESHOLD means there is significiant movement in value relative to the last message. these are relative states. these relative state announcements would allow gmond to save round-robin database information about metric values very efficiently... (ignore GANGLIA_METRIC_CONSTANT messages, write to disk on GANGLIA_METRIC_VALUE_THRESHOLDS/WARNING/ALERT for example).

a GANGLIA_METRIC_WARNING or GANGLIA_METRIC_ALERT are absolute value states the denote value pushing near/over defined limits.

the new protocol definition file would make it trivial to build a real-time warning/alert system for ganglia.

the new message format also allows hosts to acts as proxies for other devices (e.g. a host sending SNMP info about a router). (btw, the code make sure that the ACL apply to the real IP header info.. not the proxy info).

steve's book and libslack have code for doing reliable UDP messaging which i've factored into this message protocol as well. 2.6.0 will likely not have the functionality though.. unless someone wanted to take it on.

the new gmond code is modular enough to allow for different ganglia formats on both the UDP and TCP channels (e.g. xdr, xml, sexp, ldif etc). you see that the default protocol is "xdr" and the only support protocol at the time for UDP channels. this modular design should allow gmond to collect real-time data from other sources on the network later as well.

ironically, this snapshot doesn't support multicast... i'll be moving the code in over the next days. did i mention the this new gmond is IPv6 capable?


this new gmond doesn't use threads but rather relies on asyncronous io.

this current snapshot isn't functional .. it's just for testing/discussion purposes.


-matt


PGP fingerprint 'A7C2 3C2F 8445 AD3C 135E F40B 242A 5984 ACBC 91D3'

   They that can give up essential liberty to obtain a little
      temporary safety deserve neither liberty nor safety.
  --Benjamin Franklin, Historical Review of Pennsylvania, 1759

signature.asc
Description: OpenPGP digital signature

[Ganglia-developers] 2.6.0 snapshot/tour

Reply via email to