i just uploaded a nearly functional snapshot of 2.6.0 to http://matt-massie.com/ganglia/ganglia-2.6.0.200412151451.tar.gz
what i mean by nearly functional is that i need feedback from the developers before stitch up and close surgery on 2.6.0. more on that follows.
i've tested this version on linux and cygwin. i'd be surprised if you have any problems compiling it on other platforms (please let me know otherwise).
to try out this snapshot % gunzip < ganglia-2.6.0.200412151451.tar.gz | tar -xvf - % cd ganglia-2.6.0.200412151451 % ./configure % make % cd gmondif you want to run this test on a host that is part of a 2.5.x multicast group, you can without messing things up because this gmond doesn't send data (yet). since you already likely have a configuration file, i would just
% gmond -t > gmond.confthis output the default configuration of gmond. you will need to alter the tcp_accept_channel definition to not conflict with your 2.5 gmond. you will also need to alter your udp_recv_channel definition to match your multicast group.
then run gmond with % ./gmond -d10 -c ./gmond.conf 169.229.48.82 => bytes_out 169.229.48.107 => cpu_system 169.229.48.107 => pkts_in 169.229.48.81 => cpu_system 169.229.48.133 => bytes_in 169.229.48.133 => pkts_out 169.229.48.124 => heartbeat 169.229.48.88 => heartbeat 169.229.48.95 => cpu_user 169.229.48.95 => mem_cached 169.229.48.89 => mem_shared 169.229.48.122 => disk_free 169.229.48.93 => load_fivethese messages just let you know what data gmond is receiving on the multicast channel.
you can then connect to your tcp_accept_channel to receive xml. % telnet localhost 8666 <- or whatever port you run the test onyou'll see that currently only DTD, <GANGLIA_XML> and <CLUSTER> and <HOST> tags. it's simple to add the <METRIC> data but i wanted to get feedback before i continue.
i wanted to also point out the i have started documenting the new gmond.conf file format. i hope that i will help to easy fears that it will be radically different than the old 2.5 format. i'm sure a perl monger could write a converter with very few lines of perl code.
the gmond.conf documentation is a ./gmond/gmond.conf.5 and reads ---------------------------------------------------------------- NAME gmond.conf - configuration file for ganglia monitoring daemon (gmond) DESCRIPTION The gmond.conf file is used to configure the ganglia monitoring daemon (gmond) which is part of the Ganglia Distributed Monitoring System. SECTIONS AND ATTRIBUTES All sections and attributes are case-insensitive. For example, name or NAME or Name or NaMe are all equivalent. Some sections can be included in the configuration file multiple times and some sections are singular. For example, you can have only one cluster section to define the attributes of the cluster being monitored; however, you can have multiple udp_recv_channel sections to allow gmond to receive message on multiple UDP channels. cluster There should only be one cluster section defined. This section controls how gmond reports the attributes of the cluster that it is part of. The cluster section has four attributes: name, owner, latlong and url. For example, cluster { name = "Millennium Cluster" owner = "UC Berkeley CS Dept." latlong = "N37.37 W122.23" url = "http://www.millennium.berkeley.edu/" } The name attributes specifies the name of the cluster of machines. The owner tag specifies the administrators of the cluster. The pair name/owner should be unique to all clusters in the world. The latlong attribute is the latitude and longitude GPS coordinates of this cluster on earth. Specified to 1 mile accuracy with two decimal places per axis in decimal. The url for more information on the cluster. Intended to give purpose, owner, administration, and account details for this cluster. There directives directly control the XML output of gmond. For example, the cluster configuration example above would translate into the following XML. <CLUSTER NAME="Millennium Cluster" OWNER="UC Berkeley CS Dept."LATLONG="N37.37 W122.23" URL="http://www.millennium.berkeley.edu/">
... </CLUSTER> behavior The behavior section controls general characteristics of gmond such as whether is should daemonize, what user it should run as, whether is should send/receive date and such. The behavior section has seven attributes: daemonize, setuid, user, debug_level, mute, deaf, host_dmax. For example, behavior { daemonize = true setuid = true user = nobody host_dmax = 3600 } The daemonize attribute is a boolean. When true, gmond will daemonize. When false, gmond will run in the foreground. The setuid attribute is a boolean. When true, gmond will set its effective UID to the uid of the user specified by the user attribute. When false, gmond will not change its effective user. The debug_level is an integer value. When set to zero (0), gmond will run normally. A debug_level greater than zero will result in gmond running in the foreground and outputting debugging information. The higher the debug_level the more verbose the output. The mute attribute is a boolean. When true, gmond will not send data regardless of any other configuration directives. The deaf attribute is a boolean. When true, gmond will not receive data regardless of any other configuration directives. The host_dmax value is an integer with units in seconds. When set to zero (0), gmond will never delete a host from its list even when a remote host has stopped responding. If host_dmax is set to a positive number then gmond will flush a host after it has not heard from it for host_dmax seconds. By the way, dmax means "delete max". udp_send_channel You can define as many udp_send_channel sections as you like within the limitations of memory and file descriptors. If gmond is configured as mute this section will be ignored. The udp_send_channel has a total of five attributes: mcast_join, mcast_if, ip, port and protocol. For example, the 2.5.x version gmond would send on the following single channel by default... udp_send_channel { mcast_join = 239.2.11.71 port = 8649 protocol = xdr } The mcast_join and mcast_if attributes are optional. When specified gmond will create the UDP socket and join the mcast_join multicast group and send data out the interface specified by mcast_if. If only a ip and port are specified then gmond will send unicast UDP messages to the hosts specified. You could specify multiple unicast hosts for redundancy as gmond will send UDP messages to all UDP channels. For example... udp_send_channel { ip = 192.168.3.4 port = 2344 } udp_send_channel { ip = 192.168.3.8 port = 2389 } would configure gmond to send messages to two hosts. Currently, the only protocol supported is xdr which is the default. udp_recv_channel You can specify as many udp_recv_channel sections as you like within the limits of memory and file descriptors. If gmond is configured deaf this attribute will be ignored. The udp_recv_channel section has a total of seven attributes: mcast_join, bind, port, mcast_if, protocol, allow_ip and allow_mask. For example, the 2.5.x gmond ran with a single udp receive channel... udp_recv_channel { mcast_join = 239.2.11.71 bind = 239.2.11.71 port = 8649 protocol = xdr } The mcast_join and mcast_if should only be used if you want to have this UDP channel receive multicast packets the multicast group mcast_join on interface mcast_if. If you do not specify multicast attributes then gmond will simply create a UDP server on the specified port. You can use the bind attribute to bind to a particular local address. Note: for multicast, specifying a bind address that equals the mcast_join address will prevent unicast UDP messages to the same port from being processed. tcp_accept_channel You can specify as many tcp_accept_channel sections as you like within the limitations of memory and file descriptors. If gmond is configured to be mute, then these sections are ignored. The tcp_accept_channel has six attributes: bind, port, interface, protocol, allow_ip and allow_mask. For example, 2.5.x gmond would accept connections on a single TCP channel. tcp_accept_channel { port = 8649 } The bind address is optional and allows you to specify which local address gmond will bind to for this channel. The port is an integer than specifies which port to answer requests for data. The interface is not implemented at this time (use bind). collection_group ... EXAMPLE The default behavior for a 2.5.x gmond would be specified as... udp_recv_channel { mcast_join = 239.2.11.71 bind = 239.2.11.71 port = 8649 } udp_send_channel { mcast_join = 239.2.11.71 port = 8649 } tcp_accept_channel { port = 8649 } ---------------------------------------------------------------------- so this all leads me to a question about how to proceed. option 1. finish up 2.6.0. nowbenefits: we get unicast support, ability to send/recv on multiple channels, solaris kstat in libmetrics, a well-defined xdr protocol description, ipv6 support, code that is cleaner, more modular and ready to have features added to it.
downside: no support for alerts or host proxying. each metric is send in a single message.. no grouping metrics on the UDP channel.
solution: i would just have 2.6.0. send it's metric info in the old 2.5.x format. we would need to add the solaris, hpux, metrics to the end of the current metric list. we break compatibility with 2.5.x on non-linux/freebsd but that is why this will be called 2.6.0 :) if we wanted to add a new message format to 2.6.0 in the future, we will not break compatibility thanks to the protocol definition file. we can add features more easily in the future.
option 2: take the time to build a new xdr communication protocol which allows for host proxying, alerts and metric groups.
downside: takes more time.i'm leaning toward option 1. i know there is a lot of frustration out there and a solid 2.6.0 release soon would be a good thing.
now that i've heard the opinion of the group about the 2.5.x 2.6.x 3.0.0 madness, i'd like to respond a bit.
i'm sorry.i won't give you a list of excuses why this transition has been so bumpy. growth isn't always painless i guess but its usually a good thing.
once 2.6.0. is released, i can see more incremental development occuring (i need the pace to slow.. i'm really exhausted). i'd like to have a bugzilla repository or something similar where we can catalog what needs to be done.. what has been done.. etc.
once we have 2.6.x the way we want it.. we can move to a 2.7.0 release where we redo gmetad to use the apache runtime and libconfuse file configuration parser. later.
i'm looking forward to feedback about this snapshot. don't expect it to be perfect but if you read the code in ./gmond/gmond.c you'll see that the layout of the code is very manageable and clean. the code has been tested with valgrind (although i still need to write regression tests later).
if you guys choose option 1, i'll stitch up the final piece of 2.6.0 gmond (collecting/send metrics and cleanup).
if you guys chose option 2, we need to talk about the message format in ./lib/protocol.x.
i will be on vacation starting tomorrow for a few weeks. i'll be checking my email but i don't expect to do lots of coding.
it's pleasure withing with you guys and i hope you have a good break. -matt -- PGP fingerprint 'A7C2 3C2F 8445 AD3C 135E F40B 242A 5984 ACBC 91D3' They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety. --Benjamin Franklin, Historical Review of Pennsylvania, 1759
signature.asc
Description: OpenPGP digital signature