i just uploaded a nearly functional snapshot of 2.6.0 to
http://matt-massie.com/ganglia/ganglia-2.6.0.200412151451.tar.gz

what i mean by nearly functional is that i need feedback from the developers before stitch up and close surgery on 2.6.0. more on that follows.

i've tested this version on linux and cygwin. i'd be surprised if you have any problems compiling it on other platforms (please let me know otherwise).

to try out this snapshot

% gunzip < ganglia-2.6.0.200412151451.tar.gz | tar -xvf -
% cd ganglia-2.6.0.200412151451
% ./configure
% make
% cd gmond

if you want to run this test on a host that is part of a 2.5.x multicast group, you can without messing things up because this gmond doesn't send data (yet). since you already likely have a configuration file, i would just

% gmond -t > gmond.conf

this output the default configuration of gmond. you will need to alter the tcp_accept_channel definition to not conflict with your 2.5 gmond. you will also need to alter your udp_recv_channel definition to match your multicast group.

then run gmond with

% ./gmond -d10 -c ./gmond.conf
169.229.48.82   =>      bytes_out
169.229.48.107  =>      cpu_system
169.229.48.107  =>      pkts_in
169.229.48.81   =>      cpu_system
169.229.48.133  =>      bytes_in
169.229.48.133  =>      pkts_out
169.229.48.124  =>      heartbeat
169.229.48.88   =>      heartbeat
169.229.48.95   =>      cpu_user
169.229.48.95   =>      mem_cached
169.229.48.89   =>      mem_shared
169.229.48.122  =>      disk_free
169.229.48.93   =>      load_five

these messages just let you know what data gmond is receiving on the multicast channel.

you can then connect to your tcp_accept_channel to receive xml.

% telnet localhost 8666 <- or whatever port you run the test on

you'll see that currently only DTD, <GANGLIA_XML> and <CLUSTER> and <HOST> tags. it's simple to add the <METRIC> data but i wanted to get feedback before i continue.

i wanted to also point out the i have started documenting the new gmond.conf file format. i hope that i will help to easy fears that it will be radically different than the old 2.5 format. i'm sure a perl monger could write a converter with very few lines of perl code.

the gmond.conf documentation is a ./gmond/gmond.conf.5 and reads

----------------------------------------------------------------
NAME
    gmond.conf - configuration file for ganglia monitoring daemon
    (gmond)

DESCRIPTION
    The gmond.conf file is used to configure the ganglia monitoring
    daemon (gmond) which is part of the Ganglia Distributed Monitoring
    System.

SECTIONS AND ATTRIBUTES
    All sections and attributes are case-insensitive. For example,
    name or NAME or Name or NaMe are all equivalent.

    Some sections can be included in the configuration file multiple
    times and some sections are singular. For example, you can have
    only one cluster section to define the attributes of the cluster
    being monitored; however, you can have multiple udp_recv_channel
    sections to allow gmond to receive message on multiple UDP
    channels.

  cluster
    There should only be one cluster section defined. This section
    controls how gmond reports the attributes of the cluster that it
    is part of.

    The cluster section has four attributes: name, owner, latlong and
    url.

    For example,

      cluster {
        name = "Millennium Cluster"
        owner = "UC Berkeley CS Dept."
        latlong = "N37.37 W122.23"
        url = "http://www.millennium.berkeley.edu/";
      }

    The name attributes specifies the name of the cluster of machines.
    The owner tag specifies the administrators of the cluster. The
    pair name/owner should be unique to all clusters in the world.

    The latlong attribute is the latitude and longitude GPS
    coordinates of this cluster on earth. Specified to 1 mile accuracy
    with two decimal places per axis in decimal.

    The url for more information on the cluster. Intended to give
    purpose, owner, administration, and account details for this
    cluster.

    There directives directly control the XML output of gmond. For
    example, the cluster configuration example above would translate
    into the following XML.

      <CLUSTER NAME="Millennium Cluster" OWNER="UC Berkeley CS Dept."
LATLONG="N37.37 W122.23" URL="http://www.millennium.berkeley.edu/";>
      ...
      </CLUSTER>

  behavior
    The behavior section controls general characteristics of gmond
    such as whether is should daemonize, what user it should run as,
    whether is should send/receive date and such. The behavior section
    has seven attributes: daemonize, setuid, user, debug_level, mute,
    deaf, host_dmax.

    For example,

      behavior {
        daemonize = true
        setuid = true
        user = nobody
        host_dmax = 3600
      }

    The daemonize attribute is a boolean. When true, gmond will
    daemonize. When false, gmond will run in the foreground.

    The setuid attribute is a boolean. When true, gmond will set its
    effective UID to the uid of the user specified by the user
    attribute. When false, gmond will not change its effective user.

    The debug_level is an integer value. When set to zero (0), gmond
    will run normally. A debug_level greater than zero will result in
    gmond running in the foreground and outputting debugging
    information. The higher the debug_level the more verbose the
    output.

    The mute attribute is a boolean. When true, gmond will not send
    data regardless of any other configuration directives.

    The deaf attribute is a boolean. When true, gmond will not receive
    data regardless of any other configuration directives.

    The host_dmax value is an integer with units in seconds. When set
    to zero (0), gmond will never delete a host from its list even
    when a remote host has stopped responding. If host_dmax is set to
    a positive number then gmond will flush a host after it has not
    heard from it for host_dmax seconds. By the way, dmax means
    "delete max".

  udp_send_channel
    You can define as many udp_send_channel sections as you like
    within the limitations of memory and file descriptors. If gmond is
    configured as mute this section will be ignored.

    The udp_send_channel has a total of five attributes: mcast_join,
    mcast_if, ip, port and protocol.

    For example, the 2.5.x version gmond would send on the following
    single channel by default...

      udp_send_channel {
        mcast_join = 239.2.11.71
        port       = 8649
        protocol   = xdr
      }

    The mcast_join and mcast_if attributes are optional. When
    specified gmond will create the UDP socket and join the mcast_join
    multicast group and send data out the interface specified by
    mcast_if.

    If only a ip and port are specified then gmond will send unicast
    UDP messages to the hosts specified. You could specify multiple
    unicast hosts for redundancy as gmond will send UDP messages to
    all UDP channels.

    For example...

      udp_send_channel {
        ip = 192.168.3.4
        port = 2344
      }
      udp_send_channel {
        ip = 192.168.3.8
        port = 2389
      }

    would configure gmond to send messages to two hosts.

    Currently, the only protocol supported is xdr which is the
    default.

  udp_recv_channel
    You can specify as many udp_recv_channel sections as you like
    within the limits of memory and file descriptors. If gmond is
    configured deaf this attribute will be ignored.

    The udp_recv_channel section has a total of seven attributes:
    mcast_join, bind, port, mcast_if, protocol, allow_ip and
    allow_mask.

    For example, the 2.5.x gmond ran with a single udp receive
    channel...

      udp_recv_channel {
        mcast_join = 239.2.11.71
        bind       = 239.2.11.71
        port       = 8649
        protocol   = xdr
      }

    The mcast_join and mcast_if should only be used if you want to
    have this UDP channel receive multicast packets the multicast
    group mcast_join on interface mcast_if. If you do not specify
    multicast attributes then gmond will simply create a UDP server on
    the specified port.

    You can use the bind attribute to bind to a particular local
    address.

    Note: for multicast, specifying a bind address that equals the
    mcast_join address will prevent unicast UDP messages to the same
    port from being processed.

  tcp_accept_channel
    You can specify as many tcp_accept_channel sections as you like
    within the limitations of memory and file descriptors. If gmond is
    configured to be mute, then these sections are ignored.
    The tcp_accept_channel has six attributes: bind, port, interface,
    protocol, allow_ip and allow_mask.

    For example, 2.5.x gmond would accept connections on a single TCP
    channel.

      tcp_accept_channel {
        port = 8649
      }

    The bind address is optional and allows you to specify which local
    address gmond will bind to for this channel.

    The port is an integer than specifies which port to answer
    requests for data.

    The interface is not implemented at this time (use bind).

  collection_group
    ...

EXAMPLE
    The default behavior for a 2.5.x gmond would be specified as...

      udp_recv_channel {
        mcast_join = 239.2.11.71
        bind       = 239.2.11.71
        port       = 8649
      }
      udp_send_channel {
        mcast_join = 239.2.11.71
        port       = 8649
      }
      tcp_accept_channel {
        port       = 8649
      }
----------------------------------------------------------------------

so this all leads me to a question about how to proceed.

option 1. finish up 2.6.0. now

benefits: we get unicast support, ability to send/recv on multiple channels, solaris kstat in libmetrics, a well-defined xdr protocol description, ipv6 support, code that is cleaner, more modular and ready to have features added to it.

downside: no support for alerts or host proxying. each metric is send in a single message.. no grouping metrics on the UDP channel.

solution: i would just have 2.6.0. send it's metric info in the old 2.5.x format. we would need to add the solaris, hpux, metrics to the end of the current metric list. we break compatibility with 2.5.x on non-linux/freebsd but that is why this will be called 2.6.0 :) if we wanted to add a new message format to 2.6.0 in the future, we will not break compatibility thanks to the protocol definition file. we can add features more easily in the future.

option 2: take the time to build a new xdr communication protocol which allows for host proxying, alerts and metric groups.

downside: takes more time.

i'm leaning toward option 1. i know there is a lot of frustration out there and a solid 2.6.0 release soon would be a good thing.

now that i've heard the opinion of the group about the 2.5.x 2.6.x 3.0.0 madness, i'd like to respond a bit.

i'm sorry.

i won't give you a list of excuses why this transition has been so bumpy. growth isn't always painless i guess but its usually a good thing.

once 2.6.0. is released, i can see more incremental development occuring (i need the pace to slow.. i'm really exhausted). i'd like to have a bugzilla repository or something similar where we can catalog what needs to be done.. what has been done.. etc.

once we have 2.6.x the way we want it.. we can move to a 2.7.0 release where we redo gmetad to use the apache runtime and libconfuse file configuration parser. later.

i'm looking forward to feedback about this snapshot. don't expect it to be perfect but if you read the code in ./gmond/gmond.c you'll see that the layout of the code is very manageable and clean. the code has been tested with valgrind (although i still need to write regression tests later).

if you guys choose option 1, i'll stitch up the final piece of 2.6.0 gmond (collecting/send metrics and cleanup).

if you guys chose option 2, we need to talk about the message format in ./lib/protocol.x.

i will be on vacation starting tomorrow for a few weeks. i'll be checking my email but i don't expect to do lots of coding.

it's pleasure withing with you guys and i hope you have a good break.

-matt

--
PGP fingerprint 'A7C2 3C2F 8445 AD3C 135E F40B 242A 5984 ACBC 91D3'

   They that can give up essential liberty to obtain a little
      temporary safety deserve neither liberty nor safety.
  --Benjamin Franklin, Historical Review of Pennsylvania, 1759

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to