I wanted to update the list on what I have been doing with regards to an 
email that I sent out about a week ago ( I have attached the original email).  
Briefly I have been trying to add some additional data to the gmond XDR packets 
so that we can communicate attributes such as an alternate TITLE for the metric 
or which GROUP(s) the metric belongs to (or anything else that we dream up in 
the future).  What I have done to solve this issue is the following:

Background:
   The XDR packets that gmond sends to communicate metric information has been 
based on the fact that every gmond in the system is well aware of every metric 
supported by gmond.  In other words, gmond has a built-in set of metrics that 
it supports and every gmond knows what those metrics are.  Given that fact, it 
was very easy to assign an identifier to each of those metrics and simply send 
a very small packet that consisted only of the metric id and the corresponding 
value.  No additional data about the metric had to be communicated between 
gmonds.  However, gmetric was the one exception.  In the case of gmetric (and I 
am also including 3.1 module based metrics as gmetrics), the entire metric 
definition had to be passed every time a gmetric was collected.  This resulted 
in a significantly larger packet that had to carry with it the entire metric 
definition.  

Going forward:
   I have taken the fact that there are two different kinds of gmond packets 
and based a new packet layout around that.  One packet is a data packet that 
carries a metric value and the other is a metadata packet that carries the 
definition of a metric.  The new code proposes to first identify an XDR packet 
as either a data packet or a metadata packet.  If the packet is a data packet, 
it is then further identified by a metric name, host and data type.  A metadata 
packet also carries the metric name and host, but it really needs no further 
identification.  Gmond has been refactored to send both types of XDR packets.  
At start up it will always send a metadata packet for each metric to announce 
the set of metrics a given gmond supports and the complete definition of each 
metric.  This allows all of the receiving gmonds to store the metric 
definitions rather than having to rely on well defined hard coded metric 
definitions.  In addition, this also allows any individual metric to include 
additional metadata to better support the web interface or anything else that 
might need more information about what the metric is or how to display it.  
After the initial metadata packet is sent, refresh metadata packets are only 
sent according to a configurable  interval.  In other words, the metadata 
packet might only be resent after every 10th data packet.  Data packets are 
obviously sent whenever a metric value needs to be updated thus allowing for 
the actual metric data to be carried in a much smaller packet rather than the 
larger metadata packet.  
   Metric spoofing in this layout can not be supported in the same way that it 
was before.  Bascially because in the old layout there was a specific "spoof" 
packet that all gmonds had to understand.  Since the XDR packets are no longer 
identified by metric type (ie. gmetric, spoof, cpu_user, mem_free, etc.), and 
rather identified by data type, there is no longer a "spoof" packet.  So in my 
first checkin, spoofing will probably be broken.  However since one of the 
identifying elements of every packet is the host, I am hoping to support the 
spoofing functionality by simply providing a way to override the host 
information that is obtained by a call to apr_gethostbyname() by either using 
the host found in the packet or providing a way to add a "SPOOF" key as 
additional metadata for the metic and providing the spoof host name as a value.

   In addition, this will change the XML output that gmetad reads from gmond by 
adding a new tag called <EXTRA_DATA>.  One or more <EXTRA_DATA> tags will 
contain any extra metadata that has been added to the metric outside of the 
standard attributes that are currently defined in the <METRIC> tag.  For now 
the extra attributes include TITLE, DESCRIPTION and GROUP, but in reality it 
could contain any new attribute.  BTW, this also adds two new configuration 
directives.  One called 'send_meta_data_pkt_interval' that belongs in the 
'Globals' block and another called 'title' that goes in the 'Metric' block

   This new XDR layout also paves the way for moving all of the builtin metrics 
out as metric modules.  Once that is done, the XDR structures that are defined 
in protocol.h can be cleaned up significantly.  I hope to be able to checkin 
this new XDR layout code by the end of this week or the beginning of next week. 
 This may destabilize trunk for a little while until the bugs can be worked 
out.  But it shouldn't be too terrible.

Any comments?

Brad
--- Begin Message ---
   I took a quick look over the wish-list items that were proposed on the 
mailing list and tried to determine which items would break compatibility and 
therefore must be completed before we release 3.1.0.  I have identified three 
tasks for which I am planning on completing and commiting the code to trunk 
over the next few weeks.  These tasks include:

1-* Add TITLE attribute to the XDR data to communicate a human readable name
   There is another task on the wish list which makes this more general which 
is:
   -* Flexible method of adding extra metric metadata.
       We could include extra metadata, not just "alias"/"title".  For example, 
some
       metrics have a natural minimum and maximum value.  Perhaps coming up 
with an
       extendable way of encoding metric metadata so future changes can be 
included
       without losing backward compatibility.
   I would rather implement the more flexible method of adding extra metric 
metadata but I am not really sure how to do that with XDR.  If somebody has a 
good idea of how that could be done with XDR, please let me know.  Otherwise I 
will probably just add the attribute to the existing set of attributes.

2-* Add a GROUP attribute (comma delimited) to the XDR data
    This would allow metrics to declare the category that they belong to. The 
category should be added at the metric definition level within the metric 
module rather than a directive in the .conf file.  Again if there were a more 
flexible way to add extra metric metadata to the XDR package, that would be the 
preferred method.  Short of that, I just plan to add an attribute that would 
hold a comma delimited list of group names that a metric can belong to.  

3-* Modify all byte count metric to 8 byte integers
   At this point I am assuming that this is one of the issues that is causing 
the 4T limit problem.  For now this is just a temporary fix.  The real fix 
would be to move all of the built in metrics out of gmond itself and implement 
them as C interface modules which define the correct counter size.  If somebody 
wants to tackle porting the built in metrics rather than applying the temporary 
fix now, please feel free and let me know that you are doing it.  Otherwise, I 
will try to take care of at least getting the sizing right and then port the 
metrics sometime later.


   I have attached a rough compilation of the tasks that were identified 
through the wish list.  This list is not very detailed and should probably be 
used as a jumping off point for adding all of these enhancements into bugzilla. 
 Once in bugzilla, more detail should be added to each enhancement so that we 
can have a good discussion about each one, prioritize them and get them 
implemented.

Brad

Done
------------------
- C module interface as DSO
- mod_python Python module interface
- Dynamically link libraries like expat, apr, libconfuse


GMond To Do
------------------------
- Gmond module repository
-* Add TITLE attribute to the XDR data to communicate a human readable name
- Reimplement the built in metrics as C interface modules
- Implement a perl module interface
- Implement a PHP module interface
- Implement a Ruby module interface
-* Add a GROUP attribute (comma delimited) to the XDR data
    This would allow metrics to declare the category that they belong to. The 
    category should be added at the metric definition level and not in the conf 
file.
- A cleaner XDR encoding:
    The current encoding scheme embeds too much information about which metrics
    gmond collects.  The encoding scheme should treat all metrics the same: as
    just "a metric".  The encoding should not care if the metric is 
    metric_cpu_speed, metric_swap_total or a user-defined "gmetric" one.
- Metric packing:
    Simply that a UDP packet can contain multiple metrics (using the usual XDR
    stream decoding) up to the size of a UDP packet.  This would help reduce
    the overheads when sending many metric updates concurrently.  It also
    preserves the current gmond behaviour where it sends metric updates in
    a single UDP packet.
- Support for counters (metrics with +ve slope)
    This shouldn't require much work (from memory, make sure the slope-type
    information is preserved and patch gmetad to create RRD files with the
    correct options).  Currently Ganglia doesn't actually support custom
    counter metrics, which is an awkward limitation.
- gmond switching to a non-blocking IO model.
    If there's a large number of metric updates then gmond must process them
    "quickly" or they will be lost.  If this happens whilst gmond is sending XML
    data to gmetad there's may be a delay, increasing the risk of metric
    update messages being lost.  Switching to a non-blocking IO model would 
allow
    gmond to respond preferentially to the incoming UDP messages.
-* Flexible method of adding extra metric metadata.
    We could include extra metadata, not just "alias"/"title".  For example, 
some
    metrics have a natural minimum and maximum value.  Perhaps coming up with an
    extendable way of encoding metric metadata so future changes can be included
    without loosing backwards compatibility.
- Re-organization of RPM packages (libganglia, gmond-python ?)
-* Remove the 4T limit on ganglia metric results
-* Modify all byte count metric to 8 bytes ints

GMetad To Do
------------------------------
- Support for new RRDTool which allows graphs to have dynamic sizes
- Gilad's stacked graphs
- Changing the units of default metrics to their base
    For example disk_free's base unit should be bytes, not GB as rrdtool will
    automatically append G,M,K etc.)
- Better support for bigger less frequent updates 
    one packet every 20 seconds per host for all data?
- Multi PB disk limit
- Better on disk RRD perf (tmpfs is an OK workaround)
-* Name RRD directories based on UUID generated by client gmond 
    has of MAC address? something else? So that renaming hosts, updating DNS or
    hosts files don't result in history for the phyiscal gmond client being 
lost.
- Integration of gexec/authd ?  
- Expand gstat nodelist parameter query options (i.e. return all hosts
with <10% iowait, etc.)
- Interface stats in bits?  Self awareness of interface capablity for %
util stats for network.
- Something like a unique per-gmond instance identifier
    To help with multi-homing and DNS issues and so the IP address is no 
    longer the index key. There was discussion of this under the subject 
    "Overriding hostname" on the Ganglia-general list.
- Give some metrics priority and have them updated more frequently in their 
RRDs than others.
- Allow for some sort of in memory RRD (never written to disk) as an 
alternative storage for very extreme cases.
- Let the users manage different IO bound pools for their metrics
    For extreme cases one based on tmpfs. So that they can be tied correctly 
    to the right kind of storage IO capabilities for the frequency needed.
- Add more memory metrics 
    slab, buffers, dirty, writeback, cache_clean  (= cached - 
dirty+writeback)), mapped, free

Web interface
-------------------------------
- Numerous custom graphs enhancements (Alex Balk, Timothy Witham, others)
- Web frontend face lift
- Mouse over result graphs
- Default cluster view uses text-only per host squares 
    loading 1700 little graphs chews too much browser
- Better icons.
    The current highly-compressed JPEG files for the icons look horrible!
    Line-art perhaps suffers worst from JPEG compression artifacts.  Could we 
not
    use either PNGs or (preferably) SVG?

- Add an option to allow switching to SVG in-line RRDTool graphs.
    This should be pretty easy to add as a config option.  I think support for
    SVG in current browsers is now "good enough".  A half-way modern version of
    RRDTool can generate SVG versions of the graphs, which should look much
    better.

- Have some standard way of describing custom graphs.
    There currently isn't a standard way of producing custom graphs; "custom"
    here means adding support for host-specific and cluster-specific graphs and
    also some framework for describing those custom graphs.  I have a
    solution, that (at least) has merit in both existing and working.  Perhaps 
it
    isn't ideal, but the Ganglia web front-end should provide at least some
    standard hooks if not an actual framework.

- Have the option to switch off displaying all the single-metric graphs.
    If you have ~300 metrics, the little graphs at the page bottom are all but
    useless.  They slow down the loading of the page without adding much 
insight.
    (I have a simple patch that allows a user to choose whether they want to see
    these graphs.)

- Fix the pie-chart-generating code.
    The current pie-chart code is a bit ugly and can plot things incorrectly
    under certain circumstances.  There must be some nicer graph plotting
    packages out there...





-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

--- End Message ---
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Reply via email to