[Ganglia-developers] "drill-down" user interface

ben_england Mon, 10 Aug 2009 10:10:43 -0700

Hi, 

I'm new to Ganglia, but have developed network and system management software 
in the past.


I'd like to extend Ganglia to provide more per-device information than it does 
by default, and have done so with python plug-ins and a couple of reports in 
/srv/www/htdocs/ganglia/graph.d/ I did this because the default network and 
disk performance reports aggregate statistics. This may be useful sometimes, 
but if there is a performance bottleneck I want to be able to "drill down" and 
see: 

- which if any devices were over-utilized 
- what kind of workload they were being asked to do 

The intent is not to leave these metrics running all the time, that would be 
expensive, but to be able to turn them on when further analysis is needed. I 
defined per-device metrics, but unfortunately they flood the host-level display 
with graphs, making it almost unusable, and I couldn't see a way to just select 
those graphs for a specific device without changing ganglia scripts. 

- Can you give me any suggestions on how to make Ganglia do this? There is a 
"context" variable in the .php scripts that seems to indicate the level in the 
cluster hierarchy that we're interested in, could an additional "device" level 
be added to this perhaps? I've tried this using additional context in URL such 
as "d=sda" for /dev/sda device, haven't got it completely working yet. 

- How do I filter out per-device metric graphs from the host-level context 
while making the metrics available to reports? I've added a "context" field to 
the metric descriptor returned by gmond, if this is set then I don't display 
the metric unless the context matches. This prevents the "host" context from 
getting flooded with graphs. 

- Also, is there an easy way to take a set of device utilization statistics in 
a host and aggregate them using MAX instead of AVERAGE to get a per-host 
"max_utilization" metric? I saw __SUMMARY_INFO__ but I'm guessing that this is 
averaged data and it's being averaged the wrong way. For example, I get average 
disk utilization for /dev/sdb across the cluster. Frequently the average 
utilization in a set of devices is misleading, unless the load is very well 
balanced across them, in which case MAX and AVERAGE should be much closer 
anyway. 

- It was easy to add reports at the host level by changing hardcoded report 
lists in 2 places, get_context.php and templates/default/host_view.tpl . Could 
we change these to just look at list of graph.d/*_report.php and build list of 
reports dynamically somehow? host_view.tpl does not have a place in it for PHP 
code, right? Why not name the report after the context that it supports, 
something like "graph.d/host_network_report.php", so the generic get_context 
code could generate a report list by searching for graph.d/host_*_report.php ? 
Some of these entries could be softlinks if a .php graphing script supports 
multiple contexts. 

thanks in advance for your advice, 

-Ben England

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july

_______________________________________________
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

[Ganglia-developers] "drill-down" user interface

Reply via email to