On Tue, 2006-01-24 at 16:46 -0500, Rick Mohr wrote:
> On Mon, 23 Jan 2006, Ben Hartshorne wrote:
> 
> <snip>
> > When I go into the page for a single host and click on the 'gmetrics'
> > link, I find that all of my metrics have a record of being recieved
> > within the last two minutes (my time period).  And yet, their graphs
> > show up empty.
> >
> > Any thoughts?  What logs should I be looking at?
> </snip>
> 
> If I am not mistaken, the values shown on the 'gmetrics' page are just the 
> current values extracted from the XML that is retrieved from the gmetad 
> process. 
> The graphs however come from rrdtool and are generated based on the data 
> stored 
> in the round-robin database files.
> 
> Is it possible these rrd files are missing some information?  I have never 
> seen 
> it personally, but I suppose a case could arise where gmetad has accurate 
> current values, but for some reason they are not being put into the rrd 
> files. 
> You can always use the rrdtool command by hand to dump out the rrd files for 
> those metrics which appear to have gaps.  Then check to see if the data is 
> actually there.

I have seen gaps sometimes.  They almost always happen when gmetad gets
data from a cluster that has the same exact timestamp as its last
update.  Look in your system logs for gmetad errors like:

/usr/sbin/gmetad[7695]: RRD_update (/var/lib/ganglia/rrds/Cluster
Name/hostname/metric_name.rrd): illegal attempt to update using time
1138138243 when last update time is 1138138243 (minimum one second step)

When gmetad calls the RRD_update function, it will fail if the current
timestamp is the same as or earlier than the last update that was made
to the rrd file.  Older versions of gmetad would bail out and completely
stop that update cycle, which could even cause gaps in unrelated
clusters.  I believe newer versions are more resistant to this error and
will try to continue though.

> Also, you could use rrdtool to generate the exact same graph that is shown on 
> the web page for one of these metrice and dump it straight into a file.  Then 
> you could compare that with the image seen on the web page (to check for the 
> unlikely event that the generated image if fine, but the web server is 
> messing 
> something up).
> 
> These are just kind of guesses, but maybe one of them will reveal some info.
> 
> -- Rick
> 
> --------------------------
> Rick Mohr
> Systems Developer
> Ohio Supercomputer Center
> 
> 
> 
> 
> -------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
> for problems?  Stop!  Download the new AJAX search engine that makes
> searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
> _______________________________________________
> Ganglia-general mailing list
> Ganglia-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ganglia-general
> 
-- 
/------------------------------------------------------------------\
|  Jason A. Smith                          Email:  [EMAIL PROTECTED] |
|  Atlas Computing Facility, Bldg. 510M    Phone:  (631)344-4226   |
|  Brookhaven National Lab, P.O. Box 5000  Fax:    (631)344-7616   |
|  Upton, NY 11973-5000                                            |
\------------------------------------------------------------------/



Reply via email to