We had a similar problem a few weeks ago, except that our gmetad never
seemed to recover.  It was crashing, and had to be restarted manually
almost daily.  I enabled the debug output to syslog, but received no
indication of what was failing -- it just quit!

At the time, we were in the process of consolidating our gmetad's to a
single server (we have three clusters being monitored, and each had its
own gmetad and web interface).  Following the migration to the new
server, the problem went away so we never followed up.

The gmetad we had problems with worked reliably for nearly a year before
having the problems.  Once the problem started, it occurred reliably
(nearly every night).  I could reenable the interface if it might help
to resolve a bigger problem.

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Ben
Hartshorne
Sent: Monday, January 23, 2006 6:52 PM
To: ganglia-general@lists.sourceforge.net
Subject: [Ganglia-general] intermittent blanks in graphs

Hi,

I have been running ganglia for most of the last year, quite happily.
My hosts are configured to send unicast data to a single gmetad server.

Recently, large portions of the cluster's graphs are empty.  A sample is
shown at http://cryptio.net/~ben/ganglia/blank_graphs.png  Notice that
not all hosts are missing data (Burgertime, for example, has all the
data there).

I thought it was due to high load, because I first noticed it when the
gmetad server was being hammered by a separate process.  But it has long
since recovered, and I have not seen the graphs recover, but they have
in fact gotten worse.

I was running 3.0.1, and tried upgrading to 3.0.2 on the off chance it
would fix something, but it did not.  I have since downgraded the webui
because I have made some changes[*] and I don't want to spend the time
to
migrate them just now.  :)  

When I go into the page for a single host and click on the 'gmetrics'
link, I find that all of my metrics have a record of being recieved
within the last two minutes (my time period).  And yet, their graphs
show up empty.

Any thoughts?  What logs should I be looking at?  

I am running on a Fedora Core 3 system, with version 3.0.1 (now 3.0.2).
I don't think I've made any gross changes to the environment within the
last week, which is the time period in which all this annoyance has
started.  The only think I can say is that the beginning of this
strangeness coincides with a brief (12-hr) period of intense load on the
gmetad server.

Thanks,

-ben

[*] for those interested - I added an 8-hour and 3-day view; I find the
8-hour view the most useful by far.  I also changed the size of the
graphs to fit my 20" screen.  Finally, I added a Disk summary graph, in
addition to the Load, CPU, Memory, and Network.  Is there any interest
in patching these into the source?

-- 
Ben Hartshorne
email: [EMAIL PROTECTED]
http://ben.hartshorne.net


Reply via email to