We had a similar problem a few weeks ago, except that our gmetad never seemed to recover. It was crashing, and had to be restarted manually almost daily. I enabled the debug output to syslog, but received no indication of what was failing -- it just quit!
At the time, we were in the process of consolidating our gmetad's to a single server (we have three clusters being monitored, and each had its own gmetad and web interface). Following the migration to the new server, the problem went away so we never followed up. The gmetad we had problems with worked reliably for nearly a year before having the problems. Once the problem started, it occurred reliably (nearly every night). I could reenable the interface if it might help to resolve a bigger problem. -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ben Hartshorne Sent: Monday, January 23, 2006 6:52 PM To: ganglia-general@lists.sourceforge.net Subject: [Ganglia-general] intermittent blanks in graphs Hi, I have been running ganglia for most of the last year, quite happily. My hosts are configured to send unicast data to a single gmetad server. Recently, large portions of the cluster's graphs are empty. A sample is shown at http://cryptio.net/~ben/ganglia/blank_graphs.png Notice that not all hosts are missing data (Burgertime, for example, has all the data there). I thought it was due to high load, because I first noticed it when the gmetad server was being hammered by a separate process. But it has long since recovered, and I have not seen the graphs recover, but they have in fact gotten worse. I was running 3.0.1, and tried upgrading to 3.0.2 on the off chance it would fix something, but it did not. I have since downgraded the webui because I have made some changes[*] and I don't want to spend the time to migrate them just now. :) When I go into the page for a single host and click on the 'gmetrics' link, I find that all of my metrics have a record of being recieved within the last two minutes (my time period). And yet, their graphs show up empty. Any thoughts? What logs should I be looking at? I am running on a Fedora Core 3 system, with version 3.0.1 (now 3.0.2). I don't think I've made any gross changes to the environment within the last week, which is the time period in which all this annoyance has started. The only think I can say is that the beginning of this strangeness coincides with a brief (12-hr) period of intense load on the gmetad server. Thanks, -ben [*] for those interested - I added an 8-hour and 3-day view; I find the 8-hour view the most useful by far. I also changed the size of the graphs to fit my 20" screen. Finally, I added a Disk summary graph, in addition to the Load, CPU, Memory, and Network. Is there any interest in patching these into the source? -- Ben Hartshorne email: [EMAIL PROTECTED] http://ben.hartshorne.net