Thanks for the reply, guys, but I'm not sure this is going to help. I've been restarting my gmonds over and over for a while now, and it's not doing any good. This memory graph has been broken for close to a year...in fact, I don't recall that it's ever worked at all since I first installed Ganglia. It's not something that ever recovers. The graph updates with respect to time, but the lines on it never change...even when I take down a whole cluster of systems.
All the individual graphs for the nodes and clusters are all fine...so I don't see how there could be missing data casuing this. Isn't the Grid Memory graph just built from the RRD files that already exist? My grid currently consists of 2500+ nodes separated into 22 clusters. Is this just maybe more than it can represent in a single graph? Again, the load graph works just fine...it's only the Grid Memory graph that's not working. Not a big deal at all, but I might look into just changing the PHP to not display the graph at all eventually...or get rid of both of the overview graphs altogether. Thanks again. Steve Gilbert Unix Systems Administrator [EMAIL PROTECTED] -----Original Message----- From: Jason A. Smith [mailto:[EMAIL PROTECTED] Sent: Friday, September 17, 2004 7:15 AM To: Sean Dilda Cc: Steve Gilbert; Ganglia General Subject: Re: [Ganglia-general] webfrontend questions I believe this happens because gmond's multicast is udp based so there are no guarantees that all packets are always received. If you restart all of your gmonds at about the same time, some of the gmond's will probably miss at least some of the flurry of data that is being sent out when all the gmonds first start up. You should only notice this missing data with the metrics that are actually "constant" values or ones with high time/value thresholds. They will eventually be retransmitted after several minutes, so you really only have to wait a little while. You can try to restart the gmonds, but you still might miss some data in that initial flood or traffic. I usually don't worry about it since it seems after about 10 minutes ganglia recovers from this missing data. ~Jason On Fri, 2004-09-17 at 07:42, Sean Dilda wrote: > On Thu, 2004-09-16 at 20:20, Steve Gilbert wrote: > > A couple of quick questions: > > > > gmetad 2.5.6 > > webfrontend 2.5.5 > > > > 1. My Grid overview memory graph seems to be broken. It shows far, > > far less than the total it should be. It used to peak out at 4 TB, > > so I assumed it was just hitting the MAXINT for the system. > > However, I've recently reinstalled gmetad and the webfrontend on a > > 64-bit Opteron, and now my graph is shows 800gb of Total In-Core > > Memory (should be way > > higher) and nothing at all for Memory Used...only a purple line > > (Memory > > Swapped) along the 0 axis. All the individual cluster and node > > memory graphs seem fine...just this main overview is broken. The > > Grid Load overview graph is working just fine. Any ideas? > > I've gotten that before. It seems that sometimes a (or several) > gmond(s) will recognize that certain nodes are up, but be missing a > lot of critical information for them, such as thinking they have 0 > cpus, or no memory used, no load, etc. You can probably see this > better if you start looking at the ganglia pages for the individual nodes. > Unfortunately the best fix I've found is repeatedly restarting gmetad > and all the gmonds until things look right again. > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170 > Project Admins to receive an Apple iPod Mini FREE for your judgement > on who ports your project to Linux PPC the best. Sponsored by IBM. > Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php > _______________________________________________ > Ganglia-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/ganglia-general -- /------------------------------------------------------------------\ | Jason A. Smith Email: [EMAIL PROTECTED] | | Atlas Computing Facility, Bldg. 510M Phone: (631)344-4226 | | Brookhaven National Lab, P.O. Box 5000 Fax: (631)344-7616 | | Upton, NY 11973-5000 | \------------------------------------------------------------------/

