RE: [Ganglia-general] webfrontend questions

Steve Gilbert Fri, 24 Sep 2004 10:14:32 -0700

Thanks for the reply, guys, but I'm not sure this is going to help.
I've been restarting my gmonds over and over for a while now, and it's
not doing any good.  This memory graph has been broken for close to a
year...in fact, I don't recall that it's ever worked at all since I
first installed Ganglia.  It's not something that ever recovers.  The
graph updates with respect to time, but the lines on it never
change...even when I take down a whole cluster of systems.

All the individual graphs for the nodes and clusters are all fine...so I
don't see how there could be missing data casuing this.  Isn't the Grid
Memory graph just built from the RRD files that already exist?

My grid currently consists of 2500+ nodes separated into 22 clusters.
Is this just maybe more than it can represent in a single graph?  Again,
the load graph works just fine...it's only the Grid Memory graph that's
not working.  Not a big deal at all, but I might look into just changing
the PHP to not display the graph at all eventually...or get rid of both
of the overview graphs altogether.

Thanks again.

Steve Gilbert
Unix Systems Administrator
[EMAIL PROTECTED]

-----Original Message-----
From: Jason A. Smith [mailto:[EMAIL PROTECTED] 
Sent: Friday, September 17, 2004 7:15 AM
To: Sean Dilda
Cc: Steve Gilbert; Ganglia General
Subject: Re: [Ganglia-general] webfrontend questions

I believe this happens because gmond's multicast is udp based so there
are no guarantees that all packets are always received.  If you restart
all of your gmonds at about the same time, some of the gmond's will
probably miss at least some of the flurry of data that is being sent out
when all the gmonds first start up.  You should only notice this missing
data with the metrics that are actually "constant" values or ones with
high time/value thresholds.  They will eventually be retransmitted after
several minutes, so you really only have to wait a little while.  You
can try to restart the gmonds, but you still might miss some data in
that initial flood or traffic.  I usually don't worry about it since it
seems after about 10 minutes ganglia recovers from this missing data.

~Jason

On Fri, 2004-09-17 at 07:42, Sean Dilda wrote:
> On Thu, 2004-09-16 at 20:20, Steve Gilbert wrote:
> > A couple of quick questions:
> > 
> >    gmetad 2.5.6
> >    webfrontend 2.5.5
> > 
> > 1. My Grid overview memory graph seems to be broken.  It shows far, 
> > far less than the total it should be.  It used to peak out at 4 TB, 
> > so I assumed it was just hitting the MAXINT for the system.  
> > However, I've recently reinstalled gmetad and the webfrontend on a 
> > 64-bit Opteron, and now my graph is shows 800gb of Total In-Core 
> > Memory (should be way
> > higher) and nothing at all for Memory Used...only a purple line 
> > (Memory
> > Swapped) along the 0 axis. All the individual cluster and node 
> > memory graphs seem fine...just this main overview is broken. The 
> > Grid Load overview graph is working just fine.  Any ideas?
> 
> I've gotten that before.  It seems that sometimes a (or several)
> gmond(s) will recognize that certain nodes are up, but be missing a 
> lot of critical information for them, such as thinking they have 0 
> cpus, or no memory used, no load, etc.  You can probably see this 
> better if you start looking at the ganglia pages for the individual
nodes.
> Unfortunately the best fix I've found is repeatedly restarting gmetad 
> and all the gmonds until things look right again.
> 
> 
> 
> -------------------------------------------------------
> This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170 
> Project Admins to receive an Apple iPod Mini FREE for your judgement 
> on who ports your project to Linux PPC the best. Sponsored by IBM.
> Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php 
> _______________________________________________
> Ganglia-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/ganglia-general
--
/------------------------------------------------------------------\
|  Jason A. Smith                          Email:  [EMAIL PROTECTED] |
|  Atlas Computing Facility, Bldg. 510M    Phone:  (631)344-4226   |
|  Brookhaven National Lab, P.O. Box 5000  Fax:    (631)344-7616   |
|  Upton, NY 11973-5000                                            |
\------------------------------------------------------------------/

RE: [Ganglia-general] webfrontend questions

Reply via email to