Re: [Ganglia-developers] reorganizing clusters

Chuck Simmons Thu, 23 Mar 2006 11:09:48 -0800

The number of cpus does get sorted out, but I don't believe thatrestarting 'gmond' is a solution. The problem occurs after restarting anumber of 'gmond' processes, and the problem is caused because 'gmond'is not reporting the information. Does 'gmond' maintain a timestamp ondisk as to when it last reported the number of cpus and insist onwaiting sufficiently long to report again? Does the collectivedistributed memory of the system remember when the number of cpus waslast reported but not remember what the last reported value was? Isthere any chance that anyone can give me hints to how the code workswithout me having to read the code and reverse engineer the intent?

I understand that I can group nodes via /etc/gmond.conf. The questionis, once I have screwed up the configuration, how do I recover from thatscrew up? I have restarted various gmetad's and various gmond's. Thegrouping is still incorrect. Exactly which gmetad's and gmond's do Ihave to shut down when. And, again, my real question is aboutunderstanding how the code works -- how the distributed memory works.

I'd much rather be ignored than have people try to pawn off facileanswers on me.


Cheers, Chuck



Bernard Li wrote:

Hi Chuck:
For the first issue - give it time, it should sort itself out.Alternatively, you can find out which node is reporting incorrectinformation, and restart gmond on it.For the second issue, you can group nodes in different data_source viathe multicast port in /etc/gmond.conf. Use the same port # for nodesyou want belonging to the same group.You'll need to restart gmetad and gmond for the new groupings to takeeffect.Cheers,Bernard
------------------------------------------------------------------------
*From:* [EMAIL PROTECTED] on behalf ofChuck Simmons
*Sent:* Wed 22/03/2006 17:54
*To:* ganglia-developers@lists.sourceforge.net
*Subject:* [Ganglia-developers] reorganizing clusters

I need help understanding two things.

I currently have a grid.  One of the clusters in the grid is named
"staiu" and the "grid" level web page reports that this has 8 hosts
containing 4 cpus.  In actuality, this has 8 hosts each containing 4
cpus, but apparently the hosts are not reporting the current number of
cpus to the front end.  Why not?  I recently restarted gmond on each of
the 8 hosts.

Another cluster is named "staqp05-08" and the "grid" level web page
reports that this has 12 hosts. In actual fact, it only has 4 hosts.The extra 8 hosts are the 8 hosts of the 'staiu' cluster. On the
cluster level page for staqp05-08, the "choose a node" pull down menu
lists the 8 staiu hosts, and the "hosts up" number contains the staiu
hosts, and there are undrawn graphs for each of the staiu hosts in the
"load one" section.  What do I have to do so that the web pages or gmond
daemons or whatever won't think that the staqp cluster contains the
staiu hosts?  What is the specific mechanism that causes this
association to persist despite having shutdown all staqp gmond daemons
and both the gmond and gmetad daemons on the web server, simultaneously,
and then starting up that collection of daemons?

Thanks, Chuck


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scriptinglanguagethat extends applications into web and mobile media. Attend the livewebcastand join the prime developer group breaking into this new codingterritory!http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642<http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642>
_______________________________________________
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] reorganizing clusters

Reply via email to