The number of cpus does get sorted out, but I don't believe that restarting 'gmond' is a solution. The problem occurs after restarting a number of 'gmond' processes, and the problem is caused because 'gmond' is not reporting the information. Does 'gmond' maintain a timestamp on disk as to when it last reported the number of cpus and insist on waiting sufficiently long to report again? Does the collective distributed memory of the system remember when the number of cpus was last reported but not remember what the last reported value was? Is there any chance that anyone can give me hints to how the code works without me having to read the code and reverse engineer the intent?

I understand that I can group nodes via /etc/gmond.conf. The question is, once I have screwed up the configuration, how do I recover from that screw up? I have restarted various gmetad's and various gmond's. The grouping is still incorrect. Exactly which gmetad's and gmond's do I have to shut down when. And, again, my real question is about understanding how the code works -- how the distributed memory works.

I'd much rather be ignored than have people try to pawn off facile answers on me.

Cheers, Chuck



Bernard Li wrote:

Hi Chuck:
For the first issue - give it time, it should sort itself out. Alternatively, you can find out which node is reporting incorrect information, and restart gmond on it. For the second issue, you can group nodes in different data_source via the multicast port in /etc/gmond.conf. Use the same port # for nodes you want belonging to the same group. You'll need to restart gmetad and gmond for the new groupings to take effect. Cheers, Bernard

------------------------------------------------------------------------
*From:* [EMAIL PROTECTED] on behalf of Chuck Simmons
*Sent:* Wed 22/03/2006 17:54
*To:* ganglia-developers@lists.sourceforge.net
*Subject:* [Ganglia-developers] reorganizing clusters

I need help understanding two things.

I currently have a grid.  One of the clusters in the grid is named
"staiu" and the "grid" level web page reports that this has 8 hosts
containing 4 cpus.  In actuality, this has 8 hosts each containing 4
cpus, but apparently the hosts are not reporting the current number of
cpus to the front end.  Why not?  I recently restarted gmond on each of
the 8 hosts.

Another cluster is named "staqp05-08" and the "grid" level web page
reports that this has 12 hosts. In actual fact, it only has 4 hosts. The extra 8 hosts are the 8 hosts of the 'staiu' cluster. On the
cluster level page for staqp05-08, the "choose a node" pull down menu
lists the 8 staiu hosts, and the "hosts up" number contains the staiu
hosts, and there are undrawn graphs for each of the staiu hosts in the
"load one" section.  What do I have to do so that the web pages or gmond
daemons or whatever won't think that the staqp cluster contains the
staiu hosts?  What is the specific mechanism that causes this
association to persist despite having shutdown all staqp gmond daemons
and both the gmond and gmetad daemons on the web server, simultaneously,
and then starting up that collection of daemons?

Thanks, Chuck


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 <http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642>
_______________________________________________
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Reply via email to