Re: [Ganglia-general] Total CPU inaccuracies

Paul Henderson Tue, 24 May 2005 11:41:15 -0700

Matt,

A lot of network switches won't support multicast. You should speak toyour network folks about this. They would have to do some changes toport settings to make it work (at least that's what they had to dohere). Glad you got it working... btw, unicast is a better approach ifyou can do it anyway...

Paul



Matt Klaric wrote:

Changing to unicast as described below fixed the problem.
Any ideas why multicast isn't working?  I know multicast is compiled
into the kernel and ifconfig shows that my interface is capable of it.Thanks for all the help.
--Matt


On Mon, 2005-05-23 at 19:17, Ian Cunningham wrote:
Matt Klaric,

It seems as though multicast may not be working on your network. I
would suggest trying unicast if you cannot enable multicast. Your
gmond.conf would look more like this:
cluster {
 name = "Geo Cluster"
}
udp_send_channel {
 host = 192.168.7.10
 port = 8649
}
udp_recv_channel {
 port = 8649
}
tcp_accept_channel {
 port = 8649
}
And then you would remove the recv_channel on all the other nodes.
This is how I run things and it seems to work ok.

Ian
Matt Klaric wrote:
Thanks for the feedback.  I have modified my configuration as
suggested.  I now have only one of my machines as the data source.  The
problem that I now have is that only one node shows up in the web
interface.  That's why I had previously added each node as a data
source.
Is there something wrong with my configuration?  I've attached the
relevant portions of the files below.
Thanks,
Matt

##############

gmetad.conf:
data_source "g" 192.168.7.10

##############

gmond.conf on 192.168.7.10:
globals {
 setuid = yes
 user = nobody
 cleanup_threshold = 300 /*secs */
}
cluster {
 name = "Geo Cluster"
}
udp_send_channel {
 mcast_join = 239.2.11.71
 port = 8649
}
udp_recv_channel {
 mcast_join = 239.2.11.71
 port = 8649
 bind = 239.2.11.71
}
tcp_accept_channel {
 port = 8649
}

####################

gmond.conf on all others:
globals {
 mute = "no"
 deaf = "yes"
 debug_level = "0"
 setuid = "yes"
 user="nobody"
 gexec = "no"
 host_dmax = "0"
}
cluster {
 name = "Geo Cluster"
}
udp_send_channel {
 mcast_join = 239.2.11.71
 port = 8649
}
udp_recv_channel {
 mcast_join = 239.2.11.71
 port = 8649
 bind = 239.2.11.71
}
tcp_accept_channel {
 port = 8649
}


On Mon, 2005-05-23 at 15:01, Paul Henderson wrote:
The way I would do it is this:

define only one data source
in the /etc/gmond.conf on the four systems that are not data sources,set "mute = no" and "deaf = yes" in the global variables section, i.e.:
/* global variables */
globals {
 mute = "no"
 deaf = "yes"
 debug_level = "0"
 setuid = "yes"
 user="nobody"
 gexec = "no"
 host_dmax = "0"
}


Ian Cunningham wrote:
Matt Klaric,
I am seeing this same problem as well. There seems to be a problemwith how gmetad computes the summaries for each grid. It seems asthough it resets it's count of machines each processing loop. When itis asked by the front end, it seems as though gmetad has not yetfinished its counting, so you get incomplete numbers for the gridsummary. The odd thing is that cluster summaries work just fine.
As an aside, I think I have noticed that you are using each machine onyour cluster as a data_source. Normally you would just have one of themachines on the cluster as a data source, as well as backup nodes forredundancy. If you were using multicast, all your nodes would shareinformation on the multicast channel. This is all defined in yourgmond.conf. Using the data in your example, I would suggest that yourgmetad look more like:
data_source "foo" 192.168.7.10 192.168.7.11
This way you do not need to define every node in the cluster in thegmetad config file (as separate clusters)
Ian

Matt Klaric wrote:
I've installed Ganglia v3.0.1 and setup the web interface to gmetad.I've setup this up on a small cluster of 5 machines using the default
configuration for gmond by using the command 'gmond -t'.  I've put this
config file no all the nodes.Then I setup my gmetad.conf file as follows:
data_source "a" 192.168.7.10
data_source "b" 192.168.7.11
data_source "c" 192.168.7.12
data_source "d" 192.168.7.13
data_source "e" 192.168.7.14
gridname "foo"

When I look at the web interface for Ganglia I notice that the image
showing the number of CPUs in the cluster is not accurate.  It
oscillates up and down over time despite nodes not being added or
removed from the cluster.  It's reporting anywhere from 8 to 14 CPUs in
the cluster when there are really 20 CPUs in the 5 boxes.  (The text to
the left of this image does indicate there are 20 CPUs in 5 hosts.)Additionally, "Total In-core Memory" shown in the cluster on this
interface is lower than the sum of the amount of RAM in all boxes and
varies over time.However, if I look at the stats for any one node in the cluster thevalues are correct and constant over time.Has anyone seen these kinds of problems? How have you addressed them?
Thanks,
Matt


-------------------------------------------------------
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_id=7412&alloc_id=16344&op=click
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general
-------------------------------------------------------
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_id=7412&alloc_id=16344&op=click
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general
-------------------------------------------------------
This SF.Net email is sponsored by Yahoo.
Introducing Yahoo! Search Developer Network - Create apps using Yahoo!
Search APIs Find out how you can build Yahoo! directly into your own
Applications - visit http://developer.yahoo.net/?fr=offad-ysdn-ostg-q22005
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] Total CPU inaccuracies

Reply via email to