Changing to unicast as described below fixed the problem.
Any ideas why multicast isn't working? I know multicast is compiled
into the kernel and ifconfig shows that my interface is capable of it.
Thanks for all the help.
--Matt
On Mon, 2005-05-23 at 19:17, Ian Cunningham wrote:
> Matt Klaric,
>
> It seems as though multicast may not be working on your network. I
> would suggest trying unicast if you cannot enable multicast. Your
> gmond.conf would look more like this:
> cluster {
> name = "Geo Cluster"
> }
> udp_send_channel {
> host = 192.168.7.10
> port = 8649
> }
> udp_recv_channel {
> port = 8649
> }
> tcp_accept_channel {
> port = 8649
> }
> And then you would remove the recv_channel on all the other nodes.
> This is how I run things and it seems to work ok.
>
> Ian
>
>
> Matt Klaric wrote:
> > Thanks for the feedback. I have modified my configuration as
> > suggested. I now have only one of my machines as the data source. The
> > problem that I now have is that only one node shows up in the web
> > interface. That's why I had previously added each node as a data
> > source.
> >
> > Is there something wrong with my configuration? I've attached the
> > relevant portions of the files below.
> >
> > Thanks,
> > Matt
> >
> > ##############
> >
> > gmetad.conf:
> > data_source "g" 192.168.7.10
> >
> > ##############
> >
> > gmond.conf on 192.168.7.10:
> > globals {
> > setuid = yes
> > user = nobody
> > cleanup_threshold = 300 /*secs */
> > }
> > cluster {
> > name = "Geo Cluster"
> > }
> > udp_send_channel {
> > mcast_join = 239.2.11.71
> > port = 8649
> > }
> > udp_recv_channel {
> > mcast_join = 239.2.11.71
> > port = 8649
> > bind = 239.2.11.71
> > }
> > tcp_accept_channel {
> > port = 8649
> > }
> >
> > ####################
> >
> > gmond.conf on all others:
> > globals {
> > mute = "no"
> > deaf = "yes"
> > debug_level = "0"
> > setuid = "yes"
> > user="nobody"
> > gexec = "no"
> > host_dmax = "0"
> > }
> > cluster {
> > name = "Geo Cluster"
> > }
> > udp_send_channel {
> > mcast_join = 239.2.11.71
> > port = 8649
> > }
> > udp_recv_channel {
> > mcast_join = 239.2.11.71
> > port = 8649
> > bind = 239.2.11.71
> > }
> > tcp_accept_channel {
> > port = 8649
> > }
> >
> >
> > On Mon, 2005-05-23 at 15:01, Paul Henderson wrote:
> >
> > > The way I would do it is this:
> > >
> > > define only one data source
> > > in the /etc/gmond.conf on the four systems that are not data sources,
> > > set "mute = no" and "deaf = yes" in the global variables section, i.e.:
> > >
> > > /* global variables */
> > > globals {
> > > mute = "no"
> > > deaf = "yes"
> > > debug_level = "0"
> > > setuid = "yes"
> > > user="nobody"
> > > gexec = "no"
> > > host_dmax = "0"
> > > }
> > >
> > >
> > > Ian Cunningham wrote:
> > >
> > >
> > > > Matt Klaric,
> > > >
> > > > I am seeing this same problem as well. There seems to be a problem
> > > > with how gmetad computes the summaries for each grid. It seems as
> > > > though it resets it's count of machines each processing loop. When it
> > > > is asked by the front end, it seems as though gmetad has not yet
> > > > finished its counting, so you get incomplete numbers for the grid
> > > > summary. The odd thing is that cluster summaries work just fine.
> > > >
> > > > As an aside, I think I have noticed that you are using each machine on
> > > > your cluster as a data_source. Normally you would just have one of the
> > > > machines on the cluster as a data source, as well as backup nodes for
> > > > redundancy. If you were using multicast, all your nodes would share
> > > > information on the multicast channel. This is all defined in your
> > > > gmond.conf. Using the data in your example, I would suggest that your
> > > > gmetad look more like:
> > > >
> > > > data_source "foo" 192.168.7.10 192.168.7.11
> > > >
> > > > This way you do not need to define every node in the cluster in the
> > > > gmetad config file (as separate clusters)
> > > >
> > > > Ian
> > > >
> > > > Matt Klaric wrote:
> > > >
> > > >
> > > > > I've installed Ganglia v3.0.1 and setup the web interface to gmetad.
> > > > > I've setup this up on a small cluster of 5 machines using the default
> > > > > configuration for gmond by using the command 'gmond -t'. I've put
> > > > > this
> > > > > config file no all the nodes.
> > > > > Then I setup my gmetad.conf file as follows:
> > > > > data_source "a" 192.168.7.10
> > > > > data_source "b" 192.168.7.11
> > > > > data_source "c" 192.168.7.12
> > > > > data_source "d" 192.168.7.13
> > > > > data_source "e" 192.168.7.14
> > > > > gridname "foo"
> > > > >
> > > > > When I look at the web interface for Ganglia I notice that the image
> > > > > showing the number of CPUs in the cluster is not accurate. It
> > > > > oscillates up and down over time despite nodes not being added or
> > > > > removed from the cluster. It's reporting anywhere from 8 to 14 CPUs
> > > > > in
> > > > > the cluster when there are really 20 CPUs in the 5 boxes. (The text
> > > > > to
> > > > > the left of this image does indicate there are 20 CPUs in 5 hosts.)
> > > > > Additionally, "Total In-core Memory" shown in the cluster on this
> > > > > interface is lower than the sum of the amount of RAM in all boxes and
> > > > > varies over time.
> > > > > However, if I look at the stats for any one node in the cluster the
> > > > > values are correct and constant over time.
> > > > > Has anyone seen these kinds of problems? How have you addressed them?
> > > > >
> > > > > Thanks,
> > > > > Matt
> > > > >
> > > > >
> > > > > -------------------------------------------------------
> > > > > This SF.Net email is sponsored by Oracle Space Sweepstakes
> > > > > Want to be the first software developer in space?
> > > > > Enter now for the Oracle Space Sweepstakes!
> > > > > http://ads.osdn.com/?ad_id=7412&alloc_id=16344&op=click
> > > > > _______________________________________________
> > > > > Ganglia-general mailing list
> > > > > [email protected]
> > > > > https://lists.sourceforge.net/lists/listinfo/ganglia-general
> > > > >
> > > > >
> > > > >
> > > > >
> > > > -------------------------------------------------------
> > > > This SF.Net email is sponsored by Oracle Space Sweepstakes
> > > > Want to be the first software developer in space?
> > > > Enter now for the Oracle Space Sweepstakes!
> > > > http://ads.osdn.com/?ad_id=7412&alloc_id=16344&op=click
> > > > _______________________________________________
> > > > Ganglia-general mailing list
> > > > [email protected]
> > > > https://lists.sourceforge.net/lists/listinfo/ganglia-general
> > > >
> > >
> >