Re: [Ganglia-general] Total CPU inaccuracies

Matt Klaric Tue, 24 May 2005 07:58:47 -0700

Changing to unicast as described below fixed the problem.  

Any ideas why multicast isn't working?  I know multicast is compiled
into the kernel and ifconfig shows that my interface is capable of it.


Thanks for all the help.  

--Matt


On Mon, 2005-05-23 at 19:17, Ian Cunningham wrote:
> Matt Klaric,
> 
> It seems as though multicast may not be working on your network. I
> would suggest trying unicast if you cannot enable multicast. Your
> gmond.conf would look more like this:
> cluster {
>   name = "Geo Cluster"
> }
> udp_send_channel {
>   host = 192.168.7.10
>   port = 8649
> }
> udp_recv_channel {
>   port = 8649
> }
> tcp_accept_channel {
>   port = 8649
> }
> And then you would remove the recv_channel on all the other nodes.
> This is how I run things and it seems to work ok.
> 
> Ian
> 
> 
> Matt Klaric wrote: 
> > Thanks for the feedback.  I have modified my configuration as
> > suggested.  I now have only one of my machines as the data source.  The
> > problem that I now have is that only one node shows up in the web
> > interface.  That's why I had previously added each node as a data
> > source.  
> > 
> > Is there something wrong with my configuration?  I've attached the
> > relevant portions of the files below.  
> > 
> > Thanks,
> > Matt
> > 
> > ##############
> > 
> > gmetad.conf:
> > data_source "g" 192.168.7.10
> > 
> > ##############
> > 
> > gmond.conf on 192.168.7.10:
> > globals {
> >   setuid = yes
> >   user = nobody
> >   cleanup_threshold = 300 /*secs */
> > }
> > cluster {
> >   name = "Geo Cluster"
> > }
> > udp_send_channel {
> >   mcast_join = 239.2.11.71
> >   port = 8649
> > }
> > udp_recv_channel {
> >   mcast_join = 239.2.11.71
> >   port = 8649
> >   bind = 239.2.11.71
> > }
> > tcp_accept_channel {
> >   port = 8649
> > }
> > 
> > ####################
> > 
> > gmond.conf on all others:
> > globals {
> >   mute = "no"
> >   deaf = "yes"
> >   debug_level = "0"
> >   setuid = "yes"
> >   user="nobody"
> >   gexec = "no"
> >   host_dmax = "0"
> > }
> > cluster {
> >   name = "Geo Cluster"
> > }
> > udp_send_channel {
> >   mcast_join = 239.2.11.71
> >   port = 8649
> > }
> > udp_recv_channel {
> >   mcast_join = 239.2.11.71
> >   port = 8649
> >   bind = 239.2.11.71
> > }
> > tcp_accept_channel {
> >   port = 8649
> > }
> > 
> > 
> > On Mon, 2005-05-23 at 15:01, Paul Henderson wrote:
> >   
> > > The way I would do it is this:
> > > 
> > > define only one data source
> > > in the /etc/gmond.conf on the four systems that are not data sources, 
> > > set "mute = no" and "deaf = yes" in the global variables section, i.e.:
> > > 
> > > /* global variables */
> > > globals {
> > >   mute = "no"
> > >   deaf = "yes"
> > >   debug_level = "0"
> > >   setuid = "yes"
> > >   user="nobody"
> > >   gexec = "no"
> > >   host_dmax = "0"
> > > }
> > > 
> > > 
> > > Ian Cunningham wrote:
> > > 
> > >     
> > > > Matt Klaric,
> > > > 
> > > > I am seeing this same problem as well. There seems to be a problem 
> > > > with how gmetad computes the summaries for each grid. It seems as 
> > > > though it resets it's count of machines each processing loop. When it 
> > > > is asked by the front end, it seems as though gmetad has not yet 
> > > > finished its counting, so you get incomplete numbers for the grid 
> > > > summary. The odd thing is that cluster summaries work just fine.
> > > > 
> > > > As an aside, I think I have noticed that you are using each machine on 
> > > > your cluster as a data_source. Normally you would just have one of the 
> > > > machines on the cluster as a data source, as well as backup nodes for 
> > > > redundancy.  If you were using multicast, all your nodes would share 
> > > > information on the multicast channel. This is all defined in your 
> > > > gmond.conf. Using the data in your example, I would suggest that your 
> > > > gmetad look more like:
> > > > 
> > > > data_source "foo" 192.168.7.10 192.168.7.11
> > > > 
> > > > This way you do not need to define every node in the cluster in the 
> > > > gmetad config file (as separate clusters)
> > > > 
> > > > Ian
> > > > 
> > > > Matt Klaric wrote:
> > > > 
> > > >       
> > > > > I've installed Ganglia v3.0.1 and setup the web interface to gmetad. 
> > > > > I've setup this up on a small cluster of 5 machines using the default
> > > > > configuration for gmond by using the command 'gmond -t'.  I've put 
> > > > > this
> > > > > config file no all the nodes. 
> > > > > Then I setup my gmetad.conf file as follows:
> > > > > data_source "a" 192.168.7.10
> > > > > data_source "b" 192.168.7.11
> > > > > data_source "c" 192.168.7.12
> > > > > data_source "d" 192.168.7.13
> > > > > data_source "e" 192.168.7.14
> > > > > gridname "foo"
> > > > > 
> > > > > When I look at the web interface for Ganglia I notice that the image
> > > > > showing the number of CPUs in the cluster is not accurate.  It
> > > > > oscillates up and down over time despite nodes not being added or
> > > > > removed from the cluster.  It's reporting anywhere from 8 to 14 CPUs 
> > > > > in
> > > > > the cluster when there are really 20 CPUs in the 5 boxes.  (The text 
> > > > > to
> > > > > the left of this image does indicate there are 20 CPUs in 5 hosts.) 
> > > > > Additionally, "Total In-core Memory" shown in the cluster on this
> > > > > interface is lower than the sum of the amount of RAM in all boxes and
> > > > > varies over time. 
> > > > > However, if I look at the stats for any one node in the cluster the
> > > > > values are correct and constant over time. 
> > > > > Has anyone seen these kinds of problems?  How have you addressed them?
> > > > > 
> > > > > Thanks,
> > > > > Matt
> > > > > 
> > > > > 
> > > > > -------------------------------------------------------
> > > > > This SF.Net email is sponsored by Oracle Space Sweepstakes
> > > > > Want to be the first software developer in space?
> > > > > Enter now for the Oracle Space Sweepstakes!
> > > > > http://ads.osdn.com/?ad_id=7412&alloc_id=16344&op=click
> > > > > _______________________________________________
> > > > > Ganglia-general mailing list
> > > > > [email protected]
> > > > > https://lists.sourceforge.net/lists/listinfo/ganglia-general
> > > > > 
> > > > >  
> > > > > 
> > > > >         
> > > > -------------------------------------------------------
> > > > This SF.Net email is sponsored by Oracle Space Sweepstakes
> > > > Want to be the first software developer in space?
> > > > Enter now for the Oracle Space Sweepstakes!
> > > > http://ads.osdn.com/?ad_id=7412&alloc_id=16344&op=click
> > > > _______________________________________________
> > > > Ganglia-general mailing list
> > > > [email protected]
> > > > https://lists.sourceforge.net/lists/listinfo/ganglia-general
> > > >       
> > >     
> >

Re: [Ganglia-general] Total CPU inaccuracies

Reply via email to