Re: [Ganglia-general] Nodes Showing Down Constantly

David Chin Wed, 17 Dec 2014 08:22:50 -0800

I've had a similar problem in the past. It was fixed by setting, in
gmond.conf:


     globals {
         ...
         send_metadata_interval = 60 /* secs */
     }

because I was not using multicast.

--Dave

On Wed, Dec 17, 2014 at 11:13 AM, Jared David Baker <jared.ba...@uwyo.edu>
wrote:
>
>  Thanks Vladimir!
>
>
>
> I’ve looked at the link that you provided and the setup is very similar.
> The guide suggested that there was nearly no configuration for the
> multicast gather, which is why I didn’t bother changing much of the initial
> configuration after `gmond -t > $PREFIX/etc/gmond.conf`. Here is the top
> segment of the gmond.conf file which is the same across the entire cluster
> (aggregator and clients):
>
>
>
> --
>
> globals {
>
>   daemonize = yes
>
>   setuid = yes
>
>   user = ganglia
>
>   debug_level = 0
>
>   max_udp_msg_len = 1472
>
>   mute = no
>
>   deaf = no
>
>   allow_extra_data = yes
>
>   host_dmax = 86400 /*secs. Expires (removes from web interface) hosts in
> 1 day */
>
>   host_tmax = 20 /*secs */
>
>   cleanup_threshold = 300 /*secs */
>
>   gexec = no
>
>   # By default gmond will use reverse DNS resolution when displaying your
> hostname
>
>   # Uncommeting following value will override that value.
>
>   # override_hostname = "mywebserver.domain.com"
>
>   # If you are not using multicast this value should be set to something
> other than 0.
>
>   # Otherwise if you restart aggregator gmond you will get empty graphs.
> 60 seconds is reasonable
>
>   send_metadata_interval = 0 /*secs */
>
>
>
> }
>
>
>
> cluster {
>
>   name = "<cluster-name>"
>
>   owner = "<Owner>"
>
>   latlong = "<Location>"
>
>   url = "<cluster-ganglia-url>"
>
> }
>
>
>
> host {
>
>   location = "unspecified"
>
> }
>
>
>
> udp_send_channel {
>
>   bind_hostname = yes # Highly recommended, soon to be default.
>
>                        # This option tells gmond to use a source address
>
>                        # that resolves to the machine's hostname.  Without
>
>                        # this, the metrics may appear to come from any
>
>                        # interface and the DNS names associated with
>
>                        # those IPs will be used to create the RRDs.
>
>   mcast_join = 239.2.11.71
>
>   #host = <my_private_host>
>
>   port = 8649
>
>   ttl = 1
>
> }
>
>
>
> udp_recv_channel {
>
>   mcast_join = 239.2.11.71
>
>   port = 8649
>
>   bind = 239.2.11.71
>
>   retry_bind = true
>
>   # Size of the UDP buffer. If you are handling lots of metrics you really
>
>   # should bump it up to e.g. 10MB or even higher.
>
>   # buffer = 10485760
>
> }
>
>
>
> tcp_accept_channel {
>
>   port = 8649
>
>   # If you want to gzip XML output
>
>   gzip_output = no
>
> }
>
> --
>
>
>
> I didn’t modify anything in the modules section or the default collection
> groups.
>
>
>
> Also, below is the gmetad.conf file w/o comments, but sanitized:
>
>
>
> --
>
> data_source "<cluster-name>" 10 localhost <private-aggregator-hostname>
>
> gridname "<later-grid-name>"
>
> authority "<cluster-ganglia-url"
>
> trusted_hosts 127.0.0.1 <private-aggregator-hostname>
> <additional-ip-grid-aggregation>
>
> setuid on
>
> setuid_username "ganglia"
>
> case_sensitive_hostnames 0
>
> --
>
>
>
> If you need additional information, please let me know. Again, thanks for
> the help!
>
>
>
> Regards,
>
>
>
> Jared
>
>
>
>
>
> *From:* Vladimir Vuksan [mailto:vli...@veus.hr]
> *Sent:* Tuesday, December 16, 2014 9:17 PM
> *To:* Jared David Baker; ganglia-general@lists.sourceforge.net
> *Subject:* Re: [Ganglia-general] Nodes Showing Down Constantly
>
>
>
> Hi Jared,
>
> can you review
>
> https://github.com/ganglia/monitor-core/wiki/Ganglia-Quick-Start
>
> and let us know if your set up looks similar. If it does please post top
> 100 or so lines of the config from your aggregator and client nodes.
> Sanitize any names or IPs.
>
> Thanks,
>
> Vladimir
>
> On 12/16/2014 10:49 PM, Jared David Baker wrote:
>
>  Hello All,
>
>
>
> I’m new to the Ganglia scene, but I’ve been working on installing it as
> part of a project now. I’ve built the latest Ganglia software (3.6.1 at
> time of writing) from source on CentOS 6.5 and the build seemed to go fine,
> no major issues that I saw. I am using multicasting as the send/recv method
> and have limited the multicasting to the a cluster’s private network
> interface.
>
>
>
> I start up gmond on the aggregator node (which happens to the be cluster
> master node) and start up the gmond daemons on the client machines. I think
> proceed to start gmetad to view the metrics via web browser and for the
> first 3 minutes or so, it appears Ganglia is working perfectly, getting the
> correct numbers and such.  However, after about 3 minutes, the web
> interface and gstat command report that all my client nodes are down. The
> aggregation node remains active. I’ve been unable to determine the root
> cause of this. However, if I restart the gmond daemon on the aggregator,
> the client nodes come back for approximately 3 minutes again before going
> into the ‘dead’ state?
>
>
>
> The configurations are nearly standard with only minor changes which seem
> more like descriptive entries and not operational entries. I only changed
> the cluster{} block. The filesystem where gmond is from is common and
> therefore the same configuration file is used for all nodes. I’m not too
> worried about the host{} block to describe location right now.
>
>
>
> Does anybody have any helpful pointers and/or suggestions on where to look
> for an issue or a misconfiguration?
>
>
>
> Thanks everyone!
>
>
>
> Jared
>
>
>
>
>  
> ------------------------------------------------------------------------------
>
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
>
> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
>
> with Interactivity, Sharing, Native Excel Exports, App Integration & more
>
> Get technology previously reserved for billion-dollar corporations, FREE
>
> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
>
>
>
>
>  _______________________________________________
>
> Ganglia-general mailing list
>
> Ganglia-general@lists.sourceforge.net
>
> https://lists.sourceforge.net/lists/listinfo/ganglia-general
>
>
>
>
> ------------------------------------------------------------------------------
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> with Interactivity, Sharing, Native Excel Exports, App Integration & more
> Get technology previously reserved for billion-dollar corporations, FREE
>
> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
> _______________________________________________
> Ganglia-general mailing list
> Ganglia-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ganglia-general
>
>

-- 
David Chin, Ph.D.
david.c...@drexel.edu    Sr. Systems Administrator, URCF, Drexel U.
http://www.drexel.edu/research/urcf/
https://linuxfollies.blogspot.com/
215.221.4747 (mobile)
https://github.com/prehensilecode

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk

_______________________________________________
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] Nodes Showing Down Constantly

Reply via email to