Re: [Ganglia-general] gmetad(3.6.1) suddenly stoped
Based on the timeout from source messages you are either having network connectivity issues polling gmonds or they are down. Vladimir On 03/17/2015 12:46 AM, 潇湘居士 wrote: Hi, my gmetad(3.6.1) suddenly stoped, and it has passed much time when I find the stop status. here is the log: [root@ca5 log]# grep -v RRD_update messages | tail -n 20 Mar 16 15:30:31 ca5 /usr/sbin/gmetad[28087]: poll() timeout from source 0 for [hb] data source after 0 bytes read Mar 16 15:30:46 ca5 /usr/sbin/gmetad[28087]: poll() timeout from source 1 for [hb] data source after 0 bytes read Mar 16 15:31:01 ca5 /usr/sbin/gmetad[28087]: poll() timeout from source 0 for [hb] data source after 0 bytes read Mar 16 15:54:40 ca5 /usr/sbin/gmetad[28087]: poll() timeout from source 1 for [hb] data source after 0 bytes read Mar 16 15:54:48 ca5 /usr/sbin/gmetad[28087]: poll() timeout from source 1 for [dp] data source after 0 bytes read Mar 16 15:54:54 ca5 /usr/sbin/gmetad[28087]: poll() timeout from source 1 for [stat] data source after 43261 bytes read Mar 16 15:54:56 ca5 /usr/sbin/gmetad[28087]: poll() timeout from source 0 for [hb] data source after 0 bytes read Mar 16 15:55:09 ca5 /usr/sbin/gmetad[28087]: poll() timeout from source 0 for [test] data source after 5427 bytes read Mar 16 15:55:10 ca5 /usr/sbin/gmetad[28087]: poll() timeout from source 0 for [dp] data source after 11584 bytes read Mar 16 15:55:27 ca5 /usr/sbin/gmetad[28087]: poll() timeout from source 1 for [dp] data source after 0 bytes read Mar 16 15:55:31 ca5 /usr/sbin/gmetad[28087]: poll() timeout from source 1 for [hb] data source after 0 bytes read Mar 16 18:26:22 ca5 last message repeated 2 times Mar 16 18:28:23 ca5 kernel: gmetad[28126]: segfault at 3fc22580 rip 003f1320ba5f rsp 59073790 error 4 Mar 16 19:58:23 ca5 auditd[3025]: Audit daemon rotating log files Mar 17 00:54:25 ca5 Server Administrator: Storage Service EventID: 2243 The Patrol Read has stopped.: Controller 0 (PERC H700 Integrated) Mar 17 01:05:29 ca5 auditd[3025]: Audit daemon rotating log files Mar 17 01:30:04 ca5 auditd[3025]: Audit daemon rotating log files Mar 17 02:00:10 ca5 /usr/sbin/gmetad[6865]: data_thread() for [db] failed to contact node 66.160.159.72 Mar 17 02:06:14 ca5 last message repeated 3 times Mar 17 03:13:29 ca5 last message repeated 2 times my system is RHEL 5.5: [root@ca5 log]# lsb_release -a LSB Version: :core-3.1-amd64:core-3.1-ia32:core-3.1-noarch:graphics-3.1-amd64:graphics-3.1-ia32:graphics-3.1-noarch Distributor ID: RedHatEnterpriseServer Description: Red Hat Enterprise Linux Server release 5.5 (Tikanga) Release: 5.5 Codename: Tikanga And I don't know why did gmetad stop, can anyone help me ? -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more.
Re: [Ganglia-general] Ganglia unable to report remote client
On the machine running gmetad are there any iptables rules that may be blocking inbound UDP traffic on port 8649. Vladimir On 03/17/2015 01:06 AM, Ashish Kumar9 wrote: I would request some inputs on below issue . Thanks and Regards, Ashish Kumar From: Ashish Kumar9/India/IBM To: ganglia-general@lists.sourceforge.net Date: 03/10/2015 01:59 PM Subject: Ganglia unable to report remote client My ganglia setup is able to report only one guest which is local server on which gmond and gmetad both are installed . It does not show up other clients inspite of proper configurations with gmond . Remote gmond process is running absolutely fine . Debugging done so far 1) I also verified that port 8649 are free and not used using lsof -i tcp:8649 2) /var/log/messages do not show any error 3) all the processes - gmetad , gmond are running fine Please suggest steps/tips to debug the issue further . Server config gmetad.conf : data_source "hadoopgpfs" bigdatagpfs01 gridname "MyHadoopGPFSGrid" setuid_username "root" case_sensitive_hostnames 0 gmond.conf : /* This configuration is as close to 2.5.x default behavior as possible The values closely match ./gmond/metric.h definitions in 2.5.x */ globals { daemonize = yes setuid = no user = root debug_level = 0 max_udp_msg_len = 1472 mute = no deaf = no allow_extra_data = yes host_dmax = 86400 /*secs. Expires (removes from web interface) hosts in 1 day */ host_tmax = 20 /*secs */ cleanup_threshold = 300 /*secs */ gexec = no # By default gmond will use reverse DNS resolution when displaying your hostname # Uncommeting following value will override that value. # override_hostname = "mywebserver.domain.com" # If you are not using multicast this value should be set to something other than 0. # Otherwise if you restart aggregator gmond you will get empty graphs. 60 seconds is reasonable send_metadata_interval = 30 /*secs */ } /* * The cluster attributes specified will be used as part of the CLUSTER * tag that will wrap all hosts collected by this instance. */ cluster { name = "hadoopgpfs" owner = "unspecified" latlong = "unspecified" url = "" } /* The host section describes attributes of the host, like the location */ host { location = "unspecified" } /* Feel free to specify as many udp_send_channels as you like. Gmond used to only support having a single channel */ udp_send_channel { #bind_hostname = yes # Highly recommended, soon to be default. # This option tells gmond to use a source address # that resolves to the machine's hostname. Without # this, the metrics may appear to come from any # interface and the DNS names associated with # those IPs will be used to create the RRDs. #mcast_join = 10.241.0.21 host = 10.241.0.21 port = 8649 ttl = 1 } /* You can specify as many udp_recv_channels as you like as well. */ udp_recv_channel { #mcast_join = 239.2.11.71 port = 8649 #bind = 239.2.11.71 #retry_bind = true # Size of the UDP buffer. If you are handling lots of metrics you really # should bump it up to e.g. 10MB or even higher. # buffer = 10485760
Re: [Ganglia-general] Can't aggregate custom metrics
Ayman, what Ganglia web version are you using? Another thing to try is append debug=5 to see what rrdtool command is being executed. See if you can execute that on the command line by hand. Vladimir On 03/16/2015 06:18 PM, Ayman Al-Shorman wrote: Hi All, I've added around 40 custom metrics using gmetric. I can aggregate graphs on some hosts and can't on others, i tried everything i know but the graphs images are broken. I got the error message "The image “http://host.com/gweb/graph.php?r=hourz=xlargetitle=BCC+Loadvl=avgx=3n=0hreg%5B%5D=host594mreg%5B%5D=sphinx-avg-query-wallgtype=lineglegend=showaggregate=1embed=1_=1426543062410” cannot be displayed because it contains errors." But the graphs are working fine on the host page. I would be thankful if anyone can help me. -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general