Re: [Ganglia-general] gmetad(3.6.1) suddenly stoped

2015-03-18 Thread Vladimir Vuksan

  
  
Based on the timeout from source
  messages you are either having network connectivity issues polling
  gmonds or they are down.
  
  Vladimir
  
  On 03/17/2015 12:46 AM, 潇湘居士 wrote:


  
Hi,
  
      my gmetad(3.6.1) suddenly stoped, and it has passed much
  time when I find the stop status. 
      here is the log:
     

  [root@ca5 log]# grep -v RRD_update messages | tail -n 20
  Mar 16 15:30:31 ca5 /usr/sbin/gmetad[28087]: poll() timeout
  from source 0 for [hb] data source after 0 bytes read 
  Mar 16 15:30:46 ca5 /usr/sbin/gmetad[28087]: poll() timeout
  from source 1 for [hb] data source after 0 bytes read 
  Mar 16 15:31:01 ca5 /usr/sbin/gmetad[28087]: poll() timeout
  from source 0 for [hb] data source after 0 bytes read 
  Mar 16 15:54:40 ca5 /usr/sbin/gmetad[28087]: poll() timeout
  from source 1 for [hb] data source after 0 bytes read 
  Mar 16 15:54:48 ca5 /usr/sbin/gmetad[28087]: poll() timeout
  from source 1 for [dp] data source after 0 bytes read 
  Mar 16 15:54:54 ca5 /usr/sbin/gmetad[28087]: poll() timeout
  from source 1 for [stat] data source after 43261 bytes read 
  Mar 16 15:54:56 ca5 /usr/sbin/gmetad[28087]: poll() timeout
  from source 0 for [hb] data source after 0 bytes read 
  Mar 16 15:55:09 ca5 /usr/sbin/gmetad[28087]: poll() timeout
  from source 0 for [test] data source after 5427 bytes read 
  Mar 16 15:55:10 ca5 /usr/sbin/gmetad[28087]: poll() timeout
  from source 0 for [dp] data source after 11584 bytes read 
  Mar 16 15:55:27 ca5 /usr/sbin/gmetad[28087]: poll() timeout
  from source 1 for [dp] data source after 0 bytes read 
  Mar 16 15:55:31 ca5 /usr/sbin/gmetad[28087]: poll() timeout
  from source 1 for [hb] data source after 0 bytes read 
  Mar 16 18:26:22 ca5 last message repeated 2 times
  Mar 16 18:28:23 ca5 kernel: gmetad[28126]: segfault at
3fc22580 rip 003f1320ba5f rsp 59073790
error 4
  Mar 16 19:58:23 ca5 auditd[3025]: Audit daemon rotating log
  files
  Mar 17 00:54:25 ca5 Server Administrator: Storage Service
  EventID: 2243  The Patrol Read has stopped.:  Controller 0
  (PERC H700 Integrated) 
  Mar 17 01:05:29 ca5 auditd[3025]: Audit daemon rotating log
  files
  Mar 17 01:30:04 ca5 auditd[3025]: Audit daemon rotating log
  files
  Mar 17 02:00:10 ca5 /usr/sbin/gmetad[6865]: data_thread() for
  [db] failed to contact node 66.160.159.72 
  Mar 17 02:06:14 ca5 last message repeated 3 times
  Mar 17 03:13:29 ca5 last message repeated 2 times
     

  
  my system is RHEL 5.5:

  [root@ca5 log]# lsb_release -a
  LSB Version:   
:core-3.1-amd64:core-3.1-ia32:core-3.1-noarch:graphics-3.1-amd64:graphics-3.1-ia32:graphics-3.1-noarch
  Distributor ID:    RedHatEnterpriseServer
  Description:    Red Hat Enterprise Linux Server release 5.5
  (Tikanga)
  Release:    5.5
  Codename:    Tikanga

  
  And I don't know why did gmetad stop, can anyone help me ?

  
  
  
  
  --
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
  
  
  
  ___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general



  


--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. 

Re: [Ganglia-general] Ganglia unable to report remote client

2015-03-18 Thread Vladimir Vuksan

  
  
On the machine running gmetad are there
  any iptables rules that may be blocking inbound UDP traffic on
  port 8649.
  
  Vladimir
  
  On 03/17/2015 01:06 AM, Ashish Kumar9 wrote:

I would request some
inputs on below issue
. 
  
  
  Thanks and Regards,
Ashish Kumar
  
  
  
  From:      
 Ashish Kumar9/India/IBM
  
  To:      
 ganglia-general@lists.sourceforge.net
  
  Date:      
 03/10/2015 01:59 PM
  
  Subject:    
   Ganglia unable
to report remote client
  
  
  
  
  My ganglia setup is able to
report only
one guest which is local server on which gmond and gmetad both
are installed
. It does not show up other clients inspite of proper
configurations with
gmond .
  
  
  Remote gmond process is running
absolutely
fine . 
  
  
  Debugging done so far 
  
  1) I also verified that port 8649
are
free and not used using lsof -i tcp:8649
  
  2) /var/log/messages do not show
any
error 
  
  3) all the processes - gmetad ,
gmond
are running fine 
  
  
  Please suggest steps/tips to
debug the
issue further . 
  
  
  Server config
  
  
  gmetad.conf :
  
  
  data_source "hadoopgpfs"
bigdatagpfs01
  
  gridname "MyHadoopGPFSGrid"
  
  setuid_username "root"
  
  case_sensitive_hostnames 0
  
  
  
  
  
  gmond.conf :
  
  /* This configuration is as
close
to 2.5.x default behavior as possible
  
     The values closely
match ./gmond/metric.h definitions in 2.5.x */
  
  globals {
  
    daemonize = yes
  
    setuid = no
  
    user = root
  
    debug_level = 0
  
    max_udp_msg_len = 1472
  
    mute = no
  
    deaf = no
  
    allow_extra_data = yes
  
    host_dmax = 86400 /*secs.
Expires (removes from web interface) hosts in 1 day */
  
    host_tmax = 20 /*secs */
  
    cleanup_threshold = 300
/*secs
*/
  
    gexec = no
  
    # By default gmond will use
reverse DNS resolution when displaying your hostname
  
    # Uncommeting following
value
will override that value.
  
    # override_hostname =
"mywebserver.domain.com"
  
    # If you are not using
multicast
this value should be set to something other than 0.
  
    # Otherwise if you restart
aggregator gmond you will get empty graphs. 60 seconds is
reasonable
  
    send_metadata_interval =
30 /*secs */
  
  
  }
  
  
  /*
  
   * The cluster attributes
specified
will be used as part of the CLUSTER
  
   * tag that will wrap all
hosts
collected by this instance.
  
   */
  
  cluster {
  
    name = "hadoopgpfs"
  
    owner = "unspecified"
  
    latlong = "unspecified"
  
    url = ""
  
  }
  
  
  /* The host section describes
attributes
of the host, like the location */
  
  host {
  
    location = "unspecified"
  
  }
  
  
  /* Feel free to specify as
many
udp_send_channels as you like.  Gmond
  
     used to only support
having a single channel */
  
  udp_send_channel {
  
    #bind_hostname = yes #
Highly
recommended, soon to be default.
  
           
             # This option tells gmond
to use a source address
  
           
             # that resolves to the
machine's hostname.  Without
  
           
             # this, the metrics may
appear to come from any
  
           
             # interface and the DNS
names associated with
  
           
             # those IPs will be used
to create the RRDs.
  
    #mcast_join = 10.241.0.21
  
    host = 10.241.0.21
  
    port = 8649
  
    ttl = 1
  
  }
  
  
  /* You can specify as many
udp_recv_channels
as you like as well. */
  
  udp_recv_channel {
  
    #mcast_join = 239.2.11.71
  
    port = 8649
  
    #bind = 239.2.11.71
  
    #retry_bind = true
  
    # Size of the UDP buffer.
If you are handling lots of metrics you really
  
    # should bump it up to e.g.
10MB or even higher.
  
    # buffer = 10485760
  

Re: [Ganglia-general] Can't aggregate custom metrics

2015-03-18 Thread Vladimir Vuksan

  
  
Ayman,
  
  what Ganglia web version are you using?
  
  Another thing to try is append debug=5 to see what rrdtool
  command is being executed. See if you can execute that on the
  command line by hand.
  
  Vladimir
  
  On 03/16/2015 06:18 PM, Ayman Al-Shorman wrote:


  
  Hi All,

I've added around 40 custom metrics using gmetric.

I can aggregate graphs on some hosts and can't on others, i
tried everything i know but the graphs images are broken.

I got the error message "The image
“http://host.com/gweb/graph.php?r=hourz=xlargetitle=BCC+Loadvl=avgx=3n=0hreg%5B%5D=host594mreg%5B%5D=sphinx-avg-query-wallgtype=lineglegend=showaggregate=1embed=1_=1426543062410”
cannot be displayed because it contains errors."

But the graphs are working fine on the host page.

I would be thankful if anyone can help me.
  


  


--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general