Re: [Ganglia-general] Unable to see nodes

2012-07-30 Thread Eric Dow
Hi Bernard,

Good call. Switching to unicast works perfectly. Thanks for the suggestion.

Eric

On 07/29/2012 04:13 PM, Bernard Li wrote:
> Hi Eric:
>
> Just try setting up unicast between the head node and one of the other
> hosts, and see if that works.
>
> Regards,
>
> Bernard
>
> On Sun, Jul 29, 2012 at 9:20 AM, Eric Dow  wrote:
>> Thanks for the quick response.
>>
>> I am configured to run in multicast (the default). My head node and all
>> compute nodes are connected to a single router. I don't know if there is a
>> way to make sure that multicast is set up properly. One thing that I checked
>> was this (from the ganglia FAQ):
>>
>> Confirm that UDP connections can be established between the gmetad and
>> gmond(or gmond and other gmond's for multicast purposes) by running nc -u -l
>> 8653 on the host in question, then echo "hello"|nc -u  8653 from
>> the gmetad or another gmond.
>>
>> This worked. I saw "hello" appear on my compute node when I did this.
>>
>> My gmetad.conf is the same as the standard file provided with ganglia. The
>> only modification is the data_source line:
>>
>> data_source "ACDL Cluster" localhost
>>
>> Finally, here's my gmond.conf:
>>
>> /* This configuration is as close to 2.5.x default behavior as possible
>> The values closely match ./gmond/metric.h definitions in 2.5.x */
>> globals {
>>daemonize = yes
>>setuid = yes
>>user = nobody
>>debug_level = 0
>>max_udp_msg_len = 1472
>>mute = no
>>deaf = no
>>allow_extra_data = yes
>>host_dmax = 86400 /*secs. Expires (removes from web interface) hosts in 1
>> day */
>>host_tmax = 20 /*secs */
>>cleanup_threshold = 300 /*secs */
>>gexec = no
>>send_metadata_interval = 0 /*secs */
>> }
>>
>> /*
>>   * The cluster attributes specified will be used as part of the 
>>   * tag that will wrap all hosts collected by this instance.
>>   */
>> cluster {
>>name = "ACDL Cluster"
>>owner = "ACDL"
>>latlong = "unspecified"
>>url = "unspecified"
>> }
>>
>> /* The host section describes attributes of the host, like the location */
>> host {
>>location = "ACDL"
>> }
>>
>> /* Feel free to specify as many udp_send_channels as you like.  Gmond
>> used to only support having a single channel */
>> udp_send_channel {
>>#bind_hostname = yes # Highly recommended, soon to be default.
>> # This option tells gmond to use a source address
>> # that resolves to the machine's hostname.  Without
>> # this, the metrics may appear to come from any
>> # interface and the DNS names associated with
>> # those IPs will be used to create the RRDs.
>>mcast_join = 239.2.11.71
>>port = 8649
>>ttl = 1
>> }
>>
>> /* You can specify as many udp_recv_channels as you like as well. */
>> udp_recv_channel {
>>mcast_join = 239.2.11.71
>>port = 8649
>>bind = 239.2.11.71
>>retry_bind = true
>> }
>>
>> /* You can specify as many tcp_accept_channels as you like to share
>> an xml description of the state of the cluster */
>> tcp_accept_channel {
>>port = 8649
>> }
>>
>> /* Channel to receive sFlow datagrams */
>> #udp_recv_channel {
>> #  port = 6343
>> #}
>>
>> /* Optional sFlow settings */
>> #sflow {
>> # udp_port = 6343
>> # accept_vm_metrics = yes
>> # accept_jvm_metrics = yes
>> # multiple_jvm_instances = no
>> # accept_http_metrics = yes
>> # multiple_http_instances = no
>> # accept_memcache_metrics = yes
>> # multiple_memcache_instances = no
>> #}
>>
>> /* Each metrics module that is referenced by gmond must be specified and
>> loaded. If the module has been statically linked with gmond, it does
>> not require a load path. However all dynamically loadable modules must
>> include a load path. */
>> modules {
>>module {
>>  name = "core_metrics"
>>}
>>module {
>>  name = "cpu_module"
>>  path = "modcpu.so"
>>}
>>module {
>>  name = "disk_module"
>>  path = "moddisk.so"
>>}
>>module {
>>  name = "load_module"
>>  path = "modload.so"
>>}
>>module {
>>  name = "mem_module"
>>  path = "modmem.so"
>>}
>>module {
>>  name = "net_module"
>>  path = "modnet.so"
>>}
>>module {
>>  name = "proc_module"
>>  path = "modproc.so"
>>}
>>module {
>>  name = "sys_module"
>>  path = "modsys.so"
>>}
>> }
>>
>> /* The old internal 2.5.x metric array has been replaced by the following
>> collection_group directives.  What follows is the default behavior for
>> collecting and sending metrics that is as close to 2.5.x behavior as
>> possible. */
>>
>> /* This collection group will cause a heartbeat (or beacon) to be sent every
>> 20 seconds.  In the heartbeat is the GMOND_STARTED data which expresses
>> the age of the running gmond. */
>> collection_group {
>>collect_once = yes
>>time_threshold = 20
>> 

Re: [Ganglia-general] Unable to see nodes

2012-07-29 Thread Bernard Li
Hi Eric:

Just try setting up unicast between the head node and one of the other
hosts, and see if that works.

Regards,

Bernard

On Sun, Jul 29, 2012 at 9:20 AM, Eric Dow  wrote:
> Thanks for the quick response.
>
> I am configured to run in multicast (the default). My head node and all
> compute nodes are connected to a single router. I don't know if there is a
> way to make sure that multicast is set up properly. One thing that I checked
> was this (from the ganglia FAQ):
>
> Confirm that UDP connections can be established between the gmetad and
> gmond(or gmond and other gmond's for multicast purposes) by running nc -u -l
> 8653 on the host in question, then echo "hello"|nc -u  8653 from
> the gmetad or another gmond.
>
> This worked. I saw "hello" appear on my compute node when I did this.
>
> My gmetad.conf is the same as the standard file provided with ganglia. The
> only modification is the data_source line:
>
> data_source "ACDL Cluster" localhost
>
> Finally, here's my gmond.conf:
>
> /* This configuration is as close to 2.5.x default behavior as possible
>The values closely match ./gmond/metric.h definitions in 2.5.x */
> globals {
>   daemonize = yes
>   setuid = yes
>   user = nobody
>   debug_level = 0
>   max_udp_msg_len = 1472
>   mute = no
>   deaf = no
>   allow_extra_data = yes
>   host_dmax = 86400 /*secs. Expires (removes from web interface) hosts in 1
> day */
>   host_tmax = 20 /*secs */
>   cleanup_threshold = 300 /*secs */
>   gexec = no
>   send_metadata_interval = 0 /*secs */
> }
>
> /*
>  * The cluster attributes specified will be used as part of the 
>  * tag that will wrap all hosts collected by this instance.
>  */
> cluster {
>   name = "ACDL Cluster"
>   owner = "ACDL"
>   latlong = "unspecified"
>   url = "unspecified"
> }
>
> /* The host section describes attributes of the host, like the location */
> host {
>   location = "ACDL"
> }
>
> /* Feel free to specify as many udp_send_channels as you like.  Gmond
>used to only support having a single channel */
> udp_send_channel {
>   #bind_hostname = yes # Highly recommended, soon to be default.
># This option tells gmond to use a source address
># that resolves to the machine's hostname.  Without
># this, the metrics may appear to come from any
># interface and the DNS names associated with
># those IPs will be used to create the RRDs.
>   mcast_join = 239.2.11.71
>   port = 8649
>   ttl = 1
> }
>
> /* You can specify as many udp_recv_channels as you like as well. */
> udp_recv_channel {
>   mcast_join = 239.2.11.71
>   port = 8649
>   bind = 239.2.11.71
>   retry_bind = true
> }
>
> /* You can specify as many tcp_accept_channels as you like to share
>an xml description of the state of the cluster */
> tcp_accept_channel {
>   port = 8649
> }
>
> /* Channel to receive sFlow datagrams */
> #udp_recv_channel {
> #  port = 6343
> #}
>
> /* Optional sFlow settings */
> #sflow {
> # udp_port = 6343
> # accept_vm_metrics = yes
> # accept_jvm_metrics = yes
> # multiple_jvm_instances = no
> # accept_http_metrics = yes
> # multiple_http_instances = no
> # accept_memcache_metrics = yes
> # multiple_memcache_instances = no
> #}
>
> /* Each metrics module that is referenced by gmond must be specified and
>loaded. If the module has been statically linked with gmond, it does
>not require a load path. However all dynamically loadable modules must
>include a load path. */
> modules {
>   module {
> name = "core_metrics"
>   }
>   module {
> name = "cpu_module"
> path = "modcpu.so"
>   }
>   module {
> name = "disk_module"
> path = "moddisk.so"
>   }
>   module {
> name = "load_module"
> path = "modload.so"
>   }
>   module {
> name = "mem_module"
> path = "modmem.so"
>   }
>   module {
> name = "net_module"
> path = "modnet.so"
>   }
>   module {
> name = "proc_module"
> path = "modproc.so"
>   }
>   module {
> name = "sys_module"
> path = "modsys.so"
>   }
> }
>
> /* The old internal 2.5.x metric array has been replaced by the following
>collection_group directives.  What follows is the default behavior for
>collecting and sending metrics that is as close to 2.5.x behavior as
>possible. */
>
> /* This collection group will cause a heartbeat (or beacon) to be sent every
>20 seconds.  In the heartbeat is the GMOND_STARTED data which expresses
>the age of the running gmond. */
> collection_group {
>   collect_once = yes
>   time_threshold = 20
>   metric {
> name = "heartbeat"
>   }
> }
>
> /* This collection group will send general info about this host every
>1200 secs.
>This information doesn't change between reboots and is only collected
>once. */
> collection_group {
>   collect_once = yes
>   time_threshold = 1200
>   metric {
> name = "cpu_num"
> title = "CPU Count"
>   }
>   metric {
> na

Re: [Ganglia-general] Unable to see nodes

2012-07-29 Thread Eric Dow

Thanks for the quick response.

I am configured to run in multicast (the default). My head node and all 
compute nodes are connected to a single router. I don't know if there is 
a way to make sure that multicast is set up properly. One thing that I 
checked was this (from the ganglia FAQ):


Confirm that UDP connections can be established between the gmetad and 
gmond(or gmond and other gmond's for multicast purposes) by running nc 
-u -l 8653 on the host in question, then echo "hello"|nc -u  
8653 from the gmetad or another gmond.


This worked. I saw "hello" appear on my compute node when I did this.

My gmetad.conf is the same as the standard file provided with ganglia. 
The only modification is the data_source line:


data_source "ACDL Cluster" localhost

Finally, here's my gmond.conf:

/* This configuration is as close to 2.5.x default behavior as possible
   The values closely match ./gmond/metric.h definitions in 2.5.x */
globals {
  daemonize = yes
  setuid = yes
  user = nobody
  debug_level = 0
  max_udp_msg_len = 1472
  mute = no
  deaf = no
  allow_extra_data = yes
  host_dmax = 86400 /*secs. Expires (removes from web interface) hosts 
in 1 day */

  host_tmax = 20 /*secs */
  cleanup_threshold = 300 /*secs */
  gexec = no
  send_metadata_interval = 0 /*secs */
}

/*
 * The cluster attributes specified will be used as part of the 
 * tag that will wrap all hosts collected by this instance.
 */
cluster {
  name = "ACDL Cluster"
  owner = "ACDL"
  latlong = "unspecified"
  url = "unspecified"
}

/* The host section describes attributes of the host, like the location */
host {
  location = "ACDL"
}

/* Feel free to specify as many udp_send_channels as you like. Gmond
   used to only support having a single channel */
udp_send_channel {
  #bind_hostname = yes # Highly recommended, soon to be default.
   # This option tells gmond to use a source address
   # that resolves to the machine's hostname.  Without
   # this, the metrics may appear to come from any
   # interface and the DNS names associated with
   # those IPs will be used to create the RRDs.
  mcast_join = 239.2.11.71
  port = 8649
  ttl = 1
}

/* You can specify as many udp_recv_channels as you like as well. */
udp_recv_channel {
  mcast_join = 239.2.11.71
  port = 8649
  bind = 239.2.11.71
  retry_bind = true
}

/* You can specify as many tcp_accept_channels as you like to share
   an xml description of the state of the cluster */
tcp_accept_channel {
  port = 8649
}

/* Channel to receive sFlow datagrams */
#udp_recv_channel {
#  port = 6343
#}

/* Optional sFlow settings */
#sflow {
# udp_port = 6343
# accept_vm_metrics = yes
# accept_jvm_metrics = yes
# multiple_jvm_instances = no
# accept_http_metrics = yes
# multiple_http_instances = no
# accept_memcache_metrics = yes
# multiple_memcache_instances = no
#}

/* Each metrics module that is referenced by gmond must be specified and
   loaded. If the module has been statically linked with gmond, it does
   not require a load path. However all dynamically loadable modules must
   include a load path. */
modules {
  module {
name = "core_metrics"
  }
  module {
name = "cpu_module"
path = "modcpu.so"
  }
  module {
name = "disk_module"
path = "moddisk.so"
  }
  module {
name = "load_module"
path = "modload.so"
  }
  module {
name = "mem_module"
path = "modmem.so"
  }
  module {
name = "net_module"
path = "modnet.so"
  }
  module {
name = "proc_module"
path = "modproc.so"
  }
  module {
name = "sys_module"
path = "modsys.so"
  }
}

/* The old internal 2.5.x metric array has been replaced by the following
   collection_group directives.  What follows is the default behavior for
   collecting and sending metrics that is as close to 2.5.x behavior as
   possible. */

/* This collection group will cause a heartbeat (or beacon) to be sent every
   20 seconds.  In the heartbeat is the GMOND_STARTED data which expresses
   the age of the running gmond. */
collection_group {
  collect_once = yes
  time_threshold = 20
  metric {
name = "heartbeat"
  }
}

/* This collection group will send general info about this host every
   1200 secs.
   This information doesn't change between reboots and is only collected
   once. */
collection_group {
  collect_once = yes
  time_threshold = 1200
  metric {
name = "cpu_num"
title = "CPU Count"
  }
  metric {
name = "cpu_speed"
title = "CPU Speed"
  }
  metric {
name = "mem_total"
title = "Memory Total"
  }
  /* Should this be here? Swap can be added/removed between reboots. */
  metric {
name = "swap_total"
title = "Swap Space Total"
  }
  metric {
name = "boottime"
title = "Last Boot Time"
  }
  metric {
name = "machine_type"
title = "Machine Type"
  }
  metric {
name = "os_name"
title = "Operating System"
  }
  metric {
name = "os_release"

Re: [Ganglia-general] Unable to see nodes

2012-07-28 Thread Aaron Nichols
On Sat, Jul 28, 2012 at 11:53 AM, Eric Dow  wrote:

>  Hi,
>
> I am running a cluster with Ubuntu 12.04 and I installed gangila 3.4.0
> from the tarball provided on the site.
>
> I installed gmetad and gmond onto my head node, and gmond onto each of the
> compute nodes. I am able to see my head node just fine from the ganglia
> interface, and can view the various metrics for the head node. However, I
> cannot see any of the nodes, even though I am running gmond on all of them.
> The gmond.conf file is identical on the head node and all the nodes.
>

This sounds like you are running multicast - can you confirm that you're
running in multicast mode? If you are running in unicast mode then the
config for all the nodes probably shouldn't be identical. If you aren't
sure then yes, you should post your gmond.conf.

It would also be helpful to see what you have configured in gmetad for your
clusters.
--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Unable to see nodes

2012-07-28 Thread Bernard Li
Hi Eric:

By default gmond is configured to run in multicast mode.  If you can
only see your head node, chances are multicast does not work in your
environment.

When you nc to port 8651, you should see XML of not just your head
node, but other hosts and their corresponding metrics as well.

And 8651 is the correct port for gmetad non-interactive port, not sure
who put 8650 there because that's wrong.

Try setting up unicast and see if that works.

Cheers,

Bernard

On Sat, Jul 28, 2012 at 10:53 AM, Eric Dow  wrote:
> Hi,
>
> I am running a cluster with Ubuntu 12.04 and I installed gangila 3.4.0 from
> the tarball provided on the site.
>
> I installed gmetad and gmond onto my head node, and gmond onto each of the
> compute nodes. I am able to see my head node just fine from the ganglia
> interface, and can view the various metrics for the head node. However, I
> cannot see any of the nodes, even though I am running gmond on all of them.
> The gmond.conf file is identical on the head node and all the nodes.
>
> I tried all the debug steps found here
> http://sourceforge.net/apps/trac/ganglia/wiki/FAQ and everything works as it
> should. The only difference was that I need to use nc  8651 rather
> than 8650 to see the XML data from the head node. Other than, everything
> seems to be working.
>
> I can post my gmond.conf file if that will help, or any other info for that
> matter. I think ganglia is amazing and would really like to get it up and
> running. Any suggestions would be appreciated.
>
> Thanks
>
> --
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> ___
> Ganglia-general mailing list
> Ganglia-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ganglia-general
>

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


[Ganglia-general] Unable to see nodes

2012-07-28 Thread Eric Dow

Hi,

I am running a cluster with Ubuntu 12.04 and I installed gangila 3.4.0 
from the tarball provided on the site.


I installed gmetad and gmond onto my head node, and gmond onto each of 
the compute nodes. I am able to see my head node just fine from the 
ganglia interface, and can view the various metrics for the head node. 
However, I cannot see any of the nodes, even though I am running gmond 
on all of them. The gmond.conf file is identical on the head node and 
all the nodes.


I tried all the debug steps found here 
http://sourceforge.net/apps/trac/ganglia/wiki/FAQ and everything works 
as it should. The only difference was that I need to use nc  
8651 rather than 8650 to see the XML data from the head node. Other 
than, everything seems to be working.


I can post my gmond.conf file if that will help, or any other info for 
that matter. I think ganglia is amazing and would really like to get it 
up and running. Any suggestions would be appreciated.


Thanks
--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general