Jeff,

Gmetad doesn't exactly treat the nodes in the data_source line the way you're 
thinking it does. Gmond assumes all hosts have a full set of data and only uses 
the second one if the first can't be contacted. If you want the two nodes to be 
in the same cluster, you have to configure the gmond on each host to join the 
same multicast group (or setup unicast).

On both nodes, you should have a section of your gmond that looks like this:

udp_send_channel {
  mcast_join = 239.2.11.71
  port = 9649
  ttl = 64
}

udp_recv_channel {
  mcast_join = 239.2.11.71
  port = 9649
  bind = 239.2.11.71
}

Make sure your switches support multicast. If not, use unicast:

#on both hosts
udp_send_channel {
  host = 10.1.0.250
  port = 9649
}
  
#on "master" node
udp_recv_channel {
  port = 9649
}

send_metadata_interval 30

If you're going to scale, unicast works better anyway.

Hope that helps,

Jonah


On Jul 21, 2012, at 8:13 AM, Jeff Layton wrote:

> Jesse,
> 
> [Note - I changed groups to ganglia-general since this isn't
> a developers issue - just a silly user issue].
> 
> I must admit that networking is one of my weak areas but here
> is are some relevant sections of output from netstat and lsof
> on the master node.
> 
> Netstat:
> 
> [root@test1 ~]# netstat | more
> Active Internet connections (w/o servers)
> Proto Recv-Q Send-Q Local Address               Foreign Address             
> State      
> tcp        0      0 192.168.1.250:53687         ord08s06-in-f15.1e100:https 
> ESTABLISHED 
> tcp        0      0 10.1.0.250:shell            n0001:48199                 
> ESTABLISHED 
> tcp        0      0 192.168.1.250:41461         ord08s08-in-f21.1e100:https 
> ESTABLISHED 
> tcp        0      0 192.168.1.250:41476         ord08s08-in-f21.1e100:https 
> ESTABLISHED 
> tcp        0      0 10.1.0.250:8649             10.1.0.250:49899            
> TIME_WAIT   
> tcp        0      0 10.1.0.250:nfs              n0001:imaps                 
> ESTABLISHED 
> tcp        0      0 192.168.1.250:50191         den03s05-in-f16.1e100:https 
> TIME_WAIT   
> tcp        0      0 10.1.0.250:8649             10.1.0.250:49905            
> TIME_WAIT   
> tcp        0      0 192.168.1.250:52133         ord08s07-in-f14.1e100:https 
> ESTABLISHED 
> tcp        0      0 192.168.1.250:50500         ord08s09-in-f22.1e100:https 
> ESTABLISHED 
> tcp        0      0 10.1.0.250:8649             10.1.0.250:49904            
> TIME_WAIT   
> tcp        0      0 10.1.0.250:8649             10.1.0.250:49903            
> TIME_WAIT   
> tcp        0      0 10.1.0.250:43479            n0001:ssh                   
> ESTABLISHED 
> tcp        0      0 192.168.1.250:52134         ord08s07-in-f14.1e100:https 
> TIME_WAIT   
> tcp        0      0 192.168.1.250:53686         ord08s06-in-f15.1e100:https 
> ESTABLISHED 
> udp        0      0 192.168.1.250:52035         239.2.11.71:8649            
> ESTABLISHED 
> 
> 
> lsof output:
> [root@test1 ~]# lsof | grep gmond
> gmond      1948      nobody  cwd       DIR                8,2      4096       
>    2 /
> gmond      1948      nobody  rtd       DIR                8,2      4096       
>    2 /
> gmond      1948      nobody  txt       REG                8,2    111475    
> 2527491 /usr/sbin/gmond
> gmond      1948      nobody  mem       REG                8,2    161084    
> 1318670 /lib/libexpat.so.1.5.2
> gmond      1948      nobody  mem       REG                8,2     43916    
> 2527474 /usr/lib/libconfuse.so.0.0.0
> gmond      1948      nobody  mem       REG                8,2     67920    
> 3165062 /usr/lib/ganglia/modcpu.so
> gmond      1948      nobody  mem       REG                8,2     67461    
> 3165065 /usr/lib/ganglia/modload.so
> gmond      1948      nobody  mem       REG                8,2    131044    
> 1318655 /lib/libpthread-2.12.so
> gmond      1948      nobody  mem       REG                8,2   1876580    
> 1318624 /lib/libc-2.12.so
> gmond      1948      nobody  mem       REG                8,2    113908    
> 1318641 /lib/libnsl-2.12.so
> gmond      1948      nobody  mem       REG                8,2     67515    
> 3165068 /usr/lib/ganglia/modnet.so
> gmond      1948      nobody  mem       REG                8,2    190604    
> 1318668 /lib/libpcre.so.0.0.1
> gmond      1948      nobody  mem       REG                8,2    142480    
> 1318667 /lib/ld-2.12.so
> gmond      1948      nobody  mem       REG                8,2     67469    
> 3165063 /usr/lib/ganglia/moddisk.so
> gmond      1948      nobody  mem       REG                8,2     58704    
> 1318647 /lib/libnss_files-2.12.so
> gmond      1948      nobody  mem       REG                8,2     67885    
> 3165066 /usr/lib/ganglia/modmem.so
> gmond      1948      nobody  mem       REG                8,2    184012    
> 2527486 /usr/lib/libapr-1.so.0.3.9
> gmond      1948      nobody  mem       REG                8,2     67760    
> 3165070 /usr/lib/ganglia/modsys.so
> gmond      1948      nobody  mem       REG                8,2     38376    
> 1318635 /lib/libcrypt-2.12.so
> gmond      1948      nobody  mem       REG                8,2    103384    
> 1318657 /lib/libresolv-2.12.so
> gmond      1948      nobody  mem       REG                8,2     15200    
> 1318672 /lib/libuuid.so.1.3.0
> gmond      1948      nobody  mem       REG                8,2     25592    
> 1318645 /lib/libnss_dns-2.12.so
> gmond      1948      nobody  mem       REG                8,2    300676    
> 1318666 /lib/libfreebl3.so
> gmond      1948      nobody  mem       REG                8,2     17892    
> 1318637 /lib/libdl-2.12.so
> gmond      1948      nobody  mem       REG                8,2     24382    
> 3165067 /usr/lib/ganglia/modmulticpu.so
> gmond      1948      nobody  mem       REG                8,2    202573    
> 2527488 /usr/lib/libganglia-3.4.0.so.0.0.0
> gmond      1948      nobody  mem       REG                8,2     67397    
> 3165069 /usr/lib/ganglia/modproc.so
> gmond      1948      nobody  mem       REG                8,2     26048    
> 2646130 /usr/lib/gconv/gconv-modules.cache
> gmond      1948      nobody  mem       REG                8,2  99158704    
> 2520564 /usr/lib/locale/locale-archive
> gmond      1948      nobody    0r      CHR                1,3       0t0       
> 3652 /dev/null
> gmond      1948      nobody    1w      CHR                1,3       0t0       
> 3652 /dev/null
> gmond      1948      nobody    2w      CHR                1,3       0t0       
> 3652 /dev/null
> gmond      1948      nobody    3u      REG                0,9         0       
> 3650 anon_inode
> gmond      1948      nobody    4u     IPv4              11948       0t0       
>  UDP 239.2.11.71:8649 
> gmond      1948      nobody    5u     IPv4              11952       0t0       
>  TCP *:8649 (LISTEN)
> gmond      1948      nobody    6u     IPv4              11955       0t0       
>  UDP 192.168.1.250:52035->239.2.11.71:8649
> 
> 
> 
> To me it looks correct except I'm worried about that list bit from
> lsof with 192.168.1.250 pointing to multicast address. But again,
> I'm not entirely sure what I'm looking at :)
> 
> Thanks!
> 
> Jeff
> 
> 
> 
>> What does netstat or lsof say about gmond interface binding?
>> 
>> (in haste, sorry for the brevity)
>> 
>> On Sat, Jul 21, 2012 at 10:50 AM, Jeff Layton <layto...@att.net> wrote:
>>> Good morning,
>>> 
>>> Apologies for the simple question. I've got a simple cluster
>>> with a master node and one compute node. I installed the
>>> latest Ganglia on the master node (3.4.0) - libganglia,
>>> ganglia-gmond, ganglia-metad, ganglia-web (3.5.1). I can
>>> use ganglia-web to "see" the master node with no problems.
>>> 
>>> I'm using Warewulf for the compute node and I installed
>>> libganglia and ganglia-gmond in the VNFS and rebooted the
>>> node. When the node comes back up, I tested ganglia via
>>> "gstat -all" on the compute node and it seems to work
>>> correctly.
>>> 
>>> However, ganglia-web doesn't display anything for the compute
>>> node even though I've added it to the data_source line in
>>> gmetad.conf:
>>> 
>>> 
>>> data_source "my cluster" 10.1.0.250 10.1.0.1:8649
>>> 
>>> 
>>> I also checked if the master node could access the data from
>>> the compute node via "gstat -all" and I only get data from the
>>> master node (i.e. no compute node).
>>> 
>>> I checked the Ethernet interfaces on both nodes and both
>>> are listed as MULTICAST. iptabbles on the master node and the
>>> compute node are off and the services are not running (checked
>>> that 3 times).
>>> 
>>> There is a simple Netgear GigE switch between the nodes
>>> (unmanaged). I don't think that's a problem.
>>> 
>>> One thing I think is interesting is that the master node has
>>> eth0 with an IP of 192.168.1.250 which is to the outside world
>>> and eth1 is 10.1.0.250 which is the cluster network. The compute
>>> node has eth0 as 10.1.0.1. But when I go to http://localhost/ganglia
>>> I can only access the master node as 192.168.1.250, not
>>> 10.1.0.250 (i.e. the list of nodes is only 192.168.1.250).
>>> 
>>> Otherwise i can login into the compute node, ping it, etc. It works
>>> fine but somehow I'm missing a configuration piece for ganglia.
>>> 
>>> TIA!
>>> 
>>> Jeff
>>> 
>>> 
>>> 
>>> ------------------------------------------------------------------------------
>>> Live Security Virtual Conference
>>> Exclusive live event will cover all the ways today's security and
>>> threat landscape has changed and how IT managers can respond. Discussions
>>> will include endpoint security, mobile security and the latest in malware
>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>> _______________________________________________
>>> Ganglia-developers mailing list
>>> ganglia-develop...@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/ganglia-developers
>>> 
>> 
>> 
> 
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and 
> threat landscape has changed and how IT managers can respond. Discussions 
> will include endpoint security, mobile security and the latest in malware 
> threats. 
> http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/_______________________________________________
> Ganglia-general mailing list
> Ganglia-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ganglia-general

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to