Jeff,
Gmetad doesn't exactly treat the nodes in the data_source line the way you're
thinking it does. Gmond assumes all hosts have a full set of data and only uses
the second one if the first can't be contacted. If you want the two nodes to be
in the same cluster, you have to configure the gmond on each host to join the
same multicast group (or setup unicast).
On both nodes, you should have a section of your gmond that looks like this:
udp_send_channel {
mcast_join = 239.2.11.71
port = 9649
ttl = 64
}
udp_recv_channel {
mcast_join = 239.2.11.71
port = 9649
bind = 239.2.11.71
}
Make sure your switches support multicast. If not, use unicast:
#on both hosts
udp_send_channel {
host = 10.1.0.250
port = 9649
}
#on "master" node
udp_recv_channel {
port = 9649
}
send_metadata_interval 30
If you're going to scale, unicast works better anyway.
Hope that helps,
Jonah
On Jul 21, 2012, at 8:13 AM, Jeff Layton wrote:
> Jesse,
>
> [Note - I changed groups to ganglia-general since this isn't
> a developers issue - just a silly user issue].
>
> I must admit that networking is one of my weak areas but here
> is are some relevant sections of output from netstat and lsof
> on the master node.
>
> Netstat:
>
> [root@test1 ~]# netstat | more
> Active Internet connections (w/o servers)
> Proto Recv-Q Send-Q Local Address Foreign Address
> State
> tcp 0 0 192.168.1.250:53687 ord08s06-in-f15.1e100:https
> ESTABLISHED
> tcp 0 0 10.1.0.250:shell n0001:48199
> ESTABLISHED
> tcp 0 0 192.168.1.250:41461 ord08s08-in-f21.1e100:https
> ESTABLISHED
> tcp 0 0 192.168.1.250:41476 ord08s08-in-f21.1e100:https
> ESTABLISHED
> tcp 0 0 10.1.0.250:8649 10.1.0.250:49899
> TIME_WAIT
> tcp 0 0 10.1.0.250:nfs n0001:imaps
> ESTABLISHED
> tcp 0 0 192.168.1.250:50191 den03s05-in-f16.1e100:https
> TIME_WAIT
> tcp 0 0 10.1.0.250:8649 10.1.0.250:49905
> TIME_WAIT
> tcp 0 0 192.168.1.250:52133 ord08s07-in-f14.1e100:https
> ESTABLISHED
> tcp 0 0 192.168.1.250:50500 ord08s09-in-f22.1e100:https
> ESTABLISHED
> tcp 0 0 10.1.0.250:8649 10.1.0.250:49904
> TIME_WAIT
> tcp 0 0 10.1.0.250:8649 10.1.0.250:49903
> TIME_WAIT
> tcp 0 0 10.1.0.250:43479 n0001:ssh
> ESTABLISHED
> tcp 0 0 192.168.1.250:52134 ord08s07-in-f14.1e100:https
> TIME_WAIT
> tcp 0 0 192.168.1.250:53686 ord08s06-in-f15.1e100:https
> ESTABLISHED
> udp 0 0 192.168.1.250:52035 239.2.11.71:8649
> ESTABLISHED
>
>
> lsof output:
> [root@test1 ~]# lsof | grep gmond
> gmond 1948 nobody cwd DIR 8,2 4096
> 2 /
> gmond 1948 nobody rtd DIR 8,2 4096
> 2 /
> gmond 1948 nobody txt REG 8,2 111475
> 2527491 /usr/sbin/gmond
> gmond 1948 nobody mem REG 8,2 161084
> 1318670 /lib/libexpat.so.1.5.2
> gmond 1948 nobody mem REG 8,2 43916
> 2527474 /usr/lib/libconfuse.so.0.0.0
> gmond 1948 nobody mem REG 8,2 67920
> 3165062 /usr/lib/ganglia/modcpu.so
> gmond 1948 nobody mem REG 8,2 67461
> 3165065 /usr/lib/ganglia/modload.so
> gmond 1948 nobody mem REG 8,2 131044
> 1318655 /lib/libpthread-2.12.so
> gmond 1948 nobody mem REG 8,2 1876580
> 1318624 /lib/libc-2.12.so
> gmond 1948 nobody mem REG 8,2 113908
> 1318641 /lib/libnsl-2.12.so
> gmond 1948 nobody mem REG 8,2 67515
> 3165068 /usr/lib/ganglia/modnet.so
> gmond 1948 nobody mem REG 8,2 190604
> 1318668 /lib/libpcre.so.0.0.1
> gmond 1948 nobody mem REG 8,2 142480
> 1318667 /lib/ld-2.12.so
> gmond 1948 nobody mem REG 8,2 67469
> 3165063 /usr/lib/ganglia/moddisk.so
> gmond 1948 nobody mem REG 8,2 58704
> 1318647 /lib/libnss_files-2.12.so
> gmond 1948 nobody mem REG 8,2 67885
> 3165066 /usr/lib/ganglia/modmem.so
> gmond 1948 nobody mem REG 8,2 184012
> 2527486 /usr/lib/libapr-1.so.0.3.9
> gmond 1948 nobody mem REG 8,2 67760
> 3165070 /usr/lib/ganglia/modsys.so
> gmond 1948 nobody mem REG 8,2 38376
> 1318635 /lib/libcrypt-2.12.so
> gmond 1948 nobody mem REG 8,2 103384
> 1318657 /lib/libresolv-2.12.so
> gmond 1948 nobody mem REG 8,2 15200
> 1318672 /lib/libuuid.so.1.3.0
> gmond 1948 nobody mem REG 8,2 25592
> 1318645 /lib/libnss_dns-2.12.so
> gmond 1948 nobody mem REG 8,2 300676
> 1318666 /lib/libfreebl3.so
> gmond 1948 nobody mem REG 8,2 17892
> 1318637 /lib/libdl-2.12.so
> gmond 1948 nobody mem REG 8,2 24382
> 3165067 /usr/lib/ganglia/modmulticpu.so
> gmond 1948 nobody mem REG 8,2 202573
> 2527488 /usr/lib/libganglia-3.4.0.so.0.0.0
> gmond 1948 nobody mem REG 8,2 67397
> 3165069 /usr/lib/ganglia/modproc.so
> gmond 1948 nobody mem REG 8,2 26048
> 2646130 /usr/lib/gconv/gconv-modules.cache
> gmond 1948 nobody mem REG 8,2 99158704
> 2520564 /usr/lib/locale/locale-archive
> gmond 1948 nobody 0r CHR 1,3 0t0
> 3652 /dev/null
> gmond 1948 nobody 1w CHR 1,3 0t0
> 3652 /dev/null
> gmond 1948 nobody 2w CHR 1,3 0t0
> 3652 /dev/null
> gmond 1948 nobody 3u REG 0,9 0
> 3650 anon_inode
> gmond 1948 nobody 4u IPv4 11948 0t0
> UDP 239.2.11.71:8649
> gmond 1948 nobody 5u IPv4 11952 0t0
> TCP *:8649 (LISTEN)
> gmond 1948 nobody 6u IPv4 11955 0t0
> UDP 192.168.1.250:52035->239.2.11.71:8649
>
>
>
> To me it looks correct except I'm worried about that list bit from
> lsof with 192.168.1.250 pointing to multicast address. But again,
> I'm not entirely sure what I'm looking at :)
>
> Thanks!
>
> Jeff
>
>
>
>> What does netstat or lsof say about gmond interface binding?
>>
>> (in haste, sorry for the brevity)
>>
>> On Sat, Jul 21, 2012 at 10:50 AM, Jeff Layton <layto...@att.net> wrote:
>>> Good morning,
>>>
>>> Apologies for the simple question. I've got a simple cluster
>>> with a master node and one compute node. I installed the
>>> latest Ganglia on the master node (3.4.0) - libganglia,
>>> ganglia-gmond, ganglia-metad, ganglia-web (3.5.1). I can
>>> use ganglia-web to "see" the master node with no problems.
>>>
>>> I'm using Warewulf for the compute node and I installed
>>> libganglia and ganglia-gmond in the VNFS and rebooted the
>>> node. When the node comes back up, I tested ganglia via
>>> "gstat -all" on the compute node and it seems to work
>>> correctly.
>>>
>>> However, ganglia-web doesn't display anything for the compute
>>> node even though I've added it to the data_source line in
>>> gmetad.conf:
>>>
>>>
>>> data_source "my cluster" 10.1.0.250 10.1.0.1:8649
>>>
>>>
>>> I also checked if the master node could access the data from
>>> the compute node via "gstat -all" and I only get data from the
>>> master node (i.e. no compute node).
>>>
>>> I checked the Ethernet interfaces on both nodes and both
>>> are listed as MULTICAST. iptabbles on the master node and the
>>> compute node are off and the services are not running (checked
>>> that 3 times).
>>>
>>> There is a simple Netgear GigE switch between the nodes
>>> (unmanaged). I don't think that's a problem.
>>>
>>> One thing I think is interesting is that the master node has
>>> eth0 with an IP of 192.168.1.250 which is to the outside world
>>> and eth1 is 10.1.0.250 which is the cluster network. The compute
>>> node has eth0 as 10.1.0.1. But when I go to http://localhost/ganglia
>>> I can only access the master node as 192.168.1.250, not
>>> 10.1.0.250 (i.e. the list of nodes is only 192.168.1.250).
>>>
>>> Otherwise i can login into the compute node, ping it, etc. It works
>>> fine but somehow I'm missing a configuration piece for ganglia.
>>>
>>> TIA!
>>>
>>> Jeff
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Live Security Virtual Conference
>>> Exclusive live event will cover all the ways today's security and
>>> threat landscape has changed and how IT managers can respond. Discussions
>>> will include endpoint security, mobile security and the latest in malware
>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>> _______________________________________________
>>> Ganglia-developers mailing list
>>> ganglia-develop...@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/ganglia-developers
>>>
>>
>>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats.
> http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/_______________________________________________
> Ganglia-general mailing list
> Ganglia-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ganglia-general
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general