Jesse,

[Note - I changed groups to ganglia-general since this isn't
a developers issue - just a silly user issue].

I must admit that networking is one of my weak areas but here
is are some relevant sections of output from netstat and lsof
on the master node.

Netstat:

[root@test1 ~]# netstat | more
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 192.168.1.250:53687 <http://192.168.1.250:53687> ord08s06-in-f15.1e100:https ESTABLISHED tcp 0 0 10.1.0.250:shell n0001:48199 ESTABLISHED tcp 0 0 192.168.1.250:41461 <http://192.168.1.250:41461> ord08s08-in-f21.1e100:https ESTABLISHED tcp 0 0 192.168.1.250:41476 <http://192.168.1.250:41476> ord08s08-in-f21.1e100:https ESTABLISHED tcp 0 0 10.1.0.250:8649 <http://10.1.0.250:8649> 10.1.0.250:49899 <http://10.1.0.250:49899> TIME_WAIT tcp 0 0 10.1.0.250:nfs n0001:imaps ESTABLISHED tcp 0 0 192.168.1.250:50191 <http://192.168.1.250:50191> den03s05-in-f16.1e100:https TIME_WAIT tcp 0 0 10.1.0.250:8649 <http://10.1.0.250:8649> 10.1.0.250:49905 <http://10.1.0.250:49905> TIME_WAIT tcp 0 0 192.168.1.250:52133 <http://192.168.1.250:52133> ord08s07-in-f14.1e100:https ESTABLISHED tcp 0 0 192.168.1.250:50500 <http://192.168.1.250:50500> ord08s09-in-f22.1e100:https ESTABLISHED tcp 0 0 10.1.0.250:8649 <http://10.1.0.250:8649> 10.1.0.250:49904 <http://10.1.0.250:49904> TIME_WAIT tcp 0 0 10.1.0.250:8649 <http://10.1.0.250:8649> 10.1.0.250:49903 <http://10.1.0.250:49903> TIME_WAIT tcp 0 0 10.1.0.250:43479 <http://10.1.0.250:43479> n0001:ssh ESTABLISHED tcp 0 0 192.168.1.250:52134 <http://192.168.1.250:52134> ord08s07-in-f14.1e100:https TIME_WAIT tcp 0 0 192.168.1.250:53686 <http://192.168.1.250:53686> ord08s06-in-f15.1e100:https ESTABLISHED udp 0 0 192.168.1.250:52035 <http://192.168.1.250:52035> 239.2.11.71:8649 <http://239.2.11.71:8649> ESTABLISHED


lsof output:
[root@test1 ~]# lsof | grep gmond
gmond 1948 nobody cwd DIR 8,2 4096 2 / gmond 1948 nobody rtd DIR 8,2 4096 2 / gmond 1948 nobody txt REG 8,2 111475 2527491 /usr/sbin/gmond gmond 1948 nobody mem REG 8,2 161084 1318670 /lib/libexpat.so.1.5.2 gmond 1948 nobody mem REG 8,2 43916 2527474 /usr/lib/libconfuse.so.0.0.0 gmond 1948 nobody mem REG 8,2 67920 3165062 /usr/lib/ganglia/modcpu.so gmond 1948 nobody mem REG 8,2 67461 3165065 /usr/lib/ganglia/modload.so gmond 1948 nobody mem REG 8,2 131044 1318655 /lib/libpthread-2.12.so <http://libpthread-2.12.so> gmond 1948 nobody mem REG 8,2 1876580 1318624 /lib/libc-2.12.so <http://libc-2.12.so> gmond 1948 nobody mem REG 8,2 113908 1318641 /lib/libnsl-2.12.so <http://libnsl-2.12.so> gmond 1948 nobody mem REG 8,2 67515 3165068 /usr/lib/ganglia/modnet.so gmond 1948 nobody mem REG 8,2 190604 1318668 /lib/libpcre.so.0.0.1 gmond 1948 nobody mem REG 8,2 142480 1318667 /lib/ld-2.12.so <http://ld-2.12.so> gmond 1948 nobody mem REG 8,2 67469 3165063 /usr/lib/ganglia/moddisk.so gmond 1948 nobody mem REG 8,2 58704 1318647 /lib/libnss_files-2.12.so <http://libnss_files-2.12.so> gmond 1948 nobody mem REG 8,2 67885 3165066 /usr/lib/ganglia/modmem.so gmond 1948 nobody mem REG 8,2 184012 2527486 /usr/lib/libapr-1.so.0.3.9 gmond 1948 nobody mem REG 8,2 67760 3165070 /usr/lib/ganglia/modsys.so gmond 1948 nobody mem REG 8,2 38376 1318635 /lib/libcrypt-2.12.so <http://libcrypt-2.12.so> gmond 1948 nobody mem REG 8,2 103384 1318657 /lib/libresolv-2.12.so <http://libresolv-2.12.so> gmond 1948 nobody mem REG 8,2 15200 1318672 /lib/libuuid.so.1.3.0 gmond 1948 nobody mem REG 8,2 25592 1318645 /lib/libnss_dns-2.12.so <http://libnss_dns-2.12.so> gmond 1948 nobody mem REG 8,2 300676 1318666 /lib/libfreebl3.so gmond 1948 nobody mem REG 8,2 17892 1318637 /lib/libdl-2.12.so <http://libdl-2.12.so> gmond 1948 nobody mem REG 8,2 24382 3165067 /usr/lib/ganglia/modmulticpu.so gmond 1948 nobody mem REG 8,2 202573 2527488 /usr/lib/libganglia-3.4.0.so.0.0.0 gmond 1948 nobody mem REG 8,2 67397 3165069 /usr/lib/ganglia/modproc.so gmond 1948 nobody mem REG 8,2 26048 2646130 /usr/lib/gconv/gconv-modules.cache gmond 1948 nobody mem REG 8,2 99158704 2520564 /usr/lib/locale/locale-archive gmond 1948 nobody 0r CHR 1,3 0t0 3652 /dev/null gmond 1948 nobody 1w CHR 1,3 0t0 3652 /dev/null gmond 1948 nobody 2w CHR 1,3 0t0 3652 /dev/null gmond 1948 nobody 3u REG 0,9 0 3650 anon_inode gmond 1948 nobody 4u IPv4 11948 0t0 UDP 239.2.11.71:8649 <http://239.2.11.71:8649> gmond 1948 nobody 5u IPv4 11952 0t0 TCP *:8649 (LISTEN) gmond 1948 nobody 6u IPv4 11955 0t0 UDP 192.168.1.250:52035->239.2.11.71:8649 <http://239.2.11.71:8649>



To me it looks correct except I'm worried about that list bit from
lsof with 192.168.1.250 pointing to multicast address. But again,
I'm not entirely sure what I'm looking at :)

Thanks!

Jeff



What does netstat or lsof say about gmond interface binding?

(in haste, sorry for the brevity)

On Sat, Jul 21, 2012 at 10:50 AM, Jeff Layton<layto...@att.net>  wrote:
Good morning,

Apologies for the simple question. I've got a simple cluster
with a master node and one compute node. I installed the
latest Ganglia on the master node (3.4.0) - libganglia,
ganglia-gmond, ganglia-metad, ganglia-web (3.5.1). I can
use ganglia-web to "see" the master node with no problems.

I'm using Warewulf for the compute node and I installed
libganglia and ganglia-gmond in the VNFS and rebooted the
node. When the node comes back up, I tested ganglia via
"gstat -all" on the compute node and it seems to work
correctly.

However, ganglia-web doesn't display anything for the compute
node even though I've added it to the data_source line in
gmetad.conf:


data_source "my cluster" 10.1.0.250 10.1.0.1:8649


I also checked if the master node could access the data from
the compute node via "gstat -all" and I only get data from the
master node (i.e. no compute node).

I checked the Ethernet interfaces on both nodes and both
are listed as MULTICAST. iptabbles on the master node and the
compute node are off and the services are not running (checked
that 3 times).

There is a simple Netgear GigE switch between the nodes
(unmanaged). I don't think that's a problem.

One thing I think is interesting is that the master node has
eth0 with an IP of 192.168.1.250 which is to the outside world
and eth1 is 10.1.0.250 which is the cluster network. The compute
node has eth0 as 10.1.0.1. But when I go to http://localhost/ganglia
I can only access the master node as 192.168.1.250, not
10.1.0.250 (i.e. the list of nodes is only 192.168.1.250).

Otherwise i can login into the compute node, ping it, etc. It works
fine but somehow I'm missing a configuration piece for ganglia.

TIA!

Jeff



------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Ganglia-developers mailing list
ganglia-develop...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers




------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to