[Ganglia-general] Error 1 sending the modular data

2012-08-13 Thread Chris Burroughs
So for background, my original problem is that load_one will not be
updated by gmetad for a period of over 600 seconds (an arbitrary timeout
signifying that gmond/the host is probably down).  It occurs a few
times/day across hundreds of hosts, and often occurs near midnight
localtime. This *appears* to correlate with messages along the lines of
the following (I didn't see anything else suspicious in syslog):

Aug 12 23:53:26 adq82 /usr/sbin/gmond[28637]: Error 1 sending the
modular data for entropy_avail#012
Aug 12 23:59:00 adq82 /usr/sbin/gmond[28637]: Error 1 sending the
modular data for mem_cached#012
Aug 12 23:59:10 adq82 /usr/sbin/gmond[28637]: Error 1 sending the
modular data for diskstat_sda_write_bytes_per_sec#012


Since it occurs infrequently running in debug mode on every server is
not a good option.  But false positives that keep people from sleeping
are bad. First of all, does a correlation between these messages and all
metrics not reporting for a period of time make sense?  If not what
should I be looking at?

Second, if this is anything other than a red herring, I'm totally
confused how to debug it. Even if debug was enabled the debug message
[1] does doesn't seem to include any additional information.  Also 1
seems like it could be two different errors [2] [3].

System information:
 - gmond 3.4.0
 - centos6
 - using send channels

[1]
https://github.com/ganglia/monitor-core/blob/release/3.4/gmond/gmond.c#L2735
[2]
https://github.com/ganglia/monitor-core/blob/release/3.4/lib/libgmond.c#L575
[3]
https://github.com/ganglia/monitor-core/blob/release/3.4/lib/libgmond.c#L517


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


[Ganglia-general] gmetad xml generation time

2012-08-13 Thread Chris Burroughs
I have a process that periodically polls gmetad (builds models of some
metrics, alerts if things don't look like).  To reduce the number of
variables I set up a dedicated gmetad on the same host as the poller and
set write_rrds off.  Unless I'm missing something the only thing it
should be doing is polling gmond, and responding to my polls.

Polling localhost currently has a mean around 2000 ms, with a stddev
around 30.  I've seen higher outliers, but right now I'm just trying to
figure out of that's normal.  2000 ms to send a request over the
loopback interface *seems* like a lot.   But I don't really have
anything to compare it to.  Is that normal?


Info:
 - ganglia 3.4.0
 - centos5 [1]
 - xml size: 25 MiB
 - hosts:  300
 - metrics:  9k
 - unique host-metric pairs:  80k

[1] Same setup in another DC with centos6, so I don't think it's that.

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


[Ganglia-general] CPU Steal not logging yet enabled

2012-08-13 Thread Zuhaib Siddique
Hello,

I have enabled CPU Steal monitoring on my AWS EC2 servers by adding the
following to my gmond.conf:

  metric {
name = cpu_steal
value_threshold = 1.0
title = CPU steal
  }

but I am not getting any CPU steal information.  Any idea what I am
missing? Version is:

 Ganglia Web Frontend version 3.4.2
Ganglia Web Backend *(gmetad)* version 3.3.7

Thanks
Zuhaib
--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general