[Ganglia-developers] send_metadata_interval

2011-01-07 Thread Bernard Li
Hi all:

Since the release of Ganglia 3.1, we have introduced the new
configuration option send_metadata_interval in gmond.conf.  This is
set to 0 by default and the user must set this to a sane number if
using unicast otherwise if gmonds are restarted, hosts may appear to
be offline (this is documented in the release notes).  A bug has
already been filed:

http://bugzilla.ganglia.info/cgi-bin/bugzilla/show_bug.cgi?id=242

We recently have a lot of users having this issue and Vladimir
recommend that we just set a sane number as the default and be done
with it, since we end up spending a lot of time on IRC/mailing-list to
solve the same problem over and over again.

Since there have been some commits to the 3.1 branch since tagging
3.1.7, I propose we just copy 3.1.7 tag, update the send_meta_data
interval in the configuration file and release that as 3.1.8.

This is not the normal procedure for making a release, so I'd like to
get some feedback from other developers.

BTW I am thinking of setting send_metadata_interval to 30 seconds.
Also, does anybody know if this setting affects multicast setups in
any way?

Thanks,

Bernard

--
Gaining the trust of online customers is vital for the success of any company
that requires sensitive data to be transmitted over the Web.   Learn how to 
best implement a security strategy that keeps consumers' information secure 
and instills the confidence they need to proceed with transactions.
http://p.sf.net/sfu/oracle-sfdevnl 
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] send_metadata_interval

2011-01-07 Thread Jesse Becker
On Fri, Jan 7, 2011 at 15:25, Bernard Li bern...@vanhpc.org wrote:
 Hi all:

 Since the release of Ganglia 3.1, we have introduced the new
 configuration option send_metadata_interval in gmond.conf.  This is
 set to 0 by default and the user must set this to a sane number if
 using unicast otherwise if gmonds are restarted, hosts may appear to
 be offline (this is documented in the release notes).  A bug has
 already been filed:

 http://bugzilla.ganglia.info/cgi-bin/bugzilla/show_bug.cgi?id=242

 We recently have a lot of users having this issue and Vladimir
 recommend that we just set a sane number as the default and be done
 with it, since we end up spending a lot of time on IRC/mailing-list to
 solve the same problem over and over again.

 Since there have been some commits to the 3.1 branch since tagging
 3.1.7, I propose we just copy 3.1.7 tag, update the send_meta_data
 interval in the configuration file and release that as 3.1.8.

 This is not the normal procedure for making a release, so I'd like to
 get some feedback from other developers.

 BTW I am thinking of setting send_metadata_interval to 30 seconds.
 Also, does anybody know if this setting affects multicast setups in
 any way?

I think that it's fine to set this to a non-zero value, but I wonder
if 30 seconds is too high.  I did a quick set of checking on the
actual packets that are sent--and specifically the metadata packets.
I haven't been able to really delve into the code to figure exactly
what's going on (this part of the code is't terribly transparent to
me), but I *think* that they are really large--on the order of several
KB when fully assembled, as compared to less than 100-120 bytes for a
typical metric packet .  I think that size will increase with the
number of metrics stored, since each one must be described in full XML
each time.

The reason for the large size is that an entire XML description of the
metrics appears to be sent each time.  Metadata packets also appear to
go over TCP, not UDP.

My testing was pretty simple:
1) setup a gmond (from SVN, well after 3.1 came out) in unicast mode.
2) set 'send_metadata_interfaval' to 1
3) disable all modules, except for 'mod_core'
4) remove all collection groups.
5) start gmond, and run tcpdump.

On a large cluster, with lots of metrics per host, I can see problems
if the metadata packets are sent too frequently.  I have hosts that
send well over 300 metrics (lots of CPU cores makes for lots of
metrics...).  Each of these need to be described in the metadata
packets.

So I think that setting a non-zero default is fine.  But think that
something like 300 or 600 seconds would be preferable.


-- 
Jesse Becker

--
Gaining the trust of online customers is vital for the success of any company
that requires sensitive data to be transmitted over the Web.   Learn how to 
best implement a security strategy that keeps consumers' information secure 
and instills the confidence they need to proceed with transactions.
http://p.sf.net/sfu/oracle-sfdevnl 
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers