Brad,

Thanks for the feedback. My comments are in-line.

On Oct 7, 2010, at 12:27 PM, Brad Nicholes wrote:
 
> Sorry to jump into this thread so late but I thought that I would throw my 2 
> cents in.  
> 
> I finally got a chance to take a look at the code.  I was able to compile it 
> but ran into some C99 issues with variable declarations.  Once I got the code 
> to compile, I was able to take a closer look at what it was doing.  From what 
> I could tell, it looks like the sflow integration is based around reading XDR 
> packets from an sflow agent and turning them into gmond spoofing metrics.  My 
> first question after seeing this is why does this code have to be built into 
> gmond.c?  Why can't it just do the same thing in a module that would be 
> plugged into gmond?  
> 
> The reason why I ask this is because we went to a lot of work to pull all of 
> the metric gathering out of gmond and into modules (including all of the 
> standard metrics).  Some of the main reasons for this is so that metric 
> gathering could be pluggable without having to affect the gmond code itself.  
> That way if a bug was ever found and fixed for a specific metrics, we 
> wouldn't have to re-release all of Ganglia just for one metric fix.  Also, 
> modules give the user the ability to customize each gmond agent to conform to 
> the specific needs of the node where gmond is running.  Regarding sflow, it 
> seems that in order to integrate the sflow metrics into the Ganglia 
> monitoring system, only a single gmond node needs to be configured to gather 
> the sflow metrics.  All of the other gmond agents can continue to be 
> configured and run as they were.  Given that, it would make more sense to 
> integrate sflow as a module that could be loaded under a single gmond agent 
> rather than replacing all the gmond agents or even upgrading just a single 
> agent.  It would also seem to follow the way that other metric modules and 
> spoofing modules have been implemented as well.


I am not very familiar with gmod modules, but it looks like they are designed 
around a polling model and used to retrieve metrics from the server that the 
particular instance of gmond is running on:
1. a module is loaded in the modules section of the gmond.conf file and 
registers a set of metrics it can provide
2. metrics are then included in collection_group sections and polled at the 
specified intervals

With sFlow, the counters are being pushed by remote servers. There may be 
hundreds of sFlow agents sending XDR packets to the single gmond instance. Our 
code acts as a gateway, translating the metrics from the remote hosts and 
presenting them as if they had arrived in the form of Ganglia XDR datagrams 
from remote gmond instances. This function needs to be part of the main 
datagram processing loop. I don't see a way for a module to inject code into 
the packet processing loop(?)

We do of course plan to limit the changes to gmond.c by moving most of the code 
to a separate sflow.c file, leaving just the minimal changes in gmond.c.  We 
also plan to address the issues with the C99 variable declarations.  Thanks for 
pointing that out.

> Implementing the sflow integration as a module would also allow it to change 
> whenever a newer version of sflow is released or whenever the sflow spec or 
> transport changes.  A user could simply upgrade his ganglia sflow module and 
> be up to date with the latest spec without having to wait for the Ganglia 
> project to re-release ganglia.

The sFlow version 5 specification hasn't changed since July 2004. The sFlow 
version 5 protocol sends TLV data containing XDR encoded structures, making it 
extensible. However, once an sFlow structure is published by sFlow.org,  it is 
immutable. The structures we are decoding are part of the recent sFlow Host 
Structures specification and will not change:
http://www.sflow.org/sflow_host.txt

The gmond sFlow decoder skips over structures it is not interested in and won't 
be affected by future sFlow extensions. However, converting additional sFlow 
structures into Ganglia metrics would involve extra coding effort (decoding the 
structure, calculating counter deltas, defining metadata etc), but is 
relatively straightforward. The current code lays down a framework for 
supporting additional sFlow metrics in the future.

> 
> Anyway, the more that I am learning about sflow and what it does especially 
> in relation to Ganglia and what it does, this all seems like a really cool 
> idea.  I am looking forward to seeing this integration done especially if it 
> is through a pluggable module.

We were very excited to see how easily the data propagated into the Ganglia UI. 
 The sFlow standard leverages the work that the Ganglia community has done to 
define a core set of portable metrics, making sFlow and Ganglia a good fit.

Peter
------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb
_______________________________________________
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Reply via email to