Here is the patch I was referring to.  It allows you to put something like this 
in your gmond.conf:

sflow {
  null_int = 0
  null_float = 0.0
}

and then if a fields like cpu_nice is missing (as in the Windows hsflowd) we'll 
submit 0.0 instead of leaving it out.   This is a work-around for the problem 
where the RRD does not even appear when cpu_nice is missing.

You can also add another setting "accept_all_physical = yes",  like this:

sflow {
  null_int = 0
  null_float = 0
  accept_all_physical = yes
}

and now the extra metrics that are defined in host-sflow but not in libmetrics 
are accepted too.  These include some useful ones like the number of context 
switches,  the number of pages swapped in/out, network errors and drops, more 
info on disk reads and writes,  and so on.  The UI seems to do a good job of 
just adding these RRDs to the page (so perhaps it would be even safe to make 
"yes" the default here?)

I'm still skipping over the VM fields,  and don't have the option to ignore the 
sFlow hostname field yet,  but placeholder boolean options "accept_all_virtual" 
and "accept_hostname" are defined.  There is also "udp_port" in case you want 
to designate a non-standard port as the sFlow port (though it still has to 
appear in a udp_receive_channel section elsewhere).

I didn't edit gmond/conf.pod yet.  I figured that could happen once there is 
consensus on these options.

Thoughts?

Regards,
Neil

Attachment: sflow_20110211.patch
Description: Binary data



On Feb 11, 2011, at 2:26 PM, Neil McKee wrote:

> It might be helpful to have commit privileges for minor bugfixes,  yes.   
> Although I wouldn't use it for bigger changes that require consensus.
> 
> I'm working on a bigger patch that will add an sflow { } configuration block 
> so we can have options to control the following:
> 
> - the sflow udp port
> - whether to submit 0s rather than leave undefined metrics out altogether (as 
> a workaround for that issue with the CPU/mem charts for Windows)
> - whether to ignore the hostname that comes with with host-sflow so that 
> gmond will do it's own reverse lookup (to get the FQDN instead)
> - whether to submit the extra metrics that host-sflow sends for physical 
> servers
> - whether to submit the extra metrics that host-sflow sends for virtual 
> servers
> 
> Should have it ready for review later this afternoon.
> 
> Once the sFlow HTTP and memcached counter-blocks are finalized then we could 
> quickly add support for those metrics too.  More input on those would be very 
> welcome:
> http://blog.sflow.com/search/label/HTTP
> http://blog.sflow.com/search/label/Memcache
> 
> Regards,
> Neil
> 
> 
> 
> 
> 
> On Feb 11, 2011, at 12:56 PM, Bernard Li wrote:
> 
>> Hi Neil:
>> 
>> Argh, Gmail always mess up inline patches...  anyways, I've fixed it
>> manually and checked it in:
>> 
>> https://sourceforge.net/apps/trac/ganglia/changeset/2473
>> 
>> Do you want commit rights to our SVN repo so that you could fix this
>> yourself in the future? :)
>> 
>> Thanks,
>> 
>> Bernard
>> 
>> On Thu, Feb 10, 2011 at 9:34 PM, Neil McKee <neil.mc...@inmon.com> wrote:
>>> I didn't realize that "gmond_started" was meant to go with every 
>>> heatbeat-message.  Below is a patch.
>>> 
>>> Neil
>>> 
>>> [root@ganglia gmond]# svn diff sflow.c
>>> Index: sflow.c
>>> ===================================================================
>>> --- sflow.c     (revision 2471)
>>> +++ sflow.c     (working copy)
>>> @@ -398,7 +398,7 @@
>>> #define SFLOW_OK_COUNTER64(field) (field != (uint64_t)-1)
>>> 
>>>  /* always send a heartbeat */
>>> -  process_sflow_uint32(hostdata, SFLOW_M_heartbeat, 0);
>>> +  process_sflow_uint32(hostdata, SFLOW_M_heartbeat, (apr_time_as_msec(now) 
>>> - x.uptime_mS) / 1000);
>>> 
>>>  if(offset_HID) {
>>>    /* sumbit the system fields that we already extracted above */
>>> 
>>> 
>>> 
>>> 
>>> On Feb 10, 2011, at 5:38 PM, Bernard Li wrote:
>>> 
>>>> P.S. It looks like GMOND_STARTED=0 for the Windows host -- should I file a 
>>>> bug?
>>>> 
>>>> Thanks,
>>>> 
>>>> Bernard
>>>> 
>>>> On Thu, Feb 10, 2011 at 5:21 PM, Bernard Li <bern...@vanhpc.org> wrote:
>>>>> Hi Neil:
>>>>> 
>>>>> On Thu, Feb 10, 2011 at 5:05 PM, Neil McKee <neil.mc...@inmon.com> wrote:
>>>>> 
>>>>>> That's odd,  the CPU and MEM charts are working OK for me.  It's just 
>>>>>> the load-avg that is missing (which causes the host to be marked "down").
>>>>> 
>>>>> You'll likely need to nuke your old rrd files to see the error.
>>>>> 
>>>>> For Windows, mem_shared, mem_buffers and cpu_nice are missing, and
>>>>> they are needed by the respective mem_ and cpu_ reports.  I guess we
>>>>> could fix to frontend code to only include them if they exist.  Do
>>>>> these metrics exist in Windows-land?
>>>>> 
>>>>>> How frequently would we need to poll the Processor Queue Length to get a 
>>>>>> reasonable load-average estimate?  If we could get away with only doing 
>>>>>> it every few seconds then it might be worthwhile (perhaps a random delay 
>>>>>> would be appropriate) but I suspect you might have to poll much faster 
>>>>>> than that to get something worthwhile?
>>>>> 
>>>>> I will defer to Nick for answering that question :-)
>>>>> 
>>>>> Cheers,
>>>>> 
>>>>> Bernard
>>>>> 
>>> 
>>> 
> 

------------------------------------------------------------------------------
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
_______________________________________________
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Reply via email to