I checked the sFlow feed,  and it looks like the sanity checks for 32-bit 
rollover and impossible-counter-delta are already present in the hsflowd code 
(host-sflow.sourceforge.net  src/Linux/readNioCounters.c).  At least for the 
Linux and FreeBSD ports anyway.  We should add those checks to the Windows 
port.  Always better to clean things up at the source if you can.

That makes it less urgent to add the same sanity checks at the receiver end 
(monitor-core/gmond/sflow.c).   Sanity checks in too many places could cause 
headaches down the line (e.g when we all have 10Tbps links).

I apologize if this is too much information about a feature that is only 
available if you compile the Ganglia trunk from sources,   but for the record:

(1). The 32-bit rollover problem is handled in hsflowd by polling faster 
internally (every 3 seconds).  This accumulates 64-bit versions of the counters 
which are then pushed out at the normal polling frequency (typically 20 
seconds).   If the code detects that the kernel counters are already 64-bit,  
then it turns off the 3-second polling.

(2). The impossible-counter-delta sanity checks in hsflowd depend on whether 
the field is 32-bit or 64-bit.   The upper limit for a 32-bit counter delta is 
0x7FFFFFFF (about 2e9) and for a 64-bit counter it is 1e13.  These checks are 
applied to the frames and bytes counters,  but if either check fails then the 
sequence number is reset for the whole counter-block -- which invalidates all 
the counter-deltas for that polling-interval.  In other words,  if the bytes_in 
counter jumps crazily then we won't believe the frames, errors or drops 
counters either.

looking at libmetrics/linux/metrics.c,  it does seem that compiling with 
-DREMOVE_BOGUS_SPIKES will do more or less the same as (2).

Neil




On Mar 30, 2011, at 5:56 PM, Bernard Li wrote:

> Hi all:
> 
> On Tue, Mar 29, 2011 at 11:30 AM, Vladimir Vuksan <vli...@veus.hr> wrote:
> 
>> I see it all the time :-(. According to Bernard this is due to problem
>> with some of the Broadcom cards. Perhaps Bernard can offer more insight.
> 
> Some old threads which describe the issue in more detail:
> 
> http://www.mail-archive.com/ganglia-general@lists.sourceforge.net/msg04463.html
> http://www.mail-archive.com/ganglia-general@lists.sourceforge.net/msg04245.html
> 
> I see two solutions to this problem:
> 
> 1) If this is indeed a driver issue, we should check to see if newer
> kernels can fix that.  Perhaps Vladimir could look into this
> 
> 2) It would probably be a good thing to implement sanity check.  I
> think Neil is looking into implementing this for the sflow
> integration.  Perhaps this could be extended for gmond data as well.
> 
> To help resolve this issue, I would suggestion that we:
> 
> 1) File a bug at bugzilla.ganglia.info
> 2) For all those affected, add comments to the bug providing the
> network driver model, module used, kernel version, OS version etc.
> 
> Thanks!
> 
> Bernard
> 
> ------------------------------------------------------------------------------
> Create and publish websites with WebMatrix
> Use the most popular FREE web apps or write code yourself; 
> WebMatrix provides all the features you need to develop and 
> publish your website. http://p.sf.net/sfu/ms-webmatrix-sf
> _______________________________________________
> Ganglia-general mailing list
> Ganglia-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ganglia-general


------------------------------------------------------------------------------
Create and publish websites with WebMatrix
Use the most popular FREE web apps or write code yourself; 
WebMatrix provides all the features you need to develop and 
publish your website. http://p.sf.net/sfu/ms-webmatrix-sf
_______________________________________________
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to