I checked the sFlow feed, and it looks like the sanity checks for 32-bit rollover and impossible-counter-delta are already present in the hsflowd code (host-sflow.sourceforge.net src/Linux/readNioCounters.c). At least for the Linux and FreeBSD ports anyway. We should add those checks to the Windows port. Always better to clean things up at the source if you can.
That makes it less urgent to add the same sanity checks at the receiver end (monitor-core/gmond/sflow.c). Sanity checks in too many places could cause headaches down the line (e.g when we all have 10Tbps links). I apologize if this is too much information about a feature that is only available if you compile the Ganglia trunk from sources, but for the record: (1). The 32-bit rollover problem is handled in hsflowd by polling faster internally (every 3 seconds). This accumulates 64-bit versions of the counters which are then pushed out at the normal polling frequency (typically 20 seconds). If the code detects that the kernel counters are already 64-bit, then it turns off the 3-second polling. (2). The impossible-counter-delta sanity checks in hsflowd depend on whether the field is 32-bit or 64-bit. The upper limit for a 32-bit counter delta is 0x7FFFFFFF (about 2e9) and for a 64-bit counter it is 1e13. These checks are applied to the frames and bytes counters, but if either check fails then the sequence number is reset for the whole counter-block -- which invalidates all the counter-deltas for that polling-interval. In other words, if the bytes_in counter jumps crazily then we won't believe the frames, errors or drops counters either. looking at libmetrics/linux/metrics.c, it does seem that compiling with -DREMOVE_BOGUS_SPIKES will do more or less the same as (2). Neil On Mar 30, 2011, at 5:56 PM, Bernard Li wrote: > Hi all: > > On Tue, Mar 29, 2011 at 11:30 AM, Vladimir Vuksan <vli...@veus.hr> wrote: > >> I see it all the time :-(. According to Bernard this is due to problem >> with some of the Broadcom cards. Perhaps Bernard can offer more insight. > > Some old threads which describe the issue in more detail: > > http://www.mail-archive.com/ganglia-general@lists.sourceforge.net/msg04463.html > http://www.mail-archive.com/ganglia-general@lists.sourceforge.net/msg04245.html > > I see two solutions to this problem: > > 1) If this is indeed a driver issue, we should check to see if newer > kernels can fix that. Perhaps Vladimir could look into this > > 2) It would probably be a good thing to implement sanity check. I > think Neil is looking into implementing this for the sflow > integration. Perhaps this could be extended for gmond data as well. > > To help resolve this issue, I would suggestion that we: > > 1) File a bug at bugzilla.ganglia.info > 2) For all those affected, add comments to the bug providing the > network driver model, module used, kernel version, OS version etc. > > Thanks! > > Bernard > > ------------------------------------------------------------------------------ > Create and publish websites with WebMatrix > Use the most popular FREE web apps or write code yourself; > WebMatrix provides all the features you need to develop and > publish your website. http://p.sf.net/sfu/ms-webmatrix-sf > _______________________________________________ > Ganglia-general mailing list > Ganglia-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/ganglia-general ------------------------------------------------------------------------------ Create and publish websites with WebMatrix Use the most popular FREE web apps or write code yourself; WebMatrix provides all the features you need to develop and publish your website. http://p.sf.net/sfu/ms-webmatrix-sf _______________________________________________ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general