Earlier this week I set up a new ganglia 3.6.0 system for our software
development infrastructure servers.  Aside from some minor issues with
the web UI (which I'll discuss in another thread), it has been working
smoothly and I've already have actionable data with regards to how the
servers should be reconfigured for better performance.

However, yesterday I added three 32 bit x86 Linux build farm clusters
to the mix, and now network metrics show erratic and erroneous traffic
spikes of Petabytes or even Exabytes per second.

These machines do a lot of I/O, and I theorized that this was related
to rolling the network counters.  Looking into the code, I found this
in libmetrics/linux/metrics.c:

    /* receive */
    rbi = strtostat(p, &p ,10);
    if ( rbi >= ns->rbi ) {
       l_bytes_in += rbi - ns->rbi;
    } else {
       debug_msg("update_ifdata(%s) - Overflow in rbi: %"PRI_STAT" -> 
%"PRI_STAT,caller,ns->rbi,rbi);
       l_bytes_in += STAT_MAX - ns->rbi + rbi;
    }
    ns->rbi = rbi;

The problem is that because HAVE_STRTOULL is defined, stat_t,
STAT_MAX, PRI_STAT, and strtostat() are defined to be/use 64 bit 
values, even though the network counters read from /proc/net/dev
are 32 bits.  This means that a huge increment is added when the
counter overflows.

What's the way to fix this so it can be integrated into ganglia?

It appears that these types/#defines are only used for network
stats, so we could simply change the conditional used to select
ullong vs. ulong from HAVE_STRTOULL to something else.  In fact,
from the git history, it appears that these stats all used to be
32 bits wide until change made about a year ago.

    --jtc

-- 
J.T. Conklin

------------------------------------------------------------------------------
Android apps run on BlackBerry 10
Introducing the new BlackBerry 10.2.1 Runtime for Android apps.
Now with support for Jelly Bean, Bluetooth, Mapview and more.
Get your Android app in front of a whole new audience.  Start now.
http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk
_______________________________________________
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Reply via email to