I just installed Linux kernel 2.6.0 and noticed that the network statistics
given by gmond 2.5.5 were incorrect. On a few of my servers I have large
outgoing traffic going out the interfaces compared to the incoming, and
ganglia is really useful for seeing total outgoing bandwidth and that kind
of thing. But with the /proc/net/dev output from kernel 2.6.0 the ganglia
graph was suddenly showing both incoming and outgoing traffic to be small
values.

Well it turns out that there are four places in the machine file linux.c
that parse proc_net_dev, and it assumes that the loopback is the first
interface. That seems to be correct for 2.4.x but when I boot with 2.6.0,
proc_net_dev shows ``eth0'' and ``eth1'' first and then ``lo'' last.

I made some tweaks so that gmond only skips that first line if it really
contains the interface characters 'lo:'

Also, there is a while loop in each of those proc_net_dev functions that
already had an extra check via an if statement to confirm that the
characters 'l' and 'o' really are not there as the interface label on the
line, so it can skip it. But from what I could tell there was an off by one
error in the comparison, so the comparison was never working -- although it
didn't need to for earlier linux kernels, since the loopback interface was
skipped as the first line. But I adjusted that off by one error and also
added more paranthesis to the code to make the character comparisons clearer
for the compiler.

Anyhow, it might be worth looking over those function agains and coming up
with a clean patch to include into the next release of ganglia. I've
attached a patch file in case my explanation of what I spotted in the if
statement wasn't clear. I tested the patch against a freshly uncompressed
ganglia-2.5.5.tar.gz file, I just had to explicitly specify 'linux.c' to the
patch command when I passed the diff file through it.

Lester

Attachment: gmond-2.5.5-linux-kernel-2.6.0-proc_net_dev-fix.diff
Description: Binary data

Reply via email to