I just installed Linux kernel 2.6.0 and noticed that the network statistics given by gmond 2.5.5 were incorrect. On a few of my servers I have large outgoing traffic going out the interfaces compared to the incoming, and ganglia is really useful for seeing total outgoing bandwidth and that kind of thing. But with the /proc/net/dev output from kernel 2.6.0 the ganglia graph was suddenly showing both incoming and outgoing traffic to be small values.
Well it turns out that there are four places in the machine file linux.c that parse proc_net_dev, and it assumes that the loopback is the first interface. That seems to be correct for 2.4.x but when I boot with 2.6.0, proc_net_dev shows ``eth0'' and ``eth1'' first and then ``lo'' last. I made some tweaks so that gmond only skips that first line if it really contains the interface characters 'lo:' Also, there is a while loop in each of those proc_net_dev functions that already had an extra check via an if statement to confirm that the characters 'l' and 'o' really are not there as the interface label on the line, so it can skip it. But from what I could tell there was an off by one error in the comparison, so the comparison was never working -- although it didn't need to for earlier linux kernels, since the loopback interface was skipped as the first line. But I adjusted that off by one error and also added more paranthesis to the code to make the character comparisons clearer for the compiler. Anyhow, it might be worth looking over those function agains and coming up with a clean patch to include into the next release of ganglia. I've attached a patch file in case my explanation of what I spotted in the if statement wasn't clear. I tested the patch against a freshly uncompressed ganglia-2.5.5.tar.gz file, I just had to explicitly specify 'linux.c' to the patch command when I passed the diff file through it. Lester
gmond-2.5.5-linux-kernel-2.6.0-proc_net_dev-fix.diff
Description: Binary data