Re: [Ganglia-developers] A few questions.

Steven Wagner Thu, 15 Aug 2002 16:12:45 -0700

Jason A. Smith wrote:

I have a few questions about ganglia development.  I am using ganglia on
RedHat 7.x i386 here.


1.  I tried the new network metrics that are in the latest cvs version
of the monitoring-core module and was wondering what happens on a dual
NIC computer, is just the first interface counted or does it total all
interfaces when calculating bytes/pkts_in/out?  I assume this number is
the rate between measurement intervals, correct?


I should field this since I wrote it. :)

The network code scans /proc/net/dev and ignores the loopback adapter (atthe moment I don't remember whether the code simply skips the first line,which is typically loopback, or whether it actually looks for a tokenbeginning with "lo").

Remaining traffic stats are summed. Every time the metric is collected,the data is saved and timestamped, so the data *should* actually be inbytes-per-second or packets-per-second.

2. I thought I saw a message about adding disk i/o metrics to gmond.


Yeah, that was probably me. :)

Is this currently in development and will it be in the upcoming 2.5.0

release?

I've implemented something like it in Solaris but those metrics aregathered using a different interface (namely, "not procfs"). Most of themetrics in linux.c are gathered by comparing cached timestamped results ofa /proc textfile with its current incarnation. This could be adapted veryeasily to new metrics.


> We have a set of scripts that we use to collect some extra

data and publish it through gmetric.  Some of the extra parameters we
collect are: # of established tcp connections, various disk i/o stats
from iostat like read/write rates, average wait time and service time
for requests.  Parameters like this would be useful to include in the
monitoring core.  I haven't seen a TODO list, do you have a set of
parameters that will eventually be included in the core?

My fileservers are all running Solaris, so I don't have as much interest inporting that to Linux. However, iostat apparently gets a lot of its datafrom /proc/stat and /proc/partitions (especially /proc/partitions), so newmetrics could certainly be added to monitor that stuff.

The main problem with adding new internal metrics is that metric names areconverted to number via a compiled-in metric hash before they're broadcast.So if you add five metrics and the monitoring core multicasts"number_of_cokes_left_in_vending_machine" as internal metric #31, andanother Linux (or Solaris or IRIX or ... ) monitoring core hears it thathasn't had the same hacks applied, it will map it to something else("number_of_sniper_kills"), or discard it completely if it exceeds thenumber of entries in the hash.

If you check the archives you'll see that I made a big fuss about this afew weeks back. No resolution on that yet. :)

As the monitoring core is (IMO) at that point in its maturity that a lot ofpeople are starting to use it and find it useful, I expect that the Linuxversion will probably lead the way in terms of what metrics the majority ofusers want. The feeling I get is that, once 2.x becomes fairly stable, wewill start to see talk of development on 3.0, which I imagine will have aradically different architecture, based on some of the musings I've seen onthis list.

3.  What is the long term development plan for gmetad?  Will it always
remain a perl script or will it eventually be rewritten in C?  I think I
saw an earlier message about the known problems with gmetad hanging or
dieing because of network problems or hosts not responding.  Are there
any ideas on a way to solve this problem?


The argument for gmetad being a perl script goes something like this:

"It's perl, it's flexible, and it deals with lots of strings."

Which I can't really find fault with. And it runs like a dream on mymonitoring box (also Solaris), with the caveats mentioned in this quote. Ihave not heard of anyone *on Linux* having these problems so far, so itmight be related to the Solaris implementation of perl'snetcode/Socket.pm/who-knows-what.

But there is definitely talk of rewriting it in C. But, to paraphrase aUtah Phillips story, we have a rule around this here list that the personwho complains about the code gets to write the replacement. Which is why Iam always very careful to say, "Good lord, this is a big ugly Perl script!GOOD THOUGH!"


:)

So when can we expect your CVS checkin of gmetad.c?

4.  Just a warning:  Have you ever run gmond on hosts that are using
iptables for local firewalling?  I have tried it here and think there is
a bug with the iptables handling of multicast packets.  I put in a rule
to accept packets for 239.2.11.71 on port 8649, but several minutes
after starting the iptables firewall, the host stops receiving the
multicast packets from other hosts, it only sees the multicast packets
it sends out.  Then after stopping iptables it takes a minute or two
before it starts seeing them from other hosts again.  I verified this by
watching the packets with tcpdump on two hosts.  I don't have this
problem if I use an equivalent ipchains rule instead of iptables.

That sounds pretty wild, although I can't say I'm totally shocked thatiptables messes with multicast. The monitoring core, as of CVS, hasswitched to libdnet for its network library needs... have you tried arecent CVS build in the same situation versus two 2.4.x builds?


And it could also be the kernel... are we having fun yet?

All of the work I'm doing is far, far inside a firewall so the question, tome, is unfortunately academic ... but maybe I've given you some ideas. (orgiven someone else reading this archived post some ideas?)

PS. Are suggestions or patches welcome if I have some ideas on
improvements with gmetad or its webfrontend?

I sure hope so. Nobody on this list seems to mind talking out designideas, either.

Re: [Ganglia-developers] A few questions.

Reply via email to