Re: Using sysctl(1) to gather resource consumption data

Brian Scott Sun, 14 Sep 2008 02:18:56 -0700

David Wolfskill wrote:

At $work, I've been trying to gather information on "interesting
patterns" of resource consumption during moderately long-running (5 - 8
hour) tasks; the hosts in question usually run FreeBSD 6.2, though
there's an occasional 6.x that's more recent, as well as a bit of
7-STABLE.


I wanted to have a low impact on the system being measured (of course),
and I was unwilling to require that a system to be measured had any
software installed on it other than base FreeBSD.  (Yes, that means I
didn't assume Perl, though in practice in this environment, each does.)

I also wanted the data to be transferred reasonably securely, even if
part of that transit was over facilities over which I had no control.
(Some of the machines being measured happen to be in a continent other
than where I am.)

So I cobbled up a Perl script to run on a data-gathering machine (that
one was mine, so I could require that it had any software I wanted on
it); it acts (if you will) as a "shepherd," watching over child
processes, one of which is created for each host to be measured.

A given child process copies over a shell script to the remote machine,
then redirects STDOUT to append to a file on the data-gathering machine,
and exec()s ssh(1), telling it to run the shell script on the remote
machine.

The shell script fabricates a string (depending on the arguments with
which it was invoked), then sits in a loop:

* eval the string
* sleep for the amount of time remaining

indefinitely.  (In practice, the usual nominal time between successive
eval()s is 5 minutes.  I have recently been doing some experiments at a
10-second interval.)

Periodically, back on the data-gathering machine, a couple of different
things happen:

* The "shepherd" script wakes up and checks the mtime on the file for
  each per-host process (to see if it's been updated "sufficiently
  recently").  Acttually, it first checks the file that lists the hosts
  to watch; if its mtime has changed, it's re-read, and the list of
  hosts is modified as appropriate.  Anyway, if a given per-host file is
  "too old," the corresponding child process is killed.  The the
  script runs through the list of hosts that should be checked,
  creating a per-host process for each one for which that's necessary.

  There's a fair amount of detail I'm eliding (such as limited
  exponential backoff for unresponsive hosts).

  In practice, this runs every 2 minutes at the moment.

* There's a cron(8)-initiated make(1) process that runs, reading the
  files created by the per-host processes and writing to a corresponding
  RRD.  (I cobbled up a Perl script to do this.)

While I tried to externalize a fair amount of this -- e.g., the list of
sysctl(1) OIDs to use is read from an external file -- it turns out that
certain types of change are a bit ... painful.  In particular, adding a
new "data source" to the RRD qualifies (as "painful").

I recently modified the scripts involved to allow them to also be used
to gather per-NIC statistics (via invocation of "netstat -nibf inet").

I'm about to implement that change over the weekend, so it occurred to
me that this might be a good time to add some more sysctl(1) OIDs.

So I'm asking for suggestions -- ideally, for OIDs that are fairly
easily parseable.  (I started being limited to only OIDs that were
presented as a single numeric value per line, then figured out how to
handle kern.cp_time (which is an ordered quintuple); later I figured out
how to cope with vm.loadavg (which is an order triplet ... surrounded by
curly braces).  I don't currently have logic to cope with anything more
complicated than those.)

Here's a list of the OIDs I'm currently using:

-------- Snip ---------



I admit that I don't know what several of those actually mean: I figured
I'd capture what I can, then try to make sense of it.  It's very easy to
ignore data that I've captured, but don't need; it's a little harder to take
appropriate corrective action if I determine that there was some
information I should have captured, but didn't.  :-}

Still, if something's in there that's just silly, I wouldn't mind knowing
about it.  :-)

Thanks!

Peace,
david

You may be interested in some software that I've written over the last 5years or so called FreePDB. Its written in Perl and has a requirementfor an XML library to be installed. This sort of breaks your firstrequirement but I'll describe it anyway.

I schedule a program to run regularly with cron. The program reads someconfiguration data from an XML file telling it what needs to becollected (and what mechanisms to use to collect it) and issues thenecessary commands (sysctl is definitely one of the possibilities) andspits out rows into one or more text files.

In your case, I expect you would transfer the text files over to acentral system (the logger just creates a new file if someone steals theold one), where another program loads the text files into database tables.

Graphing support includes the possibility to extract data into an rrdfile, as well as driving gnuplot or some Perl GD::Graph stuff, or evenhooking up Excel with ODBC from a Windows box and using the graph wizard.


Anyway, I just thought I'd mention it since it might save you some work.

It can be found at freepdb.sourceforge.net. It definitely runs onFreeBSD (I recently upgraded a 4.7 machine but before that it ran therequite nicely) including 7.0.

I'm just cleaning up a new release that includes choice of databasesystems and a few performance/usability improvements. As they say in theclassics, "If you don't see what you need, just ask".


Regards,

Brian
_______________________________________________
freebsd-performance@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Using sysctl(1) to gather resource consumption data

Reply via email to