On Fri, 29 May 2015, Simon Hobson wrote:
Peter Valdemar Mørch wrote:
Looking at average and standard deviation is a possibility, but most of my users (and I)
really have no good intuitive feeling for what standard deviation really
"means".
+1, I don't either
I recommend "Full House", by Stephen Jay Gould, or other essays of his.
Summary of one of his most well-known explanations: Why are there no more
.400 hitters in baseball? Has the average quality of batters gone down, or
the average quality of pitchers gone up, or some change to the rules that
makes batting harder in general? No, none of those. What has happened is
that the variability of batting has shrunk. So there is less distance
between the very top batters and the rest of the (major league, already a
select group) batters.
Standard deviation is a measure of variability; I think of it as the range
in which an observed value is about 68% likely to be the result of random
chance (as opposed to being different from the expected value because of
some real cause).
If Babe Ruth bats .300 in 1915 and .320 in 1916 (I am making up these
numbers), you would not think it was a big deal, because a .20 difference
is batting average is pretty small compared to the standard deviation of
player batting averages at the time. Whereas if David Ortiz bats .300 in
2015 and .320 in 2016, you might be justified in thinking this is the
result of something he is doing differently, because the .20 difference is
big compared to the standard deviation of player batting averages in 2015.
Anyway, I wanted to respond to the OP with a script I wrote, attached. The
documentation is very scanty, but you never know when something will be
useful to someone.
- Alex Aminoff
BaseSpace.net
National Bureau of Economic Research (nber.org)#!/usr/bin/perl
=head1 NAME
sdna -- Statistical Detection of Network Abberrance
=head1 SYNOPSIS
# as a cron job, every 10 minutes
sdna --query --read --quiet
# command line
sdna --grep nonzeroerrors switch1 switch2
sdna --read
=head1 OPTIONS
--debug debug
--query Queries all targets and saves collected date to RRD files
in the RRD directory
--grep Grep mode. Collections of include and exclude regexps are
hard coded. Implies --query.
--readRead RRD files, calculate stats, display most abberrant values
--quiet In read mode, disply no output unless network abberrance is above a
threshold.
--config Config file to read
=head1 DESCRIPTION
sdna is intended to run periodically from cron. It calls snmpbulkwalk
to collect all SNMP values from each target IP address, and stores
each in a RRD (Round-Robin Database) file.
sdna makes use of RRDTool's Abberrant Behavior Detection
functionality. For each value, we derive an estimate of how abberrant
that value is, which is basically a Z value, or the number of standard
deviations out from our estimated mean for the value.
Then, we aggrgate all the abberrances of all the values to get a grand
estimate of how unusual or abberrant the current state of the network
as a whole is. If greater than a threshold, we send an alert to an
operator.
sdna can also be used from the command line to produce a list of the
most deviant SNMP variables across the entire network. This might be
used to find which switch port a misbehaving device is on.
A feature of this system is that we try to be agnostic about what each
SNMP variable represents. It does not matter if it is bandwidth or
packet loss or the speed of the link - all we care about is how
different it is from its predicted value based on history. In practice
we can not quite be pure about this, see $SKIP_PATTERN
=head1 SEE ALSO
L,L,L
=head1 AUTHOR
Alex Aminoff, alex_amin...@alum.mit.edu
=head1 COPYRIGHT
Copyright 2013, shared by National Bureau of Economic Research
and Alexander Aminoff
=cut
use Getopt::Long;
use RRDs;
my %byshortcut = (
nonzeroerrors => [ [ qr/Error/o,1],
[ qr/: 0/o,0],
],
);
#my $DIR = '/homes/nber/aminoff/DUMPHERE/nbersnmpdata/';
my $DIR = '/var/db/sdna/';
my $SKIP_PATTERN =
qr/(SNMPv2-SMI::mib-2|SNMPv2-SMI::transmission|SNMPv2-MIB::snmp|IP-MIB::ipNetToMediaIfIndex|66\.251\.7|198\.71\.[67])/o;
my $debug = 0;
my $grep = '';
my $eachthreshold = 2; # threshold Z score to be counted as abberrant
my $masterthreshold = .1; # threshold of proportion abberant tests for alarm
my ($query,$read,$quiet) = (0,0,0);
my $config = '';
my $nofork = 0;
GetOptions('query' => \$query,
'read' => \$read,
'grep=s' => \$grep,
'debug+' => \$debug,
'quiet' => \$quiet,
'nofork' => \$nofork,
'config=s' => \$config,
);
if (! $query && ! $read && ! $grep) {
# default operation
$