John Conner wrote: > Thanks a lot, Sven! > > Still fairly new to rrdtool and never used the "updatev" option, gonna > check it right now. > > Do you have any documents handy on how you implement this? if you do, > could you point me the link?
Sure, here are some quick notes on how to set up aberrant behaviour detection for a data value. My example is based on actual monitoring of a network link with somewhat strong periodical behaviour; that is, you can easily identify a repeating (daily) pattern in the traffic graph. This is the rrdtool create command I use. I've added comments to some of the lines: rrdtool create network-uplink.rrd \ --start 1166600000 \ --step 120 \ # sample every 2 minutes DS:pktsin:DERIVE:180:0:4294967295 \ #maintain counters for packets DS:bytesin:DERIVE:180:0:4294967295 \ #and bytes, inbound and outbound DS:pktsout:DERIVE:180:0:4294967295 \ DS:bytesout:DERIVE:180:0:4294967295 \ RRA:AVERAGE:0.5:1:840 \ #day graph RRA:AVERAGE:0.5:15:384 \ #week graph RRA:AVERAGE:0.5:60:384 \ #month graph RRA:AVERAGE:0.5:720:400 \ #year graph RRA:HWPREDICT:1440:0.05:0.0035:720:6 \ #Detailed notes below RRA:SEASONAL:720:0.01:5 \ RRA:DEVSEASONAL:720:0.01:5 \ RRA:DEVPREDICT:1440:7 \ RRA:FAILURES:1440:5:25:7 About the last RRAs: - HWPREDICT is set up to use a seasonal period of 720 datapoints. 720 datapoints with intervals of 2 minutes equals exactly 24 hours. I.e., the traffic pattern repeats every day. You might want to use an entire week as the seasonal period, depending on your patterns. - The alpha,beta,gamma values are not all that easy to tune properly to your data source, in my opinion. I've chosen fairly generic values, based on those found in http://cricket.sourceforge.net/aberrant/rrd_hw.htm - HWPREDICT has index 6, SEASONAL has 5, and so on. This is the rra-num index number, and was not entirely easy to figure out based on the documentation, which states "The rra-num argument is the 1-based index in the order of RRA creation (that is, the order they appear in the create command)." It simply refers to the index number of the RRAs, counting from 1 (this includes *all* RRAs, AVERAGE too!) HWPREDICT should refer to the SEASONAL index, SEASONAL to HWPREDICT, DEVSEASONAL to HWPREDICT, DEVPREDICT to DEVSEASONAL and FAILURES to DEVSEASONAL. The next step that I would have easily worked out if I read the documentation properly, is to adjust the positive and negative confidence band factors. The default is 2, which I find a bit too unforgiving for my scenario. To adjust it to 5, run: rrdtool tune network-uplink.rrd --deltapos 5 --deltaneg 5 Here's how I graph the daily graph for the inbound byte counter: rrdtool graph \ daily.png \ --font LEGEND:7 \ --font UNIT:7 \ --font AXIS:7 \ --base 1024 \ -l 0 -r \ -w 400 \ -h 125 \ --start end-100800 \ -E \ --title "Network traffic, by day" \ --vertical-label "Bytes/sec" \ --x-grid "HOUR:1:DAY:1:HOUR:4:0:%H:%M" \ \ DEF:a_avg=network-uplink.rrd:bytesin:AVERAGE \ DEF:a_pred=network-uplink.rrd:bytesin:HWPREDICT \ DEF:a_dev=network-uplink.rrd:bytesin:DEVPREDICT \ DEF:a_fail=network-uplink.rrd:bytesin:FAILURES \ \ CDEF:a_normavg=a_avg \ CDEF:dev_lower=a_pred,a_dev,5,*,- \ # Note we're using 5 as the scaling factor CDEF:dev_upper=a_pred,a_dev,5,*,+ \ # when graphing! Same as in the tune command. CDEF:dev_area=dev_upper,dev_lower,- \ \ VDEF:a_last=a_avg,LAST \ VDEF:a_average=a_avg,AVERAGE \ \ AREA:dev_lower#ffffff \ AREA:dev_area#ccffcc::STACK \ TICK:a_fail#ff9999:1.0 \ LINE1:dev_lower#66ff66 \ LINE1:dev_upper#66ff66 \ LINE3:a_pred#66ff66 \ LINE1:a_normavg#666699 \ COMMENT:"Current\:" GPRINT:a_last:%6.2lf \ COMMENT:"Average\:" GPRINT:a_average:%6.2lf \ COMMENT:"\n" \ COMMENT:"Last update\: `date \"+%Y-%m-%d %H\\:%M\\:%S %Z\"`"\\r Next, to actually have it report aberrant behaviour in real-time, as opposed to post-mortem, you'll need a wrapper script to run 'rrdtool updatev' and parse the output. There are probably fancy bindings in perl for this, or some other graceful way of doing it. My way is a quick python script that parses the output looking for 'FAILURES', and then determining if the corresponding value is greater than 0.0. Well, that's pretty much it. Good luck! Sven >> I use the aberrant behaviour detection in rrdtool and I find >> it quite handy. To detect problems, i use the 'rrdtool updatev' >> command, which will output FAILURE=1.0 (different syntax), if >> it detects failures. FAILURE=0.0 if not. In other words, I parse >> the output of the command, and trigger alerts based on it. You >> should probably implement a wrapper around the parsing/alarming, >> so that you won't get flooded with mails/SMS messages every five >> minutes while a deviation is happening. >> >> Sven _______________________________________________ rrd-users mailing list rrd-users@lists.oetiker.ch https://lists.oetiker.ch/cgi-bin/listinfo/rrd-users