Giuseppe Lavagetto has uploaded a new change for review. https://gerrit.wikimedia.org/r/133692
Change subject: icinga: Fix anomaly detection checks ...................................................................... icinga: Fix anomaly detection checks Changed the check for reqstats.5xx to use the data and not the ratio to requests as although formally more correct it's much more unstable than the base check. Also, check_graphite now requests the holt-winters anomaly bands with 5 delta confidence, which has proven to be much nearer to detect a real anomaly with less false positives. Change-Id: I8b06b0b139b292e57b97417c0f880ec660fbbf2d Signed-off-by: Giuseppe Lavagetto <glavage...@wikimedia.org> --- M files/icinga/check_graphite M manifests/role/graphite.pp 2 files changed, 3 insertions(+), 3 deletions(-) git pull ssh://gerrit.wikimedia.org:29418/operations/puppet refs/changes/92/133692/1 diff --git a/files/icinga/check_graphite b/files/icinga/check_graphite index f3ac06d..cb5cfe2 100755 --- a/files/icinga/check_graphite +++ b/files/icinga/check_graphite @@ -316,7 +316,7 @@ for target in self.targets: self.params.append(('target', target)) self.params.append( - ('target', 'holtWintersConfidenceBands(%s)' % target)) + ('target', 'holtWintersConfidenceBands(%s, 5)' % target)) self.check_window = args.check_window self.warn = args.warn self.crit = args.crit @@ -423,7 +423,7 @@ check_threshold my.beloved.metric --from -20m \ --threshold 100 --over -C 10 -W 5 - Check if a metric has exceeded its holter-winters confidence bands 5% of the + Check if a metric has exceeded its holt-winters confidence bands 5% of the times over the last 500 checks ./check_graphyte.py --url http://some-graphite-host \ diff --git a/manifests/role/graphite.pp b/manifests/role/graphite.pp index 2246027..59a057d 100644 --- a/manifests/role/graphite.pp +++ b/manifests/role/graphite.pp @@ -200,7 +200,7 @@ # if 10% of the last 100 checks is out of forecasted bounds monitor_graphite_anomaly {'requests_error_ratio': description => 'HTTP error ratio anomaly detection', - metric => 'divideSeries(reqstats.5xx,reqstats.requests)', + metric => 'reqstats.5xx', warning => 5, critical => 10, check_window => 100, -- To view, visit https://gerrit.wikimedia.org/r/133692 To unsubscribe, visit https://gerrit.wikimedia.org/r/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I8b06b0b139b292e57b97417c0f880ec660fbbf2d Gerrit-PatchSet: 1 Gerrit-Project: operations/puppet Gerrit-Branch: production Gerrit-Owner: Giuseppe Lavagetto <glavage...@wikimedia.org> _______________________________________________ MediaWiki-commits mailing list MediaWiki-commits@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits