Giuseppe Lavagetto has uploaded a new change for review.

  https://gerrit.wikimedia.org/r/133692

Change subject: icinga: Fix anomaly detection checks
......................................................................

icinga: Fix anomaly detection checks

Changed the check for reqstats.5xx to use the data and not the ratio to
requests as although formally more correct it's much more unstable than
the base check. Also, check_graphite now requests the holt-winters
anomaly bands with 5 delta confidence, which has proven to be much
nearer to detect a real anomaly with less false positives.

Change-Id: I8b06b0b139b292e57b97417c0f880ec660fbbf2d
Signed-off-by: Giuseppe Lavagetto <glavage...@wikimedia.org>
---
M files/icinga/check_graphite
M manifests/role/graphite.pp
2 files changed, 3 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.wikimedia.org:29418/operations/puppet 
refs/changes/92/133692/1

diff --git a/files/icinga/check_graphite b/files/icinga/check_graphite
index f3ac06d..cb5cfe2 100755
--- a/files/icinga/check_graphite
+++ b/files/icinga/check_graphite
@@ -316,7 +316,7 @@
         for target in self.targets:
             self.params.append(('target', target))
             self.params.append(
-                ('target', 'holtWintersConfidenceBands(%s)' % target))
+                ('target', 'holtWintersConfidenceBands(%s, 5)' % target))
         self.check_window = args.check_window
         self.warn = args.warn
         self.crit = args.crit
@@ -423,7 +423,7 @@
            check_threshold my.beloved.metric  --from -20m \
            --threshold 100 --over -C 10 -W 5
 
-    Check if a metric has exceeded its holter-winters confidence bands 5% of 
the
+    Check if a metric has exceeded its holt-winters confidence bands 5% of the
     times over the last 500 checks
 
     ./check_graphyte.py --url http://some-graphite-host  \
diff --git a/manifests/role/graphite.pp b/manifests/role/graphite.pp
index 2246027..59a057d 100644
--- a/manifests/role/graphite.pp
+++ b/manifests/role/graphite.pp
@@ -200,7 +200,7 @@
         # if 10% of the last 100 checks is out of forecasted bounds
         monitor_graphite_anomaly {'requests_error_ratio':
             description  => 'HTTP error ratio anomaly detection',
-            metric       => 'divideSeries(reqstats.5xx,reqstats.requests)',
+            metric       => 'reqstats.5xx',
             warning      => 5,
             critical     => 10,
             check_window => 100,

-- 
To view, visit https://gerrit.wikimedia.org/r/133692
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I8b06b0b139b292e57b97417c0f880ec660fbbf2d
Gerrit-PatchSet: 1
Gerrit-Project: operations/puppet
Gerrit-Branch: production
Gerrit-Owner: Giuseppe Lavagetto <glavage...@wikimedia.org>

_______________________________________________
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to