RobH has uploaded a new change for review.

  https://gerrit.wikimedia.org/r/267052

Change subject: Add alert for elasticsearch 50th percentile prefix search time
......................................................................

Add alert for elasticsearch 50th percentile prefix search time


Typically prefix search is around 10-30ms. If it hits 75 or 150ms
there is almost certainly something wrong that should be addressed.

Requires adding discovery-ale...@lists.wikimedia.org to the private
puppet repo containing alerting email addresses (contacts.cfg)

Bug: T124542

This reverts commit f416ea1f678d48004dc205c9c3611d992e7a7207.

Change-Id: I3cc858db262edb88eed5c0ea409e974565a52707
---
M manifests/site.pp
M modules/nagios_common/files/contactgroups.cfg
A modules/role/manifests/elasticsearch/alerts.pp
3 files changed, 17 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.wikimedia.org:29418/operations/puppet 
refs/changes/52/267052/1

diff --git a/manifests/site.pp b/manifests/site.pp
index 4c17989..532dec9 100644
--- a/manifests/site.pp
+++ b/manifests/site.pp
@@ -1170,7 +1170,7 @@
 
 # Primary graphite machines
 node 'graphite1001.eqiad.wmnet' {
-    role graphite::production, statsdlb, performance, graphite::alerts, 
restbase::alerts, graphite::alerts::reqstats
+    role graphite::production, statsdlb, performance, graphite::alerts, 
restbase::alerts, graphite::alerts::reqstats, elasticsearch::alerts
     include standard
 }
 
diff --git a/modules/nagios_common/files/contactgroups.cfg 
b/modules/nagios_common/files/contactgroups.cfg
index 3de6f46..8df55d9 100644
--- a/modules/nagios_common/files/contactgroups.cfg
+++ b/modules/nagios_common/files/contactgroups.cfg
@@ -64,3 +64,8 @@
     contactgroup_name   wdqs-admins
     members             smalyshev
 }
+
+define contactgroup {
+    contactgroup_name   team-discovery
+    members             discovery-alerts
+}
diff --git a/modules/role/manifests/elasticsearch/alerts.pp 
b/modules/role/manifests/elasticsearch/alerts.pp
new file mode 100644
index 0000000..0f86ea5
--- /dev/null
+++ b/modules/role/manifests/elasticsearch/alerts.pp
@@ -0,0 +1,11 @@
+class role::elasticsearch::alerts {
+    monitoring::graphite_threshold { 'prefix_search_50th_percentile':
+        description    => 'Prefix search 50th percentile latency',
+        metric         => 
'transformNull(MediaWiki.CirrusSearch.requestTimeMs.prefix.p50, 0)',
+        from           => '10min',
+        warning        => '75',
+        critical       => '150',
+        percentage     => '20',
+        contact_group  => 'team-discovery',
+    }
+}

-- 
To view, visit https://gerrit.wikimedia.org/r/267052
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I3cc858db262edb88eed5c0ea409e974565a52707
Gerrit-PatchSet: 1
Gerrit-Project: operations/puppet
Gerrit-Branch: production
Gerrit-Owner: RobH <r...@wikimedia.org>

_______________________________________________
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to