RobH has submitted this change and it was merged. Change subject: Add alert for elasticsearch 50th percentile prefix search time ......................................................................
Add alert for elasticsearch 50th percentile prefix search time Typically prefix search is around 10-30ms. If it hits 75 or 150ms there is almost certainly something wrong that should be addressed. Requires adding discovery-ale...@lists.wikimedia.org to the private puppet repo containing alerting email addresses (contacts.cfg) Bug: T124542 Change-Id: I9c7b79f7af221c0d32ba1c6baa39c55f1bc92d8d --- M manifests/site.pp M modules/nagios_common/files/contactgroups.cfg A modules/role/manifests/elasticsearch/alerts.pp 3 files changed, 17 insertions(+), 1 deletion(-) Approvals: RobH: Looks good to me, approved jenkins-bot: Verified diff --git a/manifests/site.pp b/manifests/site.pp index 4c17989..532dec9 100644 --- a/manifests/site.pp +++ b/manifests/site.pp @@ -1170,7 +1170,7 @@ # Primary graphite machines node 'graphite1001.eqiad.wmnet' { - role graphite::production, statsdlb, performance, graphite::alerts, restbase::alerts, graphite::alerts::reqstats + role graphite::production, statsdlb, performance, graphite::alerts, restbase::alerts, graphite::alerts::reqstats, elasticsearch::alerts include standard } diff --git a/modules/nagios_common/files/contactgroups.cfg b/modules/nagios_common/files/contactgroups.cfg index 3de6f46..8df55d9 100644 --- a/modules/nagios_common/files/contactgroups.cfg +++ b/modules/nagios_common/files/contactgroups.cfg @@ -64,3 +64,8 @@ contactgroup_name wdqs-admins members smalyshev } + +define contactgroup { + contactgroup_name team-discovery + members discovery-alerts +} diff --git a/modules/role/manifests/elasticsearch/alerts.pp b/modules/role/manifests/elasticsearch/alerts.pp new file mode 100644 index 0000000..0f86ea5 --- /dev/null +++ b/modules/role/manifests/elasticsearch/alerts.pp @@ -0,0 +1,11 @@ +class role::elasticsearch::alerts { + monitoring::graphite_threshold { 'prefix_search_50th_percentile': + description => 'Prefix search 50th percentile latency', + metric => 'transformNull(MediaWiki.CirrusSearch.requestTimeMs.prefix.p50, 0)', + from => '10min', + warning => '75', + critical => '150', + percentage => '20', + contact_group => 'team-discovery', + } +} -- To view, visit https://gerrit.wikimedia.org/r/265942 To unsubscribe, visit https://gerrit.wikimedia.org/r/settings Gerrit-MessageType: merged Gerrit-Change-Id: I9c7b79f7af221c0d32ba1c6baa39c55f1bc92d8d Gerrit-PatchSet: 5 Gerrit-Project: operations/puppet Gerrit-Branch: production Gerrit-Owner: EBernhardson <ebernhard...@wikimedia.org> Gerrit-Reviewer: Giuseppe Lavagetto <glavage...@wikimedia.org> Gerrit-Reviewer: RobH <r...@wikimedia.org> Gerrit-Reviewer: jenkins-bot <> _______________________________________________ MediaWiki-commits mailing list MediaWiki-commits@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits