EBernhardson has uploaded a new change for review.
https://gerrit.wikimedia.org/r/259443
Change subject: [elasticsearch] Collect cluster health stats about shard
movement
......................................................................
[elasticsearch] Collect cluster health stats about shard movement
The addition of relocating/initializing/unassigned shards statistics
should give us better insight into when the cluster drops a node, and
how it recovers from dropping that node. I would have thought this was
uncommon, but we have dropped a node twice in the last 3 days and
need better monitoring about what happens.
Bug: T117284
Change-Id: I69788c5455115b5aa54167facfcb2dd83954e0bc
---
M modules/elasticsearch/files/monitor/wmfelastic.py
1 file changed, 26 insertions(+), 2 deletions(-)
git pull ssh://gerrit.wikimedia.org:29418/operations/puppet
refs/changes/43/259443/1
diff --git a/modules/elasticsearch/files/monitor/wmfelastic.py
b/modules/elasticsearch/files/monitor/wmfelastic.py
index 15aa081..8a5d8f2 100644
--- a/modules/elasticsearch/files/monitor/wmfelastic.py
+++ b/modules/elasticsearch/files/monitor/wmfelastic.py
@@ -37,8 +37,18 @@
self.endpoints = {
'node': '_nodes/_local/stats',
- 'cluster': '_cluster/stats',
+ 'cluster_stats': '_cluster/stats',
+ 'cluster_health': '_cluster/health',
}
+
+ # Metrics provided at cluster level
+ # _cluster/health
+ self.health_metrics = [
+ "delayed_unassigned_shards",
+ "unassigned_shards",
+ "initializing_shards",
+ "relocating_shards",
+ ]
# Metrics provided at cluster level
# _cluster/stats
@@ -179,8 +189,19 @@
def dict_path(self, m, sep='.'):
return m.split(sep)
+ def cluster_health(self):
+ chealth = self._get(self.endpoints['cluster_health'])
+ gmetrics = {}
+ for metric in self.health_metrics:
+ try:
+ gmetrics[metric] = chealth[metric]
+ except KeyError, e:
+ self.errors += 1
+ pass
+ return gmetrics
+
def cluster_stats(self):
- cstats = self._get(self.endpoints['cluster'])
+ cstats = self._get(self.endpoints['cluster_stats'])
gmetrics = {}
for m in self.cluster_metrics:
depth = self.dict_path(m)
@@ -223,6 +244,9 @@
cluster_stats = self.cluster_stats()
for metric, value in cluster_stats.iteritems():
self.publish(metric, value)
+ health_stats = self.health_stats()
+ for metric, value in health_stats.iteritems():
+ self.publish(metric, value)
# Remaining fall under the hostname context
self.config['path_prefix'] = self.o_prefix
--
To view, visit https://gerrit.wikimedia.org/r/259443
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: I69788c5455115b5aa54167facfcb2dd83954e0bc
Gerrit-PatchSet: 1
Gerrit-Project: operations/puppet
Gerrit-Branch: production
Gerrit-Owner: EBernhardson <[email protected]>
_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits