[jira] [Updated] (KUDU-1506) Add Consensus "follower lag" metrics
[ https://issues.apache.org/jira/browse/KUDU-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-1506: -- Target Version/s: (was: 1.5.0) > Add Consensus "follower lag" metrics > > > Key: KUDU-1506 > URL: https://issues.apache.org/jira/browse/KUDU-1506 > Project: Kudu > Issue Type: New Feature > Components: consensus, metrics, supportability >Affects Versions: 0.9.0 >Reporter: Mike Percy >Priority: Major > > It would be useful to have metrics that measured the lag time between leader > WAL writes and follower WAL writes. Imagine if a node on a cluster had a very > slow disk or was extremely overloaded. That node may constantly be falling > behind and/or remote bootstrapping. It would help to be able to monitor for > nodes that were constantly very far behind the leader (high seconds or > minutes) so that administrators could take a look at these slow machines and > either remove them from the cluster or fix the underlying issues. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KUDU-1506) Add Consensus "follower lag" metrics
[ https://issues.apache.org/jira/browse/KUDU-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-1506: -- Component/s: supportability > Add Consensus "follower lag" metrics > > > Key: KUDU-1506 > URL: https://issues.apache.org/jira/browse/KUDU-1506 > Project: Kudu > Issue Type: New Feature > Components: consensus, metrics, supportability >Affects Versions: 0.9.0 >Reporter: Mike Percy >Priority: Major > > It would be useful to have metrics that measured the lag time between leader > WAL writes and follower WAL writes. Imagine if a node on a cluster had a very > slow disk or was extremely overloaded. That node may constantly be falling > behind and/or remote bootstrapping. It would help to be able to monitor for > nodes that were constantly very far behind the leader (high seconds or > minutes) so that administrators could take a look at these slow machines and > either remove them from the cluster or fix the underlying issues. -- This message was sent by Atlassian JIRA (v7.6.3#76005)