[jira] [Updated] (KUDU-1506) Add Consensus "follower lag" metrics

2018-02-16 Thread Grant Henke (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-1506:
--
Target Version/s:   (was: 1.5.0)

> Add Consensus "follower lag" metrics
> 
>
> Key: KUDU-1506
> URL: https://issues.apache.org/jira/browse/KUDU-1506
> Project: Kudu
>  Issue Type: New Feature
>  Components: consensus, metrics, supportability
>Affects Versions: 0.9.0
>Reporter: Mike Percy
>Priority: Major
>
> It would be useful to have metrics that measured the lag time between leader 
> WAL writes and follower WAL writes. Imagine if a node on a cluster had a very 
> slow disk or was extremely overloaded. That node may constantly be falling 
> behind and/or remote bootstrapping. It would help to be able to monitor for 
> nodes that were constantly very far behind the leader (high seconds or 
> minutes) so that administrators could take a look at these slow machines and 
> either remove them from the cluster or fix the underlying issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-1506) Add Consensus "follower lag" metrics

2018-02-16 Thread Grant Henke (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-1506:
--
Component/s: supportability

> Add Consensus "follower lag" metrics
> 
>
> Key: KUDU-1506
> URL: https://issues.apache.org/jira/browse/KUDU-1506
> Project: Kudu
>  Issue Type: New Feature
>  Components: consensus, metrics, supportability
>Affects Versions: 0.9.0
>Reporter: Mike Percy
>Priority: Major
>
> It would be useful to have metrics that measured the lag time between leader 
> WAL writes and follower WAL writes. Imagine if a node on a cluster had a very 
> slow disk or was extremely overloaded. That node may constantly be falling 
> behind and/or remote bootstrapping. It would help to be able to monitor for 
> nodes that were constantly very far behind the leader (high seconds or 
> minutes) so that administrators could take a look at these slow machines and 
> either remove them from the cluster or fix the underlying issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)