[ https://issues.apache.org/jira/browse/HBASE-25881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17350243#comment-17350243 ]
Anoop Sam John commented on HBASE-25881: ---------------------------------------- Do you see it as a 1.x issue only or its applicable in all versions? I feel the latter. Looking forward for the patch > Create a chore to update age related metrics. > --------------------------------------------- > > Key: HBASE-25881 > URL: https://issues.apache.org/jira/browse/HBASE-25881 > Project: HBase > Issue Type: Improvement > Components: Replication > Reporter: Rushabh Shah > Assignee: Rahul Kumar > Priority: Major > > We had a case where logRoller and ReplicationShipper thread were stuck for a > day since some other thread was holding the lock. > We were not rolling the wal for 1 day and we were not shipping any edits for > 1 day. > Still the oldestWalAge and age of last ship metric were not spiking as they > should. > The way we calculate any age related metric is we calculate the diff between > current time and the time at which any event happens and we add that to > metrics Framework. We lose the event timestamp at that point. > If the thread populating the metric is stuck then we will always carry > forward the same value forever. This will make it look like there is no > problem in the system. In this case the oldestWalAge metric was stuck at 809 > value and age of last ship metric was 0 the whole time and no PD alert was > fired. > From Andrew Purtell, > We have the Chore/ScheduledChore framework. We could be making more use of > it. Much of this is legacy, before Chore was formalized as it is today. -- This message was sent by Atlassian Jira (v8.3.4#803005)