[ 
https://issues.apache.org/jira/browse/HBASE-25881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17350243#comment-17350243
 ] 

Anoop Sam John commented on HBASE-25881:
----------------------------------------

Do you see it as a 1.x issue only or its applicable in all versions?  I feel 
the latter.  Looking forward for the patch

> Create a chore to update age related metrics.
> ---------------------------------------------
>
>                 Key: HBASE-25881
>                 URL: https://issues.apache.org/jira/browse/HBASE-25881
>             Project: HBase
>          Issue Type: Improvement
>          Components: Replication
>            Reporter: Rushabh Shah
>            Assignee: Rahul Kumar
>            Priority: Major
>
> We had a case where logRoller and ReplicationShipper thread were stuck for a 
> day since some other thread was holding the lock.
> We were not rolling the wal for 1 day and we were not shipping any edits for 
> 1 day.
> Still the oldestWalAge and age of last ship metric were not spiking as they 
> should.
> The way we calculate any age related metric is we calculate the diff between 
> current time and the time at which any event happens and we add that to 
> metrics Framework. We lose the event timestamp at that point.
> If the thread populating the metric is stuck then we will always carry 
> forward the same value forever. This will make it look like there is no 
> problem in the system. In this case the oldestWalAge metric was stuck at 809 
> value and age of last ship metric was 0 the whole time and no PD alert was 
> fired.
> From Andrew Purtell,
> We have the Chore/ScheduledChore framework. We could be making more use of 
> it. Much of this is legacy, before Chore was formalized as it is today.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to