[ https://issues.apache.org/jira/browse/HBASE-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995242#comment-13995242 ]
Lars Hofhansl edited comment on HBASE-11143 at 5/12/14 5:22 PM: ---------------------------------------------------------------- \+1, and can we get the new metric in 0.96+? was (Author: jdcryans): +1, and can we get the new metric in 0.96+? > ageOfLastShippedOp metric is confusing > -------------------------------------- > > Key: HBASE-11143 > URL: https://issues.apache.org/jira/browse/HBASE-11143 > Project: HBase > Issue Type: Bug > Components: Replication > Reporter: Lars Hofhansl > Assignee: Lars Hofhansl > Fix For: 0.94.20 > > Attachments: 11143-0.94-v2.txt, 11143-0.94.txt > > > We are trying to report on replication lag and find that there is no good > single metric to do that. > ageOfLastShippedOp is close, but unfortunately it is increased even when > there is nothing to ship on a particular RegionServer. > I would like discuss a few options here: > Add a new metric: replicationQueueTime (or something) with the above meaning. > I.e. if we have something to ship we set the age of that last shipped edit, > if we fail we increment that last time (just like we do now). But if there is > nothing to replicate we set it to current time (and hence that metric is > reported to close to 0). > Alternatively we could change the meaning of ageOfLastShippedOp to mean to do > that. That might lead to surprises, but the current behavior is clearly weird > when there is nothing to replicate. > Comments? [~jdcryans], [~stack]. > If approach sounds good, I'll make a patch for all branches. -- This message was sent by Atlassian JIRA (v6.2#6252)