Viraj Jasani created HBASE-29263:
------------------------------------

             Summary: Metrics for long running procedures
                 Key: HBASE-29263
                 URL: https://issues.apache.org/jira/browse/HBASE-29263
             Project: HBase
          Issue Type: Improvement
    Affects Versions: 2.6.2, 2.5.11, 3.0.0-beta-1
            Reporter: Viraj Jasani


As of today, the procedure metrics we have include:
 * 
SubmittedCount: Counter
 * 
Time: Histogram
 * 
FailedCount: Counter

While the SubmittedCount is updated when the given procedure is submitted for 
execution, the Time histogram and FailedCount metrics are updated upon the 
termination of the procedures.

With recent incidents like HBASE-29251, we have realized that we don't have 
metrics to indicate long running or stuck procedures on which we can create 
alerts.

The purpose of this Jira is to introduce metrics for long running procedures. 
One possible way to introduce such metric is by a chore that can periodically 
look into how many procedures are currently being executed and have exceeded 
certain amount of configurable time duration.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to