Viraj Jasani created HBASE-29263:
------------------------------------
Summary: Metrics for long running procedures
Key: HBASE-29263
URL: https://issues.apache.org/jira/browse/HBASE-29263
Project: HBase
Issue Type: Improvement
Affects Versions: 2.6.2, 2.5.11, 3.0.0-beta-1
Reporter: Viraj Jasani
As of today, the procedure metrics we have include:
*
SubmittedCount: Counter
*
Time: Histogram
*
FailedCount: Counter
While the SubmittedCount is updated when the given procedure is submitted for
execution, the Time histogram and FailedCount metrics are updated upon the
termination of the procedures.
With recent incidents like HBASE-29251, we have realized that we don't have
metrics to indicate long running or stuck procedures on which we can create
alerts.
The purpose of this Jira is to introduce metrics for long running procedures.
One possible way to introduce such metric is by a chore that can periodically
look into how many procedures are currently being executed and have exceeded
certain amount of configurable time duration.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)