Viraj Jasani created HBASE-29263: ------------------------------------ Summary: Metrics for long running procedures Key: HBASE-29263 URL: https://issues.apache.org/jira/browse/HBASE-29263 Project: HBase Issue Type: Improvement Affects Versions: 2.6.2, 2.5.11, 3.0.0-beta-1 Reporter: Viraj Jasani
As of today, the procedure metrics we have include: * SubmittedCount: Counter * Time: Histogram * FailedCount: Counter While the SubmittedCount is updated when the given procedure is submitted for execution, the Time histogram and FailedCount metrics are updated upon the termination of the procedures. With recent incidents like HBASE-29251, we have realized that we don't have metrics to indicate long running or stuck procedures on which we can create alerts. The purpose of this Jira is to introduce metrics for long running procedures. One possible way to introduce such metric is by a chore that can periodically look into how many procedures are currently being executed and have exceeded certain amount of configurable time duration. -- This message was sent by Atlassian Jira (v8.20.10#820010)