Ravi Kishore Valeti created HBASE-28834:
-------------------------------------------
Summary: Procedure queues & PE pool metrics
Key: HBASE-28834
URL: https://issues.apache.org/jira/browse/HBASE-28834
Project: HBase
Issue Type: Improvement
Reporter: Ravi Kishore Valeti
While investigating a production incident, we observed that some procedures are
getting created but never getting executed until a HMaster failover.
- master-2 was active & rs-1 holding meta
- 18:40, bunch of RSs (~80) reported crashed & SCPs were created & being
executed
- 19:51, balancer decided to move Meta region to another RS -> TRSP created ->
Meta region went offline
- 19:52, RS carrying meta crashed -> SCP created
- 19:52 - Both TRSP & SCP seemed stuck/not executing - No more logs related to
these procedures
- 21:09 - Master failed over from master-2 to master-3
- Procs were loaded from store & attached.
- 21:17 - When the TRSP for meta had completed, meta came back online.
I will post the logs in some time.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)