Prathyusha created HBASE-28522:
----------------------------------
Summary: UNASSIGN proc indefinitely stuck on dead rs
Key: HBASE-28522
URL: https://issues.apache.org/jira/browse/HBASE-28522
Project: HBase
Issue Type: Improvement
Components: proc-v2
Reporter: Prathyusha
One scenario we noticed in production -
we had DisableTableProc and SCP almost triggered at similar time
2024-03-16 17:59:23,014 INFO [PEWorker-11] procedure.DisableTableProcedure -
Set <TABLE_NAME> to state=DISABLING
2024-03-16 17:59:15,243 INFO [PEWorker-26] procedure.ServerCrashProcedure -
Start pid=21592440, state=RUNNABLE:SERVER_CRASH_START, locked=true;
ServerCrashProcedure
<regionserver>, splitWal=true, meta=false
DisabeTableProc creates unassign procs, and at this time ASSIGNs of SCP is not
completed
{{2024-03-16 17:59:23,003 DEBUG [PEWorker-40] procedure2.ProcedureExecutor -
LOCK_EVENT_WAIT pid=21594220, ppid=21592440,
state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE;
TransitRegionStateProcedure table=<TABLE_NAME>, region=<regionhash>, ASSIGN}}
{{UNASSIGN created by DisableTableProc is stuck on the dead regionserver and we
had to manually bypass unassign of }}{{DisableTableProc}}{{ and then do
ASSIGN.}}
{{If we can break the loop for UNASSIGN procedure to not retry if there is scp
for that server, we do not need manual intervention}}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)