Prathyusha created HBASE-28522:
----------------------------------

             Summary: UNASSIGN proc indefinitely stuck on dead rs
                 Key: HBASE-28522
                 URL: https://issues.apache.org/jira/browse/HBASE-28522
             Project: HBase
          Issue Type: Improvement
          Components: proc-v2
            Reporter: Prathyusha


One scenario we noticed in production -

we had DisableTableProc and SCP almost triggered at similar time

2024-03-16 17:59:23,014 INFO [PEWorker-11] procedure.DisableTableProcedure - 
Set <TABLE_NAME> to state=DISABLING

2024-03-16 17:59:15,243 INFO [PEWorker-26] procedure.ServerCrashProcedure - 
Start pid=21592440, state=RUNNABLE:SERVER_CRASH_START, locked=true; 
ServerCrashProcedure 
<regionserver>, splitWal=true, meta=false

DisabeTableProc creates unassign procs, and at this time ASSIGNs of SCP is not 
completed

{{2024-03-16 17:59:23,003 DEBUG [PEWorker-40] procedure2.ProcedureExecutor - 
LOCK_EVENT_WAIT pid=21594220, ppid=21592440, 
state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; 
TransitRegionStateProcedure table=<TABLE_NAME>, region=<regionhash>, ASSIGN}}

{{UNASSIGN created by DisableTableProc is stuck on the dead regionserver and we 
had to manually bypass unassign of }}{{DisableTableProc}}{{ and then do 
ASSIGN.}}

{{If we can break the loop for UNASSIGN procedure to not retry if there is scp 
for that server, we do not need manual intervention}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to