[
https://issues.apache.org/jira/browse/HBASE-28522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Duo Zhang resolved HBASE-28522.
-------------------------------
Fix Version/s: 2.7.0
3.0.0-beta-2
2.6.1
2.5.11
Hadoop Flags: Reviewed
Assignee: Duo Zhang (was: Prathyusha)
Resolution: Fixed
Pushed to all active branches.
Thanks all for helping and reviewing!
> UNASSIGN proc indefinitely stuck on dead rs
> -------------------------------------------
>
> Key: HBASE-28522
> URL: https://issues.apache.org/jira/browse/HBASE-28522
> Project: HBase
> Issue Type: Improvement
> Components: proc-v2, Region Assignment
> Reporter: Prathyusha
> Assignee: Duo Zhang
> Priority: Critical
> Labels: pull-request-available
> Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.11
>
> Attachments: timeline.jpg
>
>
> One scenario we noticed in production -
> we had DisableTableProc and SCP almost triggered at similar time
> 2024-03-16 17:59:23,014 INFO [PEWorker-11] procedure.DisableTableProcedure -
> Set <TABLE_NAME> to state=DISABLING
> 2024-03-16 17:59:15,243 INFO [PEWorker-26] procedure.ServerCrashProcedure -
> Start pid=21592440, state=RUNNABLE:SERVER_CRASH_START, locked=true;
> ServerCrashProcedure
> <regionserver>, splitWal=true, meta=false
> DisabeTableProc creates unassign procs, and at this time ASSIGNs of SCP is
> not completed
> {{2024-03-16 17:59:23,003 DEBUG [PEWorker-40] procedure2.ProcedureExecutor -
> LOCK_EVENT_WAIT pid=21594220, ppid=21592440,
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE;
> TransitRegionStateProcedure table=<TABLE_NAME>, region=<regionhash>, ASSIGN}}
> UNASSIGN created by DisableTableProc is stuck on the dead regionserver and we
> had to manually bypass unassign of DisableTableProc and then do ASSIGN.
> If we can break the loop for UNASSIGN procedure to not retry if there is scp
> for that server, we do not need manual intervention?, at least the
> DisableTableProc can go to a rollback state?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)