[ 
https://issues.apache.org/jira/browse/HBASE-28690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17869873#comment-17869873
 ] 

Aman Poonia commented on HBASE-28690:
-------------------------------------


{noformat}
 I was also curious apart from cluster shutdown, in what cases regionNode will 
not have the procedure. Why didn't we throw exception in cases apart from 
cluster shutdown? 
{noformat}


This is to keep the notion of idempotency. If we send a request to master that 
proc has finished and master changes the status but before master could respond 
another request from RS comes for same proc because of some unknown network 
situation we don't want to fail it as we know that we have already cleared the 
state of proc from regionNode. I hope this clears the doubt. 

 

> Aborting Active HMaster is not rejecting reportRegionStateTransition if 
> procedure is initialised by next Active master
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-28690
>                 URL: https://issues.apache.org/jira/browse/HBASE-28690
>             Project: HBase
>          Issue Type: Bug
>          Components: proc-v2
>    Affects Versions: 2.5.8
>            Reporter: Umesh Kumar Kumawat
>            Assignee: Umesh Kumar Kumawat
>            Priority: Major
>              Labels: pull-request-available
>
> A CloseRegionProcedure on master requests the RS to close the region and 
> after closing the region RS reports RegionStateTransition 
> back([here|https://github.com/apache/hbase/blob/d1015a68ed9f94d74668abd37edefd32f5e9305b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterRpcServices.java#L1853]).
>  On receiving the report, the master checks if regionNode has any procedure 
> assigned to it 
> ([code|https://github.com/apache/hbase/blob/d1015a68ed9f94d74668abd37edefd32f5e9305b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java#L1294]).
>  
>  
> {code:java}
>  private boolean reportTransition(RegionStateNode regionNode, ServerStateNode 
> serverNode,
>     TransitionCode state, long seqId, long procId) throws IOException {
>     ServerName serverName = serverNode.getServerName();
>     TransitRegionStateProcedure proc = regionNode.getProcedure();
>     if (proc == null) {
>       return false;
>     }
>     
> proc.reportTransition(master.getMasterProcedureExecutor().getEnvironment(), 
> regionNode,
>       serverName, state, seqId, procId);
>     return true;
>   } {code}
> If regionNode doesn't have any procedure, the master just logs it and doesn't 
> throw any error to RPC. 
>  
> Think of a case when MasterFailover is happening and the new Active master 
> only initialized the TRSP and CloseRegionProcedure. Now aborting Master has 
> stale/false data. If the transition report comes to the aborting master, not 
> rejecting this report is causing the procedure to get stuck. 
>  
> *Logs for more understanding* 
> active master server4-1 failing
> {noformat}
> 2024-06-20 04:45:05,576 ERROR 
> [iority.RWQ.Fifo.write.handler=3,queue=0,port=61000] master.HMaster - ***** 
> ABORTING master server4-1,61000,1715413775736: Failed to record region server 
> as started *****{noformat}
> *logs of new active master server5-1*
>  
> {noformat}
> 2024-06-20 04:49:28,893 DEBUG [aster/server5-1:61000:becomeActiveMaster] 
> assignment.RegionStateStore - Load hbase:meta entry 
> region=888a715d5926adbb89c985d8967f40d4, regionState=OPEN, 
> lastHost=server1-119,61020,1717560166420, 
> regionLocation=server1-119,61020,1717560166420, openSeqNum=34892620
> 024-06-20 04:49:51,886 INFO [PEWorker-22] procedure2.ProcedureExecutor - 
> Initialized subprocedures=[{pid=16276416, ppid=16276108, 
> state=RUNNABLE:REGION_STATE_TRANSITION_CLOSE; TransitRegionStateProcedure 
> table=RIMBS.UPLOADER_JOB_DETAILS, region=888a715d5926adbb89c985d8967f40d4, 
> UNASSIGN}]  (on server5-1)
> 2024-06-20 04:49:52,022 INFO [PEWorker-40] procedure2.ProcedureExecutor - 
> Initialized subprocedures=[{pid=16276470, ppid=16276416, state=RUNNABLE; 
> CloseRegionProcedure 888a715d5926adbb89c985d8967f40d4, 
> server=server1-119,61020,1717560166420}] (on server5-1){noformat}
>  
> *RS logs for closing* 
> {noformat}
> 2024-06-20 04:49:52,267 INFO [_REGION-regionserver/server1-119:61020-2] 
> handler.UnassignRegionHandler - Close 888a715d5926adbb89c985d8967f40d4
> 2024-06-20 04:49:52,267 DEBUG [_REGION-regionserver/server1-119:61020-2] 
> regionserver.HRegion - Closing 888a715d5926adbb89c985d8967f40d4, disabling 
> compactions & flushes
> 2024-06-20 04:49:52,354 INFO [_REGION-regionserver/server1-119:61020-2] 
> regionserver.HRegion - Closed 
> TABLE,KW\x00na240-app1-16\x00/Events-120620231740\x00MARKER-Events,1702619592612.888a715d5926adbb89c985d8967f40d4.
> {noformat}
> *Logs of report on aborting active Hmaster*
> {noformat}
> 2024-06-20 04:49:52,355 WARN 
> [iority.RWQ.Fifo.write.handler=1,queue=0,port=61000] 
> assignment.AssignmentManager - No matching procedure found for 
> server1-119,61020,1717560166420 transition on state=OPEN, 
> location=server1-119,61020,1717560166420, table=RIMBS.UPLOADER_JOB_DETAILS, 
> region=888a715d5926adbb89c985d8967f40d4 to CLOSED ( host = server4-1 , 
> hbaseMasterLogFile){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to