Umesh Kumar Kumawat created HBASE-28690:
-------------------------------------------

             Summary: Aborting Active HMaster is not rejecting 
reportRegionStateTransition if procedure is initialised by next Active master
                 Key: HBASE-28690
                 URL: https://issues.apache.org/jira/browse/HBASE-28690
             Project: HBase
          Issue Type: Bug
          Components: proc-v2
    Affects Versions: 2.5.8
            Reporter: Umesh Kumar Kumawat


A {{CloseRegionProcedure on master request the RS to close the region and after 
closing the region RS reports RegionStateTransition 
back([here|https://github.com/apache/hbase/blob/d1015a68ed9f94d74668abd37edefd32f5e9305b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterRpcServices.java#L1853]).
 On recieving the report, master checks if }}regionNode has any procedure 
assigned to it 
([code|https://github.com/apache/hbase/blob/d1015a68ed9f94d74668abd37edefd32f5e9305b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java#L1294]).
 

 
{code:java}
 private boolean reportTransition(RegionStateNode regionNode, ServerStateNode 
serverNode,
    TransitionCode state, long seqId, long procId) throws IOException {
    ServerName serverName = serverNode.getServerName();
    TransitRegionStateProcedure proc = regionNode.getProcedure();
    if (proc == null) {
      return false;
    }
    proc.reportTransition(master.getMasterProcedureExecutor().getEnvironment(), 
regionNode,
      serverName, state, seqId, procId);
    return true;
  } {code}
If regionNode doesn't have any procedure, the master just logs it and doesn't 
throw any error to RPC. 

 

Think of a case when MasterFailover is happening and the new Active master only 
initialized the TRSP and CloseRegionProcedure. Now aborting Master has 
stale/false data. If the transition report comes to the aborting master, not 
rejecting this report is causing the procedure to stuck. 

 

Logs for more understanding 

{{{}2024{}}}{{{}-{}}}{{{}06{}}}{{{}-{}}}{{{}20{}}}{{ 
}}{{{}04{}}}{{{}:{}}}{{{}45{}}}{{{}:{}}}{{{}05{}}}{{{},{}}}{{{}576{}}}{{ 
}}{{ERROR}}{{{} 
[{}}}{{{}iority{}}}{{{}.{}}}{{{}RWQ{}}}{{{}.{}}}{{{}Fifo{}}}{{{}.{}}}{{{}write{}}}{{{}.{}}}{{{}handler{}}}{{{}={}}}{{{}3{}}}{{{},{}}}{{{}queue{}}}{{{}={}}}{{{}0{}}}{{{},{}}}{{{}port{}}}{{{}={}}}{{{}61000{}}}{{{}]
 {}}}{{{}master{}}}{{{}.{}}}{{{}HMaster{}}}{{ }}{{-}}{{ ***** 
}}{*}{{{}ABORT{}}}{{{}ING{}}}{{ }}{{master}}{{{} 
server{}}}{{{}4{}}}{{{}-{}}}{{{}1{}}}{*}{{{},{}}}{{{}61000{}}}{{{},{}}}{{{}1715413775736{}}}{{{}:
 {}}}{{Failed}}{{ }}{{to}}{{ }}{{record}}{{ }}{{region}}{{ }}{{server}}{{ 
}}{{as}}{{ }}{{started}}{{ *****}}

{{{}2024{}}}{{{}-{}}}{{{}06{}}}{{{}-{}}}{{{}20{}}}{{ 
}}{{{}04{}}}{{{}:{}}}{{{}49{}}}{{{}:{}}}{{{}28{}}}{{{},{}}}{{{}893{}}}{{ 
}}{{DEBUG}}{{{} 
[{}}}{{{}aster{}}}{{{}/server{}}}{{{}5{}}}{{{}-{}}}{{{}1:{}}}{{{}61000{}}}{{{}:{}}}{{{}becomeActiveMaster{}}}{{{}]
 {}}}{{{}assignment{}}}{{{}.{}}}{{{}RegionStateStore{}}}{{ }}{{-}}{{ 
}}{{Load}}{{ }}{{{}hbase{}}}{{{}:{}}}{{{}meta{}}}{{ }}{{entry}}{{ 
}}{{{}region{}}}{{{}={}}}{{{}888a715d5926adbb89c985d8967f40d4{}}}{{{}, 
{}}}{{{}regionState{}}}{{{}={}}}{{{}OPEN{}}}{{{}, 
{}}}{{{}lastHost{}}}{{{}=server{}}}{{{}1{}}}{{{}-{}}}{{{}119{}}}{{{},{}}}{{{}61020{}}}{{{},{}}}{{{}1717560166420{}}}{{{},
 
{}}}{{{}regionLocation{}}}{{{}=server{}}}{{{}1{}}}{{{}-{}}}{{{}119{}}}{{{},{}}}{{{}61020{}}}{{{},{}}}{{{}1717560166420{}}}{{{},
 {}}}{{{}openSeqNum{}}}{{{}={}}}{{{}34892620{}}}

{{{}024-06-20 04:49:51,886 INFO [PEWorker-22] procedure2.ProcedureExecutor - 
Initialized subprocedures=[{pid={*}16276416{*}, ppid=16276108, 
state=RUNNABLE:REGION_STATE_TRANSITION_CLOSE; TransitRegionStateProcedure 
table=RIMBS.UPLOADER_JOB_DETAILS, region=888a715d5926adbb89c985d8967f40d4, 
UNASSIGN}]  ({*}on server{*}{}}}{*}{{5-1}}{*}{{{}*)*{}}}

{{2024-06-20 04:49:52,022 INFO [PEWorker-40] procedure2.ProcedureExecutor - 
Initialized subprocedures=[\{pid=16276470, ppid=16276416, state=RUNNABLE; 
CloseRegionProcedure 888a715d5926adbb89c985d8967f40d4, 
server=server1-119,61020,1717560166420}] ({*}on server5-1){*}}}

 

*RS logs for closing* 

{{{}2024{}}}{{{}-{}}}{{{}06{}}}{{{}-{}}}{{{}20{}}}{{ 
}}{{{}04{}}}{{{}:{}}}{{{}49{}}}{{{}:{}}}{{{}52{}}}{{{},{}}}{{{}267{}}}{{ 
}}{{INFO}}{{{} 
[{}}}{{{}_{}}}{{{}REGION{}}}{{{}-{}}}{{{}regionserver{}}}{{{}/server{}}}{{{}1{}}}{{{}-{}}}{{{}119{}}}{{{}:{}}}{{{}61020{}}}{{{}-{}}}{{{}2{}}}{{{}]
 {}}}{{{}handler{}}}{{{}.{}}}{{{}UnassignRegionHandler{}}}{{ }}{{-}}{{ 
}}{{Close}}{{ }}{{888a715d5926adbb89c985d8967f40d4}}

2024-06-20 04:49:52,267 DEBUG [_REGION-regionserver/server1-119:61020-2] 
regionserver.HRegion - Closing 888a715d5926adbb89c985d8967f40d4, disabling 
compactions & flushes

{{{}2024{}}}{{{}-{}}}{{{}06{}}}{{{}-{}}}{{{}20{}}}{{ 
}}{{{}04{}}}{{{}:{}}}{{{}49{}}}{{{}:{}}}{{{}52{}}}{{{},{}}}{{{}354{}}}{{ 
}}{{INFO}}{{{} 
[{}}}{{{}_{}}}{{{}REGION{}}}{{{}-{}}}{{{}regionserver{}}}{{{}/server{}}}{{{}1{}}}{{{}-{}}}{{{}119{}}}{{{}:{}}}{{{}61020{}}}{{{}-{}}}{{{}2{}}}{{{}]
 {}}}{{{}regionserver{}}}{{{}.{}}}{{{}HRegion{}}}{{ }}{{-}}{{ }}{{Closed}}{{{} 
TABLE,{}}}{{{}KW{}}}{{{}\{}}}{{{}x00na240{}}}{{{}-{}}}{{{}app1{}}}{{{}-{}}}{{{}16{}}}{{{}\{}}}{{{}x00{}}}{{{}/{}}}{{{}Events{}}}{{{}-{}}}{{{}120620231740{}}}{{{}\{}}}{{{}x00{}}}{{{}MARKER{}}}{{{}-{}}}{{{}Events{}}}{{{},{}}}{{{}1702619592612{}}}{{{}.{}}}{{{}888a715d5926adbb89c985d8967f40d4.{}}}

 

*Logs on aborting active Hmaster*

{{{}2024{}}}{{{}-{}}}{{{}06{}}}{{{}-{}}}{{{}20{}}}{{ 
}}{{{}04{}}}{{{}:{}}}{{{}49{}}}{{{}:{}}}{{{}52{}}}{{{},{}}}{{{}355{}}}{{ 
}}{{WARN}}{{{} 
[{}}}{{{}iority{}}}{{{}.{}}}{{{}RWQ{}}}{{{}.{}}}{{{}Fifo{}}}{{{}.{}}}{{{}write{}}}{{{}.{}}}{{{}handler{}}}{{{}={}}}{{{}1{}}}{{{},{}}}{{{}queue{}}}{{{}={}}}{{{}0{}}}{{{},{}}}{{{}port{}}}{{{}={}}}{{{}61000{}}}{{{}]
 {}}}{{{}assignment{}}}{{{}.{}}}{{{}AssignmentManager{}}}{{ }}{{-}}{{ 
}}*{{No}}{{ }}{{matching}}{{ }}{{procedure}}{{ }}{{found}}*{{ }}{{for}}{{{} 
server{}}}{{{}1{}}}{{{}-{}}}{{{}119{}}}{{{},{}}}{{{}61020{}}}{{{},{}}}{{{}1717560166420{}}}{{
 }}{{transition}}{{ }}{{on}}{{ }}{{{}state{}}}{{{}={}}}{{{}OPEN{}}}{{{}, 
{}}}{{{}location{}}}{{{}=server{}}}{{{}1{}}}{{{}-{}}}{{{}119{}}}{{{},{}}}{{{}61020{}}}{{{},{}}}{{{}1717560166420{}}}{{{},
 
{}}}{{{}table{}}}{{{}={}}}{{{}RIMBS{}}}{{{}.{}}}{{{}UPLOADER{}}}{{{}_{}}}{{{}JOB{}}}{{{}_{}}}{{{}DETAILS{}}}{{{},
 {}}}{{{}region{}}}{{{}={}}}{{{}888a715d5926adbb89c985d8967f40d4{}}}{{ 
}}{{to}}{{ }}{{CLOSED ( }}host = server{*}4-1{*} , 
{*}hbaseMasterLogFile{*}{{{}){}}}

{{}}

{{}}

{{}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to