Umesh Kumar Kumawat created HBASE-28690: -------------------------------------------
Summary: Aborting Active HMaster is not rejecting reportRegionStateTransition if procedure is initialised by next Active master Key: HBASE-28690 URL: https://issues.apache.org/jira/browse/HBASE-28690 Project: HBase Issue Type: Bug Components: proc-v2 Affects Versions: 2.5.8 Reporter: Umesh Kumar Kumawat A {{CloseRegionProcedure on master request the RS to close the region and after closing the region RS reports RegionStateTransition back([here|https://github.com/apache/hbase/blob/d1015a68ed9f94d74668abd37edefd32f5e9305b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterRpcServices.java#L1853]). On recieving the report, master checks if }}regionNode has any procedure assigned to it ([code|https://github.com/apache/hbase/blob/d1015a68ed9f94d74668abd37edefd32f5e9305b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java#L1294]). {code:java} private boolean reportTransition(RegionStateNode regionNode, ServerStateNode serverNode, TransitionCode state, long seqId, long procId) throws IOException { ServerName serverName = serverNode.getServerName(); TransitRegionStateProcedure proc = regionNode.getProcedure(); if (proc == null) { return false; } proc.reportTransition(master.getMasterProcedureExecutor().getEnvironment(), regionNode, serverName, state, seqId, procId); return true; } {code} If regionNode doesn't have any procedure, the master just logs it and doesn't throw any error to RPC. Think of a case when MasterFailover is happening and the new Active master only initialized the TRSP and CloseRegionProcedure. Now aborting Master has stale/false data. If the transition report comes to the aborting master, not rejecting this report is causing the procedure to stuck. Logs for more understanding {{{}2024{}}}{{{}-{}}}{{{}06{}}}{{{}-{}}}{{{}20{}}}{{ }}{{{}04{}}}{{{}:{}}}{{{}45{}}}{{{}:{}}}{{{}05{}}}{{{},{}}}{{{}576{}}}{{ }}{{ERROR}}{{{} [{}}}{{{}iority{}}}{{{}.{}}}{{{}RWQ{}}}{{{}.{}}}{{{}Fifo{}}}{{{}.{}}}{{{}write{}}}{{{}.{}}}{{{}handler{}}}{{{}={}}}{{{}3{}}}{{{},{}}}{{{}queue{}}}{{{}={}}}{{{}0{}}}{{{},{}}}{{{}port{}}}{{{}={}}}{{{}61000{}}}{{{}] {}}}{{{}master{}}}{{{}.{}}}{{{}HMaster{}}}{{ }}{{-}}{{ ***** }}{*}{{{}ABORT{}}}{{{}ING{}}}{{ }}{{master}}{{{} server{}}}{{{}4{}}}{{{}-{}}}{{{}1{}}}{*}{{{},{}}}{{{}61000{}}}{{{},{}}}{{{}1715413775736{}}}{{{}: {}}}{{Failed}}{{ }}{{to}}{{ }}{{record}}{{ }}{{region}}{{ }}{{server}}{{ }}{{as}}{{ }}{{started}}{{ *****}} {{{}2024{}}}{{{}-{}}}{{{}06{}}}{{{}-{}}}{{{}20{}}}{{ }}{{{}04{}}}{{{}:{}}}{{{}49{}}}{{{}:{}}}{{{}28{}}}{{{},{}}}{{{}893{}}}{{ }}{{DEBUG}}{{{} [{}}}{{{}aster{}}}{{{}/server{}}}{{{}5{}}}{{{}-{}}}{{{}1:{}}}{{{}61000{}}}{{{}:{}}}{{{}becomeActiveMaster{}}}{{{}] {}}}{{{}assignment{}}}{{{}.{}}}{{{}RegionStateStore{}}}{{ }}{{-}}{{ }}{{Load}}{{ }}{{{}hbase{}}}{{{}:{}}}{{{}meta{}}}{{ }}{{entry}}{{ }}{{{}region{}}}{{{}={}}}{{{}888a715d5926adbb89c985d8967f40d4{}}}{{{}, {}}}{{{}regionState{}}}{{{}={}}}{{{}OPEN{}}}{{{}, {}}}{{{}lastHost{}}}{{{}=server{}}}{{{}1{}}}{{{}-{}}}{{{}119{}}}{{{},{}}}{{{}61020{}}}{{{},{}}}{{{}1717560166420{}}}{{{}, {}}}{{{}regionLocation{}}}{{{}=server{}}}{{{}1{}}}{{{}-{}}}{{{}119{}}}{{{},{}}}{{{}61020{}}}{{{},{}}}{{{}1717560166420{}}}{{{}, {}}}{{{}openSeqNum{}}}{{{}={}}}{{{}34892620{}}} {{{}024-06-20 04:49:51,886 INFO [PEWorker-22] procedure2.ProcedureExecutor - Initialized subprocedures=[{pid={*}16276416{*}, ppid=16276108, state=RUNNABLE:REGION_STATE_TRANSITION_CLOSE; TransitRegionStateProcedure table=RIMBS.UPLOADER_JOB_DETAILS, region=888a715d5926adbb89c985d8967f40d4, UNASSIGN}] ({*}on server{*}{}}}{*}{{5-1}}{*}{{{}*)*{}}} {{2024-06-20 04:49:52,022 INFO [PEWorker-40] procedure2.ProcedureExecutor - Initialized subprocedures=[\{pid=16276470, ppid=16276416, state=RUNNABLE; CloseRegionProcedure 888a715d5926adbb89c985d8967f40d4, server=server1-119,61020,1717560166420}] ({*}on server5-1){*}}} *RS logs for closing* {{{}2024{}}}{{{}-{}}}{{{}06{}}}{{{}-{}}}{{{}20{}}}{{ }}{{{}04{}}}{{{}:{}}}{{{}49{}}}{{{}:{}}}{{{}52{}}}{{{},{}}}{{{}267{}}}{{ }}{{INFO}}{{{} [{}}}{{{}_{}}}{{{}REGION{}}}{{{}-{}}}{{{}regionserver{}}}{{{}/server{}}}{{{}1{}}}{{{}-{}}}{{{}119{}}}{{{}:{}}}{{{}61020{}}}{{{}-{}}}{{{}2{}}}{{{}] {}}}{{{}handler{}}}{{{}.{}}}{{{}UnassignRegionHandler{}}}{{ }}{{-}}{{ }}{{Close}}{{ }}{{888a715d5926adbb89c985d8967f40d4}} 2024-06-20 04:49:52,267 DEBUG [_REGION-regionserver/server1-119:61020-2] regionserver.HRegion - Closing 888a715d5926adbb89c985d8967f40d4, disabling compactions & flushes {{{}2024{}}}{{{}-{}}}{{{}06{}}}{{{}-{}}}{{{}20{}}}{{ }}{{{}04{}}}{{{}:{}}}{{{}49{}}}{{{}:{}}}{{{}52{}}}{{{},{}}}{{{}354{}}}{{ }}{{INFO}}{{{} [{}}}{{{}_{}}}{{{}REGION{}}}{{{}-{}}}{{{}regionserver{}}}{{{}/server{}}}{{{}1{}}}{{{}-{}}}{{{}119{}}}{{{}:{}}}{{{}61020{}}}{{{}-{}}}{{{}2{}}}{{{}] {}}}{{{}regionserver{}}}{{{}.{}}}{{{}HRegion{}}}{{ }}{{-}}{{ }}{{Closed}}{{{} TABLE,{}}}{{{}KW{}}}{{{}\{}}}{{{}x00na240{}}}{{{}-{}}}{{{}app1{}}}{{{}-{}}}{{{}16{}}}{{{}\{}}}{{{}x00{}}}{{{}/{}}}{{{}Events{}}}{{{}-{}}}{{{}120620231740{}}}{{{}\{}}}{{{}x00{}}}{{{}MARKER{}}}{{{}-{}}}{{{}Events{}}}{{{},{}}}{{{}1702619592612{}}}{{{}.{}}}{{{}888a715d5926adbb89c985d8967f40d4.{}}} *Logs on aborting active Hmaster* {{{}2024{}}}{{{}-{}}}{{{}06{}}}{{{}-{}}}{{{}20{}}}{{ }}{{{}04{}}}{{{}:{}}}{{{}49{}}}{{{}:{}}}{{{}52{}}}{{{},{}}}{{{}355{}}}{{ }}{{WARN}}{{{} [{}}}{{{}iority{}}}{{{}.{}}}{{{}RWQ{}}}{{{}.{}}}{{{}Fifo{}}}{{{}.{}}}{{{}write{}}}{{{}.{}}}{{{}handler{}}}{{{}={}}}{{{}1{}}}{{{},{}}}{{{}queue{}}}{{{}={}}}{{{}0{}}}{{{},{}}}{{{}port{}}}{{{}={}}}{{{}61000{}}}{{{}] {}}}{{{}assignment{}}}{{{}.{}}}{{{}AssignmentManager{}}}{{ }}{{-}}{{ }}*{{No}}{{ }}{{matching}}{{ }}{{procedure}}{{ }}{{found}}*{{ }}{{for}}{{{} server{}}}{{{}1{}}}{{{}-{}}}{{{}119{}}}{{{},{}}}{{{}61020{}}}{{{},{}}}{{{}1717560166420{}}}{{ }}{{transition}}{{ }}{{on}}{{ }}{{{}state{}}}{{{}={}}}{{{}OPEN{}}}{{{}, {}}}{{{}location{}}}{{{}=server{}}}{{{}1{}}}{{{}-{}}}{{{}119{}}}{{{},{}}}{{{}61020{}}}{{{},{}}}{{{}1717560166420{}}}{{{}, {}}}{{{}table{}}}{{{}={}}}{{{}RIMBS{}}}{{{}.{}}}{{{}UPLOADER{}}}{{{}_{}}}{{{}JOB{}}}{{{}_{}}}{{{}DETAILS{}}}{{{}, {}}}{{{}region{}}}{{{}={}}}{{{}888a715d5926adbb89c985d8967f40d4{}}}{{ }}{{to}}{{ }}{{CLOSED ( }}host = server{*}4-1{*} , {*}hbaseMasterLogFile{*}{{{}){}}} {{}} {{}} {{}} -- This message was sent by Atlassian Jira (v8.20.10#820010)