Umesh Kumar Kumawat created HBASE-28690:
-------------------------------------------
Summary: Aborting Active HMaster is not rejecting
reportRegionStateTransition if procedure is initialised by next Active master
Key: HBASE-28690
URL: https://issues.apache.org/jira/browse/HBASE-28690
Project: HBase
Issue Type: Bug
Components: proc-v2
Affects Versions: 2.5.8
Reporter: Umesh Kumar Kumawat
A {{CloseRegionProcedure on master request the RS to close the region and after
closing the region RS reports RegionStateTransition
back([here|https://github.com/apache/hbase/blob/d1015a68ed9f94d74668abd37edefd32f5e9305b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterRpcServices.java#L1853]).
On recieving the report, master checks if }}regionNode has any procedure
assigned to it
([code|https://github.com/apache/hbase/blob/d1015a68ed9f94d74668abd37edefd32f5e9305b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java#L1294]).
{code:java}
private boolean reportTransition(RegionStateNode regionNode, ServerStateNode
serverNode,
TransitionCode state, long seqId, long procId) throws IOException {
ServerName serverName = serverNode.getServerName();
TransitRegionStateProcedure proc = regionNode.getProcedure();
if (proc == null) {
return false;
}
proc.reportTransition(master.getMasterProcedureExecutor().getEnvironment(),
regionNode,
serverName, state, seqId, procId);
return true;
} {code}
If regionNode doesn't have any procedure, the master just logs it and doesn't
throw any error to RPC.
Think of a case when MasterFailover is happening and the new Active master only
initialized the TRSP and CloseRegionProcedure. Now aborting Master has
stale/false data. If the transition report comes to the aborting master, not
rejecting this report is causing the procedure to stuck.
Logs for more understanding
{{{}2024{}}}{{{}-{}}}{{{}06{}}}{{{}-{}}}{{{}20{}}}{{
}}{{{}04{}}}{{{}:{}}}{{{}45{}}}{{{}:{}}}{{{}05{}}}{{{},{}}}{{{}576{}}}{{
}}{{ERROR}}{{{}
[{}}}{{{}iority{}}}{{{}.{}}}{{{}RWQ{}}}{{{}.{}}}{{{}Fifo{}}}{{{}.{}}}{{{}write{}}}{{{}.{}}}{{{}handler{}}}{{{}={}}}{{{}3{}}}{{{},{}}}{{{}queue{}}}{{{}={}}}{{{}0{}}}{{{},{}}}{{{}port{}}}{{{}={}}}{{{}61000{}}}{{{}]
{}}}{{{}master{}}}{{{}.{}}}{{{}HMaster{}}}{{ }}{{-}}{{ *****
}}{*}{{{}ABORT{}}}{{{}ING{}}}{{ }}{{master}}{{{}
server{}}}{{{}4{}}}{{{}-{}}}{{{}1{}}}{*}{{{},{}}}{{{}61000{}}}{{{},{}}}{{{}1715413775736{}}}{{{}:
{}}}{{Failed}}{{ }}{{to}}{{ }}{{record}}{{ }}{{region}}{{ }}{{server}}{{
}}{{as}}{{ }}{{started}}{{ *****}}
{{{}2024{}}}{{{}-{}}}{{{}06{}}}{{{}-{}}}{{{}20{}}}{{
}}{{{}04{}}}{{{}:{}}}{{{}49{}}}{{{}:{}}}{{{}28{}}}{{{},{}}}{{{}893{}}}{{
}}{{DEBUG}}{{{}
[{}}}{{{}aster{}}}{{{}/server{}}}{{{}5{}}}{{{}-{}}}{{{}1:{}}}{{{}61000{}}}{{{}:{}}}{{{}becomeActiveMaster{}}}{{{}]
{}}}{{{}assignment{}}}{{{}.{}}}{{{}RegionStateStore{}}}{{ }}{{-}}{{
}}{{Load}}{{ }}{{{}hbase{}}}{{{}:{}}}{{{}meta{}}}{{ }}{{entry}}{{
}}{{{}region{}}}{{{}={}}}{{{}888a715d5926adbb89c985d8967f40d4{}}}{{{},
{}}}{{{}regionState{}}}{{{}={}}}{{{}OPEN{}}}{{{},
{}}}{{{}lastHost{}}}{{{}=server{}}}{{{}1{}}}{{{}-{}}}{{{}119{}}}{{{},{}}}{{{}61020{}}}{{{},{}}}{{{}1717560166420{}}}{{{},
{}}}{{{}regionLocation{}}}{{{}=server{}}}{{{}1{}}}{{{}-{}}}{{{}119{}}}{{{},{}}}{{{}61020{}}}{{{},{}}}{{{}1717560166420{}}}{{{},
{}}}{{{}openSeqNum{}}}{{{}={}}}{{{}34892620{}}}
{{{}024-06-20 04:49:51,886 INFO [PEWorker-22] procedure2.ProcedureExecutor -
Initialized subprocedures=[{pid={*}16276416{*}, ppid=16276108,
state=RUNNABLE:REGION_STATE_TRANSITION_CLOSE; TransitRegionStateProcedure
table=RIMBS.UPLOADER_JOB_DETAILS, region=888a715d5926adbb89c985d8967f40d4,
UNASSIGN}] ({*}on server{*}{}}}{*}{{5-1}}{*}{{{}*)*{}}}
{{2024-06-20 04:49:52,022 INFO [PEWorker-40] procedure2.ProcedureExecutor -
Initialized subprocedures=[\{pid=16276470, ppid=16276416, state=RUNNABLE;
CloseRegionProcedure 888a715d5926adbb89c985d8967f40d4,
server=server1-119,61020,1717560166420}] ({*}on server5-1){*}}}
*RS logs for closing*
{{{}2024{}}}{{{}-{}}}{{{}06{}}}{{{}-{}}}{{{}20{}}}{{
}}{{{}04{}}}{{{}:{}}}{{{}49{}}}{{{}:{}}}{{{}52{}}}{{{},{}}}{{{}267{}}}{{
}}{{INFO}}{{{}
[{}}}{{{}_{}}}{{{}REGION{}}}{{{}-{}}}{{{}regionserver{}}}{{{}/server{}}}{{{}1{}}}{{{}-{}}}{{{}119{}}}{{{}:{}}}{{{}61020{}}}{{{}-{}}}{{{}2{}}}{{{}]
{}}}{{{}handler{}}}{{{}.{}}}{{{}UnassignRegionHandler{}}}{{ }}{{-}}{{
}}{{Close}}{{ }}{{888a715d5926adbb89c985d8967f40d4}}
2024-06-20 04:49:52,267 DEBUG [_REGION-regionserver/server1-119:61020-2]
regionserver.HRegion - Closing 888a715d5926adbb89c985d8967f40d4, disabling
compactions & flushes
{{{}2024{}}}{{{}-{}}}{{{}06{}}}{{{}-{}}}{{{}20{}}}{{
}}{{{}04{}}}{{{}:{}}}{{{}49{}}}{{{}:{}}}{{{}52{}}}{{{},{}}}{{{}354{}}}{{
}}{{INFO}}{{{}
[{}}}{{{}_{}}}{{{}REGION{}}}{{{}-{}}}{{{}regionserver{}}}{{{}/server{}}}{{{}1{}}}{{{}-{}}}{{{}119{}}}{{{}:{}}}{{{}61020{}}}{{{}-{}}}{{{}2{}}}{{{}]
{}}}{{{}regionserver{}}}{{{}.{}}}{{{}HRegion{}}}{{ }}{{-}}{{ }}{{Closed}}{{{}
TABLE,{}}}{{{}KW{}}}{{{}\{}}}{{{}x00na240{}}}{{{}-{}}}{{{}app1{}}}{{{}-{}}}{{{}16{}}}{{{}\{}}}{{{}x00{}}}{{{}/{}}}{{{}Events{}}}{{{}-{}}}{{{}120620231740{}}}{{{}\{}}}{{{}x00{}}}{{{}MARKER{}}}{{{}-{}}}{{{}Events{}}}{{{},{}}}{{{}1702619592612{}}}{{{}.{}}}{{{}888a715d5926adbb89c985d8967f40d4.{}}}
*Logs on aborting active Hmaster*
{{{}2024{}}}{{{}-{}}}{{{}06{}}}{{{}-{}}}{{{}20{}}}{{
}}{{{}04{}}}{{{}:{}}}{{{}49{}}}{{{}:{}}}{{{}52{}}}{{{},{}}}{{{}355{}}}{{
}}{{WARN}}{{{}
[{}}}{{{}iority{}}}{{{}.{}}}{{{}RWQ{}}}{{{}.{}}}{{{}Fifo{}}}{{{}.{}}}{{{}write{}}}{{{}.{}}}{{{}handler{}}}{{{}={}}}{{{}1{}}}{{{},{}}}{{{}queue{}}}{{{}={}}}{{{}0{}}}{{{},{}}}{{{}port{}}}{{{}={}}}{{{}61000{}}}{{{}]
{}}}{{{}assignment{}}}{{{}.{}}}{{{}AssignmentManager{}}}{{ }}{{-}}{{
}}*{{No}}{{ }}{{matching}}{{ }}{{procedure}}{{ }}{{found}}*{{ }}{{for}}{{{}
server{}}}{{{}1{}}}{{{}-{}}}{{{}119{}}}{{{},{}}}{{{}61020{}}}{{{},{}}}{{{}1717560166420{}}}{{
}}{{transition}}{{ }}{{on}}{{ }}{{{}state{}}}{{{}={}}}{{{}OPEN{}}}{{{},
{}}}{{{}location{}}}{{{}=server{}}}{{{}1{}}}{{{}-{}}}{{{}119{}}}{{{},{}}}{{{}61020{}}}{{{},{}}}{{{}1717560166420{}}}{{{},
{}}}{{{}table{}}}{{{}={}}}{{{}RIMBS{}}}{{{}.{}}}{{{}UPLOADER{}}}{{{}_{}}}{{{}JOB{}}}{{{}_{}}}{{{}DETAILS{}}}{{{},
{}}}{{{}region{}}}{{{}={}}}{{{}888a715d5926adbb89c985d8967f40d4{}}}{{
}}{{to}}{{ }}{{CLOSED ( }}host = server{*}4-1{*} ,
{*}hbaseMasterLogFile{*}{{{}){}}}
{{}}
{{}}
{{}}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)