[ 
https://issues.apache.org/jira/browse/HBASE-29660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chaijunjie reassigned HBASE-29660:
----------------------------------

    Assignee: chaijunjie

> submittedRegionProcedures data leak in HRegionServer when region open failed.
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-29660
>                 URL: https://issues.apache.org/jira/browse/HBASE-29660
>             Project: HBase
>          Issue Type: Bug
>          Components: Region Assignment
>    Affects Versions: 4.0.0-alpha-1
>            Reporter: chaijunjie
>            Assignee: chaijunjie
>            Priority: Major
>              Labels: pull-request-available
>
> There are 2 cache/map to track region Procedure in HRegionServer...
> https://github.com/apache/hbase/blob/e57552521e2c228e8ec8d09ebef4657ba8a2b1d9/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L268C3-L268C97
> and
> https://github.com/apache/hbase/blob/e57552521e2c228e8ec8d09ebef4657ba8a2b1d9/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L273
> When RS want to submit same region procedure, will ignore it.
> https://github.com/apache/hbase/blob/e57552521e2c228e8ec8d09ebef4657ba8a2b1d9/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L3596C1-L3596C47
> But the executedRegionProcedures is a cache object, it will clean itself, but 
> submittedRegionProcedures will not, so when the region open failed on RS, it 
> will just return, see
> https://github.com/apache/hbase/blob/e57552521e2c228e8ec8d09ebef4657ba8a2b1d9/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/AssignRegionHandler.java#L146
> Then the rs.finishRegionProcedure never called in this RS for this region...
> Some times, the "MasterData" dir lost data in HDFS...I try to fix it, and 
> recreate a master region(Restart HMaster), but after that some region could 
> not open...(tigger by balancer/SCP)...just found these logs....then we need 
> restart many RegionServers...
> I think we could call finishRegionProcedure in cleanUpAndReportFailure method 
> after report it to master succeed..And also could set 
> submittedRegionProcedures as a cache object not a map to avoid...



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to