[ 
https://issues.apache.org/jira/browse/HBASE-23958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17065191#comment-17065191
 ] 

Nick Dimiduk commented on HBASE-23958:
--------------------------------------

[~ram_krish] give this a spin with the latest branch-2 or branch-2.3. 
HBASE-23984 fixes a minor accounting bug in RIT tracking in the master.

> Balancer keeps balancing indefinitely 
> --------------------------------------
>
>                 Key: HBASE-23958
>                 URL: https://issues.apache.org/jira/browse/HBASE-23958
>             Project: HBase
>          Issue Type: Bug
>          Components: Region Assignment
>    Affects Versions: 2.0.2
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Major
>             Fix For: 2.3.0
>
>
> Before raising this issue - am not sure if this got fixed directly or 
> indirectly in other latest versions of hbase.
> The steps are 
> 1) Create a cluster and create some tables.  (assume we have RS 1,2,3, 4 and 
> 5)
> 2) After the table creation and some ops done, the cluster was restarted. Due 
> to this some regions are in RIT. the RIT in  progress was to be assigned to 
> RS 3.
> 3) After the cluster comes back RS 3 and 4 are stopped.  (RS 3 will have 
> newer timestamp)
> 4) Now the master that comes up sees there are some RIT in place and tries to 
> load the entries to process the procedures again. As part of this the 
> RegionStateStore is populated with the old RS 3 hostname. (older timestamp). 
> This adds to the ServerStateNode creating a RS 3 with old timestamp as one 
> server.
> 5) Now after the master restarts and all regions assigned, the balancer 
> infinitely tries to balance the region to the RS 3 (old timestamp server) 
> thinking it is part of the cluster. 
> 6)the other problem is the MoveProcedure has the target as RS 3 (with old 
> timestamp) but the AM realizes that it is a down server and move it to the 
> one of the active server. But this is not recorded anywhere.
> I will continue to check the latest code if this case is valid. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to