[ 
https://issues.apache.org/jira/browse/HBASE-26433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Jasani resolved HBASE-26433.
----------------------------------
    Hadoop Flags: Reviewed
      Resolution: Fixed

Thanks for the reviews [~apurtell] [~gjacoby] [~dmanning].

> Rollback from ZK-less to ZK-based assignment could produce inconsistent state 
> - doubly assigned regions
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-26433
>                 URL: https://issues.apache.org/jira/browse/HBASE-26433
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.7.1
>            Reporter: Viraj Jasani
>            Assignee: Viraj Jasani
>            Priority: Major
>             Fix For: 1.7.2
>
>
> By enabling configĀ {_}hbase.assignment.usezk.migrating{_}, we initiate the 
> transition of HBase 1.x cluster from default ZK-based region assignment to 
> ZK-less region assignments. Once the migration is enabled, any subsequent 
> region transition is going to add two additional CQs in meta: info:sn and 
> info:state. The workflow that adds new CQs in meta should be the only 
> workflow reading it (unless it requires coordination among multiple 
> workflows), however that is not the case here. Reading info:sn and info:state 
> to rebuild user region states in RegionStateStore data structure is a hidden 
> bug because it doesn't restrict the usage for only ZK-less region assignment.
> What are the effects?
> After enabling ZK-less migration, if we revert it back, info:state and 
> info:sn are not reverted. Moreover, new active master rebuilds the region 
> states in memory and use this info. So if all regions have consistent info:sn 
> values (i.e. consistent with info:server and info:serverstartcode), nothing 
> goes wrong and this is likely going to happen when we revert the config with 
> rolling restart of masters. However, after this config revert, if any region 
> moves, only info:server and info:serverstartcode get updated but info:sn and 
> info:state values stay the same. Because of the missing condition, subsequent 
> active master restart would try to rebuild regions and assign regions as per 
> info:sn, but those regions are already OPEN on info:server, hence we get 
> doubly assigned regions.
> We need two part fix for this:
>  # Guard reading of info:sn and info:state with proper conditions.
>  # Once active master init is complete, if ZK-based region assignment is 
> enabled and redundant CQs are available in meta (info:sn and info:state), 
> delete them all.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to