[
https://issues.apache.org/jira/browse/HBASE-26433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Viraj Jasani resolved HBASE-26433.
----------------------------------
Hadoop Flags: Reviewed
Resolution: Fixed
Thanks for the reviews [~apurtell] [~gjacoby] [~dmanning].
> Rollback from ZK-less to ZK-based assignment could produce inconsistent state
> - doubly assigned regions
> -------------------------------------------------------------------------------------------------------
>
> Key: HBASE-26433
> URL: https://issues.apache.org/jira/browse/HBASE-26433
> Project: HBase
> Issue Type: Bug
> Affects Versions: 1.7.1
> Reporter: Viraj Jasani
> Assignee: Viraj Jasani
> Priority: Major
> Fix For: 1.7.2
>
>
> By enabling configĀ {_}hbase.assignment.usezk.migrating{_}, we initiate the
> transition of HBase 1.x cluster from default ZK-based region assignment to
> ZK-less region assignments. Once the migration is enabled, any subsequent
> region transition is going to add two additional CQs in meta: info:sn and
> info:state. The workflow that adds new CQs in meta should be the only
> workflow reading it (unless it requires coordination among multiple
> workflows), however that is not the case here. Reading info:sn and info:state
> to rebuild user region states in RegionStateStore data structure is a hidden
> bug because it doesn't restrict the usage for only ZK-less region assignment.
> What are the effects?
> After enabling ZK-less migration, if we revert it back, info:state and
> info:sn are not reverted. Moreover, new active master rebuilds the region
> states in memory and use this info. So if all regions have consistent info:sn
> values (i.e. consistent with info:server and info:serverstartcode), nothing
> goes wrong and this is likely going to happen when we revert the config with
> rolling restart of masters. However, after this config revert, if any region
> moves, only info:server and info:serverstartcode get updated but info:sn and
> info:state values stay the same. Because of the missing condition, subsequent
> active master restart would try to rebuild regions and assign regions as per
> info:sn, but those regions are already OPEN on info:server, hence we get
> doubly assigned regions.
> We need two part fix for this:
> # Guard reading of info:sn and info:state with proper conditions.
> # Once active master init is complete, if ZK-based region assignment is
> enabled and redundant CQs are available in meta (info:sn and info:state),
> delete them all.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)