[ https://issues.apache.org/jira/browse/HBASE-26433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Viraj Jasani resolved HBASE-26433. ---------------------------------- Hadoop Flags: Reviewed Resolution: Fixed Thanks for the reviews [~apurtell] [~gjacoby] [~dmanning]. > Rollback from ZK-less to ZK-based assignment could produce inconsistent state > - doubly assigned regions > ------------------------------------------------------------------------------------------------------- > > Key: HBASE-26433 > URL: https://issues.apache.org/jira/browse/HBASE-26433 > Project: HBase > Issue Type: Bug > Affects Versions: 1.7.1 > Reporter: Viraj Jasani > Assignee: Viraj Jasani > Priority: Major > Fix For: 1.7.2 > > > By enabling configĀ {_}hbase.assignment.usezk.migrating{_}, we initiate the > transition of HBase 1.x cluster from default ZK-based region assignment to > ZK-less region assignments. Once the migration is enabled, any subsequent > region transition is going to add two additional CQs in meta: info:sn and > info:state. The workflow that adds new CQs in meta should be the only > workflow reading it (unless it requires coordination among multiple > workflows), however that is not the case here. Reading info:sn and info:state > to rebuild user region states in RegionStateStore data structure is a hidden > bug because it doesn't restrict the usage for only ZK-less region assignment. > What are the effects? > After enabling ZK-less migration, if we revert it back, info:state and > info:sn are not reverted. Moreover, new active master rebuilds the region > states in memory and use this info. So if all regions have consistent info:sn > values (i.e. consistent with info:server and info:serverstartcode), nothing > goes wrong and this is likely going to happen when we revert the config with > rolling restart of masters. However, after this config revert, if any region > moves, only info:server and info:serverstartcode get updated but info:sn and > info:state values stay the same. Because of the missing condition, subsequent > active master restart would try to rebuild regions and assign regions as per > info:sn, but those regions are already OPEN on info:server, hence we get > doubly assigned regions. > We need two part fix for this: > # Guard reading of info:sn and info:state with proper conditions. > # Once active master init is complete, if ZK-based region assignment is > enabled and redundant CQs are available in meta (info:sn and info:state), > delete them all. -- This message was sent by Atlassian Jira (v8.20.1#820001)