[ https://issues.apache.org/jira/browse/HBASE-24603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bharath Vissapragada updated HBASE-24603: ----------------------------------------- Resolution: Fixed Status: Resolved (was: Patch Available) > Zookeeper sync() call is async > ------------------------------ > > Key: HBASE-24603 > URL: https://issues.apache.org/jira/browse/HBASE-24603 > Project: HBase > Issue Type: Improvement > Components: master, regionserver > Affects Versions: 3.0.0-alpha-1, 2.3.0, 1.7.0 > Reporter: Bharath Vissapragada > Assignee: Bharath Vissapragada > Priority: Critical > Fix For: 3.0.0-alpha-1, 2.3.1, 1.7.0, 2.4.0 > > > Here is the method that does a sync() of lagging followers with leader in the > quorum. We rely on this to see a consistent snapshot of ZK data from multiple > clients. However the problem is that the underlying sync() call is actually > asynchronous since we are passing a 'null' call back. See the ZK API > [doc|https://zookeeper.apache.org/doc/r3.5.7/apidocs/zookeeper-server/index.html] > for details. The end-result is that sync() doesn't guarantee that it has > happened by the time it returns. > {noformat} > /** > * Forces a synchronization of this ZooKeeper client connection. > * <p> > * Executing this method before running other methods will ensure that the > * subsequent operations are up-to-date and consistent as of the time that > * the sync is complete. > * <p> > * This is used for compareAndSwap type operations where we need to read the > * data of an existing node and delete or transition that node, utilizing > the > * previously read version and data. We want to ensure that the version > read > * is up-to-date from when we begin the operation. > */ > public void sync(String path) throws KeeperException { > this.recoverableZooKeeper.sync(path, null, null); > } > {noformat} > We rely on this heavily (at least in the older branches that do ZK based > region assignment). In branch-1 we saw weird "BadVersionException" exceptions > in RITs because of the inconsistent view of the ZK snapshot. It could > manifest differently in other branches. Either way, this is something we need > to fix. -- This message was sent by Atlassian Jira (v8.3.4#803005)