[ 
https://issues.apache.org/jira/browse/HBASE-24603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Vissapragada updated HBASE-24603:
-----------------------------------------
    Description: 
Here is the method that does a sync() of lagging followers with leader in the 
quorum. We rely on this to see a consistent snapshot of ZK data from multiple 
clients. However the problem is that the underlying sync() call is actually 
asynchronous since we are passing a 'null' call back.  See the ZK API 
[doc|https://zookeeper.apache.org/doc/r3.5.7/apidocs/zookeeper-server/index.html]
 for details. The end-result is that sync() doesn't guarantee that it has 
happened by the time it returns.

{noformat}
  /**
   * Forces a synchronization of this ZooKeeper client connection.
   * <p>
   * Executing this method before running other methods will ensure that the
   * subsequent operations are up-to-date and consistent as of the time that
   * the sync is complete.
   * <p>
   * This is used for compareAndSwap type operations where we need to read the
   * data of an existing node and delete or transition that node, utilizing the
   * previously read version and data.  We want to ensure that the version read
   * is up-to-date from when we begin the operation.
   */
  public void sync(String path) throws KeeperException {
    this.recoverableZooKeeper.sync(path, null, null);
  }
{noformat}

We rely on this heavily (at least in the older branches that do ZK based region 
assignment). In branch-1 we saw weird "BadVersionException" exceptions in RITs 
because of the inconsistent view of the ZK snapshot. It could manifest 
differently in other branches. Either way, this is something we need to fix.

  was:
Here is the method that does as sync() of lagging followers with leaders in the 
quorum. We rely on this to see a consistent snapshot of ZK data from multiple 
clients. However the problem is that the underlying sync() call is actually 
async() since are passing a 'null' call back.  See the ZK API 
[doc|https://zookeeper.apache.org/doc/r3.5.7/apidocs/zookeeper-server/index.html]
 for details. The end-result is that sync() doesn't guarantee that it has 
happened by the time it returns.

{noformat}
  /**
   * Forces a synchronization of this ZooKeeper client connection.
   * <p>
   * Executing this method before running other methods will ensure that the
   * subsequent operations are up-to-date and consistent as of the time that
   * the sync is complete.
   * <p>
   * This is used for compareAndSwap type operations where we need to read the
   * data of an existing node and delete or transition that node, utilizing the
   * previously read version and data.  We want to ensure that the version read
   * is up-to-date from when we begin the operation.
   */
  public void sync(String path) throws KeeperException {
    this.recoverableZooKeeper.sync(path, null, null);
  }
{noformat}

We rely on this heavily (at least in the older branches that do ZK based region 
assignment). In branch-1 we saw weird "BadVersionException" exceptions in RITs 
because of the inconsistent view of the ZK snapshot. It could manifest 
differently in other branches. Either way, this is something we need to fix.


> Zookeeper sync() call is async
> ------------------------------
>
>                 Key: HBASE-24603
>                 URL: https://issues.apache.org/jira/browse/HBASE-24603
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, regionserver
>    Affects Versions: 3.0.0-alpha-1, 2.3.0, 1.7.0
>            Reporter: Bharath Vissapragada
>            Assignee: Bharath Vissapragada
>            Priority: Critical
>
> Here is the method that does a sync() of lagging followers with leader in the 
> quorum. We rely on this to see a consistent snapshot of ZK data from multiple 
> clients. However the problem is that the underlying sync() call is actually 
> asynchronous since we are passing a 'null' call back.  See the ZK API 
> [doc|https://zookeeper.apache.org/doc/r3.5.7/apidocs/zookeeper-server/index.html]
>  for details. The end-result is that sync() doesn't guarantee that it has 
> happened by the time it returns.
> {noformat}
>   /**
>    * Forces a synchronization of this ZooKeeper client connection.
>    * <p>
>    * Executing this method before running other methods will ensure that the
>    * subsequent operations are up-to-date and consistent as of the time that
>    * the sync is complete.
>    * <p>
>    * This is used for compareAndSwap type operations where we need to read the
>    * data of an existing node and delete or transition that node, utilizing 
> the
>    * previously read version and data.  We want to ensure that the version 
> read
>    * is up-to-date from when we begin the operation.
>    */
>   public void sync(String path) throws KeeperException {
>     this.recoverableZooKeeper.sync(path, null, null);
>   }
> {noformat}
> We rely on this heavily (at least in the older branches that do ZK based 
> region assignment). In branch-1 we saw weird "BadVersionException" exceptions 
> in RITs because of the inconsistent view of the ZK snapshot. It could 
> manifest differently in other branches. Either way, this is something we need 
> to fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to