[
https://issues.apache.org/jira/browse/ZOOKEEPER-3713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated ZOOKEEPER-3713:
--------------------------------------
Labels: pull-request-available (was: )
> ReadOnlyZooKeeperServer should not expose the uninitialized ZKDatabase to
> client during the snapshot loading.
> -------------------------------------------------------------------------------------------------------------
>
> Key: ZOOKEEPER-3713
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3713
> Project: ZooKeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.6.0, 3.4.14, 3.5.6
> Reporter: Pierre Yin
> Priority: Major
> Labels: pull-request-available
>
> The Follower/Observer may load snapshot from disk or leader in some
> scenarios. During the snapshot loading, the follower/observer may lose the
> connection from leader when the network is broken.In current design,
> follower/observer would switch into ReadOnly mode immediately when the
> network connection from leader is broken. So follower/observer may become
> ReadOnlyZooKeeperServer before the ZKDatabase initialization of snapshot
> loading is finished. The time window between follower/observer ReadOnly
> mode's successful switch and the ZkDatabase's full snapshot loading is
> unsafe.
> The unsafe window may confuse Curator's NodeCache. If NodeCache's underlying
> reconnection hit the unsafe window, it may get NoNode KeeperException for the
> specified path and clear the NodeCache. When the unsafe window is elapsed,
> NodeCache can see the data again.
> This behavior is not correct. From client's view, it gets a null value for a
> short period
> when the server ensemble network is broken. Curator NodeCache is often used
> as configuration's source. Returning null is confusing and introduces logical
> issues for configuration scenario.
> I think the better behavior should be that reject all the reconnecting during
> the unsafe window. NodeCache still keep the old data when reconnection is
> rejected. This behavior makes sense.
> I will send my patch later. Hope someone can help to review it.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)