[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ZOOKEEPER-3713:
--------------------------------------
    Labels: pull-request-available  (was: )

> ReadOnlyZooKeeperServer should not expose the uninitialized ZKDatabase to 
> client during the snapshot loading.
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-3713
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3713
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.6.0, 3.4.14, 3.5.6
>            Reporter: Pierre Yin
>            Priority: Major
>              Labels: pull-request-available
>
> The Follower/Observer may load snapshot from disk or leader in some 
> scenarios. During the snapshot loading, the follower/observer may lose the 
> connection from leader when the network is broken.In current design, 
> follower/observer would switch into ReadOnly mode immediately when the 
> network connection from leader is broken. So follower/observer may become 
> ReadOnlyZooKeeperServer before the ZKDatabase initialization of snapshot 
> loading is finished. The time window between follower/observer ReadOnly 
> mode's successful switch and the ZkDatabase's full snapshot loading is 
> unsafe. 
> The unsafe window may confuse Curator's NodeCache. If NodeCache's underlying 
> reconnection hit the unsafe window, it may get NoNode KeeperException for the 
> specified path and clear the NodeCache. When the unsafe window is elapsed, 
> NodeCache can see the data again.
> This behavior is not correct. From client's view, it gets a null value for a 
> short period 
> when the server ensemble network is broken. Curator NodeCache is often used 
> as configuration's source. Returning null is confusing and introduces logical 
> issues  for configuration scenario.
> I think the better behavior should be that reject all the reconnecting during 
> the unsafe window. NodeCache still keep the old data when reconnection is 
> rejected. This behavior makes sense.
> I will send my patch later. Hope someone can help to review it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to