[
https://issues.apache.org/jira/browse/CURATOR-311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15815503#comment-15815503
]
ASF GitHub Bot commented on CURATOR-311:
----------------------------------------
Github user oza commented on a diff in the pull request:
https://github.com/apache/curator/pull/193#discussion_r95409077
--- Diff:
curator-recipes/src/main/java/org/apache/curator/framework/recipes/shared/SharedValue.java
---
@@ -75,14 +75,41 @@ public void process(WatchedEvent event) throws Exception
@Override
public void stateChanged(CuratorFramework client, ConnectionState
newState)
{
+ handleStateChange(newState);
notifyListenerOfStateChanged(newState);
}
};
+ private void handleStateChange(ConnectionState newState) {
+ // LOST: close should be called from user-defined listener
+ // CONNECTED: nothing to do
+ switch ( newState )
+ {
+ case SUSPENDED:
+ state.compareAndSet(State.STARTED, State.STALE);
+ break;
+
+ case RECONNECTED:
+ // update currentValue
+ try {
+ state.compareAndSet(State.STALE, State.STARTING);
+ startInternal();
+ state.compareAndSet(State.STARTING, State.STARTED);
+ } catch (Exception e) {
+ ThreadUtils.checkInterrupted(e);
+ log.error("Could not re-start instances after
reconnection", e);
--- End diff --
I would like to know whether this error handling is correct. Please let me
know if you know a better way.
> SharedValue could hold stall data when quourm membership changes
> ----------------------------------------------------------------
>
> Key: CURATOR-311
> URL: https://issues.apache.org/jira/browse/CURATOR-311
> Project: Apache Curator
> Issue Type: Bug
> Components: Recipes
> Affects Versions: 3.1.0
> Environment: Linux
> Reporter: Jian Fang
>
> We run a Zookeeper 3.5.1-alpha quorum on EC2 instances and the quorum members
> could be changed, for example, one peer could be replaced by a new EC2
> instance due to EC2 instance termination. We use Apache Curator 3.1.0 as the
> zookeeper client. During our testing, we found the SharedValue data structure
> could hold stall data during and after one peer is replaced and thus led to
> the system failure.
> We look into the SharedValue code. Seems it always returns the value from an
> in-memory reference variable and the value is only updated by a watcher. If
> for any reason, the watch is lost, then the value would never get a chance to
> be updated again.
>
> Right now, we added a connection state listener to force SharedValue to call
> readValue(), i.e., read the data from zookeeper directly, if the connection
> state has been changed to RECONNECTED to work around this issue.
> It would be great if this issue could be fixed in Curator directly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)