[jira] [Commented] (CURATOR-311) SharedValue could hold stall data when quourm membership changes

ASF GitHub Bot (JIRA) Tue, 10 Jan 2017 09:02:49 -0800

    [ 
https://issues.apache.org/jira/browse/CURATOR-311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15815503#comment-15815503
 ]


ASF GitHub Bot commented on CURATOR-311:
----------------------------------------

Github user oza commented on a diff in the pull request:

    https://github.com/apache/curator/pull/193#discussion_r95409077
  
    --- Diff: 
curator-recipes/src/main/java/org/apache/curator/framework/recipes/shared/SharedValue.java
 ---
    @@ -75,14 +75,41 @@ public void process(WatchedEvent event) throws Exception
             @Override
             public void stateChanged(CuratorFramework client, ConnectionState 
newState)
             {
    +            handleStateChange(newState);
                 notifyListenerOfStateChanged(newState);
             }
         };
     
    +    private void handleStateChange(ConnectionState newState) {
    +        // LOST: close should be called from user-defined listener
    +        // CONNECTED: nothing to do
    +        switch ( newState )
    +        {
    +            case SUSPENDED:
    +                state.compareAndSet(State.STARTED, State.STALE);
    +                break;
    +
    +            case RECONNECTED:
    +                // update currentValue
    +                try {
    +                    state.compareAndSet(State.STALE, State.STARTING);
    +                    startInternal();
    +                    state.compareAndSet(State.STARTING, State.STARTED);
    +                } catch (Exception e) {
    +                    ThreadUtils.checkInterrupted(e);
    +                    log.error("Could not re-start instances after 
reconnection", e);
    --- End diff --
    
    I would like to know whether this error handling is correct. Please let me 
know if you know a better way.


> SharedValue could hold stall data when quourm membership changes
> ----------------------------------------------------------------
>
>                 Key: CURATOR-311
>                 URL: https://issues.apache.org/jira/browse/CURATOR-311
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Recipes
>    Affects Versions: 3.1.0
>         Environment: Linux
>            Reporter: Jian Fang
>
> We run a Zookeeper 3.5.1-alpha quorum on EC2 instances and the quorum members 
> could be changed, for example, one peer could be replaced by a new EC2 
> instance due to EC2 instance termination. We use Apache Curator 3.1.0 as the 
> zookeeper client. During our testing, we found the SharedValue data structure 
> could hold stall data during and after one peer is replaced and thus led to 
> the system failure. 
> We look into the SharedValue code. Seems it always returns the value from an 
> in-memory reference variable and the value is only updated by a watcher. If 
> for any reason, the watch is lost, then the value would never get a chance to 
> be updated again.
>  
> Right now, we added a connection state listener to force SharedValue to call 
> readValue(), i.e., read the data from zookeeper directly, if the connection 
> state has been changed to RECONNECTED to work around this issue.
> It would be great if this issue could be fixed in Curator directly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CURATOR-311) SharedValue could hold stall data when quourm membership changes

Reply via email to