[
https://issues.apache.org/jira/browse/CURATOR-311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15818098#comment-15818098
]
ASF GitHub Bot commented on CURATOR-311:
----------------------------------------
Github user oza commented on the issue:
https://github.com/apache/curator/pull/193
@Randgalt I could create a test case to reproduce the problem by injecting
emulated faulty watcher instead of using pseudo cluster. Could you check it?
> SharedValue could hold stall data when quourm membership changes
> ----------------------------------------------------------------
>
> Key: CURATOR-311
> URL: https://issues.apache.org/jira/browse/CURATOR-311
> Project: Apache Curator
> Issue Type: Bug
> Components: Recipes
> Affects Versions: 3.1.0
> Environment: Linux
> Reporter: Jian Fang
>
> We run a Zookeeper 3.5.1-alpha quorum on EC2 instances and the quorum members
> could be changed, for example, one peer could be replaced by a new EC2
> instance due to EC2 instance termination. We use Apache Curator 3.1.0 as the
> zookeeper client. During our testing, we found the SharedValue data structure
> could hold stall data during and after one peer is replaced and thus led to
> the system failure.
> We look into the SharedValue code. Seems it always returns the value from an
> in-memory reference variable and the value is only updated by a watcher. If
> for any reason, the watch is lost, then the value would never get a chance to
> be updated again.
>
> Right now, we added a connection state listener to force SharedValue to call
> readValue(), i.e., read the data from zookeeper directly, if the connection
> state has been changed to RECONNECTED to work around this issue.
> It would be great if this issue could be fixed in Curator directly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)