[
https://issues.apache.org/jira/browse/ZOOKEEPER-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13786083#comment-13786083
]
Flavio Junqueira commented on ZOOKEEPER-1777:
---------------------------------------------
Ok, thanks for the clarification. Here are my current thoughts then. In the A,
B, C scenario above, I would have expected that A truncates its history and
adopts the state of B, C, losing some transactions. But, according to you
observations, A is not getting some of the new transactions of B, C, which does
not correspond to my expectation, so that's the only thing I believe we would
need to fix if anything.
The thing that throws me is that in any case your data is broken! I'm not
exactly sure why it matters to have a consistent ensemble with a broken state.
One point you made is that we could try to detect the problem so that A decides
not to make further progress. Unfortunately, A can't determine whether the
transactions it has and B, C don't have been committed or not. The txn log only
contains accepted txns. A cannot distinguish this run from one in which A is
the only one that has accepted these txns.
Given that we are discussing an invalid scenario, I'd like to propose that we
lower the severity of this issue to major or critical. This means that it is
not a blocker for 3.4.6, but we could still try to get this in if we can
converge to a solution.
> Missing ephemeral nodes in one of the members of the ensemble
> -------------------------------------------------------------
>
> Key: ZOOKEEPER-1777
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1777
> Project: ZooKeeper
> Issue Type: Bug
> Components: quorum
> Affects Versions: 3.4.5
> Environment: Linux, Java 1.7
> Reporter: Germán Blanco
> Assignee: Germán Blanco
> Priority: Blocker
> Fix For: 3.4.6, 3.5.0
>
> Attachments: logs_trunk.tar.gz, snaps.tar, ZOOKEEPER-1777.tar.gz
>
>
> In a 3-servers ensemble, one of the followers doesn't see part of the
> ephemeral nodes that are present in the leader and the other follower.
> The 8 missing nodes in "the follower that is not ok" were created in the end
> of epoch 1, the ensemble is running in epoch 2.
--
This message was sent by Atlassian JIRA
(v6.1#6144)