[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13804344#comment-13804344
 ] 

Germán Blanco commented on ZOOKEEPER-1777:
------------------------------------------

I have received a different suggestion that has less impact. The idea would be 
to reserve some bits of the zxid for sanity check (e.g. 12 bits).
That means that the zxid will rollover more often, but the remaining space for 
zxid+epoch (51 bits) still should last for more than one hundred years. 
This sanity check will be calculated by the leader when increasing the zxid and 
it can be e.g. a random number.
When a Follower connects to a leader or a client connects to a server, the 
leader and the server will only check if they see this zxid in their 
transaction history. If it is not there, then there is a warning and an snap is 
sent to the follower or the client connection is closed.
There is no need to modify any protocol or storage with this, as far as I see. 
And most likely the biggest impact will be on the test cases. However if this 
is a configuration option, it will also be possible to decide to avoid the 
failures in some of the test cases.
Any comments or opinions?

> Missing ephemeral nodes in one of the members of the ensemble
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1777
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1777
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.4.5
>         Environment: Linux, Java 1.7
>            Reporter: Germán Blanco
>            Assignee: Germán Blanco
>            Priority: Critical
>             Fix For: 3.5.0
>
>         Attachments: logs_trunk.tar.gz, snaps.tar, ZOOKEEPER-1777-3.4.patch, 
> ZOOKEEPER-1777.patch, ZOOKEEPER-1777.patch, ZOOKEEPER-1777.tar.gz
>
>
> In a 3-servers ensemble, one of the followers doesn't see part of the 
> ephemeral nodes that are present in the leader and the other follower. 
> The 8 missing nodes in "the follower that is not ok" were created in the end 
> of epoch 1, the ensemble is running in epoch 2.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to