[jira] [Updated] (ZOOKEEPER-1777) Missing ephemeral nodes in one of the members of the ensemble
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tamas Penzes updated ZOOKEEPER-1777: Fix Version/s: (was: 3.5.5) > Missing ephemeral nodes in one of the members of the ensemble > - > > Key: ZOOKEEPER-1777 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1777 > Project: ZooKeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.4.5 > Environment: Linux, Java 1.7 >Reporter: Germán Blanco >Assignee: Germán Blanco >Priority: Critical > Fix For: 3.6.0 > > Attachments: ZOOKEEPER-1777-3.4.patch, ZOOKEEPER-1777.patch, > ZOOKEEPER-1777.patch, ZOOKEEPER-1777.tar.gz, logs_trunk.tar.gz, snaps.tar > > > In a 3-servers ensemble, one of the followers doesn't see part of the > ephemeral nodes that are present in the leader and the other follower. > The 8 missing nodes in "the follower that is not ok" were created in the end > of epoch 1, the ensemble is running in epoch 2. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ZOOKEEPER-1777) Missing ephemeral nodes in one of the members of the ensemble
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tamas Penzes updated ZOOKEEPER-1777: Priority: Major (was: Critical) > Missing ephemeral nodes in one of the members of the ensemble > - > > Key: ZOOKEEPER-1777 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1777 > Project: ZooKeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.4.5 > Environment: Linux, Java 1.7 >Reporter: Germán Blanco >Assignee: Germán Blanco >Priority: Major > Fix For: 3.6.0 > > Attachments: ZOOKEEPER-1777-3.4.patch, ZOOKEEPER-1777.patch, > ZOOKEEPER-1777.patch, ZOOKEEPER-1777.tar.gz, logs_trunk.tar.gz, snaps.tar > > > In a 3-servers ensemble, one of the followers doesn't see part of the > ephemeral nodes that are present in the leader and the other follower. > The 8 missing nodes in "the follower that is not ok" were created in the end > of epoch 1, the ensemble is running in epoch 2. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ZOOKEEPER-1777) Missing ephemeral nodes in one of the members of the ensemble
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Han updated ZOOKEEPER-1777: --- Fix Version/s: (was: 3.5.3) 3.5.4 > Missing ephemeral nodes in one of the members of the ensemble > - > > Key: ZOOKEEPER-1777 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1777 > Project: ZooKeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.4.5 > Environment: Linux, Java 1.7 >Reporter: Germán Blanco >Assignee: Germán Blanco >Priority: Critical > Fix For: 3.5.4, 3.6.0 > > Attachments: logs_trunk.tar.gz, snaps.tar, ZOOKEEPER-1777-3.4.patch, > ZOOKEEPER-1777.patch, ZOOKEEPER-1777.patch, ZOOKEEPER-1777.tar.gz > > > In a 3-servers ensemble, one of the followers doesn't see part of the > ephemeral nodes that are present in the leader and the other follower. > The 8 missing nodes in "the follower that is not ok" were created in the end > of epoch 1, the ensemble is running in epoch 2. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (ZOOKEEPER-1777) Missing ephemeral nodes in one of the members of the ensemble
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated ZOOKEEPER-1777: - Fix Version/s: (was: 3.5.2) 3.5.3 > Missing ephemeral nodes in one of the members of the ensemble > - > > Key: ZOOKEEPER-1777 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1777 > Project: ZooKeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.4.5 > Environment: Linux, Java 1.7 >Reporter: Germán Blanco >Assignee: Germán Blanco >Priority: Critical > Fix For: 3.6.0, 3.5.3 > > Attachments: ZOOKEEPER-1777-3.4.patch, ZOOKEEPER-1777.patch, > ZOOKEEPER-1777.patch, ZOOKEEPER-1777.tar.gz, logs_trunk.tar.gz, snaps.tar > > > In a 3-servers ensemble, one of the followers doesn't see part of the > ephemeral nodes that are present in the leader and the other follower. > The 8 missing nodes in "the follower that is not ok" were created in the end > of epoch 1, the ensemble is running in epoch 2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-1777) Missing ephemeral nodes in one of the members of the ensemble
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Germán Blanco updated ZOOKEEPER-1777: - Fix Version/s: (was: 3.4.6) > Missing ephemeral nodes in one of the members of the ensemble > - > > Key: ZOOKEEPER-1777 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1777 > Project: ZooKeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.4.5 > Environment: Linux, Java 1.7 >Reporter: Germán Blanco >Assignee: Germán Blanco >Priority: Critical > Fix For: 3.5.0 > > Attachments: logs_trunk.tar.gz, snaps.tar, ZOOKEEPER-1777-3.4.patch, > ZOOKEEPER-1777.patch, ZOOKEEPER-1777.patch, ZOOKEEPER-1777.tar.gz > > > In a 3-servers ensemble, one of the followers doesn't see part of the > ephemeral nodes that are present in the leader and the other follower. > The 8 missing nodes in "the follower that is not ok" were created in the end > of epoch 1, the ensemble is running in epoch 2. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (ZOOKEEPER-1777) Missing ephemeral nodes in one of the members of the ensemble
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-1777: Priority: Critical (was: Blocker) > Missing ephemeral nodes in one of the members of the ensemble > - > > Key: ZOOKEEPER-1777 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1777 > Project: ZooKeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.4.5 > Environment: Linux, Java 1.7 >Reporter: Germán Blanco >Assignee: Germán Blanco >Priority: Critical > Fix For: 3.4.6, 3.5.0 > > Attachments: logs_trunk.tar.gz, snaps.tar, ZOOKEEPER-1777-3.4.patch, > ZOOKEEPER-1777.patch, ZOOKEEPER-1777.patch, ZOOKEEPER-1777.tar.gz > > > In a 3-servers ensemble, one of the followers doesn't see part of the > ephemeral nodes that are present in the leader and the other follower. > The 8 missing nodes in "the follower that is not ok" were created in the end > of epoch 1, the ensemble is running in epoch 2. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (ZOOKEEPER-1777) Missing ephemeral nodes in one of the members of the ensemble
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Germán Blanco updated ZOOKEEPER-1777: - Attachment: ZOOKEEPER-1777.patch The updated version of the patch for trunk seems to pass the regression tests. However, it still lacks an specific test for the error test reported in this JIRA. It is not intended to be the final version, it is there to help with the discussion of the proposal. > Missing ephemeral nodes in one of the members of the ensemble > - > > Key: ZOOKEEPER-1777 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1777 > Project: ZooKeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.4.5 > Environment: Linux, Java 1.7 >Reporter: Germán Blanco >Assignee: Germán Blanco >Priority: Blocker > Fix For: 3.4.6, 3.5.0 > > Attachments: logs_trunk.tar.gz, snaps.tar, ZOOKEEPER-1777-3.4.patch, > ZOOKEEPER-1777.patch, ZOOKEEPER-1777.patch, ZOOKEEPER-1777.tar.gz > > > In a 3-servers ensemble, one of the followers doesn't see part of the > ephemeral nodes that are present in the leader and the other follower. > The 8 missing nodes in "the follower that is not ok" were created in the end > of epoch 1, the ensemble is running in epoch 2. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (ZOOKEEPER-1777) Missing ephemeral nodes in one of the members of the ensemble
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Germán Blanco updated ZOOKEEPER-1777: - Attachment: ZOOKEEPER-1777.patch For my case there is a simple solution, since our snapshots are very small we have already applied a patch that forces snapshot synchronization and avoids the problem. In any case, severity was changed by Patrick Hunt, you may want to check with him in case you haven't done so already. The attached patch proposes a fix in which an incremental hash that should be unique for each transaction history is associated with each transaction. This hash is sent to the Leader (only if the leader supports it). The Leader then sends an snapshot if the hash doesn't match its history for the same transaction. At least this was the intention of the change :-). I had only time to check the patch for 3.4 and at least it passes the regression test. Reviews and comments will be very appreciated. > Missing ephemeral nodes in one of the members of the ensemble > - > > Key: ZOOKEEPER-1777 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1777 > Project: ZooKeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.4.5 > Environment: Linux, Java 1.7 >Reporter: Germán Blanco >Assignee: Germán Blanco >Priority: Blocker > Fix For: 3.4.6, 3.5.0 > > Attachments: logs_trunk.tar.gz, snaps.tar, ZOOKEEPER-1777-3.4.patch, > ZOOKEEPER-1777.patch, ZOOKEEPER-1777.tar.gz > > > In a 3-servers ensemble, one of the followers doesn't see part of the > ephemeral nodes that are present in the leader and the other follower. > The 8 missing nodes in "the follower that is not ok" were created in the end > of epoch 1, the ensemble is running in epoch 2. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (ZOOKEEPER-1777) Missing ephemeral nodes in one of the members of the ensemble
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Germán Blanco updated ZOOKEEPER-1777: - Attachment: ZOOKEEPER-1777-3.4.patch > Missing ephemeral nodes in one of the members of the ensemble > - > > Key: ZOOKEEPER-1777 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1777 > Project: ZooKeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.4.5 > Environment: Linux, Java 1.7 >Reporter: Germán Blanco >Assignee: Germán Blanco >Priority: Blocker > Fix For: 3.4.6, 3.5.0 > > Attachments: logs_trunk.tar.gz, snaps.tar, ZOOKEEPER-1777-3.4.patch, > ZOOKEEPER-1777.tar.gz > > > In a 3-servers ensemble, one of the followers doesn't see part of the > ephemeral nodes that are present in the leader and the other follower. > The 8 missing nodes in "the follower that is not ok" were created in the end > of epoch 1, the ensemble is running in epoch 2. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (ZOOKEEPER-1777) Missing ephemeral nodes in one of the members of the ensemble
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Germán Blanco updated ZOOKEEPER-1777: - Attachment: logs_trunk.tar.gz It does occur in trunk also. Logs attached in file logs_trunk.tar.gz. However I did see the TRUNC used for synchronization a couple of times, and also a message of being unable to send TRUNC because of different epochs and sending snapshot instead. So it was a bit harder to reproduced. This is the data in server A: [Fbc, Cbc, 6a, 4a, 7bc, 5a, 8bc, 3, 2, 1, 9a, 9bc, 7a, 8a, zookeeper, Abc, 6bc, Bbc, Dbc, Ebc] This is the data in server B: [Fbc, Cbc, 7bc, 5bc, 8bc, 4bc, 3, 2, 1, 9bc, zookeeper, Abc, 6bc, Bbc, Dbc, Ebc] I am working in the patch that sends information about the last transaction from the learner to the leader. That means that synchronization via snapshot will only happen when this problem occurs. Personally I don't see any other way to solve this, please tell me if you do. > Missing ephemeral nodes in one of the members of the ensemble > - > > Key: ZOOKEEPER-1777 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1777 > Project: ZooKeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.4.5 > Environment: Linux, Java 1.7 >Reporter: Germán Blanco >Assignee: Germán Blanco >Priority: Blocker > Fix For: 3.4.6, 3.5.0 > > Attachments: logs_trunk.tar.gz, snaps.tar, ZOOKEEPER-1777.tar.gz > > > In a 3-servers ensemble, one of the followers doesn't see part of the > ephemeral nodes that are present in the leader and the other follower. > The 8 missing nodes in "the follower that is not ok" were created in the end > of epoch 1, the ensemble is running in epoch 2. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (ZOOKEEPER-1777) Missing ephemeral nodes in one of the members of the ensemble
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Germán Blanco updated ZOOKEEPER-1777: - Attachment: ZOOKEEPER-1777.tar.gz Thanks a lot Flavio and Thawan for looking into this! I thought A does not get a TRUNC because B and C are already in a zxid that is higher than a9, which is the highest zxid that A has seen. I thought a TRUNC is only sent if the leader has a lower zxid than the incoming learner. The logs and data dir for this case are attached now. This is the resulting data in the wrong follower: [3, 2, 1, 6, zookeeper, 5, 5bis, 4] And this is the resulting data in the leader and the other follower: [3, 2, 1, 4bis, 6, zookeeper, 5bis] I am not saying that this is an error in the protocol. I am only saying that I see it as a problem and a small modification of the protocol is one of the solutions. Another solution would be adding an option to force SNAP synchronization, and there are very likely more. > Missing ephemeral nodes in one of the members of the ensemble > - > > Key: ZOOKEEPER-1777 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1777 > Project: ZooKeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.4.5 > Environment: Linux, Java 1.7 >Reporter: Germán Blanco >Assignee: Germán Blanco >Priority: Blocker > Fix For: 3.4.6, 3.5.0 > > Attachments: snaps.tar, ZOOKEEPER-1777.tar.gz > > > In a 3-servers ensemble, one of the followers doesn't see part of the > ephemeral nodes that are present in the leader and the other follower. > The 8 missing nodes in "the follower that is not ok" were created in the end > of epoch 1, the ensemble is running in epoch 2. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (ZOOKEEPER-1777) Missing ephemeral nodes in one of the members of the ensemble
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-1777: Priority: Blocker (was: Major) > Missing ephemeral nodes in one of the members of the ensemble > - > > Key: ZOOKEEPER-1777 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1777 > Project: ZooKeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.4.5 > Environment: Linux, Java 1.7 >Reporter: Germán Blanco >Assignee: Germán Blanco >Priority: Blocker > Fix For: 3.4.6, 3.5.0 > > Attachments: snaps.tar > > > In a 3-servers ensemble, one of the followers doesn't see part of the > ephemeral nodes that are present in the leader and the other follower. > The 8 missing nodes in "the follower that is not ok" were created in the end > of epoch 1, the ensemble is running in epoch 2. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (ZOOKEEPER-1777) Missing ephemeral nodes in one of the members of the ensemble
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Germán Blanco updated ZOOKEEPER-1777: - Attachment: snaps.tar Snapshots of the three members of the ZooKeeper ensemble. The 8 missing nodes in "the follower that is not ok" were created in the end of epoch 1: < cZxid = 0x01007d ... < cZxid = 0x0100a9 while the complete list is: ... cZxid = 0x01007b cZxid = 0x01007d ... cZxid = 0x0100a9 cZxid = 0x020004 ... 4 of the 6 ephemeral owners of these nodes have made modifications during epoch 2, which makes me think that this problem might not be related with session expiration, but more likely with synchronization after leader election. Even though some of the missing znodes were modified in epoch 2, "the follower that is not ok" didn't use this event to notice that something was wrong and e.g. restart and synchronize via snapshot. > Missing ephemeral nodes in one of the members of the ensemble > - > > Key: ZOOKEEPER-1777 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1777 > Project: ZooKeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.4.5 > Environment: Linux, Java 1.7 >Reporter: Germán Blanco >Assignee: Germán Blanco > Fix For: 3.4.6, 3.5.0 > > Attachments: snaps.tar > > > In a 3-servers ensemble, one of the followers doesn't see part of the > ephemeral nodes that are present in the leader and the other follower. > The 8 missing nodes in "the follower that is not ok" were created in the end > of epoch 1, the ensemble is running in epoch 2. -- This message was sent by Atlassian JIRA (v6.1#6144)