[jira] [Commented] (ZOOKEEPER-2172) Cluster crashes when reconfig a new node as a participant
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643954#comment-14643954 ] Hitoshi Mitake commented on ZOOKEEPER-2172: --- But the attached logs (with DEBUG level) don't contain messages of QuorumPeer.updateServerState(). Perhaps shutdown process of leader is stopping QuorumPeer main thread? Cluster crashes when reconfig a new node as a participant - Key: ZOOKEEPER-2172 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2172 Project: ZooKeeper Issue Type: Bug Components: leaderElection, quorum, server Affects Versions: 3.5.0 Environment: Ubuntu 12.04 + java 7 Reporter: Ziyou Wang Priority: Critical Attachments: ZOOKEEPER-2172.patch, history.txt, node-1.log, node-2.log, node-3.log, zoo-1.log, zoo-2-1.log, zoo-2-2.log, zoo-2-3.log, zoo-2.log, zoo-2212-1.log, zoo-2212-2.log, zoo-2212-3.log, zoo-3-1.log, zoo-3-2.log, zoo-3-3.log, zoo-3.log, zoo-4-1.log, zoo-4-2.log, zoo-4-3.log, zoo.cfg.dynamic.1005d, zoo.cfg.dynamic.next, zookeeper-1.log, zookeeper-1.out, zookeeper-2.log, zookeeper-2.out, zookeeper-3.log, zookeeper-3.out The operations are quite simple: start three zk servers one by one, then reconfig the cluster to add the new one as a participant. When I add the third one, the zk cluster may enter a weird state and cannot recover. I found “2015-04-20 12:53:48,236 [myid:1] - INFO [ProcessThread(sid:1 cport:-1)::PrepRequestProcessor@547] - Incremental reconfig” in node-1 log. So the first node received the reconfig cmd at 12:53:48. Latter, it logged “2015-04-20 12:53:52,230 [myid:1] - ERROR [LearnerHandler-/10.0.0.2:55890:LearnerHandler@580] - Unexpected exception causing shutdown while sock still open” and “2015-04-20 12:53:52,231 [myid:1] - WARN [LearnerHandler-/10.0.0.2:55890:LearnerHandler@595] - *** GOODBYE /10.0.0.2:55890 ”. From then on, the first node and second node rejected all client connections and the third node didn’t join the cluster as a participant. The whole cluster was done. When the problem happened, all three nodes just used the same dynamic config file zoo.cfg.dynamic.1005d which only contained the first two nodes. But there was another unused dynamic config file in node-1 directory zoo.cfg.dynamic.next which already contained three nodes. When I extended the waiting time between starting the third node and reconfiguring the cluster, the problem didn’t show again. So it should be a race condition problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2172) Cluster crashes when reconfig a new node as a participant
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitoshi Mitake updated ZOOKEEPER-2172: -- Attachment: ZOOKEEPER-2172.patch Hi [~ziyouw], I found a little bit strange code path like below: 1. In the tail of Leader.shutdown(), leader tries to remove all learner handlers with synchronized (learners). The loop calls LearnerHandler.shutdown(). 2. In LearnerHandler.shutdown(), learder.removeLearnerHandler() is called. 3. In Leader.removeLearnerHandler(), the member of Leader, learners, is also locked by synchronized Seems that the above sequence can cause deadlock. I removed synchronized(learners) in removeLearnerHandler in the attached patch. Could you test it on your environment? # the targetting version is 3.5.0 Cluster crashes when reconfig a new node as a participant - Key: ZOOKEEPER-2172 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2172 Project: ZooKeeper Issue Type: Bug Components: leaderElection, quorum, server Affects Versions: 3.5.0 Environment: Ubuntu 12.04 + java 7 Reporter: Ziyou Wang Priority: Critical Attachments: ZOOKEEPER-2172.patch, history.txt, node-1.log, node-2.log, node-3.log, zoo-1.log, zoo-2-1.log, zoo-2-2.log, zoo-2-3.log, zoo-2.log, zoo-2212-1.log, zoo-2212-2.log, zoo-2212-3.log, zoo-3-1.log, zoo-3-2.log, zoo-3-3.log, zoo-3.log, zoo-4-1.log, zoo-4-2.log, zoo-4-3.log, zoo.cfg.dynamic.1005d, zoo.cfg.dynamic.next, zookeeper-1.log, zookeeper-1.out, zookeeper-2.log, zookeeper-2.out, zookeeper-3.log, zookeeper-3.out The operations are quite simple: start three zk servers one by one, then reconfig the cluster to add the new one as a participant. When I add the third one, the zk cluster may enter a weird state and cannot recover. I found “2015-04-20 12:53:48,236 [myid:1] - INFO [ProcessThread(sid:1 cport:-1)::PrepRequestProcessor@547] - Incremental reconfig” in node-1 log. So the first node received the reconfig cmd at 12:53:48. Latter, it logged “2015-04-20 12:53:52,230 [myid:1] - ERROR [LearnerHandler-/10.0.0.2:55890:LearnerHandler@580] - Unexpected exception causing shutdown while sock still open” and “2015-04-20 12:53:52,231 [myid:1] - WARN [LearnerHandler-/10.0.0.2:55890:LearnerHandler@595] - *** GOODBYE /10.0.0.2:55890 ”. From then on, the first node and second node rejected all client connections and the third node didn’t join the cluster as a participant. The whole cluster was done. When the problem happened, all three nodes just used the same dynamic config file zoo.cfg.dynamic.1005d which only contained the first two nodes. But there was another unused dynamic config file in node-1 directory zoo.cfg.dynamic.next which already contained three nodes. When I extended the waiting time between starting the third node and reconfiguring the cluster, the problem didn’t show again. So it should be a race condition problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2172) Cluster crashes when reconfig a new node as a participant
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643952#comment-14643952 ] Hitoshi Mitake commented on ZOOKEEPER-2172: --- Sorry, the synchronized is reentrant, the patch would be wrong... please ignore it. Cluster crashes when reconfig a new node as a participant - Key: ZOOKEEPER-2172 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2172 Project: ZooKeeper Issue Type: Bug Components: leaderElection, quorum, server Affects Versions: 3.5.0 Environment: Ubuntu 12.04 + java 7 Reporter: Ziyou Wang Priority: Critical Attachments: ZOOKEEPER-2172.patch, history.txt, node-1.log, node-2.log, node-3.log, zoo-1.log, zoo-2-1.log, zoo-2-2.log, zoo-2-3.log, zoo-2.log, zoo-2212-1.log, zoo-2212-2.log, zoo-2212-3.log, zoo-3-1.log, zoo-3-2.log, zoo-3-3.log, zoo-3.log, zoo-4-1.log, zoo-4-2.log, zoo-4-3.log, zoo.cfg.dynamic.1005d, zoo.cfg.dynamic.next, zookeeper-1.log, zookeeper-1.out, zookeeper-2.log, zookeeper-2.out, zookeeper-3.log, zookeeper-3.out The operations are quite simple: start three zk servers one by one, then reconfig the cluster to add the new one as a participant. When I add the third one, the zk cluster may enter a weird state and cannot recover. I found “2015-04-20 12:53:48,236 [myid:1] - INFO [ProcessThread(sid:1 cport:-1)::PrepRequestProcessor@547] - Incremental reconfig” in node-1 log. So the first node received the reconfig cmd at 12:53:48. Latter, it logged “2015-04-20 12:53:52,230 [myid:1] - ERROR [LearnerHandler-/10.0.0.2:55890:LearnerHandler@580] - Unexpected exception causing shutdown while sock still open” and “2015-04-20 12:53:52,231 [myid:1] - WARN [LearnerHandler-/10.0.0.2:55890:LearnerHandler@595] - *** GOODBYE /10.0.0.2:55890 ”. From then on, the first node and second node rejected all client connections and the third node didn’t join the cluster as a participant. The whole cluster was done. When the problem happened, all three nodes just used the same dynamic config file zoo.cfg.dynamic.1005d which only contained the first two nodes. But there was another unused dynamic config file in node-1 directory zoo.cfg.dynamic.next which already contained three nodes. When I extended the waiting time between starting the third node and reconfiguring the cluster, the problem didn’t show again. So it should be a race condition problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2172) Cluster crashes when reconfig a new node as a participant
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642415#comment-14642415 ] Hitoshi Mitake commented on ZOOKEEPER-2172: --- Hi [~ziyouw], Could you check my understanding is correct? IIUC, your situation is like below: 1. server 1 boot 2. server 2 boot 3. client issues reconfig to server 1 4. server 2 tries to sync with server 1 with Learner.syncWithLeader() 5. server 3 boot 6. client issues reconfig to server 1 (reconfig requests in 3 and 6 are overwrapping) Is this correct, I'll be able to reproduce the situation with earthquake. Cluster crashes when reconfig a new node as a participant - Key: ZOOKEEPER-2172 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2172 Project: ZooKeeper Issue Type: Bug Components: leaderElection, quorum, server Affects Versions: 3.5.0 Environment: Ubuntu 12.04 + java 7 Reporter: Ziyou Wang Priority: Critical Attachments: history.txt, node-1.log, node-2.log, node-3.log, zoo-1.log, zoo-2-1.log, zoo-2-2.log, zoo-2-3.log, zoo-2.log, zoo-2212-1.log, zoo-2212-2.log, zoo-2212-3.log, zoo-3-1.log, zoo-3-2.log, zoo-3-3.log, zoo-3.log, zoo-4-1.log, zoo-4-2.log, zoo-4-3.log, zoo.cfg.dynamic.1005d, zoo.cfg.dynamic.next, zookeeper-1.log, zookeeper-1.out, zookeeper-2.log, zookeeper-2.out, zookeeper-3.log, zookeeper-3.out The operations are quite simple: start three zk servers one by one, then reconfig the cluster to add the new one as a participant. When I add the third one, the zk cluster may enter a weird state and cannot recover. I found “2015-04-20 12:53:48,236 [myid:1] - INFO [ProcessThread(sid:1 cport:-1)::PrepRequestProcessor@547] - Incremental reconfig” in node-1 log. So the first node received the reconfig cmd at 12:53:48. Latter, it logged “2015-04-20 12:53:52,230 [myid:1] - ERROR [LearnerHandler-/10.0.0.2:55890:LearnerHandler@580] - Unexpected exception causing shutdown while sock still open” and “2015-04-20 12:53:52,231 [myid:1] - WARN [LearnerHandler-/10.0.0.2:55890:LearnerHandler@595] - *** GOODBYE /10.0.0.2:55890 ”. From then on, the first node and second node rejected all client connections and the third node didn’t join the cluster as a participant. The whole cluster was done. When the problem happened, all three nodes just used the same dynamic config file zoo.cfg.dynamic.1005d which only contained the first two nodes. But there was another unused dynamic config file in node-1 directory zoo.cfg.dynamic.next which already contained three nodes. When I extended the waiting time between starting the third node and reconfiguring the cluster, the problem didn’t show again. So it should be a race condition problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2172) Cluster crashes when reconfig a new node as a participant
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642437#comment-14642437 ] Hitoshi Mitake commented on ZOOKEEPER-2172: --- [~ziyouw] BTW, if it is possible, could you share your dockerfile for your testing? Cluster crashes when reconfig a new node as a participant - Key: ZOOKEEPER-2172 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2172 Project: ZooKeeper Issue Type: Bug Components: leaderElection, quorum, server Affects Versions: 3.5.0 Environment: Ubuntu 12.04 + java 7 Reporter: Ziyou Wang Priority: Critical Attachments: history.txt, node-1.log, node-2.log, node-3.log, zoo-1.log, zoo-2-1.log, zoo-2-2.log, zoo-2-3.log, zoo-2.log, zoo-2212-1.log, zoo-2212-2.log, zoo-2212-3.log, zoo-3-1.log, zoo-3-2.log, zoo-3-3.log, zoo-3.log, zoo-4-1.log, zoo-4-2.log, zoo-4-3.log, zoo.cfg.dynamic.1005d, zoo.cfg.dynamic.next, zookeeper-1.log, zookeeper-1.out, zookeeper-2.log, zookeeper-2.out, zookeeper-3.log, zookeeper-3.out The operations are quite simple: start three zk servers one by one, then reconfig the cluster to add the new one as a participant. When I add the third one, the zk cluster may enter a weird state and cannot recover. I found “2015-04-20 12:53:48,236 [myid:1] - INFO [ProcessThread(sid:1 cport:-1)::PrepRequestProcessor@547] - Incremental reconfig” in node-1 log. So the first node received the reconfig cmd at 12:53:48. Latter, it logged “2015-04-20 12:53:52,230 [myid:1] - ERROR [LearnerHandler-/10.0.0.2:55890:LearnerHandler@580] - Unexpected exception causing shutdown while sock still open” and “2015-04-20 12:53:52,231 [myid:1] - WARN [LearnerHandler-/10.0.0.2:55890:LearnerHandler@595] - *** GOODBYE /10.0.0.2:55890 ”. From then on, the first node and second node rejected all client connections and the third node didn’t join the cluster as a participant. The whole cluster was done. When the problem happened, all three nodes just used the same dynamic config file zoo.cfg.dynamic.1005d which only contained the first two nodes. But there was another unused dynamic config file in node-1 directory zoo.cfg.dynamic.next which already contained three nodes. When I extended the waiting time between starting the third node and reconfiguring the cluster, the problem didn’t show again. So it should be a race condition problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2172) Cluster crashes when reconfig a new node as a participant
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642436#comment-14642436 ] Hitoshi Mitake commented on ZOOKEEPER-2172: --- [~ziyouw] BTW, if it is possible, could you share your dockerfile for your testing? Cluster crashes when reconfig a new node as a participant - Key: ZOOKEEPER-2172 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2172 Project: ZooKeeper Issue Type: Bug Components: leaderElection, quorum, server Affects Versions: 3.5.0 Environment: Ubuntu 12.04 + java 7 Reporter: Ziyou Wang Priority: Critical Attachments: history.txt, node-1.log, node-2.log, node-3.log, zoo-1.log, zoo-2-1.log, zoo-2-2.log, zoo-2-3.log, zoo-2.log, zoo-2212-1.log, zoo-2212-2.log, zoo-2212-3.log, zoo-3-1.log, zoo-3-2.log, zoo-3-3.log, zoo-3.log, zoo-4-1.log, zoo-4-2.log, zoo-4-3.log, zoo.cfg.dynamic.1005d, zoo.cfg.dynamic.next, zookeeper-1.log, zookeeper-1.out, zookeeper-2.log, zookeeper-2.out, zookeeper-3.log, zookeeper-3.out The operations are quite simple: start three zk servers one by one, then reconfig the cluster to add the new one as a participant. When I add the third one, the zk cluster may enter a weird state and cannot recover. I found “2015-04-20 12:53:48,236 [myid:1] - INFO [ProcessThread(sid:1 cport:-1)::PrepRequestProcessor@547] - Incremental reconfig” in node-1 log. So the first node received the reconfig cmd at 12:53:48. Latter, it logged “2015-04-20 12:53:52,230 [myid:1] - ERROR [LearnerHandler-/10.0.0.2:55890:LearnerHandler@580] - Unexpected exception causing shutdown while sock still open” and “2015-04-20 12:53:52,231 [myid:1] - WARN [LearnerHandler-/10.0.0.2:55890:LearnerHandler@595] - *** GOODBYE /10.0.0.2:55890 ”. From then on, the first node and second node rejected all client connections and the third node didn’t join the cluster as a participant. The whole cluster was done. When the problem happened, all three nodes just used the same dynamic config file zoo.cfg.dynamic.1005d which only contained the first two nodes. But there was another unused dynamic config file in node-1 directory zoo.cfg.dynamic.next which already contained three nodes. When I extended the waiting time between starting the third node and reconfiguring the cluster, the problem didn’t show again. So it should be a race condition problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2172) Cluster crashes when reconfig a new node as a participant
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642435#comment-14642435 ] Hitoshi Mitake commented on ZOOKEEPER-2172: --- [~ziyouw] BTW, if it is possible, could you share your dockerfile for your testing? Cluster crashes when reconfig a new node as a participant - Key: ZOOKEEPER-2172 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2172 Project: ZooKeeper Issue Type: Bug Components: leaderElection, quorum, server Affects Versions: 3.5.0 Environment: Ubuntu 12.04 + java 7 Reporter: Ziyou Wang Priority: Critical Attachments: history.txt, node-1.log, node-2.log, node-3.log, zoo-1.log, zoo-2-1.log, zoo-2-2.log, zoo-2-3.log, zoo-2.log, zoo-2212-1.log, zoo-2212-2.log, zoo-2212-3.log, zoo-3-1.log, zoo-3-2.log, zoo-3-3.log, zoo-3.log, zoo-4-1.log, zoo-4-2.log, zoo-4-3.log, zoo.cfg.dynamic.1005d, zoo.cfg.dynamic.next, zookeeper-1.log, zookeeper-1.out, zookeeper-2.log, zookeeper-2.out, zookeeper-3.log, zookeeper-3.out The operations are quite simple: start three zk servers one by one, then reconfig the cluster to add the new one as a participant. When I add the third one, the zk cluster may enter a weird state and cannot recover. I found “2015-04-20 12:53:48,236 [myid:1] - INFO [ProcessThread(sid:1 cport:-1)::PrepRequestProcessor@547] - Incremental reconfig” in node-1 log. So the first node received the reconfig cmd at 12:53:48. Latter, it logged “2015-04-20 12:53:52,230 [myid:1] - ERROR [LearnerHandler-/10.0.0.2:55890:LearnerHandler@580] - Unexpected exception causing shutdown while sock still open” and “2015-04-20 12:53:52,231 [myid:1] - WARN [LearnerHandler-/10.0.0.2:55890:LearnerHandler@595] - *** GOODBYE /10.0.0.2:55890 ”. From then on, the first node and second node rejected all client connections and the third node didn’t join the cluster as a participant. The whole cluster was done. When the problem happened, all three nodes just used the same dynamic config file zoo.cfg.dynamic.1005d which only contained the first two nodes. But there was another unused dynamic config file in node-1 directory zoo.cfg.dynamic.next which already contained three nodes. When I extended the waiting time between starting the third node and reconfiguring the cluster, the problem didn’t show again. So it should be a race condition problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2172) Cluster crashes when reconfig a new node as a participant
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642439#comment-14642439 ] Hitoshi Mitake commented on ZOOKEEPER-2172: --- Sorry for bothering with duplicated replies... Cluster crashes when reconfig a new node as a participant - Key: ZOOKEEPER-2172 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2172 Project: ZooKeeper Issue Type: Bug Components: leaderElection, quorum, server Affects Versions: 3.5.0 Environment: Ubuntu 12.04 + java 7 Reporter: Ziyou Wang Priority: Critical Attachments: history.txt, node-1.log, node-2.log, node-3.log, zoo-1.log, zoo-2-1.log, zoo-2-2.log, zoo-2-3.log, zoo-2.log, zoo-2212-1.log, zoo-2212-2.log, zoo-2212-3.log, zoo-3-1.log, zoo-3-2.log, zoo-3-3.log, zoo-3.log, zoo-4-1.log, zoo-4-2.log, zoo-4-3.log, zoo.cfg.dynamic.1005d, zoo.cfg.dynamic.next, zookeeper-1.log, zookeeper-1.out, zookeeper-2.log, zookeeper-2.out, zookeeper-3.log, zookeeper-3.out The operations are quite simple: start three zk servers one by one, then reconfig the cluster to add the new one as a participant. When I add the third one, the zk cluster may enter a weird state and cannot recover. I found “2015-04-20 12:53:48,236 [myid:1] - INFO [ProcessThread(sid:1 cport:-1)::PrepRequestProcessor@547] - Incremental reconfig” in node-1 log. So the first node received the reconfig cmd at 12:53:48. Latter, it logged “2015-04-20 12:53:52,230 [myid:1] - ERROR [LearnerHandler-/10.0.0.2:55890:LearnerHandler@580] - Unexpected exception causing shutdown while sock still open” and “2015-04-20 12:53:52,231 [myid:1] - WARN [LearnerHandler-/10.0.0.2:55890:LearnerHandler@595] - *** GOODBYE /10.0.0.2:55890 ”. From then on, the first node and second node rejected all client connections and the third node didn’t join the cluster as a participant. The whole cluster was done. When the problem happened, all three nodes just used the same dynamic config file zoo.cfg.dynamic.1005d which only contained the first two nodes. But there was another unused dynamic config file in node-1 directory zoo.cfg.dynamic.next which already contained three nodes. When I extended the waiting time between starting the third node and reconfiguring the cluster, the problem didn’t show again. So it should be a race condition problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2172) Cluster crashes when reconfig a new node as a participant
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636344#comment-14636344 ] Hitoshi Mitake commented on ZOOKEEPER-2172: --- Hi [~shralex], thanks for your reply. As you pointed, the crash is caused in the inspection layer (written in byteman). Sorry for bothering. But the NullPointerException is a little bit odd. The exception is caused by the byteman script like this: RULE quorum packet receive in Follower CLASS Learner METHOD readPacket HELPER net.osrg.earthquake.PBEQHelper BIND argMap = new java.util.HashMap() AT EXIT IF $# == 1 DO argMap.put(quorumPacket, org.apache.zookeeper.server.quorum.LearnerHandler.packetToString($1)); eventFuncReturn(Learner.readPacket, argMap); ENDRULE IIUC, the quorumpacket will never be null in follower. I'll look at the problem. Cluster crashes when reconfig a new node as a participant - Key: ZOOKEEPER-2172 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2172 Project: ZooKeeper Issue Type: Bug Components: leaderElection, quorum, server Affects Versions: 3.5.0 Environment: Ubuntu 12.04 + java 7 Reporter: Ziyou Wang Priority: Critical Attachments: history.txt, node-1.log, node-2.log, node-3.log, zoo-1.log, zoo-2-1.log, zoo-2-2.log, zoo-2-3.log, zoo-2.log, zoo-2212-1.log, zoo-2212-2.log, zoo-2212-3.log, zoo-3-1.log, zoo-3-2.log, zoo-3-3.log, zoo-3.log, zoo-4-1.log, zoo-4-2.log, zoo-4-3.log, zoo.cfg.dynamic.1005d, zoo.cfg.dynamic.next, zookeeper-1.log, zookeeper-1.out, zookeeper-2.log, zookeeper-2.out, zookeeper-3.log, zookeeper-3.out The operations are quite simple: start three zk servers one by one, then reconfig the cluster to add the new one as a participant. When I add the third one, the zk cluster may enter a weird state and cannot recover. I found “2015-04-20 12:53:48,236 [myid:1] - INFO [ProcessThread(sid:1 cport:-1)::PrepRequestProcessor@547] - Incremental reconfig” in node-1 log. So the first node received the reconfig cmd at 12:53:48. Latter, it logged “2015-04-20 12:53:52,230 [myid:1] - ERROR [LearnerHandler-/10.0.0.2:55890:LearnerHandler@580] - Unexpected exception causing shutdown while sock still open” and “2015-04-20 12:53:52,231 [myid:1] - WARN [LearnerHandler-/10.0.0.2:55890:LearnerHandler@595] - *** GOODBYE /10.0.0.2:55890 ”. From then on, the first node and second node rejected all client connections and the third node didn’t join the cluster as a participant. The whole cluster was done. When the problem happened, all three nodes just used the same dynamic config file zoo.cfg.dynamic.1005d which only contained the first two nodes. But there was another unused dynamic config file in node-1 directory zoo.cfg.dynamic.next which already contained three nodes. When I extended the waiting time between starting the third node and reconfiguring the cluster, the problem didn’t show again. So it should be a race condition problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2172) Cluster crashes when reconfig a new node as a participant
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitoshi Mitake updated ZOOKEEPER-2172: -- Attachment: zookeeper-3.out zookeeper-2.out zookeeper-1.out history.txt Cluster crashes when reconfig a new node as a participant - Key: ZOOKEEPER-2172 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2172 Project: ZooKeeper Issue Type: Bug Components: leaderElection, quorum, server Affects Versions: 3.5.0 Environment: Ubuntu 12.04 + java 7 Reporter: Ziyou Wang Priority: Critical Attachments: history.txt, node-1.log, node-2.log, node-3.log, zoo-1.log, zoo-2-1.log, zoo-2-2.log, zoo-2-3.log, zoo-2.log, zoo-2212-1.log, zoo-2212-2.log, zoo-2212-3.log, zoo-3-1.log, zoo-3-2.log, zoo-3-3.log, zoo-3.log, zoo-4-1.log, zoo-4-2.log, zoo-4-3.log, zoo.cfg.dynamic.1005d, zoo.cfg.dynamic.next, zookeeper-1.log, zookeeper-1.out, zookeeper-2.log, zookeeper-2.out, zookeeper-3.log, zookeeper-3.out The operations are quite simple: start three zk servers one by one, then reconfig the cluster to add the new one as a participant. When I add the third one, the zk cluster may enter a weird state and cannot recover. I found “2015-04-20 12:53:48,236 [myid:1] - INFO [ProcessThread(sid:1 cport:-1)::PrepRequestProcessor@547] - Incremental reconfig” in node-1 log. So the first node received the reconfig cmd at 12:53:48. Latter, it logged “2015-04-20 12:53:52,230 [myid:1] - ERROR [LearnerHandler-/10.0.0.2:55890:LearnerHandler@580] - Unexpected exception causing shutdown while sock still open” and “2015-04-20 12:53:52,231 [myid:1] - WARN [LearnerHandler-/10.0.0.2:55890:LearnerHandler@595] - *** GOODBYE /10.0.0.2:55890 ”. From then on, the first node and second node rejected all client connections and the third node didn’t join the cluster as a participant. The whole cluster was done. When the problem happened, all three nodes just used the same dynamic config file zoo.cfg.dynamic.1005d which only contained the first two nodes. But there was another unused dynamic config file in node-1 directory zoo.cfg.dynamic.next which already contained three nodes. When I extended the waiting time between starting the third node and reconfiguring the cluster, the problem didn’t show again. So it should be a race condition problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2172) Cluster crashes when reconfig a new node as a participant
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634903#comment-14634903 ] Hitoshi Mitake commented on ZOOKEEPER-2172: --- Hi [~ziyouw], It seems that I could reproduce this problem. Just adding new servers with reconfig one by one, then the ensemble rejects every client request. (Of course there is a possibility of my misunderstanding) I used our distributed systems debugger named [earthquake|https://github.com/osrg/earthquake]. It uses byteman and inspect execution of debuggee (zookeeper server in this case). It tries to cause corner case situations that is hard to be produced in ordinal testing by reordering inspected method calls and returns. We are preparing a docker image for easy reproducing in your environment. Please wait for a while. I'm analyzing the problem and would like to post the root cause and patch, but it may take a time because I'm new to zookeeper. So I attached logs (zookeeper-123.out) and the history of ensemble (history.txt). The logs seem to be similar to yours. The format of the history is earthquake specific format, so it wouldn't be easy to read. But I think you can interpret the event sequence roughly (it is just a sequence of method calls and returns + their stacktrace). It would be great if I can hear your comments. Cluster crashes when reconfig a new node as a participant - Key: ZOOKEEPER-2172 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2172 Project: ZooKeeper Issue Type: Bug Components: leaderElection, quorum, server Affects Versions: 3.5.0 Environment: Ubuntu 12.04 + java 7 Reporter: Ziyou Wang Priority: Critical Attachments: history.txt, node-1.log, node-2.log, node-3.log, zoo-1.log, zoo-2-1.log, zoo-2-2.log, zoo-2-3.log, zoo-2.log, zoo-2212-1.log, zoo-2212-2.log, zoo-2212-3.log, zoo-3-1.log, zoo-3-2.log, zoo-3-3.log, zoo-3.log, zoo-4-1.log, zoo-4-2.log, zoo-4-3.log, zoo.cfg.dynamic.1005d, zoo.cfg.dynamic.next, zookeeper-1.log, zookeeper-1.out, zookeeper-2.log, zookeeper-2.out, zookeeper-3.log, zookeeper-3.out The operations are quite simple: start three zk servers one by one, then reconfig the cluster to add the new one as a participant. When I add the third one, the zk cluster may enter a weird state and cannot recover. I found “2015-04-20 12:53:48,236 [myid:1] - INFO [ProcessThread(sid:1 cport:-1)::PrepRequestProcessor@547] - Incremental reconfig” in node-1 log. So the first node received the reconfig cmd at 12:53:48. Latter, it logged “2015-04-20 12:53:52,230 [myid:1] - ERROR [LearnerHandler-/10.0.0.2:55890:LearnerHandler@580] - Unexpected exception causing shutdown while sock still open” and “2015-04-20 12:53:52,231 [myid:1] - WARN [LearnerHandler-/10.0.0.2:55890:LearnerHandler@595] - *** GOODBYE /10.0.0.2:55890 ”. From then on, the first node and second node rejected all client connections and the third node didn’t join the cluster as a participant. The whole cluster was done. When the problem happened, all three nodes just used the same dynamic config file zoo.cfg.dynamic.1005d which only contained the first two nodes. But there was another unused dynamic config file in node-1 directory zoo.cfg.dynamic.next which already contained three nodes. When I extended the waiting time between starting the third node and reconfiguring the cluster, the problem didn’t show again. So it should be a race condition problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (ZOOKEEPER-2233) Invalid description in the comment of LearnerHandler.syncFollower()
Hitoshi Mitake created ZOOKEEPER-2233: - Summary: Invalid description in the comment of LearnerHandler.syncFollower() Key: ZOOKEEPER-2233 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2233 Project: ZooKeeper Issue Type: Improvement Reporter: Hitoshi Mitake Assignee: Hitoshi Mitake Priority: Trivial LearnerHandler.syncFollower() has a comment like below: When leader election is completed, the leader will set its lastProcessedZxid to be (epoch 32). There will be no txn associated with this zxid. However, IIUC, the expression epoch 32 (comparison) should be epoch 32 (bitshift). Of course the error is very trivial but it was a little bit confusing for me, so I'd like to fix it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2233) Invalid description in the comment of LearnerHandler.syncFollower()
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitoshi Mitake updated ZOOKEEPER-2233: -- Attachment: ZOOKEEPER-2233.patch Invalid description in the comment of LearnerHandler.syncFollower() --- Key: ZOOKEEPER-2233 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2233 Project: ZooKeeper Issue Type: Improvement Reporter: Hitoshi Mitake Assignee: Hitoshi Mitake Priority: Trivial Attachments: ZOOKEEPER-2233.patch LearnerHandler.syncFollower() has a comment like below: When leader election is completed, the leader will set its lastProcessedZxid to be (epoch 32). There will be no txn associated with this zxid. However, IIUC, the expression epoch 32 (comparison) should be epoch 32 (bitshift). Of course the error is very trivial but it was a little bit confusing for me, so I'd like to fix it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2205) Log type of unexpected quorum packet in learner handler loop
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14574101#comment-14574101 ] Hitoshi Mitake commented on ZOOKEEPER-2205: --- Hi, The problem in the Observer class is similar but not directly related to this issue. Could you open your own issue and send patch to the new one? Log type of unexpected quorum packet in learner handler loop Key: ZOOKEEPER-2205 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2205 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Hitoshi Mitake Assignee: Hitoshi Mitake Priority: Trivial Attachments: ZOOKEEPER-2205-v2.patch, ZOOKEEPER-2205-v3.patch, ZOOKEEPER-2205-v4.patch, ZOOKEEPER-2205.patch Current learner handler loop doesn't log anything when it receives unexpected type of quorum packet from learner. This patch lets the learner handler loop log the type of packet for defensive purpose. It would make debugging and trouble shooting a little bit easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2205) Log type of unexpected quorum packet in learner handler loop
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitoshi Mitake updated ZOOKEEPER-2205: -- Attachment: ZOOKEEPER-2205-v4.patch Log type of unexpected quorum packet in learner handler loop Key: ZOOKEEPER-2205 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2205 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Hitoshi Mitake Assignee: Hitoshi Mitake Priority: Trivial Attachments: ZOOKEEPER-2205-v2.patch, ZOOKEEPER-2205-v3.patch, ZOOKEEPER-2205-v4.patch, ZOOKEEPER-2205.patch Current learner handler loop doesn't log anything when it receives unexpected type of quorum packet from learner. This patch lets the learner handler loop log the type of packet for defensive purpose. It would make debugging and trouble shooting a little bit easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2207) Enhance error logs with LearnerHandler.packetToString()
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitoshi Mitake updated ZOOKEEPER-2207: -- Attachment: ZOOKEEPER-2207-v2.patch Enhance error logs with LearnerHandler.packetToString() --- Key: ZOOKEEPER-2207 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2207 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Hitoshi Mitake Assignee: Hitoshi Mitake Priority: Trivial Attachments: ZOOKEEPER-2207-v2.patch, ZOOKEEPER-2207.patch This patch enhances error logs related to unexpected types of QuorumPacket with LearnerHandler.packetToString(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2205) Log type of unexpected quorum packet in learner handler loop
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14574102#comment-14574102 ] Hitoshi Mitake commented on ZOOKEEPER-2205: --- Hi [~rgs], Thanks for your review! I attached v4 patch based on your comments. Log type of unexpected quorum packet in learner handler loop Key: ZOOKEEPER-2205 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2205 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Hitoshi Mitake Assignee: Hitoshi Mitake Priority: Trivial Attachments: ZOOKEEPER-2205-v2.patch, ZOOKEEPER-2205-v3.patch, ZOOKEEPER-2205-v4.patch, ZOOKEEPER-2205.patch Current learner handler loop doesn't log anything when it receives unexpected type of quorum packet from learner. This patch lets the learner handler loop log the type of packet for defensive purpose. It would make debugging and trouble shooting a little bit easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2207) Enhance error logs with LearnerHandler.packetToString()
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14574109#comment-14574109 ] Hitoshi Mitake commented on ZOOKEEPER-2207: --- Hi [~rgs], Thanks for your review! I attached v2 patch based on your comments. BTW, I fixed the unconditional return branch problem in the v4 patch of ZOOKEEPER-2205 (https://issues.apache.org/jira/browse/ZOOKEEPER-2205). Should I remove the return branch in this 2207? If I should do so, I'll fix both of the patches in 2205 and 2207. Enhance error logs with LearnerHandler.packetToString() --- Key: ZOOKEEPER-2207 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2207 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Hitoshi Mitake Assignee: Hitoshi Mitake Priority: Trivial Attachments: ZOOKEEPER-2207-v2.patch, ZOOKEEPER-2207.patch This patch enhances error logs related to unexpected types of QuorumPacket with LearnerHandler.packetToString(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2207) Enhance error logs with LearnerHandler.packetToString()
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575441#comment-14575441 ] Hitoshi Mitake commented on ZOOKEEPER-2207: --- Thanks, [~rgs]! Enhance error logs with LearnerHandler.packetToString() --- Key: ZOOKEEPER-2207 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2207 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Hitoshi Mitake Assignee: Hitoshi Mitake Priority: Trivial Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-2207-v2.patch, ZOOKEEPER-2207.patch This patch enhances error logs related to unexpected types of QuorumPacket with LearnerHandler.packetToString(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2206) Add missing packet types to LearnerHandler.packetToString()
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575443#comment-14575443 ] Hitoshi Mitake commented on ZOOKEEPER-2206: --- Thanks, [~rgs]! Add missing packet types to LearnerHandler.packetToString() --- Key: ZOOKEEPER-2206 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2206 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Hitoshi Mitake Assignee: Hitoshi Mitake Priority: Trivial Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-2206.patch packetToString() is a method which is suitable for obtaining string representation of QuorumPacket. But it lacks some types of QuorumPacket. This patch adds the missing types and enhance the method for more friendly logging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2205) Log type of unexpected quorum packet in learner handler loop
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575439#comment-14575439 ] Hitoshi Mitake commented on ZOOKEEPER-2205: --- Thanks for merging, [~rgs]! Log type of unexpected quorum packet in learner handler loop Key: ZOOKEEPER-2205 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2205 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.6, 3.5.0 Reporter: Hitoshi Mitake Assignee: Hitoshi Mitake Priority: Trivial Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-2205-v2.patch, ZOOKEEPER-2205-v3.patch, ZOOKEEPER-2205-v4.patch, ZOOKEEPER-2205.patch Current learner handler loop doesn't log anything when it receives unexpected type of quorum packet from learner. This patch lets the learner handler loop log the type of packet for defensive purpose. It would make debugging and trouble shooting a little bit easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2194) Let DataNode.getChildren() return an unmodifiable view of its children set
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573058#comment-14573058 ] Hitoshi Mitake commented on ZOOKEEPER-2194: --- Hi [~cnauroth], Thanks a lot for your description! Now I can understand both of the rule and situation of zookeeper community. I'll wait comments from comitters. Let DataNode.getChildren() return an unmodifiable view of its children set -- Key: ZOOKEEPER-2194 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2194 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Hitoshi Mitake Assignee: Hitoshi Mitake Priority: Trivial Attachments: ZOOKEEPER-2194-v2.patch, ZOOKEEPER-2194.patch Current DataNode.getChildren() directly returns a pointer of its private member, children. However, the member should be modified through addChild() and removeChild(). Callers of getChildren() shouldn't modify it directly. For preventing the direct modification by the callers, this patch lets getChildren() return an unmodifiable view of its children set. If the callers try to modify directly, runtime exception will be risen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2194) Let DataNode.getChildren() return an unmodifiable view of its children set
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573139#comment-14573139 ] Hitoshi Mitake commented on ZOOKEEPER-2194: --- Hi [~rgs], Thanks a lot for your review and merging! Let DataNode.getChildren() return an unmodifiable view of its children set -- Key: ZOOKEEPER-2194 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2194 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.6, 3.5.0 Reporter: Hitoshi Mitake Assignee: Hitoshi Mitake Priority: Trivial Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-2194-v2.patch, ZOOKEEPER-2194.patch Current DataNode.getChildren() directly returns a pointer of its private member, children. However, the member should be modified through addChild() and removeChild(). Callers of getChildren() shouldn't modify it directly. For preventing the direct modification by the callers, this patch lets getChildren() return an unmodifiable view of its children set. If the callers try to modify directly, runtime exception will be risen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (ZOOKEEPER-2206) Add missing packet types to LearnerHandler.packetToString()
Hitoshi Mitake created ZOOKEEPER-2206: - Summary: Add missing packet types to LearnerHandler.packetToString() Key: ZOOKEEPER-2206 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2206 Project: ZooKeeper Issue Type: Improvement Reporter: Hitoshi Mitake Priority: Trivial Attachments: ZOOKEEPER-2206.patch packetToString() is a method which is suitable for obtaining string representation of QuorumPacket. But it lacks some types of QuorumPacket. This patch adds the missing types and enhance the method for more friendly logging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2206) Add missing packet types to LearnerHandler.packetToString()
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitoshi Mitake updated ZOOKEEPER-2206: -- Attachment: ZOOKEEPER-2206.patch Add missing packet types to LearnerHandler.packetToString() --- Key: ZOOKEEPER-2206 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2206 Project: ZooKeeper Issue Type: Improvement Reporter: Hitoshi Mitake Priority: Trivial Attachments: ZOOKEEPER-2206.patch packetToString() is a method which is suitable for obtaining string representation of QuorumPacket. But it lacks some types of QuorumPacket. This patch adds the missing types and enhance the method for more friendly logging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2205) Log type of unexpected quorum packet in learner handler loop
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitoshi Mitake updated ZOOKEEPER-2205: -- Summary: Log type of unexpected quorum packet in learner handler loop (was: Log type of unexpected quorum packet in learner loop) Log type of unexpected quorum packet in learner handler loop Key: ZOOKEEPER-2205 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2205 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Hitoshi Mitake Assignee: Hitoshi Mitake Priority: Trivial Attachments: ZOOKEEPER-2205.patch Current learner loop doesn't log anything when it receives unexpected type of quorum packet from leader. This patch lets the learner loop log the type of packet for defensive purpose. It would make debugging and trouble shooting a little bit easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2205) Log type of unexpected quorum packet in learner handler loop
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitoshi Mitake updated ZOOKEEPER-2205: -- Description: Current learner handler loop doesn't log anything when it receives unexpected type of quorum packet from learner. This patch lets the learner handler loop log the type of packet for defensive purpose. It would make debugging and trouble shooting a little bit easier. was: Current learner loop doesn't log anything when it receives unexpected type of quorum packet from leader. This patch lets the learner loop log the type of packet for defensive purpose. It would make debugging and trouble shooting a little bit easier. Log type of unexpected quorum packet in learner handler loop Key: ZOOKEEPER-2205 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2205 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Hitoshi Mitake Assignee: Hitoshi Mitake Priority: Trivial Attachments: ZOOKEEPER-2205.patch Current learner handler loop doesn't log anything when it receives unexpected type of quorum packet from learner. This patch lets the learner handler loop log the type of packet for defensive purpose. It would make debugging and trouble shooting a little bit easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2205) Log type of unexpected quorum packet in learner handler loop
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitoshi Mitake updated ZOOKEEPER-2205: -- Attachment: ZOOKEEPER-2205-v2.patch version 2, use packetToString() for friendly log Log type of unexpected quorum packet in learner handler loop Key: ZOOKEEPER-2205 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2205 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Hitoshi Mitake Assignee: Hitoshi Mitake Priority: Trivial Attachments: ZOOKEEPER-2205-v2.patch, ZOOKEEPER-2205.patch Current learner handler loop doesn't log anything when it receives unexpected type of quorum packet from learner. This patch lets the learner handler loop log the type of packet for defensive purpose. It would make debugging and trouble shooting a little bit easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2206) Add missing packet types to LearnerHandler.packetToString()
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitoshi Mitake updated ZOOKEEPER-2206: -- Component/s: server Add missing packet types to LearnerHandler.packetToString() --- Key: ZOOKEEPER-2206 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2206 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Hitoshi Mitake Assignee: Hitoshi Mitake Priority: Trivial Attachments: ZOOKEEPER-2206.patch packetToString() is a method which is suitable for obtaining string representation of QuorumPacket. But it lacks some types of QuorumPacket. This patch adds the missing types and enhance the method for more friendly logging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (ZOOKEEPER-2207) Enhance error logs with LearnerHandler.packetToString()
Hitoshi Mitake created ZOOKEEPER-2207: - Summary: Enhance error logs with LearnerHandler.packetToString() Key: ZOOKEEPER-2207 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2207 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Hitoshi Mitake Priority: Trivial Attachments: ZOOKEEPER-2207.patch This patch enhances error logs related to unexpected types of QuorumPacket with LearnerHandler.packetToString(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2207) Enhance error logs with LearnerHandler.packetToString()
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitoshi Mitake updated ZOOKEEPER-2207: -- Attachment: ZOOKEEPER-2207.patch Enhance error logs with LearnerHandler.packetToString() --- Key: ZOOKEEPER-2207 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2207 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Hitoshi Mitake Priority: Trivial Attachments: ZOOKEEPER-2207.patch This patch enhances error logs related to unexpected types of QuorumPacket with LearnerHandler.packetToString(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (ZOOKEEPER-2207) Enhance error logs with LearnerHandler.packetToString()
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitoshi Mitake reassigned ZOOKEEPER-2207: - Assignee: Hitoshi Mitake Enhance error logs with LearnerHandler.packetToString() --- Key: ZOOKEEPER-2207 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2207 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Hitoshi Mitake Assignee: Hitoshi Mitake Priority: Trivial Attachments: ZOOKEEPER-2207.patch This patch enhances error logs related to unexpected types of QuorumPacket with LearnerHandler.packetToString(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (ZOOKEEPER-2206) Add missing packet types to LearnerHandler.packetToString()
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitoshi Mitake reassigned ZOOKEEPER-2206: - Assignee: Hitoshi Mitake Add missing packet types to LearnerHandler.packetToString() --- Key: ZOOKEEPER-2206 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2206 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Hitoshi Mitake Assignee: Hitoshi Mitake Priority: Trivial Attachments: ZOOKEEPER-2206.patch packetToString() is a method which is suitable for obtaining string representation of QuorumPacket. But it lacks some types of QuorumPacket. This patch adds the missing types and enhance the method for more friendly logging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2205) Log type of unexpected quorum packet in learner handler loop
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572339#comment-14572339 ] Hitoshi Mitake commented on ZOOKEEPER-2205: --- I found the below branch in the head of packetToString(): {code} if (true) return null; {code} Is there any reason for avoiding the method? The conditional branch seems to exist since the commit of Initial import. Log type of unexpected quorum packet in learner handler loop Key: ZOOKEEPER-2205 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2205 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Hitoshi Mitake Assignee: Hitoshi Mitake Priority: Trivial Attachments: ZOOKEEPER-2205-v2.patch, ZOOKEEPER-2205.patch Current learner handler loop doesn't log anything when it receives unexpected type of quorum packet from learner. This patch lets the learner handler loop log the type of packet for defensive purpose. It would make debugging and trouble shooting a little bit easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2194) Let DataNode.getChildren() return an unmodifiable view of its children set
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572357#comment-14572357 ] Hitoshi Mitake commented on ZOOKEEPER-2194: --- Hi [~cnauroth], For mainlining the patch, should I just wait? Or should I do some actions? I'm very new to zookeeper community, so I just want to know the required procedure. I'm not hurrying at all :) Let DataNode.getChildren() return an unmodifiable view of its children set -- Key: ZOOKEEPER-2194 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2194 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Hitoshi Mitake Assignee: Hitoshi Mitake Priority: Trivial Attachments: ZOOKEEPER-2194-v2.patch, ZOOKEEPER-2194.patch Current DataNode.getChildren() directly returns a pointer of its private member, children. However, the member should be modified through addChild() and removeChild(). Callers of getChildren() shouldn't modify it directly. For preventing the direct modification by the callers, this patch lets getChildren() return an unmodifiable view of its children set. If the callers try to modify directly, runtime exception will be risen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2205) Log type of unexpected quorum packet in learner loop
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitoshi Mitake updated ZOOKEEPER-2205: -- Issue Type: Improvement (was: Bug) Log type of unexpected quorum packet in learner loop Key: ZOOKEEPER-2205 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2205 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Hitoshi Mitake Assignee: Hitoshi Mitake Priority: Trivial Attachments: ZOOKEEPER-2205.patch Current learner loop doesn't log anything when it receives unexpected type of quorum packet from leader. This patch lets the learner loop log the type of packet for defensive purpose. It would make debugging and trouble shooting a little bit easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2205) Log type of unexpected quorum packet in learner loop
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitoshi Mitake updated ZOOKEEPER-2205: -- Attachment: ZOOKEEPER-2205.patch Log type of unexpected quorum packet in learner loop Key: ZOOKEEPER-2205 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2205 Project: ZooKeeper Issue Type: Bug Components: server Reporter: Hitoshi Mitake Assignee: Hitoshi Mitake Priority: Trivial Attachments: ZOOKEEPER-2205.patch Current learner loop doesn't log anything when it receives unexpected type of quorum packet from leader. This patch lets the learner loop log the type of packet for defensive purpose. It would make debugging and trouble shooting a little bit easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (ZOOKEEPER-2205) Log type of unexpected quorum packet in learner loop
Hitoshi Mitake created ZOOKEEPER-2205: - Summary: Log type of unexpected quorum packet in learner loop Key: ZOOKEEPER-2205 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2205 Project: ZooKeeper Issue Type: Bug Components: server Reporter: Hitoshi Mitake Assignee: Hitoshi Mitake Priority: Trivial Current learner loop doesn't log anything when it receives unexpected type of quorum packet from leader. This patch lets the learner loop log the type of packet for defensive purpose. It would make debugging and trouble shooting a little bit easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2193) reconfig command completes even if parameter is wrong obviously
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557955#comment-14557955 ] Hitoshi Mitake commented on ZOOKEEPER-2193: --- Hi [~Yasuhito Fukuda], IIUC, there is a possibility of duplicated addresses for different purposes e.g. clientAddr of new node == electionAddr of existing node. For checking duplication, 9 comparison per node pair would be required, I think. reconfig command completes even if parameter is wrong obviously --- Key: ZOOKEEPER-2193 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2193 Project: ZooKeeper Issue Type: Bug Components: leaderElection, server Affects Versions: 3.5.0 Environment: CentOS7 + Java7 Reporter: Yasuhito Fukuda Attachments: ZOOKEEPER-2193.patch Even if reconfig parameter is wrong, it was confirmed to complete. refer to the following. - Ensemble consists of four nodes {noformat} [zk: vm-101:2181(CONNECTED) 0] config server.1=192.168.100.101:2888:3888:participant server.2=192.168.100.102:2888:3888:participant server.3=192.168.100.103:2888:3888:participant server.4=192.168.100.104:2888:3888:participant version=1 {noformat} - add node by reconfig command {noformat} [zk: vm-101:2181(CONNECTED) 9] reconfig -add server.5=192.168.100.104:2888:3888:participant;0.0.0.0:2181 Committed new configuration: server.1=192.168.100.101:2888:3888:participant server.2=192.168.100.102:2888:3888:participant server.3=192.168.100.103:2888:3888:participant server.4=192.168.100.104:2888:3888:participant server.5=192.168.100.104:2888:3888:participant;0.0.0.0:2181 version=30007 {noformat} server.4 and server.5 of the IP address is a duplicate. In this state, reader election will not work properly. Besides, it is assumed an ensemble will be undesirable state. I think that need a parameter validation when reconfig. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2194) Let DataNode.getChildren() return an unmodifiable view of its children set
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555757#comment-14555757 ] Hitoshi Mitake commented on ZOOKEEPER-2194: --- Thanks for submitting test run! I'll ask the committers to list myself as a contributor on the zookeeper mailing list. Let DataNode.getChildren() return an unmodifiable view of its children set -- Key: ZOOKEEPER-2194 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2194 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Hitoshi Mitake Priority: Trivial Attachments: ZOOKEEPER-2194-v2.patch, ZOOKEEPER-2194.patch Current DataNode.getChildren() directly returns a pointer of its private member, children. However, the member should be modified through addChild() and removeChild(). Callers of getChildren() shouldn't modify it directly. For preventing the direct modification by the callers, this patch lets getChildren() return an unmodifiable view of its children set. If the callers try to modify directly, runtime exception will be risen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2194) Let DataNode.getChildren() return an unmodifiable view of its children set
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitoshi Mitake updated ZOOKEEPER-2194: -- Attachment: ZOOKEEPER-2194.patch Let DataNode.getChildren() return an unmodifiable view of its children set -- Key: ZOOKEEPER-2194 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2194 Project: ZooKeeper Issue Type: Improvement Reporter: Hitoshi Mitake Priority: Trivial Attachments: ZOOKEEPER-2194.patch Current DataNode.getChildren() directly returns a pointer of its private member, children. However, the member should be modified through addChild() and removeChild(). Callers of getChildren() shouldn't modify it directly. For preventing the direct modification by the callers, this patch lets getChildren() return an unmodifiable view of its children set. If the callers try to modify directly, runtime exception will be risen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (ZOOKEEPER-2194) Let DataNode.getChildren() return an unmodifiable view of its children set
Hitoshi Mitake created ZOOKEEPER-2194: - Summary: Let DataNode.getChildren() return an unmodifiable view of its children set Key: ZOOKEEPER-2194 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2194 Project: ZooKeeper Issue Type: Improvement Reporter: Hitoshi Mitake Priority: Trivial Current DataNode.getChildren() directly returns a pointer of its private member, children. However, the member should be modified through addChild() and removeChild(). Callers of getChildren() shouldn't modify it directly. For preventing the direct modification by the callers, this patch lets getChildren() return an unmodifiable view of its children set. If the callers try to modify directly, runtime exception will be risen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2194) Let DataNode.getChildren() return an unmodifiable view of its children set
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitoshi Mitake updated ZOOKEEPER-2194: -- Attachment: ZOOKEEPER-2194-v2.patch Version 2, modified based on the comments from [~cnauroth]. Let DataNode.getChildren() return an unmodifiable view of its children set -- Key: ZOOKEEPER-2194 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2194 Project: ZooKeeper Issue Type: Improvement Reporter: Hitoshi Mitake Priority: Trivial Attachments: ZOOKEEPER-2194-v2.patch, ZOOKEEPER-2194.patch Current DataNode.getChildren() directly returns a pointer of its private member, children. However, the member should be modified through addChild() and removeChild(). Callers of getChildren() shouldn't modify it directly. For preventing the direct modification by the callers, this patch lets getChildren() return an unmodifiable view of its children set. If the callers try to modify directly, runtime exception will be risen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2194) Let DataNode.getChildren() return an unmodifiable view of its children set
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555401#comment-14555401 ] Hitoshi Mitake commented on ZOOKEEPER-2194: --- Hi [~cnauroth], thanks for your reply. I'll fix the style of the conditional branch, and follow your instruction of patch generation in v2. Thanks a lot for your review! Let DataNode.getChildren() return an unmodifiable view of its children set -- Key: ZOOKEEPER-2194 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2194 Project: ZooKeeper Issue Type: Improvement Reporter: Hitoshi Mitake Priority: Trivial Attachments: ZOOKEEPER-2194.patch Current DataNode.getChildren() directly returns a pointer of its private member, children. However, the member should be modified through addChild() and removeChild(). Callers of getChildren() shouldn't modify it directly. For preventing the direct modification by the callers, this patch lets getChildren() return an unmodifiable view of its children set. If the callers try to modify directly, runtime exception will be risen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)