[GitHub] zookeeper issue #465: ZOOKEEPER-2930: Leader cannot be elected due to networ...
Github user JiangJiafu commented on the issue: https://github.com/apache/zookeeper/pull/465 Will this patch be merged to 3.4 branchï¼ ---
[GitHub] zookeeper pull request #403: Zookeeper 2923
Github user JiangJiafu closed the pull request at: https://github.com/apache/zookeeper/pull/403 ---
[GitHub] zookeeper pull request #408: ZOOKEEPER-2923
GitHub user JiangJiafu opened a pull request: https://github.com/apache/zookeeper/pull/408 ZOOKEEPER-2923 You can merge this pull request into a Git repository by running: $ git pull https://github.com/JiangJiafu/zookeeper ZOOKEEPER-2923 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/408.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #408 commit b6eb491d7dd1bd8a4db93813bb62c6abf4efe31e Author: Jiafu Jiang <jiangjiafu1...@gmail.com> Date: 2017-10-26T10:05:05Z ZOOKEEPER-2923 ---
[GitHub] zookeeper pull request #403: Zookeeper 2923
GitHub user JiangJiafu opened a pull request: https://github.com/apache/zookeeper/pull/403 Zookeeper 2923 You can merge this pull request into a Git repository by running: $ git pull https://github.com/JiangJiafu/zookeeper ZOOKEEPER-2923 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/403.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #403 commit 489cec9b78b21a2a241eeab18ddfb968758b2e67 Author: JiangJiafu <jiangjiafu1...@gmail.com> Date: 2017-02-13T11:36:44Z ZOOKEEPER-2691: recreateSocketAddresses may recreate the unreachable IP address commit 31700c45030cca2d702fe0279443cd3f3b46a2b0 Author: JiangJiafu <jiangjiafu1...@gmail.com> Date: 2017-02-14T02:00:13Z ZOOKEEPER-2691: recreateSocketAddresses may recreate the unreachable IP address commit 5c1bf6bd452e8237cb0bb9d871f3d0b3d08e0de2 Author: JiangJiafu <jiangjiafu1...@gmail.com> Date: 2017-02-18T01:46:10Z ZOOKEEPER-2691: recreateSocketAddresses may recreate the unreachable IP address commit f4999a6df12d6ff42ea92596facb58d11695ba25 Author: JiangJiafu <jiangjiafu1...@gmail.com> Date: 2017-05-11T11:07:07Z Merge branch 'branch-3.4' of https://github.com/apache/zookeeper into ZOOKEEPER-2691 commit aa7b63d047450d6d1189860d08a3bc16d3ca4243 Author: Jiang Jiafu <jiangjiafu1...@gmail.com> Date: 2017-05-12T03:22:17Z ZOOKEEPER-2691 commit e2589df9630fd0310c5a39275a25632b27c50a1a Author: Jiang Jiafu <jiangjiafu1...@gmail.com> Date: 2017-05-12T03:35:03Z ZOOKEEPER-2691 commit eeb07f9b385d5e0161919f874466f837aaed3f99 Author: Jiang Jiafu <jiangjiafu1...@gmail.com> Date: 2017-05-12T06:02:51Z ZOOKEEPER-2691 commit c366949d6325cdc61aed59be30e37fe743186575 Author: Jiang Jiafu <jiangjiafu1...@gmail.com> Date: 2017-05-12T09:29:51Z ZOOKEEPER-2691 commit 6139f533af4f3b513bd713746449f147503168e0 Author: Jiang Jiafu <jiangjiafu1...@gmail.com> Date: 2017-05-12T09:47:31Z ZOOKEEPER-2691 commit 3ac65ead39fad4f8d9f26365e1bc73f83889f11e Author: Jiang Jiafu <jiangjiafu1...@gmail.com> Date: 2017-05-13T03:41:52Z ZOOKEEPER-2774 commit b67c3730936aab8c5de38f0d56dd344e1d5a2a6a Author: JiangJiafu <jiangjiafu1...@gmail.com> Date: 2017-05-13T07:56:01Z Merge pull request #1 from apache/branch-3.4 Branch 3.4 commit fbb52bc7156520ea70f8274a3ac9d46ea84d48b2 Author: Jiang Jiafu <jiangjiafu1...@gmail.com> Date: 2017-05-13T07:57:59Z Merge remote-tracking branch 'remotes/origin/ZOOKEEPER-2774' into sugon commit 92d134e84f501003fc01a2d0018c0dc8406444fd Author: Jiang Jiafu <jiangjiafu1...@gmail.com> Date: 2017-05-13T07:58:23Z Merge remote-tracking branch 'remotes/origin/ZOOKEEPER-2691' into sugon commit dc2924977fb2d5527810581012c3144b4dc11632 Author: JiangJiafu <jiangjiafu1...@gmail.com> Date: 2017-05-16T01:14:11Z Update ZKUtil.java commit 172e35153e7fa226fdffa41bd0f353ee6377098d Author: JiangJiafu <jiangjiafu1...@gmail.com> Date: 2017-05-16T01:15:21Z Update ZKUtil.java commit 5afbd4eb6a0d97c20df89d3c307f5092d19db5e1 Author: JiangJiafu <jiangjiafu1...@gmail.com> Date: 2017-05-16T01:16:34Z Update ZKUtil.java commit 6913c0175065a0d88384d38016b52fdbfe78bac2 Author: JiangJiafu <jiangjiafu1...@gmail.com> Date: 2017-05-16T01:59:04Z Merge pull request #3 from JiangJiafu/ZOOKEEPER-2774 Zookeeper 2774 commit 61b5644f680ebc8ddcc5af1c9aee932a038a390b Author: Jiang Jiafu <jiangjiafu1...@gmail.com> Date: 2017-05-17T07:51:18Z ZOOKEEPER-2691 commit 61f5b20b5b86fd68901016b7c9715033fc88ef5c Author: Jiang Jiafu <jiangjiafu1...@gmail.com> Date: 2017-05-17T07:59:26Z Remove unnecessary change. commit 5afa15d3fd2f9e8eee5b57165b058c6fe015ead5 Author: Jiang Jiafu <jiangjiafu1...@gmail.com> Date: 2017-05-19T01:14:31Z Merge branch 'branch-3.4' of https://github.com/apache/zookeeper into HEAD commit f9bfd1de0371adf5ea9a2e98c8df64ec9e161e51 Author: Jiang Jiafu <jiangjiafu1...@gmail.com> Date: 2017-05-19T01:31:31Z Merge remote-tracking branch 'origin/branch-3.4' into sugon Conflicts: src/java/main/org/apache/zookeeper/ZKUtil.java src/java/test/org/apache/zookeeper/test/ReadOnlyModeTest.java commit 712925098704c8a77a3ff504c08ca0e89f17ea9b Author: Jiang Jiafu <jiangjiafu1...@gmail.com> Date: 2017-05-19T01:32:58Z Merge branch 'sugon' of https://github.com/JiangJiafu/zookeeper into sugon commit 4b265f8a9952df555be53b4d3a4e0d5a6bae8eb9 Author: JiangJiafu <jiangjiafu1...@gmail.com> Date: 2017-05-19T01:37:10Z Merge pull request #4 from JiangJiafu/ZOOKEEPER-2691 Zookeeper 2691 commit a96d46bba8217b01884e37044d143ccad0ebf20a Author: Jiang J
[GitHub] zookeeper pull request #112: ZOOKEEPER-2355:Ephemeral node is never deleted ...
Github user JiangJiafu commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/112#discussion_r121320033 --- Diff: src/java/test/org/apache/zookeeper/server/quorum/EphemeralNodeDeletionTest.java --- @@ -0,0 +1,222 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.zookeeper.server.quorum; + +import static org.apache.zookeeper.test.ClientBase.CONNECTION_TIMEOUT; +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertNotNull; +import static org.junit.Assert.assertNull; + +import java.io.IOException; +import java.net.SocketTimeoutException; + +import org.apache.zookeeper.CreateMode; +import org.apache.zookeeper.PortAssignment; +import org.apache.zookeeper.ZooDefs.Ids; +import org.apache.zookeeper.ZooKeeper; +import org.apache.zookeeper.data.Stat; +import org.apache.zookeeper.server.persistence.FileTxnSnapLog; +import org.apache.zookeeper.server.quorum.QuorumPeer.ServerState; +import org.apache.zookeeper.test.ClientBase; +import org.apache.zookeeper.test.ClientBase.CountdownWatcher; +import org.junit.After; +import org.junit.Assert; +import org.junit.Test; + +public class EphemeralNodeDeletionTest extends QuorumPeerTestBase { +private static int SERVER_COUNT = 3; +private MainThread[] mt = new MainThread[SERVER_COUNT]; + +/** + * Test case for https://issues.apache.org/jira/browse/ZOOKEEPER-2355. + * ZooKeeper ephemeral node is never deleted if follower fail while reading + * the proposal packet. + */ + +@Test(timeout = 12) +public void testEphemeralNodeDeletion() throws Exception { +final int clientPorts[] = new int[SERVER_COUNT]; +StringBuilder sb = new StringBuilder(); +String server; + +for (int i = 0; i < SERVER_COUNT; i++) { +clientPorts[i] = PortAssignment.unique(); +server = "server." + i + "=127.0.0.1:" + PortAssignment.unique() ++ ":" + PortAssignment.unique() + ":participant;127.0.0.1:" ++ clientPorts[i]; +sb.append(server + "\n"); +} +String currentQuorumCfgSection = sb.toString(); +// start all the servers +for (int i = 0; i < SERVER_COUNT; i++) { +mt[i] = new MainThread(i, clientPorts[i], currentQuorumCfgSection, +false) { +@Override +public TestQPMain getTestQPMain() { +return new MockTestQPMain(); +} +}; +mt[i].start(); +} + +// ensure all servers started +for (int i = 0; i < SERVER_COUNT; i++) { +Assert.assertTrue("waiting for server " + i + " being up", +ClientBase.waitForServerUp("127.0.0.1:" + clientPorts[i], +CONNECTION_TIMEOUT)); +} + +CountdownWatcher watch = new CountdownWatcher(); +ZooKeeper zk = new ZooKeeper("127.0.0.1:" + clientPorts[1], +ClientBase.CONNECTION_TIMEOUT, watch); +watch.waitForConnected(ClientBase.CONNECTION_TIMEOUT); + +/** + * now the problem scenario starts + */ + +Stat firstEphemeralNode = new Stat(); + +// 1: create ephemeral node +String nodePath = "/e1"; +zk.create(nodePath, "1".getBytes(), Ids.OPEN_ACL_UNSAFE, +CreateMode.EPHEMERAL, firstEphemeralNode); +assertEquals("Current session and ephemeral owner should be same", +zk.getSessionId(), firstEphemeralNode.getEphemeralOwner()); + +// 2: inject network probl
[jira] [Commented] (ZOOKEEPER-2800) zookeeper ephemeral node not deleted after server restart and consistency is not hold
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044171#comment-16044171 ] JiangJiafu commented on ZOOKEEPER-2800: --- I believe this PR is the same with ZOOKEEPER-2355, thank you for your reminding [~rakeshr]. I will use the patch provided in ZOOKEEPER-2355, and see whether the PR will happen again. I hope the patch can work fine. > zookeeper ephemeral node not deleted after server restart and consistency is > not hold > - > > Key: ZOOKEEPER-2800 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2800 > Project: ZooKeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.4.11 > Environment: Centos6.5 java8 >Reporter: JiangJiafu >Priority: Critical > Attachments: zoo.cfg, zookeeper2.out, zookeeper3.out, zookeeper.out > > > I deploy a cluster of ZooKeeper with three nodes: > ofs_zk1:30.0.0.72 > ofs_zk2:30.0.0.73 > ofs_zk3:30.0.0.99 > On 2017-06-02, use the c zk client to create some ephemeral sequential nodes,: > /adm_election/rolemgr/rolemgr08, > /adm_election/rolemgr/rolemgr11, > /adm_election/rolemgr/rolemgr12, > with sesstion timeout 2 ms. > Then I restart ofs_zk1 and ofs_zk2. > On 2017-06-05, I found that, these ephemeral nodes still exist on ofs_zk1. > I can check the nodes by zkCli.sh get command on ofs_zk1. > But these nodes doesn't not exist on ofs_zk2 and ofs_zk3. > Is it odd? > I have upload the whole deploy directory of three nodes to: > https://pan.baidu.com/s/1miohiCo , > The log is printed in log/zookeeper.out > log of ofs_zk3 is too large, so I only show the head 1000 lines. > Since I find this PR a little late, some snapshot and log may be deleted. > I hope anyone can help find the reason. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ZOOKEEPER-2355) Ephemeral node is never deleted if follower fails while reading the proposal packet
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044155#comment-16044155 ] JiangJiafu commented on ZOOKEEPER-2355: --- Can this bug be fixed in 3.4.11??? As I know the consistency is the most important property of ZooKeeper, so I think this bug has higher priority than many others. Hope it can be fixed soon. > Ephemeral node is never deleted if follower fails while reading the proposal > packet > --- > > Key: ZOOKEEPER-2355 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2355 > Project: ZooKeeper > Issue Type: Bug > Components: quorum, server >Reporter: Mohammad Arshad >Assignee: Mohammad Arshad >Priority: Critical > Fix For: 3.5.4, 3.6.0 > > Attachments: ZOOKEEPER-2355-01.patch, ZOOKEEPER-2355-02.patch, > ZOOKEEPER-2355-03.patch, ZOOKEEPER-2355-04.patch, ZOOKEEPER-2355-05.patch > > > ZooKeeper ephemeral node is never deleted if follower fail while reading the > proposal packet > The scenario is as follows: > # Configure three node ZooKeeper cluster, lets say nodes are A, B and C, > start all, assume A is leader, B and C are follower > # Connect to any of the server and create ephemeral node /e1 > # Close the session, ephemeral node /e1 will go for deletion > # While receiving delete proposal make Follower B to fail with > {{SocketTimeoutException}}. This we need to do to reproduce the scenario > otherwise in production environment it happens because of network fault. > # Remove the fault, just check that faulted Follower is now connected with > quorum > # Connect to any of the server, create the same ephemeral node /e1, created > is success. > # Close the session, ephemeral node /e1 will go for deletion > # {color:red}/e1 is not deleted from the faulted Follower B, It should have > been deleted as it was again created with another session{color} > # {color:green}/e1 is deleted from Leader A and other Follower C{color} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ZOOKEEPER-2800) zookeeper ephemeral node not deleted after server restart and consistency is not hold
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044091#comment-16044091 ] JiangJiafu commented on ZOOKEEPER-2800: --- I found that, the first time the follower try to reconnect to the leader, it sends the peerLastZxid 0x13748 to the leader and begin to sync the log from 0x13749, but failed due to network disconnection. The second time the follower try to reconnect to the leader, it sends the peerLastZxid 0x1385c to the leader, therefore, the log 0x13749 ~ 0x1385c is missing!! > zookeeper ephemeral node not deleted after server restart and consistency is > not hold > - > > Key: ZOOKEEPER-2800 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2800 > Project: ZooKeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.4.11 > Environment: Centos6.5 java8 >Reporter: JiangJiafu >Priority: Critical > Attachments: zoo.cfg, zookeeper2.out, zookeeper3.out, zookeeper.out > > > I deploy a cluster of ZooKeeper with three nodes: > ofs_zk1:30.0.0.72 > ofs_zk2:30.0.0.73 > ofs_zk3:30.0.0.99 > On 2017-06-02, use the c zk client to create some ephemeral sequential nodes,: > /adm_election/rolemgr/rolemgr08, > /adm_election/rolemgr/rolemgr11, > /adm_election/rolemgr/rolemgr12, > with sesstion timeout 2 ms. > Then I restart ofs_zk1 and ofs_zk2. > On 2017-06-05, I found that, these ephemeral nodes still exist on ofs_zk1. > I can check the nodes by zkCli.sh get command on ofs_zk1. > But these nodes doesn't not exist on ofs_zk2 and ofs_zk3. > Is it odd? > I have upload the whole deploy directory of three nodes to: > https://pan.baidu.com/s/1miohiCo , > The log is printed in log/zookeeper.out > log of ofs_zk3 is too large, so I only show the head 1000 lines. > Since I find this PR a little late, some snapshot and log may be deleted. > I hope anyone can help find the reason. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ZOOKEEPER-2800) zookeeper ephemeral node not deleted after server restart and consistency is not hold
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044030#comment-16044030 ] JiangJiafu commented on ZOOKEEPER-2800: --- I have a quick look to the 2355, I am not pretty sure these are the same PR. But from the log I can see that zk1(the problem node) do lost connection to the leader while wring data, and then many transcations are lost too(including the closeSession transcation). > zookeeper ephemeral node not deleted after server restart and consistency is > not hold > - > > Key: ZOOKEEPER-2800 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2800 > Project: ZooKeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.4.11 > Environment: Centos6.5 java8 >Reporter: JiangJiafu >Priority: Critical > Attachments: zoo.cfg, zookeeper2.out, zookeeper3.out, zookeeper.out > > > I deploy a cluster of ZooKeeper with three nodes: > ofs_zk1:30.0.0.72 > ofs_zk2:30.0.0.73 > ofs_zk3:30.0.0.99 > On 2017-06-02, use the c zk client to create some ephemeral sequential nodes,: > /adm_election/rolemgr/rolemgr08, > /adm_election/rolemgr/rolemgr11, > /adm_election/rolemgr/rolemgr12, > with sesstion timeout 2 ms. > Then I restart ofs_zk1 and ofs_zk2. > On 2017-06-05, I found that, these ephemeral nodes still exist on ofs_zk1. > I can check the nodes by zkCli.sh get command on ofs_zk1. > But these nodes doesn't not exist on ofs_zk2 and ofs_zk3. > Is it odd? > I have upload the whole deploy directory of three nodes to: > https://pan.baidu.com/s/1miohiCo , > The log is printed in log/zookeeper.out > log of ofs_zk3 is too large, so I only show the head 1000 lines. > Since I find this PR a little late, some snapshot and log may be deleted. > I hope anyone can help find the reason. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ZOOKEEPER-2800) zookeeper ephemeral node not deleted after server restart and consistency is not hold
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043987#comment-16043987 ] JiangJiafu commented on ZOOKEEPER-2800: --- In the recently environment, I found that, zk3 (leader) found the node expired, and then zk2 and zk3 deleted the node, but the transcation is not done in zk1! > zookeeper ephemeral node not deleted after server restart and consistency is > not hold > - > > Key: ZOOKEEPER-2800 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2800 > Project: ZooKeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.4.11 > Environment: Centos6.5 java8 >Reporter: JiangJiafu >Priority: Critical > Attachments: zoo.cfg, zookeeper2.out, zookeeper3.out, zookeeper.out > > > I deploy a cluster of ZooKeeper with three nodes: > ofs_zk1:30.0.0.72 > ofs_zk2:30.0.0.73 > ofs_zk3:30.0.0.99 > On 2017-06-02, use the c zk client to create some ephemeral sequential nodes,: > /adm_election/rolemgr/rolemgr08, > /adm_election/rolemgr/rolemgr11, > /adm_election/rolemgr/rolemgr12, > with sesstion timeout 2 ms. > Then I restart ofs_zk1 and ofs_zk2. > On 2017-06-05, I found that, these ephemeral nodes still exist on ofs_zk1. > I can check the nodes by zkCli.sh get command on ofs_zk1. > But these nodes doesn't not exist on ofs_zk2 and ofs_zk3. > Is it odd? > I have upload the whole deploy directory of three nodes to: > https://pan.baidu.com/s/1miohiCo , > The log is printed in log/zookeeper.out > log of ofs_zk3 is too large, so I only show the head 1000 lines. > Since I find this PR a little late, some snapshot and log may be deleted. > I hope anyone can help find the reason. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ZOOKEEPER-2800) zookeeper ephemeral node not deleted after server restart and consistency is not hold
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043968#comment-16043968 ] JiangJiafu commented on ZOOKEEPER-2800: --- I think this must be a bug, because the PR happens again in my environment. > zookeeper ephemeral node not deleted after server restart and consistency is > not hold > - > > Key: ZOOKEEPER-2800 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2800 > Project: ZooKeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.4.11 > Environment: Centos6.5 java8 >Reporter: JiangJiafu >Priority: Critical > Attachments: zoo.cfg, zookeeper2.out, zookeeper3.out, zookeeper.out > > > I deploy a cluster of ZooKeeper with three nodes: > ofs_zk1:30.0.0.72 > ofs_zk2:30.0.0.73 > ofs_zk3:30.0.0.99 > On 2017-06-02, use the c zk client to create some ephemeral sequential nodes,: > /adm_election/rolemgr/rolemgr08, > /adm_election/rolemgr/rolemgr11, > /adm_election/rolemgr/rolemgr12, > with sesstion timeout 2 ms. > Then I restart ofs_zk1 and ofs_zk2. > On 2017-06-05, I found that, these ephemeral nodes still exist on ofs_zk1. > I can check the nodes by zkCli.sh get command on ofs_zk1. > But these nodes doesn't not exist on ofs_zk2 and ofs_zk3. > Is it odd? > I have upload the whole deploy directory of three nodes to: > https://pan.baidu.com/s/1miohiCo , > The log is printed in log/zookeeper.out > log of ofs_zk3 is too large, so I only show the head 1000 lines. > Since I find this PR a little late, some snapshot and log may be deleted. > I hope anyone can help find the reason. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (ZOOKEEPER-2800) zookeeper ephemeral node not deleted after server restart and consistency is not hold
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] JiangJiafu updated ZOOKEEPER-2800: -- Attachment: zookeeper3.out zookeeper log of ofs_zk3 > zookeeper ephemeral node not deleted after server restart and consistency is > not hold > - > > Key: ZOOKEEPER-2800 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2800 > Project: ZooKeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.4.11 > Environment: Centos6.5 java8 >Reporter: JiangJiafu >Priority: Critical > Attachments: zoo.cfg, zookeeper2.out, zookeeper3.out, zookeeper.out > > > I deploy a cluster of ZooKeeper with three nodes: > ofs_zk1:30.0.0.72 > ofs_zk2:30.0.0.73 > ofs_zk3:30.0.0.99 > On 2017-06-02, use the c zk client to create some ephemeral sequential nodes,: > /adm_election/rolemgr/rolemgr08, > /adm_election/rolemgr/rolemgr11, > /adm_election/rolemgr/rolemgr12, > with sesstion timeout 2 ms. > Then I restart ofs_zk1 and ofs_zk2. > On 2017-06-05, I found that, these ephemeral nodes still exist on ofs_zk1. > I can check the nodes by zkCli.sh get command on ofs_zk1. > But these nodes doesn't not exist on ofs_zk2 and ofs_zk3. > Is it odd? > I have upload the whole deploy directory of three nodes to: > https://pan.baidu.com/s/1miohiCo , > The log is printed in log/zookeeper.out > log of ofs_zk3 is too large, so I only show the head 1000 lines. > Since I find this PR a little late, some snapshot and log may be deleted. > I hope anyone can help find the reason. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (ZOOKEEPER-2800) zookeeper ephemeral node not deleted after server restart and consistency is not hold
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] JiangJiafu updated ZOOKEEPER-2800: -- Attachment: zookeeper2.out > zookeeper ephemeral node not deleted after server restart and consistency is > not hold > - > > Key: ZOOKEEPER-2800 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2800 > Project: ZooKeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.4.11 > Environment: Centos6.5 java8 >Reporter: JiangJiafu >Priority: Critical > Attachments: zoo.cfg, zookeeper2.out, zookeeper.out > > > I deploy a cluster of ZooKeeper with three nodes: > ofs_zk1:30.0.0.72 > ofs_zk2:30.0.0.73 > ofs_zk3:30.0.0.99 > On 2017-06-02, use the c zk client to create some ephemeral sequential nodes,: > /adm_election/rolemgr/rolemgr08, > /adm_election/rolemgr/rolemgr11, > /adm_election/rolemgr/rolemgr12, > with sesstion timeout 2 ms. > Then I restart ofs_zk1 and ofs_zk2. > On 2017-06-05, I found that, these ephemeral nodes still exist on ofs_zk1. > I can check the nodes by zkCli.sh get command on ofs_zk1. > But these nodes doesn't not exist on ofs_zk2 and ofs_zk3. > Is it odd? > I have upload the whole deploy directory of three nodes to: > https://pan.baidu.com/s/1miohiCo , > The log is printed in log/zookeeper.out > log of ofs_zk3 is too large, so I only show the head 1000 lines. > Since I find this PR a little late, some snapshot and log may be deleted. > I hope anyone can help find the reason. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (ZOOKEEPER-2800) zookeeper ephemeral node not deleted after server restart and consistency is not hold
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] JiangJiafu updated ZOOKEEPER-2800: -- Attachment: zookeeper.out > zookeeper ephemeral node not deleted after server restart and consistency is > not hold > - > > Key: ZOOKEEPER-2800 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2800 > Project: ZooKeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.4.11 > Environment: Centos6.5 java8 >Reporter: JiangJiafu >Priority: Critical > Attachments: zoo.cfg, zookeeper.out > > > I deploy a cluster of ZooKeeper with three nodes: > ofs_zk1:30.0.0.72 > ofs_zk2:30.0.0.73 > ofs_zk3:30.0.0.99 > On 2017-06-02, use the c zk client to create some ephemeral sequential nodes,: > /adm_election/rolemgr/rolemgr08, > /adm_election/rolemgr/rolemgr11, > /adm_election/rolemgr/rolemgr12, > with sesstion timeout 2 ms. > Then I restart ofs_zk1 and ofs_zk2. > On 2017-06-05, I found that, these ephemeral nodes still exist on ofs_zk1. > I can check the nodes by zkCli.sh get command on ofs_zk1. > But these nodes doesn't not exist on ofs_zk2 and ofs_zk3. > Is it odd? > I have upload the whole deploy directory of three nodes to: > https://pan.baidu.com/s/1miohiCo , > The log is printed in log/zookeeper.out > log of ofs_zk3 is too large, so I only show the head 1000 lines. > Since I find this PR a little late, some snapshot and log may be deleted. > I hope anyone can help find the reason. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (ZOOKEEPER-2800) zookeeper ephemeral node not deleted after server restart and consistency is not hold
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] JiangJiafu updated ZOOKEEPER-2800: -- Attachment: zoo.cfg > zookeeper ephemeral node not deleted after server restart and consistency is > not hold > - > > Key: ZOOKEEPER-2800 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2800 > Project: ZooKeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.4.11 > Environment: Centos6.5 java8 >Reporter: JiangJiafu >Priority: Critical > Attachments: zoo.cfg > > > I deploy a cluster of ZooKeeper with three nodes: > ofs_zk1:30.0.0.72 > ofs_zk2:30.0.0.73 > ofs_zk3:30.0.0.99 > On 2017-06-02, use the c zk client to create some ephemeral sequential nodes,: > /adm_election/rolemgr/rolemgr08, > /adm_election/rolemgr/rolemgr11, > /adm_election/rolemgr/rolemgr12, > with sesstion timeout 2 ms. > Then I restart ofs_zk1 and ofs_zk2. > On 2017-06-05, I found that, these ephemeral nodes still exist on ofs_zk1. > I can check the nodes by zkCli.sh get command on ofs_zk1. > But these nodes doesn't not exist on ofs_zk2 and ofs_zk3. > Is it odd? > I have upload the whole deploy directory of three nodes to: > https://pan.baidu.com/s/1miohiCo , > The log is printed in log/zookeeper.out > log of ofs_zk3 is too large, so I only show the head 1000 lines. > Since I find this PR a little late, some snapshot and log may be deleted. > I hope anyone can help find the reason. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (ZOOKEEPER-2800) zookeeper ephemeral node not deleted after server restart and consistency is not hold
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] JiangJiafu updated ZOOKEEPER-2800: -- Description: I deploy a cluster of ZooKeeper with three nodes: ofs_zk1:30.0.0.72 ofs_zk2:30.0.0.73 ofs_zk3:30.0.0.99 On 2017-06-02, use the c zk client to create some ephemeral sequential nodes,: /adm_election/rolemgr/rolemgr08, /adm_election/rolemgr/rolemgr11, /adm_election/rolemgr/rolemgr12, with sesstion timeout 2 ms. Then I restart ofs_zk1 and ofs_zk2. On 2017-06-05, I found that, these ephemeral nodes still exist on ofs_zk1. I can check the nodes by zkCli.sh get command on ofs_zk1. But these nodes doesn't not exist on ofs_zk2 and ofs_zk3. Is it odd? I have upload the whole deploy directory of three nodes to: https://pan.baidu.com/s/1miohiCo , The log is printed in log/zookeeper.out log of ofs_zk3 is too large, so I only show the head 1000 lines. Since I find this PR a little late, some snapshot and log may be deleted. I hope anyone can help find the reason. was: I deploy a cluster of ZooKeeper with three nodes: ofs_zk1:30.0.0.72 ofs_zk2:30.0.0.73 ofs_zk3:30.0.0.99 On 2017-06-02, use the c zk client to create some ephemeral sequential nodes,: /adm_election/rolemgr/rolemgr08, /adm_election/rolemgr/rolemgr11, /adm_election/rolemgr/rolemgr12, with sesstion timeout 2 ms. Then I restart ofs_zk1 and ofs_zk2. On 2017-06-05, I found that, the ephemeral nodes still exist on ofs_zk1. I can check the nodes by zkCli.sh get command on ofs_zk1. But these nodes doesn't not exist on ofs_zk2 and ofs_zk3. Is it odd? I have upload the whole deploy directory of three nodes to: https://pan.baidu.com/s/1miohiCo , The log is printed in log/zookeeper.out log of ofs_zk3 is too large, so I only show the head 1000 lines. Since I find this PR a little late, some snapshot and log may be deleted. I hope anyone can help find the reason. > zookeeper ephemeral node not deleted after server restart and consistency is > not hold > - > > Key: ZOOKEEPER-2800 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2800 > Project: ZooKeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.4.11 > Environment: Centos6.5 java8 >Reporter: JiangJiafu >Priority: Critical > > I deploy a cluster of ZooKeeper with three nodes: > ofs_zk1:30.0.0.72 > ofs_zk2:30.0.0.73 > ofs_zk3:30.0.0.99 > On 2017-06-02, use the c zk client to create some ephemeral sequential nodes,: > /adm_election/rolemgr/rolemgr08, > /adm_election/rolemgr/rolemgr11, > /adm_election/rolemgr/rolemgr12, > with sesstion timeout 2 ms. > Then I restart ofs_zk1 and ofs_zk2. > On 2017-06-05, I found that, these ephemeral nodes still exist on ofs_zk1. > I can check the nodes by zkCli.sh get command on ofs_zk1. > But these nodes doesn't not exist on ofs_zk2 and ofs_zk3. > Is it odd? > I have upload the whole deploy directory of three nodes to: > https://pan.baidu.com/s/1miohiCo , > The log is printed in log/zookeeper.out > log of ofs_zk3 is too large, so I only show the head 1000 lines. > Since I find this PR a little late, some snapshot and log may be deleted. > I hope anyone can help find the reason. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (ZOOKEEPER-2800) zookeeper ephemeral node not deleted after server restart and consistency is not hold
JiangJiafu created ZOOKEEPER-2800: - Summary: zookeeper ephemeral node not deleted after server restart and consistency is not hold Key: ZOOKEEPER-2800 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2800 Project: ZooKeeper Issue Type: Bug Components: quorum Affects Versions: 3.4.11 Environment: Centos6.5 java8 Reporter: JiangJiafu Priority: Critical I deploy a cluster of ZooKeeper with three nodes: ofs_zk1:30.0.0.72 ofs_zk2:30.0.0.73 ofs_zk3:30.0.0.99 On 2017-06-02, use the c zk client to create some ephemeral sequential nodes,: /adm_election/rolemgr/rolemgr08, /adm_election/rolemgr/rolemgr11, /adm_election/rolemgr/rolemgr12, with sesstion timeout 2 ms. Then I restart ofs_zk1 and ofs_zk2. On 2017-06-05, I found that, the ephemeral nodes still exist on ofs_zk1. I can check the nodes by zkCli.sh get command on ofs_zk1. But these nodes doesn't not exist on ofs_zk2 and ofs_zk3. Is it odd? I have upload the whole deploy directory of three nodes to: https://pan.baidu.com/s/1miohiCo , The log is printed in log/zookeeper.out log of ofs_zk3 is too large, so I only show the head 1000 lines. Since I find this PR a little late, some snapshot and log may be deleted. I hope anyone can help find the reason. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] zookeeper pull request #269: Sugon
Github user JiangJiafu closed the pull request at: https://github.com/apache/zookeeper/pull/269 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper pull request #269: Sugon
GitHub user JiangJiafu opened a pull request: https://github.com/apache/zookeeper/pull/269 Sugon You can merge this pull request into a Git repository by running: $ git pull https://github.com/JiangJiafu/zookeeper sugon Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/269.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #269 commit 489cec9b78b21a2a241eeab18ddfb968758b2e67 Author: JiangJiafu <jiangjiafu1...@gmail.com> Date: 2017-02-13T11:36:44Z ZOOKEEPER-2691: recreateSocketAddresses may recreate the unreachable IP address commit 31700c45030cca2d702fe0279443cd3f3b46a2b0 Author: JiangJiafu <jiangjiafu1...@gmail.com> Date: 2017-02-14T02:00:13Z ZOOKEEPER-2691: recreateSocketAddresses may recreate the unreachable IP address commit 5c1bf6bd452e8237cb0bb9d871f3d0b3d08e0de2 Author: JiangJiafu <jiangjiafu1...@gmail.com> Date: 2017-02-18T01:46:10Z ZOOKEEPER-2691: recreateSocketAddresses may recreate the unreachable IP address commit f4999a6df12d6ff42ea92596facb58d11695ba25 Author: JiangJiafu <jiangjiafu1...@gmail.com> Date: 2017-05-11T11:07:07Z Merge branch 'branch-3.4' of https://github.com/apache/zookeeper into ZOOKEEPER-2691 commit aa7b63d047450d6d1189860d08a3bc16d3ca4243 Author: Jiang Jiafu <jiangjiafu1...@gmail.com> Date: 2017-05-12T03:22:17Z ZOOKEEPER-2691 commit e2589df9630fd0310c5a39275a25632b27c50a1a Author: Jiang Jiafu <jiangjiafu1...@gmail.com> Date: 2017-05-12T03:35:03Z ZOOKEEPER-2691 commit eeb07f9b385d5e0161919f874466f837aaed3f99 Author: Jiang Jiafu <jiangjiafu1...@gmail.com> Date: 2017-05-12T06:02:51Z ZOOKEEPER-2691 commit c366949d6325cdc61aed59be30e37fe743186575 Author: Jiang Jiafu <jiangjiafu1...@gmail.com> Date: 2017-05-12T09:29:51Z ZOOKEEPER-2691 commit 6139f533af4f3b513bd713746449f147503168e0 Author: Jiang Jiafu <jiangjiafu1...@gmail.com> Date: 2017-05-12T09:47:31Z ZOOKEEPER-2691 commit 3ac65ead39fad4f8d9f26365e1bc73f83889f11e Author: Jiang Jiafu <jiangjiafu1...@gmail.com> Date: 2017-05-13T03:41:52Z ZOOKEEPER-2774 commit b67c3730936aab8c5de38f0d56dd344e1d5a2a6a Author: JiangJiafu <jiangjiafu1...@gmail.com> Date: 2017-05-13T07:56:01Z Merge pull request #1 from apache/branch-3.4 Branch 3.4 commit fbb52bc7156520ea70f8274a3ac9d46ea84d48b2 Author: Jiang Jiafu <jiangjiafu1...@gmail.com> Date: 2017-05-13T07:57:59Z Merge remote-tracking branch 'remotes/origin/ZOOKEEPER-2774' into sugon commit 92d134e84f501003fc01a2d0018c0dc8406444fd Author: Jiang Jiafu <jiangjiafu1...@gmail.com> Date: 2017-05-13T07:58:23Z Merge remote-tracking branch 'remotes/origin/ZOOKEEPER-2691' into sugon commit dc2924977fb2d5527810581012c3144b4dc11632 Author: JiangJiafu <jiangjiafu1...@gmail.com> Date: 2017-05-16T01:14:11Z Update ZKUtil.java commit 172e35153e7fa226fdffa41bd0f353ee6377098d Author: JiangJiafu <jiangjiafu1...@gmail.com> Date: 2017-05-16T01:15:21Z Update ZKUtil.java commit 5afbd4eb6a0d97c20df89d3c307f5092d19db5e1 Author: JiangJiafu <jiangjiafu1...@gmail.com> Date: 2017-05-16T01:16:34Z Update ZKUtil.java commit 6913c0175065a0d88384d38016b52fdbfe78bac2 Author: JiangJiafu <jiangjiafu1...@gmail.com> Date: 2017-05-16T01:59:04Z Merge pull request #3 from JiangJiafu/ZOOKEEPER-2774 Zookeeper 2774 commit 61b5644f680ebc8ddcc5af1c9aee932a038a390b Author: Jiang Jiafu <jiangjiafu1...@gmail.com> Date: 2017-05-17T07:51:18Z ZOOKEEPER-2691 commit 61f5b20b5b86fd68901016b7c9715033fc88ef5c Author: Jiang Jiafu <jiangjiafu1...@gmail.com> Date: 2017-05-17T07:59:26Z Remove unnecessary change. commit 5afa15d3fd2f9e8eee5b57165b058c6fe015ead5 Author: Jiang Jiafu <jiangjiafu1...@gmail.com> Date: 2017-05-19T01:14:31Z Merge branch 'branch-3.4' of https://github.com/apache/zookeeper into HEAD commit f9bfd1de0371adf5ea9a2e98c8df64ec9e161e51 Author: Jiang Jiafu <jiangjiafu1...@gmail.com> Date: 2017-05-19T01:31:31Z Merge remote-tracking branch 'origin/branch-3.4' into sugon Conflicts: src/java/main/org/apache/zookeeper/ZKUtil.java src/java/test/org/apache/zookeeper/test/ReadOnlyModeTest.java commit 712925098704c8a77a3ff504c08ca0e89f17ea9b Author: Jiang Jiafu <jiangjiafu1...@gmail.com> Date: 2017-05-19T01:32:58Z Merge branch 'sugon' of https://github.com/JiangJiafu/zookeeper into sugon commit 4b265f8a9952df555be53b4d3a4e0d5a6bae8eb9 Author: JiangJiafu <jiangjiafu1...@gmail.com> Date: 2017-05-19T01:37:10Z Merge pull request #4 from JiangJiafu/ZOOKEEPER-2691 Zookeeper 2691 --- If your project is set up for it, you can reply to this email and have your reply appear on GitH
[GitHub] zookeeper pull request #261: Branch 3.4
Github user JiangJiafu closed the pull request at: https://github.com/apache/zookeeper/pull/261 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper pull request #173: ZOOKEEPER-2691: recreateSocketAddresses may rec...
Github user JiangJiafu closed the pull request at: https://github.com/apache/zookeeper/pull/173 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper issue #261: Branch 3.4
Github user JiangJiafu commented on the issue: https://github.com/apache/zookeeper/pull/261 merge --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper pull request #261: Branch 3.4
GitHub user JiangJiafu opened a pull request: https://github.com/apache/zookeeper/pull/261 Branch 3.4 You can merge this pull request into a Git repository by running: $ git pull https://github.com/JiangJiafu/zookeeper branch-3.4 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/261.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #261 commit b67c3730936aab8c5de38f0d56dd344e1d5a2a6a Author: JiangJiafu <jiangjiafu1...@gmail.com> Date: 2017-05-13T07:56:01Z Merge pull request #1 from apache/branch-3.4 Branch 3.4 commit 5afa15d3fd2f9e8eee5b57165b058c6fe015ead5 Author: Jiang Jiafu <jiangjiafu1...@gmail.com> Date: 2017-05-19T01:14:31Z Merge branch 'branch-3.4' of https://github.com/apache/zookeeper into HEAD --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (ZOOKEEPER-2691) recreateSocketAddresses may recreate the unreachable IP address
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16018735#comment-16018735 ] JiangJiafu commented on ZOOKEEPER-2691: --- Can this patch be merged? > recreateSocketAddresses may recreate the unreachable IP address > --- > > Key: ZOOKEEPER-2691 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2691 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.8, 3.4.9, 3.4.10, 3.5.0, 3.5.1, 3.5.2, 3.4.11 > Environment: Centos6.5 > Java8 > ZooKeeper3.4.8 >Reporter: JiangJiafu >Priority: Minor > > The QuorumPeer$QuorumServer.recreateSocketAddress() is used to resolved the > hostname to a new IP address(InetAddress) when any exception happens to the > socket. It will be very useful when a hostname can be resolved to more than > one IP address. > But the problem is Java API InetAddress.getByName(String hostname) will > always return the first IP address when the hostname can be resolved to more > than one IP address, and the first IP address may be unreachable forever. For > example, if a machine has two network interfaces: eth0, eth1, say eth0 has > ip1, eth1 has ip2, the relationship between hostname and the IP addresses is > set in /etc/hosts. When I "close" the eth0 by command "ifdown eth0", the > InetAddress.getByName(String hostname) will still return ip1, which is > unreachable forever. > So I think it will be better to check the IP address by > InetAddress.isReachable(long) and choose the reachable IP address. > I have modified the ZooKeeper source code, and test the new code in my own > environment, and it can work very well when I turn down some network > interfaces using "ifdown" command. > The original code is: > {code:title=QuorumPeer.java|borderStyle=solid} > public void recreateSocketAddresses() { > InetAddress address = null; > try { > address = InetAddress.getByName(this.hostname); > LOG.info("Resolved hostname: {} to address: {}", > this.hostname, address); > this.addr = new InetSocketAddress(address, this.port); > if (this.electionPort > 0){ > this.electionAddr = new InetSocketAddress(address, > this.electionPort); > } > } catch (UnknownHostException ex) { > LOG.warn("Failed to resolve address: {}", this.hostname, ex); > // Have we succeeded in the past? > if (this.addr != null) { > // Yes, previously the lookup succeeded. Leave things as > they are > return; > } > // The hostname has never resolved. Create our > InetSocketAddress(es) as unresolved > this.addr = InetSocketAddress.createUnresolved(this.hostname, > this.port); > if (this.electionPort > 0){ > this.electionAddr = > InetSocketAddress.createUnresolved(this.hostname, > > this.electionPort); > } > } > } > {code} > After my modification: > {code:title=QuorumPeer.java|borderStyle=solid} > public void recreateSocketAddresses() { > InetAddress address = null; > try { > address = getReachableAddress(this.hostname); > LOG.info("Resolved hostname: {} to address: {}", > this.hostname, address); > this.addr = new InetSocketAddress(address, this.port); > if (this.electionPort > 0){ > this.electionAddr = new InetSocketAddress(address, > this.electionPort); > } > } catch (UnknownHostException ex) { > LOG.warn("Failed to resolve address: {}", this.hostname, ex); > // Have we succeeded in the past? > if (this.addr != null) { > // Yes, previously the lookup succeeded. Leave things as > they are > return; > } > // The hostname has never resolved. Create our > InetSocketAddress(es) as unresolved > this.addr = InetSocketAddress.createUnresolved(this.hostname, > this.port); > if (this.electionPort > 0){ > this.electionAddr = > InetSocketAddress.createUnresolved(this.ho
[jira] [Created] (ZOOKEEPER-2788) The define of MAX_CONNECTION_ATTEMPTS in QuorumCnxManager.java seems useless, should it be removed?
JiangJiafu created ZOOKEEPER-2788: - Summary: The define of MAX_CONNECTION_ATTEMPTS in QuorumCnxManager.java seems useless, should it be removed? Key: ZOOKEEPER-2788 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2788 Project: ZooKeeper Issue Type: Improvement Components: leaderElection, quorum Affects Versions: 3.4.10, 3.4.11 Reporter: JiangJiafu Priority: Minor The define of MAX_CONNECTION_ATTEMPTS in QuorumCnxManager.java seems useless, should it be removed? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] zookeeper issue #253: ZOOKEEPER-2774
Github user JiangJiafu commented on the issue: https://github.com/apache/zookeeper/pull/253 Thanks for your work @hanm @afine . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper pull request #253: ZOOKEEPER-2774
Github user JiangJiafu closed the pull request at: https://github.com/apache/zookeeper/pull/253 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper issue #253: ZOOKEEPER-2774
Github user JiangJiafu commented on the issue: https://github.com/apache/zookeeper/pull/253 It seems like now the unit tests are ok. I don't know what is the problems now? @hanm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (ZOOKEEPER-2783) follower disconnects and cannot reconnect
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015134#comment-16015134 ] JiangJiafu commented on ZOOKEEPER-2783: --- I am not pretty sure, will this problem the same as ZOOKEEPER-2701?? > follower disconnects and cannot reconnect > - > > Key: ZOOKEEPER-2783 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2783 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection >Affects Versions: 3.4.10 > Environment: centos 7, AWS EC2 >Reporter: Ben Sherman > Attachments: fail3.log, fail5.log > > > We have a 5 node cluster running 3.4.10 we saw this in .8 and .9 as well), > and sometimes, a node gets a read timeout, drops all the connections and > tries to re-establish itself to the quorum. It can usually do this in a few > seconds, but last night it took almost 15 minutes to reconnect. > These are 5 servers in AWS, and we've tried tuning the timeouts, but the are > exceeding any reasonable timeout and still failing. > In the attached logs, 5 is a follower, 3 is the leader. 5 loses connectivity > at 11:21:34. 3 sees the disconnect at the same moment. > 5 tries to re-establish the quorum, but cannot do it until the connections to > the other servers expire at 11:37:02. After the connections are > re-established, 5 connects immediately. > At 11:41:08, the operator restarted the server, and it reconnected normally. > I suspect there is a problem with stale connections to the rest of the quorum > - the other services on this box were fine (monitoring, puppet) and able to > establish new connections with no problems. > I posed this problem to the zookeeper-users list and was asked to open a > ticket. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] zookeeper issue #253: ZOOKEEPER-2774
Github user JiangJiafu commented on the issue: https://github.com/apache/zookeeper/pull/253 Code has bee changed according to your advice. @hanm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper issue #253: ZOOKEEPER-2774
Github user JiangJiafu commented on the issue: https://github.com/apache/zookeeper/pull/253 @hanm, do you mean that all the test souce files code should be changed if they use System.currentTimeMillis()? Or do you mean that I should just change these two files: ClientBase.java and QuorumPeerMainTest.java? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper issue #173: ZOOKEEPER-2691: recreateSocketAddresses may recreate t...
Github user JiangJiafu commented on the issue: https://github.com/apache/zookeeper/pull/173 Please have a look to the new code, thank you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (ZOOKEEPER-1167) C api lacks synchronous version of sync() call.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013402#comment-16013402 ] JiangJiafu commented on ZOOKEEPER-1167: --- I have read all the comments above, but I don't get the point. In what kind of scenarios will this BUG cause a problem? It seems like this bug is not going to be fixed in 3.4.X version, why? > C api lacks synchronous version of sync() call. > --- > > Key: ZOOKEEPER-1167 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1167 > Project: ZooKeeper > Issue Type: Bug > Components: c client >Affects Versions: 3.3.3, 3.4.3, 3.5.0 >Reporter: Nicholas Harteau >Assignee: Marshall McMullen > Fix For: 3.5.4, 3.6.0 > > Attachments: ZOOKEEPER-1167.patch > > > Reading through the source, the C API implements zoo_async() which is the > zookeeper sync() method implemented in the multithreaded/asynchronous C API. > It doesn't implement anything equivalent in the non-multithreaded API. > I'm not sure if this was oversight or intentional, but it means that the > non-multithreaded API can't guarantee consistent client views on critical > reads. > The zkperl bindings depend on the synchronous, non-multithreaded API so also > can't call sync() currently. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (ZOOKEEPER-2691) recreateSocketAddresses may recreate the unreachable IP address
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] JiangJiafu updated ZOOKEEPER-2691: -- Affects Version/s: 3.4.11 > recreateSocketAddresses may recreate the unreachable IP address > --- > > Key: ZOOKEEPER-2691 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2691 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.8, 3.4.9, 3.4.10, 3.5.0, 3.5.1, 3.5.2, 3.4.11 > Environment: Centos6.5 > Java8 > ZooKeeper3.4.8 >Reporter: JiangJiafu >Priority: Minor > > The QuorumPeer$QuorumServer.recreateSocketAddress() is used to resolved the > hostname to a new IP address(InetAddress) when any exception happens to the > socket. It will be very useful when a hostname can be resolved to more than > one IP address. > But the problem is Java API InetAddress.getByName(String hostname) will > always return the first IP address when the hostname can be resolved to more > than one IP address, and the first IP address may be unreachable forever. For > example, if a machine has two network interfaces: eth0, eth1, say eth0 has > ip1, eth1 has ip2, the relationship between hostname and the IP addresses is > set in /etc/hosts. When I "close" the eth0 by command "ifdown eth0", the > InetAddress.getByName(String hostname) will still return ip1, which is > unreachable forever. > So I think it will be better to check the IP address by > InetAddress.isReachable(long) and choose the reachable IP address. > I have modified the ZooKeeper source code, and test the new code in my own > environment, and it can work very well when I turn down some network > interfaces using "ifdown" command. > The original code is: > {code:title=QuorumPeer.java|borderStyle=solid} > public void recreateSocketAddresses() { > InetAddress address = null; > try { > address = InetAddress.getByName(this.hostname); > LOG.info("Resolved hostname: {} to address: {}", > this.hostname, address); > this.addr = new InetSocketAddress(address, this.port); > if (this.electionPort > 0){ > this.electionAddr = new InetSocketAddress(address, > this.electionPort); > } > } catch (UnknownHostException ex) { > LOG.warn("Failed to resolve address: {}", this.hostname, ex); > // Have we succeeded in the past? > if (this.addr != null) { > // Yes, previously the lookup succeeded. Leave things as > they are > return; > } > // The hostname has never resolved. Create our > InetSocketAddress(es) as unresolved > this.addr = InetSocketAddress.createUnresolved(this.hostname, > this.port); > if (this.electionPort > 0){ > this.electionAddr = > InetSocketAddress.createUnresolved(this.hostname, > > this.electionPort); > } > } > } > {code} > After my modification: > {code:title=QuorumPeer.java|borderStyle=solid} > public void recreateSocketAddresses() { > InetAddress address = null; > try { > address = getReachableAddress(this.hostname); > LOG.info("Resolved hostname: {} to address: {}", > this.hostname, address); > this.addr = new InetSocketAddress(address, this.port); > if (this.electionPort > 0){ > this.electionAddr = new InetSocketAddress(address, > this.electionPort); > } > } catch (UnknownHostException ex) { > LOG.warn("Failed to resolve address: {}", this.hostname, ex); > // Have we succeeded in the past? > if (this.addr != null) { > // Yes, previously the lookup succeeded. Leave things as > they are > return; > } > // The hostname has never resolved. Create our > InetSocketAddress(es) as unresolved > this.addr = InetSocketAddress.createUnresolved(this.hostname, > this.port); > if (this.electionPort > 0){ > this.electionAddr = > InetSocketAddress.createUnresolved(this.hostname, >
[GitHub] zookeeper issue #253: ZOOKEEPER-2774
Github user JiangJiafu commented on the issue: https://github.com/apache/zookeeper/pull/253 Thanks for your review work. @afine --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper issue #253: ZOOKEEPER-2774
Github user JiangJiafu commented on the issue: https://github.com/apache/zookeeper/pull/253 Port the code from ZOOKEEPER-1366 to branch3.4. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper pull request #253: ZOOKEEPER-2774
GitHub user JiangJiafu opened a pull request: https://github.com/apache/zookeeper/pull/253 ZOOKEEPER-2774 You can merge this pull request into a Git repository by running: $ git pull https://github.com/JiangJiafu/zookeeper ZOOKEEPER-2774 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/253.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #253 commit 3ac65ead39fad4f8d9f26365e1bc73f83889f11e Author: Jiang Jiafu <jiangjiafu1...@gmail.com> Date: 2017-05-13T03:41:52Z ZOOKEEPER-2774 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (ZOOKEEPER-2701) Timeout for RecvWorker is too long
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] JiangJiafu updated ZOOKEEPER-2701: -- Affects Version/s: 3.4.9 3.4.10 > Timeout for RecvWorker is too long > -- > > Key: ZOOKEEPER-2701 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2701 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.8, 3.4.9, 3.4.10 > Environment: Centos6.5 > ZooKeeper 3.4.8 >Reporter: JiangJiafu >Priority: Minor > > Environment: > I deploy ZooKeeper in a cluster of three nodes. Each node has three network > interfaces(eth0, eth1, eth2). > Hostname is used instead of IP address in zoo.cfg, and > quorumListenOnAllIPs=true > Probleam: > I start three ZooKeeper servers( node A, node B, and node C) one by one, > when the leader election finishes, node B is the leader. > Then I shutdown one network interface of node A by command "ifdown eth0". The > ZooKeeper server on node A will lost connection to node B and node C. In my > test, I will take about 20 minites that the ZooKeepr server of node A > realizes the event and try to call the QuorumServer.recreateSocketAddress the > resolve the hostname. > I try to read the source code, and I find the code in > {code:title=QuorumCnxManager.java:|borderStyle=solid} > class RecvWorker extends ZooKeeperThread { > Long sid; > Socket sock; > volatile boolean running = true; > final DataInputStream din; > final SendWorker sw; > RecvWorker(Socket sock, DataInputStream din, Long sid, SendWorker sw) > { > super("RecvWorker:" + sid); > this.sid = sid; > this.sock = sock; > this.sw = sw; > this.din = din; > try { > // OK to wait until socket disconnects while reading. > sock.setSoTimeout(0); > } catch (IOException e) { > LOG.error("Error while accessing socket for " + sid, e); > closeSocket(sock); > running = false; > } > } >... > } > {code} > I notice that the soTime is set to 0 in RecvWorker constructor. I think this > is reasonable when the IP address of a ZooKeeper server never change, but > considering that the IP address of each ZooKeeper server may change, maybe we > should better set a timeout here. > I am not pretty sure this is really a problem. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] zookeeper issue #173: ZOOKEEPER-2691: recreateSocketAddresses may recreate t...
Github user JiangJiafu commented on the issue: https://github.com/apache/zookeeper/pull/173 @hanm I have changed the document, please review the code and document, thank you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper issue #173: ZOOKEEPER-2691: recreateSocketAddresses may recreate t...
Github user JiangJiafu commented on the issue: https://github.com/apache/zookeeper/pull/173 @hanm Hi, I have modified the code according to your advices except the second one: "Documentation (see Abe's comment)" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper issue #173: ZOOKEEPER-2691: recreateSocketAddresses may recreate t...
Github user JiangJiafu commented on the issue: https://github.com/apache/zookeeper/pull/173 Eh, sorry for asking this question, but how to update the documentation? Should I modify the html files in docs directory? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (ZOOKEEPER-2774) Ephemeral znode will not be removed when sesstion timeout, if the system time of ZooKeeper node changes unexpectedly.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16007441#comment-16007441 ] JiangJiafu commented on ZOOKEEPER-2774: --- OK。 > Ephemeral znode will not be removed when sesstion timeout, if the system time > of ZooKeeper node changes unexpectedly. > - > > Key: ZOOKEEPER-2774 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2774 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.8, 3.4.9, 3.4.10 > Environment: Centos6.5 >Reporter: JiangJiafu > > 1. Deploy a ZooKeeper cluster with one node. > 2. Create a Ephemeral znode. > 3. Change the system time of the ZooKeeper node to a earlier point. > 4. Disconnect the client with the ZooKeeper server. > Then the ephemeral znode will exist for a long time even when session timeout. > I have read the ZooKeeper source code and I find the code int > SessionTrackerImpl.java, > {code:title=SessionTrackerImpl.java|borderStyle=solid} > @Override > synchronized public void run() { > try { > while (running) { > currentTime = System.currentTimeMillis(); > if (nextExpirationTime > currentTime) { > this.wait(nextExpirationTime - currentTime); > continue; > } > SessionSet set; > set = sessionSets.remove(nextExpirationTime); > if (set != null) { > for (SessionImpl s : set.sessions) { > setSessionClosing(s.sessionId); > expirer.expire(s); > } > } > nextExpirationTime += expirationInterval; > } > } catch (InterruptedException e) { > handleException(this.getName(), e); > } > LOG.info("SessionTrackerImpl exited loop!"); > } > {code} > I think it may be better to use System.nanoTime(), not > System.currentTimeMillis, because the later can be changed manually or > automatically by a NTP client. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] zookeeper pull request #250: Branch 3.4
Github user JiangJiafu closed the pull request at: https://github.com/apache/zookeeper/pull/250 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper pull request #250: Branch 3.4
GitHub user JiangJiafu opened a pull request: https://github.com/apache/zookeeper/pull/250 Branch 3.4 You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/zookeeper branch-3.4 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/250.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #250 commit fb01942d1ffa59e64bbc874abb00b9494a7d42ad Author: Flavio Paiva Junqueira <f...@apache.org> Date: 2013-09-02T21:47:42Z ZOOKEEPER-1379. 'printwatches, redo, history and connect '. client commands always print usage. This is not necessary (edward via fpj) git-svn-id: https://svn.apache.org/repos/asf/zookeeper/branches/branch-3.4@1519521 13f79535-47bb-0310-9956-ffa450edef68 commit 32593f56c1c084a20f3a76a39b70555607ea8982 Author: Flavio Paiva Junqueira <f...@apache.org> Date: 2013-09-03T11:20:03Z ZOOKEEPER-1670: zookeeper should set a default value for SERVER_JVMFLAGS and CLIENT_JVMFLAGS so that memory usage is controlled (Arpit Gupta via fpj) git-svn-id: https://svn.apache.org/repos/asf/zookeeper/branches/branch-3.4@1519650 13f79535-47bb-0310-9956-ffa450edef68 commit 33550d6ce719b2cedfd0fa8650c4311dd9077d97 Author: Flavio Paiva Junqueira <f...@apache.org> Date: 2013-09-05T20:46:44Z ZOOKEEPER-1448: Node+Quota creation in transaction log can crash leader startup (Botond Hejj via fpj) git-svn-id: https://svn.apache.org/repos/asf/zookeeper/branches/branch-3.4@1520418 13f79535-47bb-0310-9956-ffa450edef68 commit 579c7a4736ee03183e6cba25d7d20dedb6f892dc Author: Camille Fournier <cami...@apache.org> Date: 2013-09-11T20:42:24Z ZOOKEEPER-1664. Kerberos auth doesn't work with native platform GSS integration. (Boaz Kelmer via camille) git-svn-id: https://svn.apache.org/repos/asf/zookeeper/branches/branch-3.4@1522028 13f79535-47bb-0310-9956-ffa450edef68 commit a69b56620a12ab4be2147dab6b45a92c85bca1f2 Author: Michi Mutsuzaki <mic...@apache.org> Date: 2013-09-12T17:26:40Z ZOOKEEPER-1750. Race condition producing NPE in NIOServerCnxn.toString (Rakesh R via michim) git-svn-id: https://svn.apache.org/repos/asf/zookeeper/branches/branch-3.4@1522673 13f79535-47bb-0310-9956-ffa450edef68 commit 2ad419efc40456c3aa25b1cba0624c39d0b49c7f Author: Flavio Paiva Junqueira <f...@apache.org> Date: 2013-09-17T22:44:54Z ZOOKEEPER-1754. Read-only server allows to create znode (Rakesh R via fpj) git-svn-id: https://svn.apache.org/repos/asf/zookeeper/branches/branch-3.4@1524247 13f79535-47bb-0310-9956-ffa450edef68 commit 973665c48a176227a681998587f1bd14f0ea9578 Author: Mahadev Konar <maha...@apache.org> Date: 2013-09-18T02:06:28Z ZOOKEEPER-1751. ClientCnxn#run could miss the second ping or connection get dropped before a ping. (Jeffrey Zhong via mahadev) git-svn-id: https://svn.apache.org/repos/asf/zookeeper/branches/branch-3.4@1524274 13f79535-47bb-0310-9956-ffa450edef68 commit a8df46db76ee725411ff4774debf8a6352d12c6a Author: Flavio Paiva Junqueira <f...@apache.org> Date: 2013-09-18T10:24:00Z ZOOKEEPER-1657. Increased CPU usage by unnecessary SASL checks (Philip K. Warren via fpj) git-svn-id: https://svn.apache.org/repos/asf/zookeeper/branches/branch-3.4@1524355 13f79535-47bb-0310-9956-ffa450edef68 commit d1eff7a27be156a5b4a37bd6aff4da446dd3813e Author: Flavio Paiva Junqueira <f...@apache.org> Date: 2013-09-18T12:33:45Z ZOOKEEPER-1753. ClientCnxn is not properly releasing the resources, which are used to ping RwServer (Rakesh R via fpj) git-svn-id: https://svn.apache.org/repos/asf/zookeeper/branches/branch-3.4@1524387 13f79535-47bb-0310-9956-ffa450edef68 commit e3a488b11e4e0e8a124c7387dad98ecd286ee1eb Author: Flavio Paiva Junqueira <f...@apache.org> Date: 2013-09-25T21:44:11Z ZOOKEEPER-1096. Leader communication should listen on specified IP, not wildcard address (Jared Cantwell, German Blanco via fpj) git-svn-id: https://svn.apache.org/repos/asf/zookeeper/branches/branch-3.4@1526313 13f79535-47bb-0310-9956-ffa450edef68 commit dc3267b0ec40abdd67833a86e7a6051e0b468a4d Author: Mahadev Konar <maha...@apache.org> Date: 2013-09-26T00:03:59Z ZOOKEEPER-1696. Fail to run zookeeper client on Weblogic application server. (Jeffrey Zhong via mahadev) git-svn-id: https://svn.apache.org/repos/asf/zookeeper/branches/branch-3.4@1526338 13f79535-47bb-0310-9956-ffa450edef68 commit a8fe3f3be3b77273950d9e10bc2c51aadd1902dc Author: Flavio Paiva Junqueira <f...@apache.org> Date: 2013-09-26T12:37:12Z ZOOKEEPER-87. Follower does not shut itself down if its too far b
[jira] [Updated] (ZOOKEEPER-2691) recreateSocketAddresses may recreate the unreachable IP address
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] JiangJiafu updated ZOOKEEPER-2691: -- Affects Version/s: 3.4.9 3.4.10 3.5.0 3.5.1 3.5.2 > recreateSocketAddresses may recreate the unreachable IP address > --- > > Key: ZOOKEEPER-2691 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2691 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.8, 3.4.9, 3.4.10, 3.5.0, 3.5.1, 3.5.2 > Environment: Centos6.5 > Java8 > ZooKeeper3.4.8 >Reporter: JiangJiafu >Priority: Minor > > The QuorumPeer$QuorumServer.recreateSocketAddress() is used to resolved the > hostname to a new IP address(InetAddress) when any exception happens to the > socket. It will be very useful when a hostname can be resolved to more than > one IP address. > But the problem is Java API InetAddress.getByName(String hostname) will > always return the first IP address when the hostname can be resolved to more > than one IP address, and the first IP address may be unreachable forever. For > example, if a machine has two network interfaces: eth0, eth1, say eth0 has > ip1, eth1 has ip2, the relationship between hostname and the IP addresses is > set in /etc/hosts. When I "close" the eth0 by command "ifdown eth0", the > InetAddress.getByName(String hostname) will still return ip1, which is > unreachable forever. > So I think it will be better to check the IP address by > InetAddress.isReachable(long) and choose the reachable IP address. > I have modified the ZooKeeper source code, and test the new code in my own > environment, and it can work very well when I turn down some network > interfaces using "ifdown" command. > The original code is: > {code:title=QuorumPeer.java|borderStyle=solid} > public void recreateSocketAddresses() { > InetAddress address = null; > try { > address = InetAddress.getByName(this.hostname); > LOG.info("Resolved hostname: {} to address: {}", > this.hostname, address); > this.addr = new InetSocketAddress(address, this.port); > if (this.electionPort > 0){ > this.electionAddr = new InetSocketAddress(address, > this.electionPort); > } > } catch (UnknownHostException ex) { > LOG.warn("Failed to resolve address: {}", this.hostname, ex); > // Have we succeeded in the past? > if (this.addr != null) { > // Yes, previously the lookup succeeded. Leave things as > they are > return; > } > // The hostname has never resolved. Create our > InetSocketAddress(es) as unresolved > this.addr = InetSocketAddress.createUnresolved(this.hostname, > this.port); > if (this.electionPort > 0){ > this.electionAddr = > InetSocketAddress.createUnresolved(this.hostname, > > this.electionPort); > } > } > } > {code} > After my modification: > {code:title=QuorumPeer.java|borderStyle=solid} > public void recreateSocketAddresses() { > InetAddress address = null; > try { > address = getReachableAddress(this.hostname); > LOG.info("Resolved hostname: {} to address: {}", > this.hostname, address); > this.addr = new InetSocketAddress(address, this.port); > if (this.electionPort > 0){ > this.electionAddr = new InetSocketAddress(address, > this.electionPort); > } > } catch (UnknownHostException ex) { > LOG.warn("Failed to resolve address: {}", this.hostname, ex); > // Have we succeeded in the past? > if (this.addr != null) { > // Yes, previously the lookup succeeded. Leave things as > they are > return; > } > // The hostname has never resolved. Create our > InetSocketAddress(es) as unresolved > this.addr = InetSocketAddress.createUnresolved(this.hostname, > this.port); > if (this.electionPort > 0){ >
[jira] [Commented] (ZOOKEEPER-2774) Ephemeral znode will not be removed when sesstion timeout, if the system time of ZooKeeper node changes unexpectedly.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16005821#comment-16005821 ] JiangJiafu commented on ZOOKEEPER-2774: --- Is this PB planed to be solved in 3.4.X??? > Ephemeral znode will not be removed when sesstion timeout, if the system time > of ZooKeeper node changes unexpectedly. > - > > Key: ZOOKEEPER-2774 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2774 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.8, 3.4.9, 3.4.10 > Environment: Centos6.5 >Reporter: JiangJiafu > > 1. Deploy a ZooKeeper cluster with one node. > 2. Create a Ephemeral znode. > 3. Change the system time of the ZooKeeper node to a earlier point. > 4. Disconnect the client with the ZooKeeper server. > Then the ephemeral znode will exist for a long time even when session timeout. > I have read the ZooKeeper source code and I find the code int > SessionTrackerImpl.java, > {code:title=SessionTrackerImpl.java|borderStyle=solid} > @Override > synchronized public void run() { > try { > while (running) { > currentTime = System.currentTimeMillis(); > if (nextExpirationTime > currentTime) { > this.wait(nextExpirationTime - currentTime); > continue; > } > SessionSet set; > set = sessionSets.remove(nextExpirationTime); > if (set != null) { > for (SessionImpl s : set.sessions) { > setSessionClosing(s.sessionId); > expirer.expire(s); > } > } > nextExpirationTime += expirationInterval; > } > } catch (InterruptedException e) { > handleException(this.getName(), e); > } > LOG.info("SessionTrackerImpl exited loop!"); > } > {code} > I think it may be better to use System.nanoTime(), not > System.currentTimeMillis, because the later can be changed manually or > automatically by a NTP client. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] zookeeper issue #173: ZOOKEEPER-2691: recreateSocketAddresses may recreate t...
Github user JiangJiafu commented on the issue: https://github.com/apache/zookeeper/pull/173 @hanm Hi, May I ask when will this problem be fixed? And will it be fixed on 3.4.X(stable) version? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (ZOOKEEPER-2774) Ephemeral znode will not be removed when sesstion timeout, if the system time of ZooKeeper node changes unexpectedly.
JiangJiafu created ZOOKEEPER-2774: - Summary: Ephemeral znode will not be removed when sesstion timeout, if the system time of ZooKeeper node changes unexpectedly. Key: ZOOKEEPER-2774 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2774 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.10, 3.4.9, 3.4.8 Environment: Centos6.5 Reporter: JiangJiafu 1. Deploy a ZooKeeper cluster with one node. 2. Create a Ephemeral znode. 3. Change the system time of the ZooKeeper node to a earlier point. 4. Disconnect the client with the ZooKeeper server. Then the ephemeral znode will exist for a long time even when session timeout. I have read the ZooKeeper source code and I find the code int SessionTrackerImpl.java, {code:title=SessionTrackerImpl.java|borderStyle=solid} @Override synchronized public void run() { try { while (running) { currentTime = System.currentTimeMillis(); if (nextExpirationTime > currentTime) { this.wait(nextExpirationTime - currentTime); continue; } SessionSet set; set = sessionSets.remove(nextExpirationTime); if (set != null) { for (SessionImpl s : set.sessions) { setSessionClosing(s.sessionId); expirer.expire(s); } } nextExpirationTime += expirationInterval; } } catch (InterruptedException e) { handleException(this.getName(), e); } LOG.info("SessionTrackerImpl exited loop!"); } {code} I think it may be better to use System.nanoTime(), not System.currentTimeMillis, because the later can be changed manually or automatically by a NTP client. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (ZOOKEEPER-2701) Timeout for RecvWorker is too long
JiangJiafu created ZOOKEEPER-2701: - Summary: Timeout for RecvWorker is too long Key: ZOOKEEPER-2701 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2701 Project: ZooKeeper Issue Type: Bug Affects Versions: 3.4.8 Environment: Centos6.5 ZooKeeper 3.4.8 Reporter: JiangJiafu Priority: Minor Environment: I deploy ZooKeeper in a cluster of three nodes. Each node has three network interfaces(eth0, eth1, eth2). Hostname is used instead of IP address in zoo.cfg, and quorumListenOnAllIPs=true Probleam: I start three ZooKeeper servers( node A, node B, and node C) one by one, when the leader election finishes, node B is the leader. Then I shutdown one network interface of node A by command "ifdown eth0". The ZooKeeper server on node A will lost connection to node B and node C. In my test, I will take about 20 minites that the ZooKeepr server of node A realizes the event and try to call the QuorumServer.recreateSocketAddress the resolve the hostname. I try to read the source code, and I find the code in {code:title=QuorumCnxManager.java:|borderStyle=solid} class RecvWorker extends ZooKeeperThread { Long sid; Socket sock; volatile boolean running = true; final DataInputStream din; final SendWorker sw; RecvWorker(Socket sock, DataInputStream din, Long sid, SendWorker sw) { super("RecvWorker:" + sid); this.sid = sid; this.sock = sock; this.sw = sw; this.din = din; try { // OK to wait until socket disconnects while reading. sock.setSoTimeout(0); } catch (IOException e) { LOG.error("Error while accessing socket for " + sid, e); closeSocket(sock); running = false; } } ... } {code} I notice that the soTime is set to 0 in RecvWorker constructor. I think this is reasonable when the IP address of a ZooKeeper server never change, but considering that the IP address of each ZooKeeper server may change, maybe we should better set a timeout here. I am not pretty sure this is really a problem. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] zookeeper issue #173: ZOOKEEPER-2691: recreateSocketAddresses may recreate t...
Github user JiangJiafu commented on the issue: https://github.com/apache/zookeeper/pull/173 @rakeshadr Thank you for your respond. If you have decided how to fix this problem, please let me know. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper issue #173: ZOOKEEPER-2691: recreateSocketAddresses may recreate t...
Github user JiangJiafu commented on the issue: https://github.com/apache/zookeeper/pull/173 @hanm Yes, I hope this PR can be solved in 3.4.10~ Since 3.5.X is still in alpha state now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper issue #173: ZOOKEEPER-2691: recreateSocketAddresses may recreate t...
Github user JiangJiafu commented on the issue: https://github.com/apache/zookeeper/pull/173 @eribeiro Thank you for your review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper issue #173: ZOOKEEPER-2691: recreateSocketAddresses may recreate t...
Github user JiangJiafu commented on the issue: https://github.com/apache/zookeeper/pull/173 Hi, @hanm , do you mean that you will merge ZOOKEEPER-2691 and ZOOKEEPER-2184 to solve the problem later? If so, may I ask will the problem be solved in version zokkeper 3.4.10? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper issue #173: ZOOKEEPER-2691: recreateSocketAddresses may recreate t...
Github user JiangJiafu commented on the issue: https://github.com/apache/zookeeper/pull/173 I have modified the source code according to your advise, please review the code. @hanm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper pull request #173: ZOOKEEPER-2691: recreateSocketAddresses may rec...
Github user JiangJiafu commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/173#discussion_r100940145 --- Diff: src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java --- @@ -159,7 +159,7 @@ public QuorumServer(long id, String hostname, public void recreateSocketAddresses() { InetAddress address = null; try { -address = InetAddress.getByName(this.hostname); +address = getReachableAddress(this.hostname, 2000); --- End diff -- Thanks for your advices, I will send another pull request later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper pull request #173: ZOOKEEPER-2691: recreateSocketAddresses may rec...
GitHub user JiangJiafu opened a pull request: https://github.com/apache/zookeeper/pull/173 ZOOKEEPER-2691: recreateSocketAddresses may recreate the unreachable ⦠â¦IP address You can merge this pull request into a Git repository by running: $ git pull https://github.com/JiangJiafu/zookeeper ZOOKEEPER-2691 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/173.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #173 commit 489cec9b78b21a2a241eeab18ddfb968758b2e67 Author: JiangJiafu <jiangjiafu1...@gmail.com> Date: 2017-02-13T11:36:44Z ZOOKEEPER-2691: recreateSocketAddresses may recreate the unreachable IP address --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper pull request #172: Zookeeper 2691
Github user JiangJiafu closed the pull request at: https://github.com/apache/zookeeper/pull/172 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper pull request #172: Zookeeper 2691
GitHub user JiangJiafu opened a pull request: https://github.com/apache/zookeeper/pull/172 Zookeeper 2691 You can merge this pull request into a Git repository by running: $ git pull https://github.com/JiangJiafu/zookeeper ZOOKEEPER-2691 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/172.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #172 commit e179d718709afc61e4acfa0247679cd8e34684d0 Author: Mahadev Konar <maha...@apache.org> Date: 2012-12-17T07:12:12Z ZOOKEEPER-1578. org.apache.zookeeper.server.quorum.Zab1_0Test failed due to hard code with 33556 port. (Li Ping via mahadev) git-svn-id: https://svn.apache.org/repos/asf/zookeeper/branches/branch-3.4@1422771 13f79535-47bb-0310-9956-ffa450edef68 commit 70a01502f040ce102dfeb82eaff0505c376d95ff Author: Patrick D. Hunt <ph...@apache.org> Date: 2012-12-19T08:04:33Z ZOOKEEPER-1334. Zookeeper 3.4.x is not OSGi compliant - MANIFEST.MF is flawed (Claus Ibsen via phunt) git-svn-id: https://svn.apache.org/repos/asf/zookeeper/branches/branch-3.4@1423780 13f79535-47bb-0310-9956-ffa450edef68 commit af4d8207f372eb3b6e7a929017c4cdd035fae11c Author: Camille Fournier <cami...@apache.org> Date: 2012-12-31T01:44:18Z ZOOKEEPER-1535. ZK Shell/Cli re-executes last command on exit (Edward Ribeiro via camille) git-svn-id: https://svn.apache.org/repos/asf/zookeeper/branches/branch-3.4@1427035 13f79535-47bb-0310-9956-ffa450edef68 commit 055e8e71767e287c4073efdbdfef944a7225ec5d Author: Flavio Paiva Junqueira <f...@apache.org> Date: 2013-01-18T14:28:48Z ZOOKEEPER-1324. Remove Duplicate NEWLEADER packets from the Leader to the Follower. (thawan, fpj via fpj) git-svn-id: https://svn.apache.org/repos/asf/zookeeper/branches/branch-3.4@1435159 13f79535-47bb-0310-9956-ffa450edef68 commit b200af37a7ddeae44bf74fcc96829625bf926a44 Author: Patrick D. Hunt <ph...@apache.org> Date: 2013-01-25T01:34:19Z ZOOKEEPER-1495. ZK client hangs when using a function not available on the server. (Skye W-M via phunt) git-svn-id: https://svn.apache.org/repos/asf/zookeeper/branches/branch-3.4@1438291 13f79535-47bb-0310-9956-ffa450edef68 commit 094d6a8a7741e816222b7d5582a83f9ff84c67e4 Author: Patrick D. Hunt <ph...@apache.org> Date: 2013-01-25T07:15:19Z ZOOKEEPER-1615. minor typos in ZooKeeper Programmer's Guide web page (Evan Zacks via phunt) git-svn-id: https://svn.apache.org/repos/asf/zookeeper/branches/branch-3.4@1438362 13f79535-47bb-0310-9956-ffa450edef68 commit 6386376b9bc5ff0d3e92f49cba3010a0f39f1226 Author: Patrick D. Hunt <ph...@apache.org> Date: 2013-01-25T07:25:42Z ZOOKEEPER-1613. The documentation still points to 2008 in the copyright notice (Edward Ribeiro via phunt) git-svn-id: https://svn.apache.org/repos/asf/zookeeper/branches/branch-3.4@1438367 13f79535-47bb-0310-9956-ffa450edef68 commit 2bbba7b876dab45f8d3690cc37942f220cf2 Author: Patrick D. Hunt <ph...@apache.org> Date: 2013-02-03T06:41:11Z ZOOKEEPER-1562. Memory leaks in zoo_multi API (Deepak Jagtap via phunt) git-svn-id: https://svn.apache.org/repos/asf/zookeeper/branches/branch-3.4@1441863 13f79535-47bb-0310-9956-ffa450edef68 commit da74b8df26a4c80f405bd2ba37cac4dc7a594b4a Author: Patrick D. Hunt <ph...@apache.org> Date: 2013-02-16T00:50:18Z ZOOKEEPER-1645. ZooKeeper OSGi package imports not complete (Arnoud Glimmerveen via phunt) git-svn-id: https://svn.apache.org/repos/asf/zookeeper/branches/branch-3.4@1446829 13f79535-47bb-0310-9956-ffa450edef68 commit 36724d557a18fa9ee47873b18f6e963516930dbd Author: Patrick D. Hunt <ph...@apache.org> Date: 2013-02-19T07:55:58Z ZOOKEEPER-1648. Fix WatcherTest in JDK7 (Thawan Kooburat via phunt) git-svn-id: https://svn.apache.org/repos/asf/zookeeper/branches/branch-3.4@1447614 13f79535-47bb-0310-9956-ffa450edef68 commit 7b4847231e4d890b7c4b256f2995993b821d697d Author: Patrick D. Hunt <ph...@apache.org> Date: 2013-02-19T08:18:42Z ZOOKEEPER-1606. intermittent failures in ZkDatabaseCorruptionTest on jenkins (lixiaofeng via phunt) git-svn-id: https://svn.apache.org/repos/asf/zookeeper/branches/branch-3.4@1447619 13f79535-47bb-0310-9956-ffa450edef68 commit 85f8053448871b9777e54e546a968f9c4d19ebbc Author: Patrick D. Hunt <ph...@apache.org> Date: 2013-02-19T08:28:54Z ZOOKEEPER-1647. OSGi package import/export changes not applied to bin-jar (Arnoud Glimmerveen via phunt) git-svn-id: https://svn.apache.org/repos/asf/zookeeper/branches/branch-3.4@1447622 13f79535-47bb-0310-9956-ffa450edef68 commit b8306fa5a9e16da6981a2fdf2df059149687c7e4 Author: Michi Mutsuzaki <mic...@apache.org> Date: 2013-0
[jira] [Updated] (ZOOKEEPER-2691) recreateSocketAddresses may recreate the unreachable IP address
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] JiangJiafu updated ZOOKEEPER-2691: -- Description: The QuorumPeer$QuorumServer.recreateSocketAddress() is used to resolved the hostname to a new IP address(InetAddress) when any exception happens to the socket. It will be very useful when a hostname can be resolved to more than one IP address. But the problem is Java API InetAddress.getByName(String hostname) will always return the first IP address when the hostname can be resolved to more than one IP address, and the first IP address may be unreachable forever. For example, if a machine has two network interfaces: eth0, eth1, say eth0 has ip1, eth1 has ip2, the relationship between hostname and the IP addresses is set in /etc/hosts. When I "close" the eth0 by command "ifdown eth0", the InetAddress.getByName(String hostname) will still return ip1, which is unreachable forever. So I think it will be better to check the IP address by InetAddress.isReachable(long) and choose the reachable IP address. I have modified the ZooKeeper source code, and test the new code in my own environment, and it can work very well when I turn down some network interfaces using "ifdown" command. The original code is: {code:title=QuorumPeer.java|borderStyle=solid} public void recreateSocketAddresses() { InetAddress address = null; try { address = InetAddress.getByName(this.hostname); LOG.info("Resolved hostname: {} to address: {}", this.hostname, address); this.addr = new InetSocketAddress(address, this.port); if (this.electionPort > 0){ this.electionAddr = new InetSocketAddress(address, this.electionPort); } } catch (UnknownHostException ex) { LOG.warn("Failed to resolve address: {}", this.hostname, ex); // Have we succeeded in the past? if (this.addr != null) { // Yes, previously the lookup succeeded. Leave things as they are return; } // The hostname has never resolved. Create our InetSocketAddress(es) as unresolved this.addr = InetSocketAddress.createUnresolved(this.hostname, this.port); if (this.electionPort > 0){ this.electionAddr = InetSocketAddress.createUnresolved(this.hostname, this.electionPort); } } } {code} After my modification: {code:title=QuorumPeer.java|borderStyle=solid} public void recreateSocketAddresses() { InetAddress address = null; try { address = getReachableAddress(this.hostname); LOG.info("Resolved hostname: {} to address: {}", this.hostname, address); this.addr = new InetSocketAddress(address, this.port); if (this.electionPort > 0){ this.electionAddr = new InetSocketAddress(address, this.electionPort); } } catch (UnknownHostException ex) { LOG.warn("Failed to resolve address: {}", this.hostname, ex); // Have we succeeded in the past? if (this.addr != null) { // Yes, previously the lookup succeeded. Leave things as they are return; } // The hostname has never resolved. Create our InetSocketAddress(es) as unresolved this.addr = InetSocketAddress.createUnresolved(this.hostname, this.port); if (this.electionPort > 0){ this.electionAddr = InetSocketAddress.createUnresolved(this.hostname, this.electionPort); } } } public InetAddress getReachableAddress(String hostname) throws UnknownHostException { InetAddress[] addresses = InetAddress.getAllByName(hostname); for (InetAddress a : addresses) { try { if (a.isReachable(5000)) { return a; } } catch (IOException e) { LOG.warn("IP address {} is unreachable", a); } } // All the IP address is unreachable, just return the first one. return addresses[0]; } {code} was: The QuorumPeer$QuorumServer.recreateSocketAddress() is used to resolved the hostname to a new IP address(InetAddress) when any exception happens to the socket. It will be very useful when a hostname can be res
[jira] [Created] (ZOOKEEPER-2691) recreateSocketAddresses may recreate the unreachable IP address
JiangJiafu created ZOOKEEPER-2691: - Summary: recreateSocketAddresses may recreate the unreachable IP address Key: ZOOKEEPER-2691 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2691 Project: ZooKeeper Issue Type: Bug Affects Versions: 3.4.8 Environment: Centos6.5 Java8 ZooKeeper3.4.8 Reporter: JiangJiafu Priority: Minor The QuorumPeer$QuorumServer.recreateSocketAddress() is used to resolved the hostname to a new IP address(InetAddress) when any exception happens to the socket. It will be very useful when a hostname can be resolved to more than one IP address. But the problem is Java API InetAddress.getByName(String hostname) will always return the first IP address when the hostname can be resolved to more than one IP address, and the first IP address may be unreachable forever. So I think it will be better to check the IP address by InetAddress.isReachable(long) and choose the reachable IP address. I have modified the ZooKeeper source code, and test the new code in my own environment, and it can work very well when I turn down some network interfaces using "ifdown" command. The original code is: {quote} public void recreateSocketAddresses() { InetAddress address = null; try { address = InetAddress.getByName(this.hostname); LOG.info("Resolved hostname: {} to address: {}", this.hostname, address); this.addr = new InetSocketAddress(address, this.port); if (this.electionPort > 0){ this.electionAddr = new InetSocketAddress(address, this.electionPort); } } catch (UnknownHostException ex) { LOG.warn("Failed to resolve address: {}", this.hostname, ex); // Have we succeeded in the past? if (this.addr != null) { // Yes, previously the lookup succeeded. Leave things as they are return; } // The hostname has never resolved. Create our InetSocketAddress(es) as unresolved this.addr = InetSocketAddress.createUnresolved(this.hostname, this.port); if (this.electionPort > 0){ this.electionAddr = InetSocketAddress.createUnresolved(this.hostname, this.electionPort); } } } {quote} After my modification: {quote} public void recreateSocketAddresses() { InetAddress address = null; try { address = getReachableAddress(this.hostname); LOG.info("Resolved hostname: {} to address: {}", this.hostname, address); this.addr = new InetSocketAddress(address, this.port); if (this.electionPort > 0){ this.electionAddr = new InetSocketAddress(address, this.electionPort); } } catch (UnknownHostException ex) { LOG.warn("Failed to resolve address: {}", this.hostname, ex); // Have we succeeded in the past? if (this.addr != null) { // Yes, previously the lookup succeeded. Leave things as they are return; } // The hostname has never resolved. Create our InetSocketAddress(es) as unresolved this.addr = InetSocketAddress.createUnresolved(this.hostname, this.port); if (this.electionPort > 0){ this.electionAddr = InetSocketAddress.createUnresolved(this.hostname, this.electionPort); } } } public InetAddress getReachableAddress(String hostname) throws UnknownHostException { InetAddress[] addresses = InetAddress.getAllByName(hostname); for (InetAddress a : addresses) { try { if (a.isReachable(5000)) { return a; } } catch (IOException e) { LOG.warn("IP address {} is unreachable", a); } } // All the IP address is unreachable, just return the first one. return addresses[0]; } {quote} -- This message was sent by Atlassian JIRA (v6.3.15#6346)