Arvind Narain created TRAFODION-1897: ----------------------------------------
Summary: dcscheck may fail if one of the nodes in zookeeper quorum is down Key: TRAFODION-1897 URL: https://issues.apache.org/jira/browse/TRAFODION-1897 Project: Apache Trafodion Issue Type: Bug Components: connectivity-dcs Affects Versions: any Reporter: Arvind Narain Reported by Joshua Liu =================== These days during HA testing, when one zookeeper node is down, then dcscheck may also gave one error like: Exception in thread "main" org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /trafodion/dcs/master at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1468) at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1496) at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:725) at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:593) at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:365) at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:323) at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:282) my env: 1. Trafodion nodes centosha-[3-6] 2. Zookeeper nodes is centosha-2, centosha-5, centosha-6 3. If I down node centosha-6, then dcscheck would give the error. But if I down centosha-5, then we can’t see the error After check the codes, we found echo "ls $dcsznode"|$DCS_INSTALL_DIR/bin/dcs zkcli > $dcstmp every time when I manually ran dcs zkcli, it tried to connect to the zookeeper on node centosha-6. Even this node is down, the ‘dcs zkcli’ also try to connect this node: [trafodion@centosha-3 bin]$ dcs zkcli Connecting to centosha-6.novalocal:2181 Welcome to ZooKeeper! JLine support is enabled [zk: centosha-6.novalocal:2181(CONNECTING) 0] ls / Exception in thread "main" org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for / at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1468) at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1496) at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:725) at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:593) at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:365) at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:323) at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:282) -- This message was sent by Atlassian JIRA (v6.3.4#6332)