Dear Guys: Recently we compile impala using our development environment and when we run the complied impala, we met the following problem.
Problem: Impala runs successfully if we do not reboot our machine. However, when we reboot the machine, we cannot restart the impala process. We try a lot of machines, the problem occurs on every machine. We struggle for a long time , but it still does not work. We are wondering whether you guys can help us to solve the problem. The environment and error message is as follows. environment<javascript:void(0);>: OS: Distributor ID: CentOS Description: CentOS Linux release 7.2.1511 (Core) Release: 7.2.1511 Codename: Core Kernel:Linux version 3.10.0-327.28.2.el7.x86_64 Impala version: cdh5-trunk 1. We start Impala using: ${IMPALA_HOME}/testdata/bin/run-all.sh, and get the following message. [root@localhost rtap-on-impala]# ${IMPALA_HOME}/testdata/bin/run-all.sh Killing running services... Starting all cluster services... --> Starting mini-DFS cluster Stopping kms Stopping llama Stopping yarn Stopping hdfs Starting hdfs (Web UI - http://localhost:5070) ....Namenode started Starting yarn (Web UI - http://localhost:8088) Starting llama (Web UI - http://localhost:1501) Starting kms (Web UI - http://localhost:16000) The cluster is running --> Starting HBase localhost: starting zookeeper, logging to /home/linxiaoyong/impala_new/rtap-on-impala/impala/cluster_logs/hbase/hbase-root-zookeeper-localhost.localdomain.out starting master, logging to /home/linxiaoyong/impala_new/rtap-on-impala/impala/cluster_logs/hbase/hbase-root-master-localhost.localdomain.out 16/09/28 17:15:52 INFO util.VersionInfo: HBase 1.2.0-cdh5.8.0-SNAPSHOT 16/09/28 17:15:52 INFO util.VersionInfo: Source code repository file:///var/lib/jenkins/workspace/generic-binary-tarball-and-maven-deploy/CDH5-Packaging-HBase-2016-02-24_17-14-20/hbase-1.2.0-cdh5.8.0-SNAPSHOT revision=Unknown 16/09/28 17:15:52 INFO util.VersionInfo: Compiled by jenkins on Wed Feb 24 17:26:12 PST 2016 16/09/28 17:15:52 INFO util.VersionInfo: From source with checksum 2c2f0626ababf9b47e88728c663df5c7 Waiting for HBase Master ...........................Failure Hbase master did NOT write /hbase/rs in 30.4s Error in /home/linxiaoyong/impala_new/rtap-on-impala/impala/testdata/bin/run-hbase.sh at line 87: ${CLUSTER_BIN}/wait-for-hbase-master.py Error in /home/linxiaoyong/impala_new/rtap-on-impala/impala/testdata/bin/run-all.sh at line 48: tee ${IMPALA_TEST_CLUSTER_LOG_DIR}/run-hbase.log 2. Vim cluster_logs/hbase/hbase-root-master-localhost.localdomain.out Errors follow as: 16/09/28 17:16:10 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 16/09/28 17:16:10 WARN zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) 16/09/28 17:16:11 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error) 16/09/28 17:16:11 WARN zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) 16/09/28 17:16:11 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 16/09/28 17:16:11 ERROR zookeeper.RecoverableZooKeeper: ZooKeeper create failed after 4 attempts 16/09/28 17:16:11 WARN zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) 16/09/28 17:16:11 ERROR master.HMasterCommandLine: Master exiting java.lang.RuntimeException: Failed construction of Master: class org.apache.hadoop.hbase.master.HMaster. at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2428) at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:232) at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:138) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126) at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2438) Caused by: org.apache.hadoop.hbase.ZooKeeperConnectionException: master:600000x0, quorum=localhost:2181, baseZNode=/hbase Unexpected KeeperException creating base node at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.createBaseZNodes(ZooKeeperWatcher.java:206) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher.java:187) at org.apache.hadoop.hbase.regionserver.HRegionServer.<init>(HRegionServer.java:590) at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:375) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2421) ... 5 more I used “jps” to watch the processes like as: [root@localhost rtap-on-impala]# jps 26528 LlamaAMMain 25921 NodeManager 25186 DataNode 25890 NodeManager 29188 Jps 25221 DataNode 25864 NodeManager 25162 DataNode 26635 Bootstrap 14194 -- process information unavailable 25246 NameNode 25950 ResourceManager 27423 HQuorumPeer