Nikita Konovalov created AMBARI-12235:
-----------------------------------------
Summary: NameNode HA blueprint-based provisioning fails
Key: AMBARI-12235
URL: https://issues.apache.org/jira/browse/AMBARI-12235
Project: Ambari
Issue Type: Bug
Components: ambari-server
Affects Versions: 2.0.1
Environment: 1 host with Ambari
2 hosts with NameNodes and ZKFC
3 hosts with ZooKeeper and JournalNodes
1 host with ResourceManager, TimelineServer and Oozie
3 hosts with NodeManagers and DataNodes
All hosts have MetricsMonitor and a set of Clients installed.
Reporter: Nikita Konovalov
While starting the Cluster both NameNode start steps complete successfully, but
both of NameNodes appear to be in standby mode. Timeline Server fails to start
then because it's unable to write to HDFS.
The starting fails with timeout.
The automatic failover is enabled. Zookeeper quorum an journal shared edit dirs
are configured.
Setting "dfs_ha_initial_namenode_active" and "dfs_ha_initial_namenode_standby"
did not help. Both NameNodes are in standby anyway.
The following error occurs on both NameNode hosts:
2015-07-01 06:54:34,262 WARN ha.EditLogTailer
(EditLogTailer.java:triggerActiveLogRoll(274)) - Unable to trigger a roll of
the active NN
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
Operation category JOURNAL is not supported in state standby
at
org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87)
at
org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1722)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1362)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:6636)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1001)
at
org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:142)
at
org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:12025)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
at org.apache.hadoop.ipc.Client.call(Client.java:1469)
at org.apache.hadoop.ipc.Client.call(Client.java:1400)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy20.rollEditLog(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolTranslatorPB.rollEditLog(NamenodeProtocolTranslatorPB.java:148)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.triggerActiveLogRoll(EditLogTailer.java:271)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.access$600(EditLogTailer.java:61)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:313)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:282)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:299)
at
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:412)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:295)
2015-07-01 06:55:00,060 INFO ipc.Server (Server.java:run(2053)) - IPC Server
handler 4 on 8020, call
org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo from
10.50.1.50:50642 Call#0 Retry#14: org.apache.hadoop.ipc.StandbyException:
Operation category READ is not supported in state standby
The ZooKeeper nodes have the following exception in logs.
Connection broken for id 3, my id = 2, error =
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:196)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at java.net.SocketInputStream.read(SocketInputStream.java:210)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:767)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)