Nikita Konovalov created AMBARI-12235:
-----------------------------------------

             Summary: NameNode HA blueprint-based provisioning fails
                 Key: AMBARI-12235
                 URL: https://issues.apache.org/jira/browse/AMBARI-12235
             Project: Ambari
          Issue Type: Bug
          Components: ambari-server
    Affects Versions: 2.0.1
         Environment: 1 host with Ambari
2 hosts with NameNodes and ZKFC
3 hosts with ZooKeeper and JournalNodes
1 host with ResourceManager, TimelineServer and Oozie
3 hosts with NodeManagers and DataNodes 

All hosts have MetricsMonitor and a set of Clients installed.
            Reporter: Nikita Konovalov


While starting the Cluster both NameNode start steps complete successfully, but 
both of NameNodes appear to be in standby mode. Timeline Server fails to start 
then because it's unable to write to HDFS.
The starting fails with timeout.

The automatic failover is enabled. Zookeeper quorum an journal shared edit dirs 
are configured.

Setting "dfs_ha_initial_namenode_active" and "dfs_ha_initial_namenode_standby" 
did not help. Both NameNodes are in standby anyway.

The following error occurs on both NameNode hosts:
2015-07-01 06:54:34,262 WARN  ha.EditLogTailer 
(EditLogTailer.java:triggerActiveLogRoll(274)) - Unable to trigger a roll of 
the active NN
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): 
Operation category JOURNAL is not supported in state standby
        at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1722)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1362)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:6636)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1001)
        at 
org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:142)
        at 
org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:12025)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)

        at org.apache.hadoop.ipc.Client.call(Client.java:1469)
        at org.apache.hadoop.ipc.Client.call(Client.java:1400)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
        at com.sun.proxy.$Proxy20.rollEditLog(Unknown Source)
        at 
org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolTranslatorPB.rollEditLog(NamenodeProtocolTranslatorPB.java:148)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.triggerActiveLogRoll(EditLogTailer.java:271)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.access$600(EditLogTailer.java:61)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:313)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:282)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:299)
        at 
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:412)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:295)
2015-07-01 06:55:00,060 INFO  ipc.Server (Server.java:run(2053)) - IPC Server 
handler 4 on 8020, call 
org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo from 
10.50.1.50:50642 Call#0 Retry#14: org.apache.hadoop.ipc.StandbyException: 
Operation category READ is not supported in state standby


The ZooKeeper nodes have the following exception in logs.

Connection broken for id 3, my id = 2, error = 
java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(SocketInputStream.java:196)
        at java.net.SocketInputStream.read(SocketInputStream.java:122)
        at java.net.SocketInputStream.read(SocketInputStream.java:210)
        at java.io.DataInputStream.readInt(DataInputStream.java:387)
        at 
org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:767)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to