[ https://issues.apache.org/jira/browse/AMBARI-13861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009112#comment-15009112 ]
Aravindan Vijayan commented on AMBARI-13861: -------------------------------------------- Thanks [~dgrinenko] > hdfs balancer via ambari fails to run after HDP upgrade with NN HA enabled > -------------------------------------------------------------------------- > > Key: AMBARI-13861 > URL: https://issues.apache.org/jira/browse/AMBARI-13861 > Project: Ambari > Issue Type: Bug > Components: ambari-server > Affects Versions: 2.1.3 > Reporter: Dmytro Grinenko > Assignee: Dmitry Lysnichenko > Priority: Critical > Fix For: 2.1.3 > > > Ran hdfs balancer via ambari on a cluster that had HA enabled and it failed. > {code} > Starting balancer with threshold = 10 > Executing command ambari-sudo.sh su hdfs -l -s /bin/bash -c 'export > PATH='"'"'/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/var/lib/ambari-agent:/var/lib/ambari-agent:/usr/hdp/current/hadoop-client/bin'"'"' > ; hdfs --config /usr/hdp/current/hadoop-client/conf balancer -threshold 10' > 2015-10-06 23:33:27,059 - Execute['ambari-sudo.sh su hdfs -l -s /bin/bash -c > 'export > PATH='"'"'/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/var/lib/ambari-agent:/var/lib/ambari-agent:/usr/hdp/current/hadoop-client/bin'"'"' > ; hdfs --config /usr/hdp/current/hadoop-client/conf balancer -threshold > 10''] {'logoutput': False, 'on_new_line': handle_new_line} > [balancer] 15/10/06 23:33:29 INFO balancer.Balancer: Using a threshold of 10.0 > [balancer] 15/10/06 23:33:29 INFO balancer.Balancer: namenodes = > [hdfs://pre-prod-poc-1.novalocal:8020, hdfs://pre-prod-hdp-2-3] > [balancer] 15/10/06 23:33:29 INFO balancer.Balancer: parameters = > Balancer.Parameters [BalancingPolicy.Node, threshold = 10.0, max idle > iteration = 5, #excluded nodes = 0, #included nodes = 0, #source nodes = 0, > run during upgrade = false] > [balancer] 15/10/06 23:33:29 INFO balancer.Balancer: included nodes = [] > [balancer] 15/10/06 23:33:29 INFO balancer.Balancer: excluded nodes = [] > [balancer] 15/10/06 23:33:29 INFO balancer.Balancer: source nodes = [] > [balancer] Time Stamp Iteration# Bytes Already Moved Bytes > Left To Move Bytes Being Moved[balancer] > [balancer] > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): > Operation category READ is not supported in state standby > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87) > at > org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1872) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1306) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getServerDefaults(FSNamesystem.java:1618) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getServerDefaults(NameNodeRpcServer.java:595) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getServerDefaults(ClientNamenodeProtocolServerSideTranslatorPB.java:383) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(Proto[balancer] > bufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2137) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2133) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2131) > . Exiting ...[balancer] > [balancer] Oct 6, 2015 11:33:31 PM [balancer] [balancer] Balancing took > 2.281 seconds[balancer] > {code} > If you look at the log it looks like we are adding a namenode to the list > which is in standby. Should we not be using just the name service? > {code} > [balancer] 15/10/06 23:33:29 INFO balancer.Balancer: namenodes = > [hdfs://pre-prod-poc-1.novalocal:8020, hdfs://pre-prod-hdp-2-3] > [balancer] 15/10/06 23:33:29 INFO balancer.Balancer: parameters = > Balancer.Parameters > {code} > {code} > [root@pre-prod-poc-1 hive-testbench]# ambari-server --hash > 226dfd1c6136f859fc42dd18e7090a9346f0f745 > root@pre-prod-poc-1 hive-testbench]# rpm -qa | grep ambari > ambari-metrics-hadoop-sink-2.1.2-370.x86_64 > ambari-server-2.1.2-370.x86_64 > ambari-metrics-monitor-2.1.2-370.x86_64 > ambari-agent-2.1.2-370.x86_64 > [root@pre-prod-poc-1 hive-testbench]# > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)