[ 
https://issues.apache.org/jira/browse/AMBARI-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13780392#comment-13780392
 ] 

Siddharth Wagle commented on AMBARI-3368:
-----------------------------------------

Upon further investigation we find that the dfs client tries to connect to 
original NN and when the connection times out it tries the other NN. 
This will result in slow down of jobs running after failover.

{code}
[root@ambari-nn-ha-2 data]# time su - hdfs -c 'hadoop --config /etc/hadoop/conf 
fs -chown hcat /user/hcat'
13/09/24 14:09:48 DEBUG retry.RetryInvocationHandler: Exception while invoking 
getFileInfo of class ClientNamenodeProtocolTranslatorPB. Trying to fail over 
immediately.
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): 
Operation category READ is not supported in state standby
        at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1496)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1029)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3269)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(Na
{code}

Time:
{code}
real    0m3.996s
user    0m2.697s
sys     0m0.147s
{code}
                
> NameNode start hangs with HA config'd
> -------------------------------------
>
>                 Key: AMBARI-3368
>                 URL: https://issues.apache.org/jira/browse/AMBARI-3368
>             Project: Ambari
>          Issue Type: Bug
>          Components: agent
>    Affects Versions: 1.4.1
>            Reporter: Siddharth Wagle
>            Assignee: Siddharth Wagle
>             Fix For: 1.4.1
>
>
> After configuring NameNode HA, I found starting a namenode hangs and fails 
> with "Puppet has been killed due to timeout"
> 1) Install cluster
> 2) enable NameNode HA
> 3) Stop standby namenode on Hosts details page
> 4) Stop active namenode on Hosts details page
> 5) Start namenode on Hosts details page
> 6) Hangs on start. stops at 35% complete. Then after ~ 10 minutes, puppet has 
> been killed due to timeout

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to