[ https://issues.apache.org/jira/browse/AMBARI-8244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14385052#comment-14385052 ]
Yusaku Sako edited comment on AMBARI-8244 at 3/28/15 2:00 AM: -------------------------------------------------------------- The above Hadoop QA failure is unrelated to this patch. When I tested AMBARI-8244.6.patch end-to-end on a real cluster, I've found the following: * Cluster installation and service checks are fine. * Post-install, I was able to move the NameNode on c6401 to c6402 and things were ok after the move. * When enabled NameNode HA, things seemed to have gone smoothly. dfs.namenode.rpc-address is still pointing to c6402, but *hadoop fs* commands to read/write on HDFS were running fine after the NameNode on c6402 was shut down. * HDFS service check failed because it ran "hadoop dfsadmin -safemode get" with "-fs" parameter that specifies a specific NameNode based on dfs.namenode.rpc-address (this is due to changes in the patch); if that NameNode is down, then the command would obviously fail. So we need to remove this property when NameNode HA is enabled. * However, that might not be the full story. *hadoop dfsadmin -safemode get*, even when triggered from the command line caused issues; it still tried to access the NameNode that is down and not the other one even without the "-fs" parameter. I've hand edited hdfs-site.xml to remove dfs.namenode.rpc-address, but "hadoop dfsadmin -safemode get" still tries to connect to the NameNode that is down. {code} safemode: Call From c6401.ambari.apache.org/192.168.64.101 to c6402.ambari.apache.org:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused {code} What's confusing is that the " was (Author: u39kun): The above Hadoop QA failure is unrelated to this patch. When I tested AMBARI-8244.6.patch end-to-end on a real cluster, I've found the following: * Cluster installation and service checks are fine. * Post-install, I was able to move the NameNode on c6401 to c6402 and things were ok after the move. * When enabled NameNode HA, things seemed to have gone smoothly. dfs.namenode.rpc-address is still pointing to c6402, but it worked fine after NameNode on c6402 was shut down; *hadoop fs* commands to read/write on HDFS were running fine. * HDFS service check failed because it ran "hadoop dfsadmin -safemode get" with "-fs" parameter that specifies a specific NameNode based on dfs.namenode.rpc-address (this is due to changes in the patch); if that NameNode is down, then the command would obviously fail. So we need to remove this property when NameNode HA is enabled. * However, that might not be the full story. *hadoop dfsadmin -safemode get*, even when triggered from the command line caused issues; it still tried to access the NameNode that is down and not the other one even without the "-fs" parameter. I've hand edited hdfs-site.xml to remove dfs.namenode.rpc-address, but "hadoop dfsadmin -safemode get" still tries to connect to the NameNode that is down. {code} safemode: Call From c6401.ambari.apache.org/192.168.64.101 to c6402.ambari.apache.org:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused {code} What's confusing is that the " > Ambari HDP 2.0.6+ stacks do not work with fs.defaultFS not being hdfs > --------------------------------------------------------------------- > > Key: AMBARI-8244 > URL: https://issues.apache.org/jira/browse/AMBARI-8244 > Project: Ambari > Issue Type: Bug > Components: stacks > Affects Versions: 2.0.0 > Reporter: Ivan Mitic > Assignee: Ivan Mitic > Labels: HDP > Fix For: 2.1.0 > > Attachments: AMBARI-8244.2.patch, AMBARI-8244.3.patch, > AMBARI-8244.4.patch, AMBARI-8244.5.patch, AMBARI-8244.6.patch, > AMBARI-8244.patch > > > Right now changing the default file system does not work with the HDP 2.0.6+ > stacks. Given that it might be common to run HDP against some other file > system in the cloud, adding support for this will be super useful. One > alternative is to consider a separate stack definition for other file > systems, however, given that I noticed just 2 minor bugs needed to support > this, I would rather extend on the existing code. > Bugs: > - One issue is in Nagios install scripts, where it is assumed that > fs.defaultFS has the namenode port number. > - Another issue is in HDFS install scripts, where {{hadoop dfsadmin}} > command only works when hdfs is the default file system. > Fix for both places is to extract the namenode address/port from > {{dfs.namenode.rpc-address}} if one is defined and use it instead of relying > on {{fs.defaultFS}}. > Haven't included any tests yet (my first Ambari patch, not sure what is > appropriate, so please comment). -- This message was sent by Atlassian JIRA (v6.3.4#6332)