Re: Problem with Ambari 1.7 recognizing hosts running CentOS 6
I am following the startup procedure listed here: https://cwiki.apache.org/confluence/display/AMBARI/Install+Ambari+1.7.0+from+Public+Repositories. I am trying to install HDP 2.1. Please note that these install instructions mention nothing about iptables or selinux. I am using CentOS 6 machines. I tried changing the hosts file on all the machines to this: 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 xxx.xxx.200.144 datanode10.localdomain.com xxx.xxx.200.107 datanode01.localdomain.com xxx.xxx.200.143 namenode.localdomain.com (just trying to hide my IP addresses) The error is the same David Novogrodsky david.novogrod...@gmail.com http://www.linkedin.com/in/davidnovogrodsky On Wed, Dec 17, 2014 at 9:05 AM, David Novogrodsky david.novogrod...@gmail.com wrote: I hope this clears up the confusion: First, this is what the hosts file looks like: 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.200.144 datanode10.localdomain.com 192.168.200.143 namenode.localdomain.com namenode 192.168.200.107 datanode01.localdomain.com I have re-started all the nodes in the cluster. Several times. I can reach all nodes from all nodes using ping and their fully qualified domain names I can reach the data nodes from the name node using password-less ssh. I have changed the name of the machines to match their names in the hosts files on each machine. I have checked the /etc/ambari-agnet/conf/ambari-agent.ini file. David Novogrodsky david.novogrod...@gmail.com http://www.linkedin.com/in/davidnovogrodsky On Tue, Dec 16, 2014 at 10:20 PM, Devopam Mittra devo...@gmail.com wrote: forgive me if i sound rude , but please re-read the installation instructions properly - it should help you in your case positively. 1. have a sound naming convention for all your boxes. e.g.: namenode01.localdomain , datanode01.localdomain , datanode0N.localdomain , this will help you much in your future expansion and maintenance of your cluster 2. do not , by any means , tamper with /etc/hosts for 127.0.0.1 and ::1 , let it be localhost keyword only as you don't want to change that in the first place ... so don't play around with that one . it will help you to otherwise maintain normal operations on your box as well , otherwise for every internal lookup of OS functions it will only create issues 3. if you have a DHCP + very good DNS server in place, then okay , else , assign static IPs to your machines and create one entry for each box with the FQDN and static IP address , replicated on ALL the boxes 4. set up keyless ssh login for root or any other uniform localuser that you want to use and manage ambari + hadoop 5. confirm that namenode and the ambari server machines (in case they are different for you) can talk to ALL the machines using a keyless login for that universal user you have created in above steps. hope the above will help you to sort out the issue in a single go. regards Dev On Tue, Dec 16, 2014 at 11:45 PM, David Novogrodsky david.novogrod...@gmail.com wrote: There is nothing simply done in Ambari. :) By changing the name of this computer and restarting the namenode Ambari does not recogize any node. The main error I am wondering about is this: INFO 2014-12-16 12:02:29,669 main.py:233 - Connecting to Ambari server at https://namenode.localdomain:8440 (98.124.198.1) INFO 2014-12-16 12:02:29,670 NetUtil.py:48 - Connecting to https://namenode.localdomain:8440/ca WARNING 2014-12-16 12:02:29,718 NetUtil.py:71 - Failed to connect to https://namenode.localdomain:8440/ca due to [Errno 111] Connection refused WARNING 2014-12-16 12:02:29,719 NetUtil.py:92 - Server at https://namenode.localdomain:8440 is not reachable, sleeping for 10 seconds... ', None) Why is Ambari using namenode.localdomain to connect? I am running Ambari on this node; I am running Ambari on the namenode of this cluster. The host file for this computer is this: GNU nano 2.0.9 File: /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.200.144 localhost.datanode10 192.168.200.107 localhost.datanode01 192.168.200.143 namenode.localdomain.com namenode The Ambari wizard said I needed to use fully qualified domain names, so What follows is a detailed log of the registration log. I get this error in the registration log for namenode.localdomain.com: -- == Creating target directory... == Command start time 2014-12-16 12:02:18 Connection to namenode.localdomain.com closed. SSH command execution finished host=namenode.localdomain.com, exitcode=0 Command end time
Re: Problem with Ambari 1.7 recognizing hosts running CentOS 6
The error from the registration log is as follows: == Running setup agent script... == Agent log at: /var/log/ambari-agent/ambari- agent.log (WARNING 2014-12-17 10:43:08,349 NetUtil.py:92 - Server at https://namenode . localdomain.com.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode:8440 is not reachable, sleeping for 10 seconds... David Novogrodsky david.novogrod...@gmail.com http://www.linkedin.com/in/davidnovogrodsky
Re: Problem with Ambari 1.7 recognizing hosts running CentOS 6
I am having problems adding mor information to this post: Delivery to the following recipient failed permanently: user@ambari.apache.org Technical details of permanent failure: Google tried to deliver your message, but it was rejected by the server for the recipient domain ambari.apache.org by mx1.eu.apache.org.[192.87.106.230]. The error that the other server returned was: 552 spam score (6.3) exceeded threshold (HTML_MESSAGE,LONGWORDS,RCVD_IN_DNSWL_LOW,SPF_PASS,SPOOF_COM2OTH,WEIRD_PORT David Novogrodsky david.novogrod...@gmail.com http://www.linkedin.com/in/davidnovogrodsky On Wed, Dec 17, 2014 at 1:12 PM, David Novogrodsky david.novogrod...@gmail.com wrote: The error from the registration log is as follows: == Running setup agent script... == Agent log at: /var/log/ambari-agent/ambari- agent.log (WARNING 2014-12-17 10:43:08,349 NetUtil.py:92 - Server at https://namenode . localdomain.com.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode:8440 is not reachable, sleeping for 10 seconds... David Novogrodsky david.novogrod...@gmail.com http://www.linkedin.com/in/davidnovogrodsky
Re: Problem with Ambari 1.7 recognizing hosts running CentOS 6
Hi David, Try sending in plain/text, not HTML. On Wed, Dec 17, 2014 at 7:10 PM, David Novogrodsky david.novogrod...@gmail.com wrote: I am having problems adding mor information to this post: Delivery to the following recipient failed permanently: user@ambari.apache.org Technical details of permanent failure: Google tried to deliver your message, but it was rejected by the server for the recipient domain ambari.apache.org by mx1.eu.apache.org.[192.87.106.230]. The error that the other server returned was: 552 spam score (6.3) exceeded threshold (HTML_MESSAGE,LONGWORDS,RCVD_IN_DNSWL_LOW,SPF_PASS,SPOOF_COM2OTH,WEIRD_PORT David Novogrodsky david.novogrod...@gmail.com http://www.linkedin.com/in/davidnovogrodsky On Wed, Dec 17, 2014 at 1:12 PM, David Novogrodsky david.novogrod...@gmail.com wrote: The error from the registration log is as follows: == Running setup agent script... == Agent log at: /var/log/ambari-agent/ambari- agent.log (WARNING 2014-12-17 10:43:08,349 NetUtil.py:92 - Server at https://namenode . localdomain.com.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode:8440 is not reachable, sleeping for 10 seconds... David Novogrodsky david.novogrod...@gmail.com http://www.linkedin.com/in/davidnovogrodsky -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Problem with Ambari 1.7 recognizing hosts running CentOS 6
Please forgive me if I am sending this twice: I am having a problem with Ambari not recognizing nodes on a network. The cluster is using CentOS 6. I am trying to install HDP 2.1. I have the following values in my hosts file: 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.200.144 datanode10.localdomain.com 192.168.200.143 namenode.localdomain.com 192.168.200.107 datanode01.localdomain.com When I try to connect from the namenode.localdomain.com to datanode10.localdomain.com i get this error in the registration log: == Running setup agent script... DJN...expected_host not defined here DJN:bootstrap.py ...expected_host is: datanode10.localdomain.com == Agent out at: /var/log/ambari-agent/ambari-agent.out Agent log at: /var/log/ambari-agent/ambari-agent.log (WARNING 2014-12-17 16:22:50,380 NetUtil.py:92 - Server at https://namenode.localdomain.com.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode:8440 is not reachable, sleeping for 10 seconds... INFO 2014-12-17 16:23:00,390 NetUtil.py:48 - Connecting to https://namenode.localdomain.com.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode:8440/ca WARNING 2014-12-17 16:23:00,391 NetUtil.py:71 - Failed to connect to https://namenode.localdomain.com.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode:8440/ca due to [Errno -2] Name or service not known ... Connection to datanode10.localdomain.com closed. SSH command execution finished host=datanode10.localdomain.com, exitcode=0 Command end time 2014-12-17 16:23:26 datanode10.localdomain.com What follows is more detail. I also make some changes to the /usr/lib/python2.6/site-packages/ambari_server/bootstrap.py file def run(self): sshcommand = [ssh, -o, ConnectTimeOut=60, -o, StrictHostKeyChecking=no, -o, BatchMode=yes, -tt, # Should prevent tput: No value for $TERM and no -T specified warning -i, self.sshkey_file, self.user + @ + self.host, self.command] if DEBUG: self.host_log.write(Running ssh command + ' '.join(sshcommand)) self.host_log.write(==) self.host_log.write(\nCommand start time + datetime.now().strftime('%Y-%m-%d %H:%M:%S') ++ self.host + + self.user ++ self.sshkey_file ++ self.command) #self.host_log.write(djn:BOOTSTRAP the value is: + self.host) sshstat = subprocess.Popen(sshcommand, stdout=subprocess.PIPE, stderr=subprocess.PIPE) log = sshstat.communicate() errorMsg = log[1] if self.errorMessage and sshstat.returncode != 0: errorMsg = self.errorMessage + \n + errorMsg log = log[0] + \n + errorMsg self.host_log.write(log) self.host_log.write(SSH command execution finished) self.host_log.write(host= + self.host + , exitcode= + str(sshstat.returncode)) self.host_log.write(Command end time + datetime.now().strftime('%Y-%m-%d %H:%M:%S') ++ self.host) return {exitstatus: sshstat.returncode, log: log, errormsg: errorMsg} I added some information on the host_log file. The information includes self.host, self.user, self.ssh key_file and so on... When I run the web front end I get two different results. First I will detail the connection to the namenode.localdomain.com. second I will detail the connection to the datanode10.localdomain.com. The connection to the namenode.localdomain.com is successful. Here is the important part of the registeration log: == Running setup agent script... DJN...expected_host not defined here DJN:bootstrap.py ...expected_host is: namenode.localdomain.com == Command start time 2014-12-17 16:23:17 namenode.localdomain.com root /var/run/ambari-server/bootstrap/25/sshKey sudo python /var/lib/ambari-agent/data/tmp/setupAgent1418854996.py namenode.localdomain.com DEV namenode.localdomain.com 1.7.0 8080 Verifying Python version compatibility... Using python /usr/bin/python2.6 Found ambari-agent
Re: Problem with Ambari 1.7 recognizing hosts running CentOS 6
May I suggest you simply do a ssh -l keylessusername using the previous and the new FQDNs that you have defined to verify which one is in effect, and accessible ? Also, since you changed the FQDN, you may wish to simply reboot the cluster once, just to make sure that new ones are in-place. It might happen that after the reboot you will need to redo the ssh keyless pairing once again (most probably) regards Devopam On Tue, Dec 16, 2014 at 4:32 AM, David Novogrodsky david.novogrod...@gmail.com wrote: The changes I am making in the hosts file are not being picked up by the installation scripts of Ambari. I was told I could make changes to the hosts file and that Ambari would see them. I have checked the etc/ambari-agent/conf/ambari-agent.ini file and the changes I made to the hosts file are not showing up in that file. Where is Ambari getting the names for the other nodes in the cluster? Here are the changes I made to the hosts file on the host for the name node: 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.200.144 datanode10.localdomain 192.168.200.107 datanode01.localdomain 192.168.200.143 namenode.localdomain namenode Since I made these changes Ambari can not discover any of the nodes in the network. None of them. I have not made these changes to the other nodes because I do not want to make changes to the other nodes until I can see Ambari discover the host it is sitting upon. Regarding the commands you mentioned, here are the results: [root@localhost conf]# hostname -f hostname: Unknown host [root@localhost conf]# hostname localhost.namenode [root@localhost conf]# python -c 'import socket; print socket.getfqdn()' localhost.namenode localhost.namenode was the name for I used for this host during the installation of CentOS. I thought you said i could make changes to the hosts file and the installation scripts would recognize them? From the Confirm Hosts page I am getting the following errors: for connecting to the name node STDOUT: {'exitstatus': 1, 'log': Host registration aborted. Ambari Agent host cannot reach Ambari Server 'localhost.namenode:8080'. Please check the network connectivity between the Ambari Agent host and the Ambari Server} for connecting to the datanode10 INFO 2014-12-15 16:42:33,348 DataCleaner.py:36 - Data cleanup thread started ERROR 2014-12-15 16:42:33,349 main.py:137 - Ambari agent machine hostname (localhost.datanode10) does not match expected ambari server hostname (datanode10.localdomain). Aborting registration. Please check hostname, hostname -f and /etc/hosts file to confirm your hostname is setup correctly ', None) I am getting similiar error when trying to get to the datanode01. Please note I used the following domain names for the following datanodes when I installed the CentOS datanode 10 -- localhost.datanode10 datanode01 -- localhost.datanode01 David Novogrodsky david.novogrod...@gmail.com http://www.linkedin.com/in/davidnovogrodsky On Mon, Dec 15, 2014 at 11:50 AM, Yusaku Sako yus...@hortonworks.com wrote: Did you change the FQDNs like I proposed, like namenode.localdomain, rather than localhost.namenode? Did you ensure that the 3 commands returned the results as shown? Can each host resolve all the other hosts by name? If you want to get a cluster up and running on VMs, the best bet is to use: https://cwiki.apache.org/confluence/display/AMBARI/Quick+Start+Guide This sets up all /etc/hosts and other settings in the way you want. Then you can see how these VMs are being set up and mimic on your VMs if you'd rather set them up from scratch. I hope this helps. Yusaku On Mon, Dec 15, 2014 at 8:18 AM, David Novogrodsky david.novogrod...@gmail.com wrote: Ok, I removed the multiple instances onf localhost.namenode. It now only appears on one line in the hosts file. The main ambari server still cannot see the data nodes nor the node Ambari is on. Ambari is on the namenode. When I run the install, the install program can not connect to any node in the network. Also I tried running /etc/init.d/network restart on one of the nodes; datanode10 ( a virtual machine). Now that node cannot connect to the internetI would like to send you the information but I am having problems setting the document from the virtual machine. I do not have a DNS. These machines have hardwired IP addresses and names in the host file. Did runn /etc/init.d/network restart break the connection? David Novogrodsky david.novogrod...@gmail.com http://www.linkedin.com/in/davidnovogrodsky On Sat, Dec 13, 2014 at 12:46 AM, Yusaku Sako yus...@hortonworks.com wrote: You can just make the changes in /etc/hosts. You might also change /etc/sysconfig/network and run /etc/init.d/network restart. Then make sure that running the 3 commands return expected results.