Re: Problem with Ambari 1.7 recognizing hosts running CentOS 6

2014-12-17 Thread David Novogrodsky
I am following the startup procedure listed here:
https://cwiki.apache.org/confluence/display/AMBARI/Install+Ambari+1.7.0+from+Public+Repositories.
I am trying to install HDP 2.1.  Please note that these install
instructions mention nothing about iptables or selinux.  I am using CentOS
6 machines.

I tried changing the hosts file on all the machines to this:
127.0.0.1   localhost localhost.localdomain localhost4
localhost4.localdomain4
::1 localhost localhost.localdomain localhost6
localhost6.localdomain6
xxx.xxx.200.144 datanode10.localdomain.com
xxx.xxx.200.107 datanode01.localdomain.com
xxx.xxx.200.143 namenode.localdomain.com


(just trying to hide my IP addresses)  The error is the same


David Novogrodsky
david.novogrod...@gmail.com
http://www.linkedin.com/in/davidnovogrodsky

On Wed, Dec 17, 2014 at 9:05 AM, David Novogrodsky 
david.novogrod...@gmail.com wrote:

 I hope this clears up the confusion:

 First, this is what the hosts file looks like:
 127.0.0.1   localhost localhost.localdomain localhost4
 localhost4.localdomain4
 ::1 localhost localhost.localdomain localhost6
 localhost6.localdomain6
 192.168.200.144 datanode10.localdomain.com
 192.168.200.143 namenode.localdomain.com namenode
 192.168.200.107 datanode01.localdomain.com

 I have re-started all the nodes in the cluster. Several times.
 I can reach all nodes from all nodes using ping and their fully qualified
 domain names
 I can reach the data nodes from the name node using password-less ssh.
 I have changed the name of the machines to match their names in the hosts
 files on each machine.
 I have checked the /etc/ambari-agnet/conf/ambari-agent.ini file.

 David Novogrodsky
 david.novogrod...@gmail.com
 http://www.linkedin.com/in/davidnovogrodsky

 On Tue, Dec 16, 2014 at 10:20 PM, Devopam Mittra devo...@gmail.com
 wrote:

 forgive me if i sound rude , but please re-read the installation
 instructions properly - it should help you in your case positively.

 1. have a sound naming convention for all your boxes. e.g.:
 namenode01.localdomain , datanode01.localdomain , datanode0N.localdomain ,
 this will help you much in your future expansion and maintenance of your
 cluster
 2. do not , by any means , tamper with /etc/hosts for 127.0.0.1 and ::1 ,
 let it be localhost keyword only as you don't want to change that in the
 first place ... so don't play around with that one . it will help you to
 otherwise maintain normal operations on your box as well , otherwise for
 every internal lookup of OS functions it will only create issues
 3. if you have a DHCP + very good DNS server in place, then okay , else ,
 assign static IPs to your machines and create one entry for each box with
 the FQDN and static IP address , replicated on ALL the boxes
 4. set up keyless ssh login for root or any other uniform localuser that
 you want to use and manage ambari + hadoop
 5. confirm that namenode and the ambari server machines (in case they are
 different for you) can talk to ALL the machines using a keyless login for
 that universal user you have created in above steps.

 hope the above will help you to sort out the issue in a single go.

 regards
 Dev



 On Tue, Dec 16, 2014 at 11:45 PM, David Novogrodsky 
 david.novogrod...@gmail.com wrote:

 There is nothing simply done in Ambari.  :)

 By changing the name of this computer and restarting the namenode
 Ambari does not recogize any node.  The main error I am wondering about is
 this:
 INFO 2014-12-16 12:02:29,669 main.py:233 - Connecting to Ambari server
 at https://namenode.localdomain:8440 (98.124.198.1)
 INFO 2014-12-16 12:02:29,670 NetUtil.py:48 - Connecting to
 https://namenode.localdomain:8440/ca
 WARNING 2014-12-16 12:02:29,718 NetUtil.py:71 - Failed to connect to
 https://namenode.localdomain:8440/ca due to [Errno 111] Connection
 refused
 WARNING 2014-12-16 12:02:29,719 NetUtil.py:92 - Server at
 https://namenode.localdomain:8440 is not reachable, sleeping for 10
 seconds...
 ', None)
 Why is Ambari using namenode.localdomain to connect?

 I am running Ambari on this node; I am running Ambari on the namenode of
 this cluster.  The host file for this computer is this:
   GNU nano 2.0.9  File:
 /etc/hosts

 127.0.0.1   localhost localhost.localdomain localhost4
 localhost4.localdomain4
 ::1 localhost localhost.localdomain localhost6
 localhost6.localdomain6
 192.168.200.144 localhost.datanode10
 192.168.200.107 localhost.datanode01
 192.168.200.143 namenode.localdomain.com namenode

 The Ambari wizard said I needed to use fully qualified domain names, so

 What follows is a detailed log of the registration log.  I get this
 error in the registration log for namenode.localdomain.com:
 --
 ==
 Creating target directory...
 ==

 Command start time 2014-12-16 12:02:18

 Connection to namenode.localdomain.com closed.
 SSH command execution finished
 host=namenode.localdomain.com, exitcode=0
 Command end time 

Re: Problem with Ambari 1.7 recognizing hosts running CentOS 6

2014-12-17 Thread David Novogrodsky
The error from the registration log is as follows:
==
Running setup agent script...
==
Agent log at: /var/log/ambari-agent/ambari-
agent.log
(WARNING 2014-12-17 10:43:08,349 NetUtil.py:92 - Server at https://namenode
.
localdomain.com.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode:8440
is not reachable, sleeping for 10 seconds...

David Novogrodsky
david.novogrod...@gmail.com
http://www.linkedin.com/in/davidnovogrodsky


Re: Problem with Ambari 1.7 recognizing hosts running CentOS 6

2014-12-17 Thread David Novogrodsky
I am having problems adding mor
​​
information to this post:
Delivery to the following recipient failed permanently:

 user@ambari.apache.org

Technical details of permanent failure:
Google tried to deliver your message, but it was rejected by the server for
the recipient domain ambari.apache.org by
mx1.eu.apache.org.[192.87.106.230].

The error that the other server returned was:
552 spam score (6.3) exceeded threshold
(HTML_MESSAGE,LONGWORDS,RCVD_IN_DNSWL_LOW,SPF_PASS,SPOOF_COM2OTH,WEIRD_PORT

David Novogrodsky
david.novogrod...@gmail.com
http://www.linkedin.com/in/davidnovogrodsky

On Wed, Dec 17, 2014 at 1:12 PM, David Novogrodsky 
david.novogrod...@gmail.com wrote:

 The error from the registration log is as follows:
 ==
 Running setup agent script...
 ==
 Agent log at: /var/log/ambari-agent/ambari-
 agent.log
 (WARNING 2014-12-17 10:43:08,349 NetUtil.py:92 - Server at
 https://namenode .
 localdomain.com.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode:8440
 is not reachable, sleeping for 10 seconds...

 David Novogrodsky
 david.novogrod...@gmail.com
 http://www.linkedin.com/in/davidnovogrodsky




Re: Problem with Ambari 1.7 recognizing hosts running CentOS 6

2014-12-17 Thread Jeff Sposetti
Hi David, Try sending in plain/text, not HTML.

On Wed, Dec 17, 2014 at 7:10 PM, David Novogrodsky 
david.novogrod...@gmail.com wrote:

 I am having problems adding mor
 ​​
 information to this post:
 Delivery to the following recipient failed permanently:

  user@ambari.apache.org

 Technical details of permanent failure:
 Google tried to deliver your message, but it was rejected by the server
 for the recipient domain ambari.apache.org by
 mx1.eu.apache.org.[192.87.106.230].

 The error that the other server returned was:
 552 spam score (6.3) exceeded threshold
 (HTML_MESSAGE,LONGWORDS,RCVD_IN_DNSWL_LOW,SPF_PASS,SPOOF_COM2OTH,WEIRD_PORT

 David Novogrodsky
 david.novogrod...@gmail.com
 http://www.linkedin.com/in/davidnovogrodsky

 On Wed, Dec 17, 2014 at 1:12 PM, David Novogrodsky 
 david.novogrod...@gmail.com wrote:

 The error from the registration log is as follows:
 ==
 Running setup agent script...
 ==
 Agent log at: /var/log/ambari-agent/ambari-
 agent.log
 (WARNING 2014-12-17 10:43:08,349 NetUtil.py:92 - Server at
 https://namenode .
 localdomain.com.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode:8440
 is not reachable, sleeping for 10 seconds...

 David Novogrodsky
 david.novogrod...@gmail.com
 http://www.linkedin.com/in/davidnovogrodsky



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Problem with Ambari 1.7 recognizing hosts running CentOS 6

2014-12-17 Thread David Novogrodsky
Please forgive me if I am sending this twice:

I am having a problem with Ambari not recognizing nodes on a network.
The cluster is using CentOS 6.  I am trying to install HDP 2.1.  I
have the following values in my hosts file:

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.200.144 datanode10.localdomain.com
192.168.200.143 namenode.localdomain.com
192.168.200.107 datanode01.localdomain.com
When I try to connect from the namenode.localdomain.com to
datanode10.localdomain.com i get this error in the registration log:

==
Running setup agent script...
DJN...expected_host not defined here
DJN:bootstrap.py ...expected_host is: datanode10.localdomain.com
==

Agent out at: /var/log/ambari-agent/ambari-agent.out
Agent log at: /var/log/ambari-agent/ambari-agent.log
(WARNING 2014-12-17 16:22:50,380 NetUtil.py:92 - Server at
https://namenode.localdomain.com.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode:8440
is not reachable, sleeping for 10 seconds...
INFO 2014-12-17 16:23:00,390 NetUtil.py:48 - Connecting to
https://namenode.localdomain.com.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode:8440/ca
WARNING 2014-12-17 16:23:00,391 NetUtil.py:71 - Failed to connect to
https://namenode.localdomain.com.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode:8440/ca
due to [Errno -2] Name or service not known
...
Connection to datanode10.localdomain.com closed.
SSH command execution finished
host=datanode10.localdomain.com, exitcode=0
Command end time 2014-12-17 16:23:26  datanode10.localdomain.com


What follows is more detail.

I also make some changes to the
/usr/lib/python2.6/site-packages/ambari_server/bootstrap.py file
  def run(self):
sshcommand = [ssh,
  -o, ConnectTimeOut=60,
  -o, StrictHostKeyChecking=no,
  -o, BatchMode=yes,
  -tt, # Should prevent tput: No value for $TERM
and no -T specified warning
  -i, self.sshkey_file,
  self.user + @ + self.host, self.command]
if DEBUG:
  self.host_log.write(Running ssh command  + ' '.join(sshcommand))
self.host_log.write(==)
self.host_log.write(\nCommand start time  +
datetime.now().strftime('%Y-%m-%d %H:%M:%S') ++ self.host +   
+ self.user ++ self.sshkey_file ++ self.command)
#self.host_log.write(djn:BOOTSTRAP the value is: + self.host)
sshstat = subprocess.Popen(sshcommand, stdout=subprocess.PIPE,
   stderr=subprocess.PIPE)
log = sshstat.communicate()
errorMsg = log[1]
if self.errorMessage and sshstat.returncode != 0:
  errorMsg = self.errorMessage + \n + errorMsg
log = log[0] + \n + errorMsg
self.host_log.write(log)
self.host_log.write(SSH command execution finished)
self.host_log.write(host= + self.host + , exitcode= +
str(sshstat.returncode))
self.host_log.write(Command end time  +
datetime.now().strftime('%Y-%m-%d %H:%M:%S') ++ self.host)
return  {exitstatus: sshstat.returncode, log: log, errormsg: errorMsg}

I added some information on the host_log file.  The information
includes self.host, self.user, self.ssh key_file and so on...

When I run the web front end I get two different results.  First I
will detail the connection to the namenode.localdomain.com.  second I
will detail the connection to the datanode10.localdomain.com.

The connection to the namenode.localdomain.com is successful.  Here is
the important part of the registeration log:

==
Running setup agent script...
DJN...expected_host not defined here
DJN:bootstrap.py ...expected_host is: namenode.localdomain.com
==
Command start time 2014-12-17 16:23:17  namenode.localdomain.com  root
 /var/run/ambari-server/bootstrap/25/sshKey  sudo python
/var/lib/ambari-agent/data/tmp/setupAgent1418854996.py
namenode.localdomain.com DEV namenode.localdomain.com 1.7.0 8080
Verifying Python version compatibility...
Using python  /usr/bin/python2.6
Found ambari-agent 

Re: Problem with Ambari 1.7 recognizing hosts running CentOS 6

2014-12-15 Thread Devopam Mittra
May I suggest you simply do a ssh -l keylessusername using the previous
and the new FQDNs that you have defined to verify which one is in effect,
and accessible ?
Also, since you changed the FQDN, you may wish to simply reboot the cluster
once, just to make sure that new ones are in-place.
It might happen that after the reboot you will need to redo the ssh keyless
pairing once again (most probably)

regards
Devopam


On Tue, Dec 16, 2014 at 4:32 AM, David Novogrodsky 
david.novogrod...@gmail.com wrote:

 The changes I am making in the hosts file are not being picked up by the
 installation scripts of Ambari.  I was told I could make changes to the
 hosts file and that Ambari would see them.  I have
 checked the etc/ambari-agent/conf/ambari-agent.ini file and the changes I
 made to the hosts file are not showing up in that file.  Where is Ambari
 getting the names for the other nodes in the cluster?

 Here are the changes I made to the hosts file on the host for the name
 node:
 127.0.0.1   localhost localhost.localdomain localhost4
 localhost4.localdomain4
 ::1 localhost localhost.localdomain localhost6
 localhost6.localdomain6
 192.168.200.144 datanode10.localdomain
 192.168.200.107 datanode01.localdomain
 192.168.200.143 namenode.localdomain namenode

 Since I made these changes Ambari can not discover any of the nodes in the
 network.  None of them.

 I have not made these changes to the other nodes because I do not want to
 make changes to the other nodes until I can see Ambari discover the host it
 is sitting upon.

 Regarding the commands you mentioned, here are the results:
 [root@localhost conf]# hostname -f
 hostname: Unknown host
 [root@localhost conf]# hostname
 localhost.namenode
 [root@localhost conf]#  python -c 'import socket; print socket.getfqdn()'
 localhost.namenode

 localhost.namenode was the name for I used for this host during the
 installation of CentOS.   I thought you said i could make changes to the
 hosts file and the installation scripts would recognize them?

 From the Confirm Hosts page I am getting the following errors:
 for connecting to the name node

 STDOUT: {'exitstatus': 1, 'log': Host registration aborted. Ambari Agent host
 cannot reach Ambari Server 'localhost.namenode:8080'. Please check the network
 connectivity between the Ambari Agent host and the Ambari Server}

 for connecting to the datanode10

 INFO 2014-12-15 16:42:33,348 DataCleaner.py:36 - Data cleanup thread started
 ERROR 2014-12-15 16:42:33,349 main.py:137 - Ambari agent machine hostname
  (localhost.datanode10) does not match expected ambari server hostname
 (datanode10.localdomain). Aborting registration. Please check hostname,
 hostname -f and /etc/hosts file to confirm your hostname is setup correctly
 ', None)

 I am getting similiar error when trying to get to the datanode01.  Please
 note I used the following domain names for the following datanodes when I
 installed the CentOS
 datanode 10 -- localhost.datanode10
 datanode01 -- localhost.datanode01





 David Novogrodsky
 david.novogrod...@gmail.com
 http://www.linkedin.com/in/davidnovogrodsky

 On Mon, Dec 15, 2014 at 11:50 AM, Yusaku Sako yus...@hortonworks.com
 wrote:

 Did you change the FQDNs like I proposed, like namenode.localdomain,
 rather than localhost.namenode?
 Did you ensure that the 3 commands returned the results as shown?
 Can each host resolve all the other hosts by name?

 If you want to get a cluster up and running on VMs, the best bet is to
 use:
 https://cwiki.apache.org/confluence/display/AMBARI/Quick+Start+Guide

 This sets up all /etc/hosts and other settings in the way you want.
 Then you can see how these VMs are being set up and mimic on your VMs if
 you'd rather set them up from scratch.

 I hope this helps.
 Yusaku


 On Mon, Dec 15, 2014 at 8:18 AM, David Novogrodsky 
 david.novogrod...@gmail.com wrote:

 Ok, I removed the multiple instances onf localhost.namenode.  It now
 only appears on one line in the hosts file.

 The main ambari server still cannot see the data nodes nor the node
 Ambari is on.  Ambari is on the namenode.  When I run the install, the
 install program can not connect to any node in the network.

 Also I tried running /etc/init.d/network restart on one of the nodes;
 datanode10 ( a virtual machine).  Now that node cannot connect to the
 internetI would like to send you the information but I am having
 problems setting the document from the virtual machine.

 I do not have a DNS.  These machines have hardwired IP addresses and
 names in the host file. Did runn /etc/init.d/network restart break the
 connection?


 David Novogrodsky
 david.novogrod...@gmail.com
 http://www.linkedin.com/in/davidnovogrodsky

 On Sat, Dec 13, 2014 at 12:46 AM, Yusaku Sako yus...@hortonworks.com
 wrote:

 You can just make the changes in /etc/hosts.  You might also
 change /etc/sysconfig/network and run /etc/init.d/network restart.

 Then make sure that running the 3 commands return expected results.