Based on your deploy error log:
"3": {
"nodeReport": {
"PUPPET_KICK_FAILED": [],
"PUPPET_OPERATION_FAILED": [
"vbaby3.cloud.eb",
"vbaby5.cloud.eb",
"vbaby4.cloud.eb",
"vbaby2.cloud.eb",
"vbaby6.cloud.eb",
"vbaby1.cloud.eb"
],
"PUPPET_OPERATION_TIMEDOUT": [
"vbaby5.cloud.eb",
"vbaby4.cloud.eb",
"vbaby2.cloud.eb",
"vbaby6.cloud.eb",
"vbaby1.cloud.eb"
],
5 nodes timed out which means the puppet agent is not running on them or they
cannot communicate with the master. Trying doing a puppet kick --ping to them
from the master.
For the one which failed, it failed at
"\"Mon Aug 13 11:54:17 +0800 2012 /Stage[1]/Hdp::Pre_install_pkgs/Hdp::Exec[yum
install $pre_installed_pkgs]/Exec[yum install $pre_installed_pkgs]/returns
(err): change from notrun to 0 failed: yum install -y hadoop hadoop-libhdfs
hadoop-native hadoop-pipes hadoop-sbin hadoop-lzo hadoop hadoop-libhdfs
hadoop-native hadoop-pipes hadoop-sbin hadoop-lzo hdp_mon_dashboard
ganglia-gmond-3.2.0 gweb hdp_mon_ganglia_addons snappy snappy-devel returned 1
instead of one of [0] at /etc/puppet/agent/modules/hdp/manifests/init.pp:265\"",
It seems like yum install failed on the host. Try running the command manually
and see what the error is.
-- Hitesh
On Aug 13, 2012, at 2:28 AM, xu peng wrote:
> Hi Hitesh :
>
> It's me again.
>
> Followed you advice , I reinstalled the ambari server. But deploying
> cluster and uninstall cluster failed again. I really don't know why.
>
> I supplied a attachment which contains the logs of all the nodes in
> my cluster (/var/log/puppet_*.log , /var/log/puppet/*.log ,
> /var/log/yum.log, /var/log/hmc/hmc.log). And vbaby3.cloud.eb is the
> ambari server. Please refer.
>
> Attachment DeployError and UninstallError is the log supplied by the
> website of ambari when failing. And attachment DeployingDetails.jpg is
> the deploy details of my cluster. Please refer.
>
>
> Thanks again for your patience ! And look forward to your reply.
>
> Xupeng
>
> On Sat, Aug 11, 2012 at 10:56 PM, Hitesh Shah <[email protected]> wrote:
>> For uninstall failures, you will need to do a couple of things. Depending on
>> where the uninstall failed, you may have to manually do a killall java on
>> all the nodes to kill any missed processes. If you want to start with a
>> complete clean install, you should also delete the hadoop dir in the mount
>> points you selected during the previous install so that the new fresh
>> install does not face errors when it tries to re-format hdfs.
>>
>> After that, simply, uinstall and re-install ambari rpm and that should allow
>> you to re-create a fresh cluster.
>>
>> -- Hitesh
>>
>> On Aug 11, 2012, at 2:34 AM, xu peng wrote:
>>
>>> Hi Hitesh :
>>>
>>> Thanks a lot for your reply.
>>>
>>> I solved this problem , it is silly mistake. Someone has changed the
>>> owner of "/" dir , and according to the errorlog , pdsh need root to
>>> proceed.
>>>
>>> After changing the owner of "/" to root , problem solved. Thank you
>>> again for you reply.
>>>
>>> I have another question. I had a uninstall failure , and there is no
>>> button on the website for me to rollback and i don't know what to do
>>> about that. What should i do now to reinstall hadoop ?
>>>
>>> Thanks
>>>
>>> On Fri, Aug 10, 2012 at 10:55 PM, Hitesh Shah <[email protected]>
>>> wrote:
>>>> Hi
>>>>
>>>> Currently, the ambari installer requires everything to be run as root. It
>>>> does not detect that the user is not root and use sudo either on the
>>>> master or on the agent nodes.
>>>> Furthermore, it seems like it is failing when trying to use pdsh to make
>>>> remote calls to the host list that you passed in due to the errors
>>>> mentioned in your script. This could be due to how it was installed but I
>>>> am not sure.
>>>>
>>>> Could you switch to become root and run any simple command on all hosts
>>>> using pdsh? If you want to reference exactly how ambari uses pdsh, you can
>>>> look into /usr/share/hmc/php/frontend/commandUtils.php
>>>>
>>>> thanks
>>>> -- Hitesh
>>>>
>>>> On Aug 9, 2012, at 9:04 PM, xu peng wrote:
>>>>
>>>>> According to the error log , is there something wrong with my account ?
>>>>>
>>>>> I installed all the dependency module and ambari with the user
>>>>> "ambari" instead of root. I added user "ambari" to /etc/sudofilers
>>>>> with no passwd.
>>>>>
>>>>> On Fri, Aug 10, 2012 at 11:49 AM, xu peng <[email protected]> wrote:
>>>>>> There is no 100.log.file in /var/log/hmc dir, but only 55.log file (55
>>>>>> is the biggest version num).
>>>>>>
>>>>>> The content of 55.log is :
>>>>>> pdsh@vbaby1: module path "/usr/lib64/pdsh" insecure.
>>>>>> pdsh@vbaby1: "/": Owner not root, current uid, or pdsh executable owner
>>>>>> pdsh@vbaby1: Couldn't load any pdsh modules
>>>>>>
>>>>>> Thanks ~
>>>>>>
>>>>>>
>>>>>> On Fri, Aug 10, 2012 at 11:36 AM, Hitesh Shah <[email protected]>
>>>>>> wrote:
>>>>>>> Sorry - my mistake. The last txn mentioned is 100 so please look for
>>>>>>> the 100.log file.
>>>>>>>
>>>>>>> -- Hitesh
>>>>>>>
>>>>>>>
>>>>>>> On Aug 9, 2012, at 8:34 PM, Hitesh Shah wrote:
>>>>>>>
>>>>>>>> Thanks - will take a look and get back to you.
>>>>>>>>
>>>>>>>> Could you also look at /var/log/hmc/hmc.txn.55.log and see if there
>>>>>>>> are any errors in it?
>>>>>>>>
>>>>>>>> -- Hitesh.
>>>>>>>>
>>>>>>>> On Aug 9, 2012, at 8:00 PM, xu peng wrote:
>>>>>>>>
>>>>>>>>> Hi Hitesh :
>>>>>>>>>
>>>>>>>>> Thanks a lot for your replying. I have done all your suggestions in my
>>>>>>>>> ambari server , and the result is as below.
>>>>>>>>>
>>>>>>>>> 1. I can confirm that the hosts.txt file is empty after i failed at
>>>>>>>>> the step finding reachable nodes.
>>>>>>>>> 2. I tried make hostdetails file in win7 and redhat , it both
>>>>>>>>> failed.(Please see the attachment, my hostdetails file)
>>>>>>>>> 3. I removed the logging re-direct and run the .sh script .It seems
>>>>>>>>> like the script works well , it print the hostname in console and
>>>>>>>>> generate a file (content is "0") in the same dir. (Please see the
>>>>>>>>> attachment , the result and my .sh script )
>>>>>>>>> 4. I attached the hmc.log and error_log too. Hope this helps ~
>>>>>>>>>
>>>>>>>>> Thanks ~
>>>>>>>>> Xupeng
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Aug 10, 2012 at 12:24 AM, Hitesh Shah
>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>> Xupeng, can you confirm that the hosts.txt file at
>>>>>>>>>> /var/run/hmc/clusters/EBHadoop/hosts.txt is empty?
>>>>>>>>>>
>>>>>>>>>> Also, can you ensure that the hostdetails file that you upload does
>>>>>>>>>> not have any special characters that may be creating problems for
>>>>>>>>>> the parsing layer?
>>>>>>>>>>
>>>>>>>>>> In the same dir, there should be an ssh.sh script. Can you create a
>>>>>>>>>> copy of it, edit to remove the logging re-directs to files and run
>>>>>>>>>> the script manually from command-line ( it takes in a hostname as
>>>>>>>>>> the argument ) ? The output of that should show you as to what is
>>>>>>>>>> going wrong.
>>>>>>>>>>
>>>>>>>>>> Also, please look at /var/log/hmc/hmc.log and httpd/error_log to see
>>>>>>>>>> if there are any errors being logged which may shed more light on
>>>>>>>>>> the issue.
>>>>>>>>>>
>>>>>>>>>> thanks
>>>>>>>>>> -- Hitesh
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Aug 9, 2012, at 9:11 AM, Artem Ervits wrote:
>>>>>>>>>>
>>>>>>>>>>> Which file are you supplying in the step? Hostdetail.txt or hosts?
>>>>>>>>>>>
>>>>>>>>>>> From: xupeng.bupt [mailto:[email protected]]
>>>>>>>>>>> Sent: Thursday, August 09, 2012 11:33 AM
>>>>>>>>>>> To: ambari-user
>>>>>>>>>>> Subject: Re: RE: Problem when setting up hadoop cluster step 2
>>>>>>>>>>>
>>>>>>>>>>> Thank you for your replying ~
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I made only one hostdetail.txt file which contains the names of all
>>>>>>>>>>> servers. And i submit this file on the website , but i still have
>>>>>>>>>>> the same problem. I failed at the step of finding reachable nodes.
>>>>>>>>>>>
>>>>>>>>>>> The error log is : "
>>>>>>>>>>> [ERROR][sequentialScriptExecutor][sequentialScriptRunner.php:272][]:
>>>>>>>>>>> Encountered total failure in transaction 100 while running cmd:
>>>>>>>>>>> /usr/bin/php ./addNodes/findSshableNodes.php with args: EBHadoop
>>>>>>>>>>> root
>>>>>>>>>>> 35 100 36 /var/run/hmc/clusters/EBHadoop/hosts.txt
>>>>>>>>>>> "
>>>>>>>>>>>
>>>>>>>>>>> And my hostdetail.txt file is :"
>>>>>>>>>>> vbaby2.cloud.eb
>>>>>>>>>>> vbaby3.cloud.eb
>>>>>>>>>>> vbaby4.cloud.eb
>>>>>>>>>>> vbaby5.cloud.eb
>>>>>>>>>>> vbaby6.cloud.eb
>>>>>>>>>>> "
>>>>>>>>>>> Thank you very much ~
>>>>>>>>>>>
>>>>>>>>>>> 2012-08-09
>>>>>>>>>>> xupeng.bupt
>>>>>>>>>>> 发件人: Artem Ervits
>>>>>>>>>>> 发送时间: 2012-08-09 22:16:53
>>>>>>>>>>> 收件人: [email protected]
>>>>>>>>>>> 抄送:
>>>>>>>>>>> 主题: RE: Problem when setting up hadoop cluster step 2
>>>>>>>>>>> the installer requires a hosts file which I believe you called
>>>>>>>>>>> hostdetail. Make sure it's the same file. You also mention a
>>>>>>>>>>> hosts.txt and host.txt. You only need one file with the names of
>>>>>>>>>>> all servers.
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: xu peng [mailto:[email protected]]
>>>>>>>>>>> Sent: Thursday, August 09, 2012 2:02 AM
>>>>>>>>>>> To: [email protected]
>>>>>>>>>>> Subject: Problem when setting up hadoop cluster step 2
>>>>>>>>>>> Hi everyone :
>>>>>>>>>>> I am trying to use ambari to set up a hadoop cluster , but i
>>>>>>>>>>> encounter a problem on step 2. I already set up the password-less
>>>>>>>>>>> ssh, and i creat a hostdetail.txt file.
>>>>>>>>>>> The problem is that i found the file
>>>>>>>>>>> "/var/run/hmc/clusters/EBHadoop/hosts.txt" is empty , no matter how
>>>>>>>>>>> many times i submit the host.txt file on the website , and i really
>>>>>>>>>>> don't know why.
>>>>>>>>>>> {
>>>>>>>>>>> Here is the log file : [2012:08:09
>>>>>>>>>>> 05:17:56][ERROR][sequentialScriptExecutor][sequentialScriptRunner.php:272][]:
>>>>>>>>>>> Encountered total failure in transaction 100 while running cmd:
>>>>>>>>>>> /usr/bin/php ./addNodes/findSshableNodes.php with args: EBHadoop
>>>>>>>>>>> root
>>>>>>>>>>> 35 100 36 /var/run/hmc/clusters/EBHadoop/hosts.txt
>>>>>>>>>>> and my host.txt is like this(vbaby1.cloud.eb is the master node) :
>>>>>>>>>>> vbaby2.cloud.eb
>>>>>>>>>>> vbaby3.cloud.eb
>>>>>>>>>>> vbaby4.cloud.eb
>>>>>>>>>>> vbaby5.cloud.eb
>>>>>>>>>>> vbaby6.cloud.eb
>>>>>>>>>>> }
>>>>>>>>>>> Can anyone help me and tell me what i am doing wrong ?
>>>>>>>>>>> Thank you very much ~!
>>>>>>>>>>> This electronic message is intended to be for the use only of the
>>>>>>>>>>> named recipient, and may contain information that is confidential
>>>>>>>>>>> or privileged. If you are not the intended recipient, you are
>>>>>>>>>>> hereby notified that any disclosure, copying, distribution or use
>>>>>>>>>>> of the contents of this message is strictly prohibited. If you have
>>>>>>>>>>> received this message in error or are not the named recipient,
>>>>>>>>>>> please notify us immediately by contacting the sender at the
>>>>>>>>>>> electronic mail address noted above, and delete and destroy all
>>>>>>>>>>> copies of this message. Thank you.
>>>>>>>>>>> --------------------
>>>>>>>>>>> This electronic message is intended to be for the use only of the
>>>>>>>>>>> named recipient, and may contain information that is confidential
>>>>>>>>>>> or privileged. If you are not the intended recipient, you are
>>>>>>>>>>> hereby notified that any disclosure, copying, distribution or use
>>>>>>>>>>> of the contents of this message is strictly prohibited. If you
>>>>>>>>>>> have received this message in error or are not the named recipient,
>>>>>>>>>>> please notify us immediately by contacting the sender at the
>>>>>>>>>>> electronic mail address noted above, and delete and destroy all
>>>>>>>>>>> copies of this message. Thank you.
>>>>>>>>>>> --------------------
>>>>>>>>>>> This electronic message is intended to be for the use only of the
>>>>>>>>>>> named recipient, and may contain information that is confidential
>>>>>>>>>>> or privileged. If you are not the intended recipient, you are
>>>>>>>>>>> hereby notified that any disclosure, copying, distribution or use
>>>>>>>>>>> of the contents of this message is strictly prohibited. If you
>>>>>>>>>>> have received this message in error or are not the named recipient,
>>>>>>>>>>> please notify us immediately by contacting the sender at the
>>>>>>>>>>> electronic mail address noted above, and delete and destroy all
>>>>>>>>>>> copies of this message. Thank you.
>>>>>>>>>>> --------------------
>>>>>>>>>>>
>>>>>>>>>>> This electronic message is intended to be for the use only of the
>>>>>>>>>>> named recipient, and may contain information that is confidential
>>>>>>>>>>> or privileged. If you are not the intended recipient, you are
>>>>>>>>>>> hereby notified that any disclosure, copying, distribution or use
>>>>>>>>>>> of the contents of this message is strictly prohibited. If you
>>>>>>>>>>> have received this message in error or are not the named recipient,
>>>>>>>>>>> please notify us immediately by contacting the sender at the
>>>>>>>>>>> electronic mail address noted above, and delete and destroy all
>>>>>>>>>>> copies of this message. Thank you.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --------------------
>>>>>>>>>>>
>>>>>>>>>>> This electronic message is intended to be for the use only of the
>>>>>>>>>>> named recipient, and may contain information that is confidential
>>>>>>>>>>> or privileged. If you are not the intended recipient, you are
>>>>>>>>>>> hereby notified that any disclosure, copying, distribution or use
>>>>>>>>>>> of the contents of this message is strictly prohibited. If you
>>>>>>>>>>> have received this message in error or are not the named recipient,
>>>>>>>>>>> please notify us immediately by contacting the sender at the
>>>>>>>>>>> electronic mail address noted above, and delete and destroy all
>>>>>>>>>>> copies of this message. Thank you.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> <hmcLog.txt><hostdetails.txt><httpdLog.txt><ssh1.sh><ssh1_result.jpg>
>>>>>>>>
>>>>>>>
>>>>
>>
> <DeployError1_2012.8.13.txt><log.rar><DeployingDetails.jpg><UninstallError1_2012.8.13.txt>