Re: Problem running example (wrong IP address)
After talking with the Vagrant community I decided I was being too clever trying to run the datanodes on a separate subnet from the master node. I changed my configuration to have all three hosts on the same subnet and everything works just as expected. Thanks for all your help and input. On Mon, Sep 28, 2015 at 9:07 AM, Daniel Watrous wrote: > Vinay, > > There is no gateway to 51.*. These are IP addresses that I set in my > Vagrantfile for virtualbox as part of a private network: > master.vm.network "private_network", ip: 192.168.51.4 > > This allows me to spin up all the hosts for my cluster automatically and > know that they always have the same IP addresses. > > From hadoop-data1 (192.168.52.4) I have unrestricted access to > hadoop-master (192.168.51.4) > > hadoop@hadoop-data1:~$ ifconfig > eth1 Link encap:Ethernet HWaddr 08:00:27:b9:55:25 > inet addr:192.168.52.4 Bcast:192.168.52.255 Mask:255.255.255.0 > hadoop@hadoop-data1:~$ ping hadoop-master > PING hadoop-master (192.168.51.4) 56(84) bytes of data. > 64 bytes from hadoop-master (192.168.51.4): icmp_seq=1 ttl=63 time=3.13 ms > 64 bytes from hadoop-master (192.168.51.4): icmp_seq=2 ttl=63 time=2.72 ms > > I'm not sure I understand exactly what you're asking for, but from the > master I can run this > > vagrant@hadoop-master:~$ sudo netstat -tnulp | grep 54310 > tcp0 0 0.0.0.0:54310 0.0.0.0:* > LISTEN 22944/java > > I understand what you're saying about a gateway often existing at that > address for a subnet. I'm not familiar enough with Vagrant to answer this > right now, but I will put in a question there. > > I can also change the other two IP addresses to be on the same 51. subnet. > I may try that next. > > > > On Mon, Sep 28, 2015 at 8:33 AM, Vinayakumar B > wrote: > >> 192.168.51.1 might be gateway to 51.* subnet right? >> >> Can you verify whether connections from outside 51 subnet, to 51.4 >> machine using other subnet IP as remote IP. ? >> >> You can create any connection, may not be namenode-datanode. >> >> for ex: Connection from 192.168.52.4 dn to 192.168.51.4 namenode should >> result in following, when checked using netstat command in namenode >> machine. "netstat -tnulp | grep " >> >> Output should be something like below >> >> tcp0 0 192.168.51.4:54310192.168.52.4:32567 >> LISTEN - >> >> >> If the Foreign Ip is listing as 192.168.51.1 instead of 192.168.52.4, >> then the gateway, is not passing original client IP forward, its >> re-creating connections with its own IP. in such case problem will be with >> the gateway. >> >> Its just a guess, reality could be different. >> >> please check and let me know. >> >> -Vinay >> >> On Mon, Sep 28, 2015 at 6:45 PM, Daniel Watrous >> wrote: >> >>> Thanks to Namikaze pointing out that I should have sent the namenode log >>> as a pastbin >>> >>> http://pastebin.com/u33bBbgu >>> >>> >>> On Mon, Sep 28, 2015 at 8:02 AM, Daniel Watrous >>> wrote: >>> >>>> I have posted the namenode logs here: >>>> https://gist.github.com/dwatrous/dafaa7695698f36a5d93 >>>> >>>> Thanks for all the help. >>>> >>>> On Sun, Sep 27, 2015 at 10:28 AM, Brahma Reddy Battula < >>>> brahmareddy.batt...@hotmail.com> wrote: >>>> >>>>> Thanks for sharing the logs. >>>>> >>>>> Problem is interesting..can you please post namenode logs and dual IP >>>>> configurations(thinking problem with gateway while sending requests from >>>>> 52.1 segment to 51.1 segment..) >>>>> >>>>> Thanks And Regards >>>>> Brahma Reddy Battula >>>>> >>>>> >>>>> -- >>>>> Date: Fri, 25 Sep 2015 12:19:00 -0500 >>>>> >>>>> Subject: Re: Problem running example (wrong IP address) >>>>> From: dwmaill...@gmail.com >>>>> To: user@hadoop.apache.org >>>>> >>>>> hadoop-master http://pastebin.com/yVF8vCYS >>>>> hadoop-data1 http://pastebin.com/xMEdf01e >>>>> hadoop-data2 http://pastebin.com/prqd02eZ >>>>> >>>>> >>>>> >>>>> On Fri, Sep 25, 2015 at 11:53 AM, Brahma Reddy Battula < >>>>> brahmareddy.batt...@hotmail.com> wrote: >>>>> >>>
Re: Problem running example (wrong IP address)
Vinay, There is no gateway to 51.*. These are IP addresses that I set in my Vagrantfile for virtualbox as part of a private network: master.vm.network "private_network", ip: 192.168.51.4 This allows me to spin up all the hosts for my cluster automatically and know that they always have the same IP addresses. >From hadoop-data1 (192.168.52.4) I have unrestricted access to hadoop-master (192.168.51.4) hadoop@hadoop-data1:~$ ifconfig eth1 Link encap:Ethernet HWaddr 08:00:27:b9:55:25 inet addr:192.168.52.4 Bcast:192.168.52.255 Mask:255.255.255.0 hadoop@hadoop-data1:~$ ping hadoop-master PING hadoop-master (192.168.51.4) 56(84) bytes of data. 64 bytes from hadoop-master (192.168.51.4): icmp_seq=1 ttl=63 time=3.13 ms 64 bytes from hadoop-master (192.168.51.4): icmp_seq=2 ttl=63 time=2.72 ms I'm not sure I understand exactly what you're asking for, but from the master I can run this vagrant@hadoop-master:~$ sudo netstat -tnulp | grep 54310 tcp0 0 0.0.0.0:54310 0.0.0.0:* LISTEN 22944/java I understand what you're saying about a gateway often existing at that address for a subnet. I'm not familiar enough with Vagrant to answer this right now, but I will put in a question there. I can also change the other two IP addresses to be on the same 51. subnet. I may try that next. On Mon, Sep 28, 2015 at 8:33 AM, Vinayakumar B wrote: > 192.168.51.1 might be gateway to 51.* subnet right? > > Can you verify whether connections from outside 51 subnet, to 51.4 machine > using other subnet IP as remote IP. ? > > You can create any connection, may not be namenode-datanode. > > for ex: Connection from 192.168.52.4 dn to 192.168.51.4 namenode should > result in following, when checked using netstat command in namenode > machine. "netstat -tnulp | grep " > > Output should be something like below > > tcp0 0 192.168.51.4:54310192.168.52.4:32567 > LISTEN - > > > If the Foreign Ip is listing as 192.168.51.1 instead of 192.168.52.4, then > the gateway, is not passing original client IP forward, its re-creating > connections with its own IP. in such case problem will be with the gateway. > > Its just a guess, reality could be different. > > please check and let me know. > > -Vinay > > On Mon, Sep 28, 2015 at 6:45 PM, Daniel Watrous > wrote: > >> Thanks to Namikaze pointing out that I should have sent the namenode log >> as a pastbin >> >> http://pastebin.com/u33bBbgu >> >> >> On Mon, Sep 28, 2015 at 8:02 AM, Daniel Watrous >> wrote: >> >>> I have posted the namenode logs here: >>> https://gist.github.com/dwatrous/dafaa7695698f36a5d93 >>> >>> Thanks for all the help. >>> >>> On Sun, Sep 27, 2015 at 10:28 AM, Brahma Reddy Battula < >>> brahmareddy.batt...@hotmail.com> wrote: >>> >>>> Thanks for sharing the logs. >>>> >>>> Problem is interesting..can you please post namenode logs and dual IP >>>> configurations(thinking problem with gateway while sending requests from >>>> 52.1 segment to 51.1 segment..) >>>> >>>> Thanks And Regards >>>> Brahma Reddy Battula >>>> >>>> >>>> -- >>>> Date: Fri, 25 Sep 2015 12:19:00 -0500 >>>> >>>> Subject: Re: Problem running example (wrong IP address) >>>> From: dwmaill...@gmail.com >>>> To: user@hadoop.apache.org >>>> >>>> hadoop-master http://pastebin.com/yVF8vCYS >>>> hadoop-data1 http://pastebin.com/xMEdf01e >>>> hadoop-data2 http://pastebin.com/prqd02eZ >>>> >>>> >>>> >>>> On Fri, Sep 25, 2015 at 11:53 AM, Brahma Reddy Battula < >>>> brahmareddy.batt...@hotmail.com> wrote: >>>> >>>> sorry,I am not able to access the logs, could please post in paste bin >>>> or attach the 192.168.51.6( as your query is why different IP) DN logs >>>> and namenode logs here..? >>>> >>>> >>>> >>>> >>>> Thanks And Regards >>>> Brahma Reddy Battula >>>> >>>> >>>> -- >>>> Date: Fri, 25 Sep 2015 11:16:55 -0500 >>>> Subject: Re: Problem running example (wrong IP address) >>>> From: dwmaill...@gmail.com >>>> To: user@hadoop.apache.org >>>> >>>> >>>> Brahma, >>>> >>>> Thanks for the reply. I'll keep this conversation here in the user >>>
Re: Problem running example (wrong IP address)
Thanks to Namikaze pointing out that I should have sent the namenode log as a pastbin http://pastebin.com/u33bBbgu On Mon, Sep 28, 2015 at 8:02 AM, Daniel Watrous wrote: > I have posted the namenode logs here: > https://gist.github.com/dwatrous/dafaa7695698f36a5d93 > > Thanks for all the help. > > On Sun, Sep 27, 2015 at 10:28 AM, Brahma Reddy Battula < > brahmareddy.batt...@hotmail.com> wrote: > >> Thanks for sharing the logs. >> >> Problem is interesting..can you please post namenode logs and dual IP >> configurations(thinking problem with gateway while sending requests from >> 52.1 segment to 51.1 segment..) >> >> Thanks And Regards >> Brahma Reddy Battula >> >> >> -- >> Date: Fri, 25 Sep 2015 12:19:00 -0500 >> >> Subject: Re: Problem running example (wrong IP address) >> From: dwmaill...@gmail.com >> To: user@hadoop.apache.org >> >> hadoop-master http://pastebin.com/yVF8vCYS >> hadoop-data1 http://pastebin.com/xMEdf01e >> hadoop-data2 http://pastebin.com/prqd02eZ >> >> >> >> On Fri, Sep 25, 2015 at 11:53 AM, Brahma Reddy Battula < >> brahmareddy.batt...@hotmail.com> wrote: >> >> sorry,I am not able to access the logs, could please post in paste bin or >> attach the 192.168.51.6( as your query is why different IP) DN logs and >> namenode logs here..? >> >> >> >> >> Thanks And Regards >> Brahma Reddy Battula >> >> >> -- >> Date: Fri, 25 Sep 2015 11:16:55 -0500 >> Subject: Re: Problem running example (wrong IP address) >> From: dwmaill...@gmail.com >> To: user@hadoop.apache.org >> >> >> Brahma, >> >> Thanks for the reply. I'll keep this conversation here in the user list. >> The /etc/hosts file is identical on all three nodes >> >> hadoop@hadoop-data1:~$ cat /etc/hosts >> 127.0.0.1 localhost >> 192.168.51.4 hadoop-master >> 192.168.52.4 hadoop-data1 >> 192.168.52.6 hadoop-data2 >> >> hadoop@hadoop-data2:~$ cat /etc/hosts >> 127.0.0.1 localhost >> 192.168.51.4 hadoop-master >> 192.168.52.4 hadoop-data1 >> 192.168.52.6 hadoop-data2 >> >> hadoop@hadoop-master:~$ cat /etc/hosts >> 127.0.0.1 localhost >> 192.168.51.4 hadoop-master >> 192.168.52.4 hadoop-data1 >> 192.168.52.6 hadoop-data2 >> >> Here are the startup logs for all three nodes: >> https://gist.github.com/dwatrous/7241bb804a9be8f9303f >> https://gist.github.com/dwatrous/bcd85cda23d6eca3a68b >> https://gist.github.com/dwatrous/922c4f773aded0137fa3 >> >> Thanks for your help. >> >> >> On Fri, Sep 25, 2015 at 10:33 AM, Brahma Reddy Battula < >> brahmareddy.batt...@huawei.com> wrote: >> >> Seems DN started in three machines and failed in >> hadoop-data1(192.168.52.4).. >> >> >> 192.168.51.6 : giving IP as 192.168.51.1 <http://192.168.51.1:50010>...can >> you please check /etc/hosts file of 192.168.51.6 (might be 192.168.51.1 >> <http://192.168.51.1:50010> is configured in /etc/hosts) >> >> 192.168.52.4 : datanode startup might be failed ( you can check this node >> logs) >> >> 192.168.51.4 : <http://192.168.51.4:50010> Datanode starup is >> success..which is in master node.. >> >> >> >> Thanks & Regards >> Brahma Reddy Battula >> >> >> >> -- >> *From:* Daniel Watrous [dwmaill...@gmail.com] >> *Sent:* Friday, September 25, 2015 8:41 PM >> *To:* user@hadoop.apache.org >> *Subject:* Re: Problem running example (wrong IP address) >> >> I'm still stuck on this and posted it to stackoverflow: >> >> http://stackoverflow.com/questions/32785256/hadoop-datanode-binds-wrong-ip-address >> >> Thanks, >> Daniel >> >> On Fri, Sep 25, 2015 at 8:28 AM, Daniel Watrous >> wrote: >> >> I could really use some help here. As you can see from the output below, >> the two attached datanodes are identified with a non-existent IP address. >> Can someone tell me how that gets selected or how to explicitly set it. >> Also, why are both datanodes shown under the same name/IP? >> >> hadoop@hadoop-master:~$ hdfs dfsadmin -report >> Configured Capacity: 84482326528 (78.68 GB) >> Present Capacity: 75745546240 (70.54 GB) >> DFS Remaining: 75744862208 (70.54 GB) >> DFS Used: 684032 (668 KB) >> DFS Used%: 0.00% >> Under replicated blocks: 0 >> Blocks with corrupt replicas
Re: Problem running example (wrong IP address)
I have posted the namenode logs here: https://gist.github.com/dwatrous/dafaa7695698f36a5d93 Thanks for all the help. On Sun, Sep 27, 2015 at 10:28 AM, Brahma Reddy Battula < brahmareddy.batt...@hotmail.com> wrote: > Thanks for sharing the logs. > > Problem is interesting..can you please post namenode logs and dual IP > configurations(thinking problem with gateway while sending requests from > 52.1 segment to 51.1 segment..) > > Thanks And Regards > Brahma Reddy Battula > > > -- > Date: Fri, 25 Sep 2015 12:19:00 -0500 > > Subject: Re: Problem running example (wrong IP address) > From: dwmaill...@gmail.com > To: user@hadoop.apache.org > > hadoop-master http://pastebin.com/yVF8vCYS > hadoop-data1 http://pastebin.com/xMEdf01e > hadoop-data2 http://pastebin.com/prqd02eZ > > > > On Fri, Sep 25, 2015 at 11:53 AM, Brahma Reddy Battula < > brahmareddy.batt...@hotmail.com> wrote: > > sorry,I am not able to access the logs, could please post in paste bin or > attach the 192.168.51.6( as your query is why different IP) DN logs and > namenode logs here..? > > > > > Thanks And Regards > Brahma Reddy Battula > > > -- > Date: Fri, 25 Sep 2015 11:16:55 -0500 > Subject: Re: Problem running example (wrong IP address) > From: dwmaill...@gmail.com > To: user@hadoop.apache.org > > > Brahma, > > Thanks for the reply. I'll keep this conversation here in the user list. > The /etc/hosts file is identical on all three nodes > > hadoop@hadoop-data1:~$ cat /etc/hosts > 127.0.0.1 localhost > 192.168.51.4 hadoop-master > 192.168.52.4 hadoop-data1 > 192.168.52.6 hadoop-data2 > > hadoop@hadoop-data2:~$ cat /etc/hosts > 127.0.0.1 localhost > 192.168.51.4 hadoop-master > 192.168.52.4 hadoop-data1 > 192.168.52.6 hadoop-data2 > > hadoop@hadoop-master:~$ cat /etc/hosts > 127.0.0.1 localhost > 192.168.51.4 hadoop-master > 192.168.52.4 hadoop-data1 > 192.168.52.6 hadoop-data2 > > Here are the startup logs for all three nodes: > https://gist.github.com/dwatrous/7241bb804a9be8f9303f > https://gist.github.com/dwatrous/bcd85cda23d6eca3a68b > https://gist.github.com/dwatrous/922c4f773aded0137fa3 > > Thanks for your help. > > > On Fri, Sep 25, 2015 at 10:33 AM, Brahma Reddy Battula < > brahmareddy.batt...@huawei.com> wrote: > > Seems DN started in three machines and failed in > hadoop-data1(192.168.52.4).. > > > 192.168.51.6 : giving IP as 192.168.51.1 <http://192.168.51.1:50010>...can > you please check /etc/hosts file of 192.168.51.6 (might be 192.168.51.1 > <http://192.168.51.1:50010> is configured in /etc/hosts) > > 192.168.52.4 : datanode startup might be failed ( you can check this node > logs) > > 192.168.51.4 : <http://192.168.51.4:50010> Datanode starup is > success..which is in master node.. > > > > Thanks & Regards > Brahma Reddy Battula > > > > -- > *From:* Daniel Watrous [dwmaill...@gmail.com] > *Sent:* Friday, September 25, 2015 8:41 PM > *To:* user@hadoop.apache.org > *Subject:* Re: Problem running example (wrong IP address) > > I'm still stuck on this and posted it to stackoverflow: > > http://stackoverflow.com/questions/32785256/hadoop-datanode-binds-wrong-ip-address > > Thanks, > Daniel > > On Fri, Sep 25, 2015 at 8:28 AM, Daniel Watrous > wrote: > > I could really use some help here. As you can see from the output below, > the two attached datanodes are identified with a non-existent IP address. > Can someone tell me how that gets selected or how to explicitly set it. > Also, why are both datanodes shown under the same name/IP? > > hadoop@hadoop-master:~$ hdfs dfsadmin -report > Configured Capacity: 84482326528 (78.68 GB) > Present Capacity: 75745546240 (70.54 GB) > DFS Remaining: 75744862208 (70.54 GB) > DFS Used: 684032 (668 KB) > DFS Used%: 0.00% > Under replicated blocks: 0 > Blocks with corrupt replicas: 0 > Missing blocks: 0 > Missing blocks (with replication factor 1): 0 > > - > Live datanodes (2): > > Name: 192.168.51.1:50010 (192.168.51.1) > Hostname: hadoop-data1 > Decommission Status : Normal > Configured Capacity: 42241163264 (39.34 GB) > DFS Used: 303104 (296 KB) > Non DFS Used: 4302479360 (4.01 GB) > DFS Remaining: 37938380800 (35.33 GB) > DFS Used%: 0.00% > DFS Remaining%: 89.81% > Configured Cache Capacity: 0 (0 B) > Cache Used: 0 (0 B) > Cache Remaining: 0 (0 B) > Cache Used%: 100.00% > Cache Remaining%: 0.00% > Xceivers: 1 > Last contact: Fri Sep 25 13:25:37 UTC 2015 > > > Name:
Re: Problem running example (wrong IP address)
hadoop-master http://pastebin.com/yVF8vCYS hadoop-data1 http://pastebin.com/xMEdf01e hadoop-data2 http://pastebin.com/prqd02eZ On Fri, Sep 25, 2015 at 11:53 AM, Brahma Reddy Battula < brahmareddy.batt...@hotmail.com> wrote: > sorry,I am not able to access the logs, could please post in paste bin or > attach the 192.168.51.6( as your query is why different IP) DN logs and > namenode logs here..? > > > > > Thanks And Regards > Brahma Reddy Battula > > > -- > Date: Fri, 25 Sep 2015 11:16:55 -0500 > Subject: Re: Problem running example (wrong IP address) > From: dwmaill...@gmail.com > To: user@hadoop.apache.org > > > Brahma, > > Thanks for the reply. I'll keep this conversation here in the user list. > The /etc/hosts file is identical on all three nodes > > hadoop@hadoop-data1:~$ cat /etc/hosts > 127.0.0.1 localhost > 192.168.51.4 hadoop-master > 192.168.52.4 hadoop-data1 > 192.168.52.6 hadoop-data2 > > hadoop@hadoop-data2:~$ cat /etc/hosts > 127.0.0.1 localhost > 192.168.51.4 hadoop-master > 192.168.52.4 hadoop-data1 > 192.168.52.6 hadoop-data2 > > hadoop@hadoop-master:~$ cat /etc/hosts > 127.0.0.1 localhost > 192.168.51.4 hadoop-master > 192.168.52.4 hadoop-data1 > 192.168.52.6 hadoop-data2 > > Here are the startup logs for all three nodes: > https://gist.github.com/dwatrous/7241bb804a9be8f9303f > https://gist.github.com/dwatrous/bcd85cda23d6eca3a68b > https://gist.github.com/dwatrous/922c4f773aded0137fa3 > > Thanks for your help. > > > On Fri, Sep 25, 2015 at 10:33 AM, Brahma Reddy Battula < > brahmareddy.batt...@huawei.com> wrote: > > Seems DN started in three machines and failed in > hadoop-data1(192.168.52.4).. > > > 192.168.51.6 : giving IP as 192.168.51.1 <http://192.168.51.1:50010>...can > you please check /etc/hosts file of 192.168.51.6 (might be 192.168.51.1 > <http://192.168.51.1:50010> is configured in /etc/hosts) > > 192.168.52.4 : datanode startup might be failed ( you can check this node > logs) > > 192.168.51.4 : <http://192.168.51.4:50010> Datanode starup is > success..which is in master node.. > > > > Thanks & Regards > Brahma Reddy Battula > > > > -- > *From:* Daniel Watrous [dwmaill...@gmail.com] > *Sent:* Friday, September 25, 2015 8:41 PM > *To:* user@hadoop.apache.org > *Subject:* Re: Problem running example (wrong IP address) > > I'm still stuck on this and posted it to stackoverflow: > > http://stackoverflow.com/questions/32785256/hadoop-datanode-binds-wrong-ip-address > > Thanks, > Daniel > > On Fri, Sep 25, 2015 at 8:28 AM, Daniel Watrous > wrote: > > I could really use some help here. As you can see from the output below, > the two attached datanodes are identified with a non-existent IP address. > Can someone tell me how that gets selected or how to explicitly set it. > Also, why are both datanodes shown under the same name/IP? > > hadoop@hadoop-master:~$ hdfs dfsadmin -report > Configured Capacity: 84482326528 (78.68 GB) > Present Capacity: 75745546240 (70.54 GB) > DFS Remaining: 75744862208 (70.54 GB) > DFS Used: 684032 (668 KB) > DFS Used%: 0.00% > Under replicated blocks: 0 > Blocks with corrupt replicas: 0 > Missing blocks: 0 > Missing blocks (with replication factor 1): 0 > > - > Live datanodes (2): > > Name: 192.168.51.1:50010 (192.168.51.1) > Hostname: hadoop-data1 > Decommission Status : Normal > Configured Capacity: 42241163264 (39.34 GB) > DFS Used: 303104 (296 KB) > Non DFS Used: 4302479360 (4.01 GB) > DFS Remaining: 37938380800 (35.33 GB) > DFS Used%: 0.00% > DFS Remaining%: 89.81% > Configured Cache Capacity: 0 (0 B) > Cache Used: 0 (0 B) > Cache Remaining: 0 (0 B) > Cache Used%: 100.00% > Cache Remaining%: 0.00% > Xceivers: 1 > Last contact: Fri Sep 25 13:25:37 UTC 2015 > > > Name: 192.168.51.4:50010 (hadoop-master) > Hostname: hadoop-master > Decommission Status : Normal > Configured Capacity: 42241163264 (39.34 GB) > DFS Used: 380928 (372 KB) > Non DFS Used: 4434300928 (4.13 GB) > DFS Remaining: 37806481408 (35.21 GB) > DFS Used%: 0.00% > DFS Remaining%: 89.50% > Configured Cache Capacity: 0 (0 B) > Cache Used: 0 (0 B) > Cache Remaining: 0 (0 B) > Cache Used%: 100.00% > Cache Remaining%: 0.00% > Xceivers: 1 > Last contact: Fri Sep 25 13:25:38 UTC 2015 > > > > On Thu, Sep 24, 2015 at 5:05 PM, Daniel Watrous > wrote: > > The IP address is clearly wrong, but I'm not sure how it gets set. Can > someone tell me how to configure it to choose a valid IP address? > &g
Re: Problem running example (wrong IP address)
Brahma, Thanks for the reply. I'll keep this conversation here in the user list. The /etc/hosts file is identical on all three nodes hadoop@hadoop-data1:~$ cat /etc/hosts 127.0.0.1 localhost 192.168.51.4 hadoop-master 192.168.52.4 hadoop-data1 192.168.52.6 hadoop-data2 hadoop@hadoop-data2:~$ cat /etc/hosts 127.0.0.1 localhost 192.168.51.4 hadoop-master 192.168.52.4 hadoop-data1 192.168.52.6 hadoop-data2 hadoop@hadoop-master:~$ cat /etc/hosts 127.0.0.1 localhost 192.168.51.4 hadoop-master 192.168.52.4 hadoop-data1 192.168.52.6 hadoop-data2 Here are the startup logs for all three nodes: https://gist.github.com/dwatrous/7241bb804a9be8f9303f https://gist.github.com/dwatrous/bcd85cda23d6eca3a68b https://gist.github.com/dwatrous/922c4f773aded0137fa3 Thanks for your help. On Fri, Sep 25, 2015 at 10:33 AM, Brahma Reddy Battula < brahmareddy.batt...@huawei.com> wrote: > Seems DN started in three machines and failed in > hadoop-data1(192.168.52.4).. > > > 192.168.51.6 : giving IP as 192.168.51.1 <http://192.168.51.1:50010>...can > you please check /etc/hosts file of 192.168.51.6 (might be 192.168.51.1 > <http://192.168.51.1:50010> is configured in /etc/hosts) > > 192.168.52.4 : datanode startup might be failed ( you can check this node > logs) > > 192.168.51.4 : <http://192.168.51.4:50010> Datanode starup is > success..which is in master node.. > > > > Thanks & Regards > > Brahma Reddy Battula > > > > > -- > *From:* Daniel Watrous [dwmaill...@gmail.com] > *Sent:* Friday, September 25, 2015 8:41 PM > *To:* user@hadoop.apache.org > *Subject:* Re: Problem running example (wrong IP address) > > I'm still stuck on this and posted it to stackoverflow: > > http://stackoverflow.com/questions/32785256/hadoop-datanode-binds-wrong-ip-address > > Thanks, > Daniel > > On Fri, Sep 25, 2015 at 8:28 AM, Daniel Watrous > wrote: > >> I could really use some help here. As you can see from the output below, >> the two attached datanodes are identified with a non-existent IP address. >> Can someone tell me how that gets selected or how to explicitly set it. >> Also, why are both datanodes shown under the same name/IP? >> >> hadoop@hadoop-master:~$ hdfs dfsadmin -report >> Configured Capacity: 84482326528 (78.68 GB) >> Present Capacity: 75745546240 (70.54 GB) >> DFS Remaining: 75744862208 (70.54 GB) >> DFS Used: 684032 (668 KB) >> DFS Used%: 0.00% >> Under replicated blocks: 0 >> Blocks with corrupt replicas: 0 >> Missing blocks: 0 >> Missing blocks (with replication factor 1): 0 >> >> - >> Live datanodes (2): >> >> Name: 192.168.51.1:50010 (192.168.51.1) >> Hostname: hadoop-data1 >> Decommission Status : Normal >> Configured Capacity: 42241163264 (39.34 GB) >> DFS Used: 303104 (296 KB) >> Non DFS Used: 4302479360 (4.01 GB) >> DFS Remaining: 37938380800 (35.33 GB) >> DFS Used%: 0.00% >> DFS Remaining%: 89.81% >> Configured Cache Capacity: 0 (0 B) >> Cache Used: 0 (0 B) >> Cache Remaining: 0 (0 B) >> Cache Used%: 100.00% >> Cache Remaining%: 0.00% >> Xceivers: 1 >> Last contact: Fri Sep 25 13:25:37 UTC 2015 >> >> >> Name: 192.168.51.4:50010 (hadoop-master) >> Hostname: hadoop-master >> Decommission Status : Normal >> Configured Capacity: 42241163264 (39.34 GB) >> DFS Used: 380928 (372 KB) >> Non DFS Used: 4434300928 (4.13 GB) >> DFS Remaining: 37806481408 (35.21 GB) >> DFS Used%: 0.00% >> DFS Remaining%: 89.50% >> Configured Cache Capacity: 0 (0 B) >> Cache Used: 0 (0 B) >> Cache Remaining: 0 (0 B) >> Cache Used%: 100.00% >> Cache Remaining%: 0.00% >> Xceivers: 1 >> Last contact: Fri Sep 25 13:25:38 UTC 2015 >> >> >> >> On Thu, Sep 24, 2015 at 5:05 PM, Daniel Watrous >> wrote: >> >>> The IP address is clearly wrong, but I'm not sure how it gets set. Can >>> someone tell me how to configure it to choose a valid IP address? >>> >>> On Thu, Sep 24, 2015 at 3:26 PM, Daniel Watrous >>> wrote: >>> >>>> I just noticed that both datanodes appear to have chosen that IP >>>> address and bound that port for HDFS communication. >>>> >>>> http://screencast.com/t/OQNbrWFF >>>> >>>> Any idea why this would be? Is there some way to specify which >>>> IP/hostname should be used for that? >>>> >>>> On Thu, Sep 24, 2015 at 3:11 PM, Daniel Watrous >>>> wrote: >>>> >
Re: Problem running example (wrong IP address)
I'm still stuck on this and posted it to stackoverflow: http://stackoverflow.com/questions/32785256/hadoop-datanode-binds-wrong-ip-address Thanks, Daniel On Fri, Sep 25, 2015 at 8:28 AM, Daniel Watrous wrote: > I could really use some help here. As you can see from the output below, > the two attached datanodes are identified with a non-existent IP address. > Can someone tell me how that gets selected or how to explicitly set it. > Also, why are both datanodes shown under the same name/IP? > > hadoop@hadoop-master:~$ hdfs dfsadmin -report > Configured Capacity: 84482326528 (78.68 GB) > Present Capacity: 75745546240 (70.54 GB) > DFS Remaining: 75744862208 (70.54 GB) > DFS Used: 684032 (668 KB) > DFS Used%: 0.00% > Under replicated blocks: 0 > Blocks with corrupt replicas: 0 > Missing blocks: 0 > Missing blocks (with replication factor 1): 0 > > - > Live datanodes (2): > > Name: 192.168.51.1:50010 (192.168.51.1) > Hostname: hadoop-data1 > Decommission Status : Normal > Configured Capacity: 42241163264 (39.34 GB) > DFS Used: 303104 (296 KB) > Non DFS Used: 4302479360 (4.01 GB) > DFS Remaining: 37938380800 (35.33 GB) > DFS Used%: 0.00% > DFS Remaining%: 89.81% > Configured Cache Capacity: 0 (0 B) > Cache Used: 0 (0 B) > Cache Remaining: 0 (0 B) > Cache Used%: 100.00% > Cache Remaining%: 0.00% > Xceivers: 1 > Last contact: Fri Sep 25 13:25:37 UTC 2015 > > > Name: 192.168.51.4:50010 (hadoop-master) > Hostname: hadoop-master > Decommission Status : Normal > Configured Capacity: 42241163264 (39.34 GB) > DFS Used: 380928 (372 KB) > Non DFS Used: 4434300928 (4.13 GB) > DFS Remaining: 37806481408 (35.21 GB) > DFS Used%: 0.00% > DFS Remaining%: 89.50% > Configured Cache Capacity: 0 (0 B) > Cache Used: 0 (0 B) > Cache Remaining: 0 (0 B) > Cache Used%: 100.00% > Cache Remaining%: 0.00% > Xceivers: 1 > Last contact: Fri Sep 25 13:25:38 UTC 2015 > > > > On Thu, Sep 24, 2015 at 5:05 PM, Daniel Watrous > wrote: > >> The IP address is clearly wrong, but I'm not sure how it gets set. Can >> someone tell me how to configure it to choose a valid IP address? >> >> On Thu, Sep 24, 2015 at 3:26 PM, Daniel Watrous >> wrote: >> >>> I just noticed that both datanodes appear to have chosen that IP address >>> and bound that port for HDFS communication. >>> >>> http://screencast.com/t/OQNbrWFF >>> >>> Any idea why this would be? Is there some way to specify which >>> IP/hostname should be used for that? >>> >>> On Thu, Sep 24, 2015 at 3:11 PM, Daniel Watrous >>> wrote: >>> >>>> When I try to run a map reduce example, I get the following error: >>>> >>>> hadoop@hadoop-master:~$ hadoop jar >>>> /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar >>>> pi 10 30 >>>> Number of Maps = 10 >>>> Samples per Map = 30 >>>> 15/09/24 20:04:28 INFO hdfs.DFSClient: Exception in >>>> createBlockOutputStream >>>> java.io.IOException: Got error, status message , ack with firstBadLink >>>> as 192.168.51.1:50010 >>>> at >>>> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:140) >>>> at >>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1334) >>>> at >>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1237) >>>> at >>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449) >>>> 15/09/24 20:04:28 INFO hdfs.DFSClient: Abandoning >>>> BP-852923283-127.0.1.1-1443119668806:blk_1073741825_1001 >>>> 15/09/24 20:04:28 INFO hdfs.DFSClient: Excluding datanode >>>> DatanodeInfoWithStorage[192.168.51.1:50010 >>>> ,DS-45f6e06d-752e-41e8-ac25-ca88bce80d00,DISK] >>>> 15/09/24 20:04:28 WARN hdfs.DFSClient: Slow waitForAckedSeqno took >>>> 65357ms (threshold=3ms) >>>> Wrote input for Map #0 >>>> >>>> I'm not sure why it's trying to access 192.168.51.1:50010, which isn't >>>> even a valid IP address in my setup. >>>> >>>> Daniel >>>> >>> >>> >> >
Re: Problem running example (wrong IP address)
I could really use some help here. As you can see from the output below, the two attached datanodes are identified with a non-existent IP address. Can someone tell me how that gets selected or how to explicitly set it. Also, why are both datanodes shown under the same name/IP? hadoop@hadoop-master:~$ hdfs dfsadmin -report Configured Capacity: 84482326528 (78.68 GB) Present Capacity: 75745546240 (70.54 GB) DFS Remaining: 75744862208 (70.54 GB) DFS Used: 684032 (668 KB) DFS Used%: 0.00% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0 Missing blocks (with replication factor 1): 0 - Live datanodes (2): Name: 192.168.51.1:50010 (192.168.51.1) Hostname: hadoop-data1 Decommission Status : Normal Configured Capacity: 42241163264 (39.34 GB) DFS Used: 303104 (296 KB) Non DFS Used: 4302479360 (4.01 GB) DFS Remaining: 37938380800 (35.33 GB) DFS Used%: 0.00% DFS Remaining%: 89.81% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 1 Last contact: Fri Sep 25 13:25:37 UTC 2015 Name: 192.168.51.4:50010 (hadoop-master) Hostname: hadoop-master Decommission Status : Normal Configured Capacity: 42241163264 (39.34 GB) DFS Used: 380928 (372 KB) Non DFS Used: 4434300928 (4.13 GB) DFS Remaining: 37806481408 (35.21 GB) DFS Used%: 0.00% DFS Remaining%: 89.50% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 1 Last contact: Fri Sep 25 13:25:38 UTC 2015 On Thu, Sep 24, 2015 at 5:05 PM, Daniel Watrous wrote: > The IP address is clearly wrong, but I'm not sure how it gets set. Can > someone tell me how to configure it to choose a valid IP address? > > On Thu, Sep 24, 2015 at 3:26 PM, Daniel Watrous > wrote: > >> I just noticed that both datanodes appear to have chosen that IP address >> and bound that port for HDFS communication. >> >> http://screencast.com/t/OQNbrWFF >> >> Any idea why this would be? Is there some way to specify which >> IP/hostname should be used for that? >> >> On Thu, Sep 24, 2015 at 3:11 PM, Daniel Watrous >> wrote: >> >>> When I try to run a map reduce example, I get the following error: >>> >>> hadoop@hadoop-master:~$ hadoop jar >>> /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar >>> pi 10 30 >>> Number of Maps = 10 >>> Samples per Map = 30 >>> 15/09/24 20:04:28 INFO hdfs.DFSClient: Exception in >>> createBlockOutputStream >>> java.io.IOException: Got error, status message , ack with firstBadLink >>> as 192.168.51.1:50010 >>> at >>> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:140) >>> at >>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1334) >>> at >>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1237) >>> at >>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449) >>> 15/09/24 20:04:28 INFO hdfs.DFSClient: Abandoning >>> BP-852923283-127.0.1.1-1443119668806:blk_1073741825_1001 >>> 15/09/24 20:04:28 INFO hdfs.DFSClient: Excluding datanode >>> DatanodeInfoWithStorage[192.168.51.1:50010 >>> ,DS-45f6e06d-752e-41e8-ac25-ca88bce80d00,DISK] >>> 15/09/24 20:04:28 WARN hdfs.DFSClient: Slow waitForAckedSeqno took >>> 65357ms (threshold=3ms) >>> Wrote input for Map #0 >>> >>> I'm not sure why it's trying to access 192.168.51.1:50010, which isn't >>> even a valid IP address in my setup. >>> >>> Daniel >>> >> >> >
Re: Problem running example (wrong IP address)
The IP address is clearly wrong, but I'm not sure how it gets set. Can someone tell me how to configure it to choose a valid IP address? On Thu, Sep 24, 2015 at 3:26 PM, Daniel Watrous wrote: > I just noticed that both datanodes appear to have chosen that IP address > and bound that port for HDFS communication. > > http://screencast.com/t/OQNbrWFF > > Any idea why this would be? Is there some way to specify which IP/hostname > should be used for that? > > On Thu, Sep 24, 2015 at 3:11 PM, Daniel Watrous > wrote: > >> When I try to run a map reduce example, I get the following error: >> >> hadoop@hadoop-master:~$ hadoop jar >> /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar >> pi 10 30 >> Number of Maps = 10 >> Samples per Map = 30 >> 15/09/24 20:04:28 INFO hdfs.DFSClient: Exception in >> createBlockOutputStream >> java.io.IOException: Got error, status message , ack with firstBadLink as >> 192.168.51.1:50010 >> at >> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:140) >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1334) >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1237) >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449) >> 15/09/24 20:04:28 INFO hdfs.DFSClient: Abandoning >> BP-852923283-127.0.1.1-1443119668806:blk_1073741825_1001 >> 15/09/24 20:04:28 INFO hdfs.DFSClient: Excluding datanode >> DatanodeInfoWithStorage[192.168.51.1:50010 >> ,DS-45f6e06d-752e-41e8-ac25-ca88bce80d00,DISK] >> 15/09/24 20:04:28 WARN hdfs.DFSClient: Slow waitForAckedSeqno took >> 65357ms (threshold=3ms) >> Wrote input for Map #0 >> >> I'm not sure why it's trying to access 192.168.51.1:50010, which isn't >> even a valid IP address in my setup. >> >> Daniel >> > >
Re: Problem running example (wrong IP address)
I just noticed that both datanodes appear to have chosen that IP address and bound that port for HDFS communication. http://screencast.com/t/OQNbrWFF Any idea why this would be? Is there some way to specify which IP/hostname should be used for that? On Thu, Sep 24, 2015 at 3:11 PM, Daniel Watrous wrote: > When I try to run a map reduce example, I get the following error: > > hadoop@hadoop-master:~$ hadoop jar > /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar > pi 10 30 > Number of Maps = 10 > Samples per Map = 30 > 15/09/24 20:04:28 INFO hdfs.DFSClient: Exception in createBlockOutputStream > java.io.IOException: Got error, status message , ack with firstBadLink as > 192.168.51.1:50010 > at > org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:140) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1334) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1237) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449) > 15/09/24 20:04:28 INFO hdfs.DFSClient: Abandoning > BP-852923283-127.0.1.1-1443119668806:blk_1073741825_1001 > 15/09/24 20:04:28 INFO hdfs.DFSClient: Excluding datanode > DatanodeInfoWithStorage[192.168.51.1:50010 > ,DS-45f6e06d-752e-41e8-ac25-ca88bce80d00,DISK] > 15/09/24 20:04:28 WARN hdfs.DFSClient: Slow waitForAckedSeqno took 65357ms > (threshold=3ms) > Wrote input for Map #0 > > I'm not sure why it's trying to access 192.168.51.1:50010, which isn't > even a valid IP address in my setup. > > Daniel >
Problem running example (wrong IP address)
When I try to run a map reduce example, I get the following error: hadoop@hadoop-master:~$ hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar pi 10 30 Number of Maps = 10 Samples per Map = 30 15/09/24 20:04:28 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Got error, status message , ack with firstBadLink as 192.168.51.1:50010 at org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:140) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1334) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1237) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449) 15/09/24 20:04:28 INFO hdfs.DFSClient: Abandoning BP-852923283-127.0.1.1-1443119668806:blk_1073741825_1001 15/09/24 20:04:28 INFO hdfs.DFSClient: Excluding datanode DatanodeInfoWithStorage[192.168.51.1:50010 ,DS-45f6e06d-752e-41e8-ac25-ca88bce80d00,DISK] 15/09/24 20:04:28 WARN hdfs.DFSClient: Slow waitForAckedSeqno took 65357ms (threshold=3ms) Wrote input for Map #0 I'm not sure why it's trying to access 192.168.51.1:50010, which isn't even a valid IP address in my setup. Daniel
Re: Datanodes not connecting to the cluster
phew, I finally added the property below to yarn-site.xml yarn.resourcemanager.bind-host 0.0.0.0 I now see the datanodes, but not at the same time. Under datanodes in operation I see the master and either one of the other datanodes. Is that typical behavior? Perhaps it's switching between them for redundancy? Daniel On Thu, Sep 24, 2015 at 2:49 PM, Daniel Watrous wrote: > I'm making a little progress here. > > I added the following properties to hdfs-site.xml > > > dfs.namenode.rpc-bind-host > 0.0.0.0 > > > dfs.namenode.servicerpc-bind-host > 0.0.0.0 > > > I can now connect to hadoop-master: > > hadoop@hadoop-data1:~$ telnet hadoop-master 54310 > Trying 192.168.51.4... > Connected to hadoop-master. > Escape character is '^]'. > > BUT I'm now getting the error below. I'm confused that it's trying to > connect to 192.168.51.1 because that's not even a valid IP in my > installation. > > 2015-09-24 19:40:08,821 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for > Block pool BP-852923283-127.0.1.1-1443119668806 (Datanode Uuid null) > service to hadoop-master/192.168.51.4:54310 Datanode denied communication > with namenode because hostname cannot be resolved (ip=192.168.51.1, > hostname=192.168.51.1): DatanodeRegistration(0.0.0.0:50010, > datanodeUuid=cd8107ac-f1dd-4c33-a054-2dee95cc4a16, infoPort=50075, > infoSecurePort=0, ipcPort=50020, > storageInfo=lv=-56;cid=CID-8e9d0e10-5744-448b-a45a-87644364e714;nsid=1441606918;c=0) > > Any idea what's happening here? > > > On Thu, Sep 24, 2015 at 2:26 PM, Daniel Watrous > wrote: > >> In a further test, I tried connecting to the NameNode from hadoop-master >> (where it's running) using both the hostname and the IP address. >> >> vagrant@hadoop-master:~/src$ telnet 192.168.51.4 54310 >> Trying 192.168.51.4... >> telnet: Unable to connect to remote host: Connection refused >> vagrant@hadoop-master:~/src$ telnet localhost 54310 >> Trying 127.0.0.1... >> telnet: Unable to connect to remote host: Connection refused >> vagrant@hadoop-master:~/src$ telnet hadoop-master 54310 >> Trying 127.0.1.1... >> Connected to hadoop-master. >> Escape character is '^]'. >> >> >> As you can see the IP address or localhost connection is refused, but the >> hostname connection succeeds. Is there some way to configure the namenode >> to accept connections from all hosts? >> >> On Thu, Sep 24, 2015 at 2:14 PM, Daniel Watrous >> wrote: >> >>> On one of the namenodes I have found the following warning: >>> >>> 2015-09-24 18:40:17,639 WARN >>> org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to >>> server: hadoop-master/192.168.51.4:54310 >>> >>> On my master node I see that the process is running and has bound that >>> port >>> >>> vagrant@hadoop-master:~/src$ sudo lsof -i :54310 >>> COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME >>> java7480 hadoop 202u IPv4 26931 0t0 TCP hadoop-master:54310 >>> (LISTEN) >>> java7480 hadoop 212u IPv4 28758 0t0 TCP >>> hadoop-master:54310->localhost:47226 (ESTABLISHED) >>> java7651 hadoop 238u IPv4 28247 0t0 TCP >>> localhost:47226->hadoop-master:54310 (ESTABLISHED) >>> hadoop@hadoop-master:~$ jps >>> 7856 SecondaryNameNode >>> 7651 DataNode >>> 7480 NameNode >>> 8106 Jps >>> >>> I don't appear to have any firewall rules interfering with traffic >>> vagrant@hadoop-master:~/src$ sudo iptables --list >>> Chain INPUT (policy ACCEPT) >>> target prot opt source destination >>> >>> Chain FORWARD (policy ACCEPT) >>> target prot opt source destination >>> >>> Chain OUTPUT (policy ACCEPT) >>> target prot opt source destination >>> >>> The iptables --list output is identical on hadoop-data1. I also show a >>> process attempting to connect to hadoop-master >>> vagrant@hadoop-data1:~$ sudo lsof -i :54310 >>> COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME >>> java3823 hadoop 238u IPv4 19304 0t0 TCP >>> hadoop-data1:45600->hadoop-master:54310 (SYN_SENT) >>> >>> I am confused by the notation of hostname/IP:port. >>> All help appreciated. >>> >>> Daniel >>> >>> >>> >>> On Thu, Sep 24, 2015 at 12:51 PM, Daniel Watrous >>> wrote: >>> >>>> I have a multi-node cluster with two datanodes. After running >>>> start-dfs.sh, I show the following processes running >>>> >>>> hadoop@hadoop-master:~$ $HADOOP_HOME/sbin/slaves.sh jps >>>> hadoop-master: 10933 DataNode >>>> hadoop-master: 10759 NameNode >>>> hadoop-master: 11145 SecondaryNameNode >>>> hadoop-master: 11567 Jps >>>> hadoop-data1: 5186 Jps >>>> hadoop-data1: 5059 DataNode >>>> hadoop-data2: 5180 Jps >>>> hadoop-data2: 5053 DataNode >>>> >>>> >>>> However, the other two DataNodes aren't visible. >>>> http://screencast.com/t/icsLnXXDk >>>> >>>> Where can I look for clues? >>>> >>> >>> >> >
Re: Datanodes not connecting to the cluster
I'm making a little progress here. I added the following properties to hdfs-site.xml dfs.namenode.rpc-bind-host 0.0.0.0 dfs.namenode.servicerpc-bind-host 0.0.0.0 I can now connect to hadoop-master: hadoop@hadoop-data1:~$ telnet hadoop-master 54310 Trying 192.168.51.4... Connected to hadoop-master. Escape character is '^]'. BUT I'm now getting the error below. I'm confused that it's trying to connect to 192.168.51.1 because that's not even a valid IP in my installation. 2015-09-24 19:40:08,821 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool BP-852923283-127.0.1.1-1443119668806 (Datanode Uuid null) service to hadoop-master/192.168.51.4:54310 Datanode denied communication with namenode because hostname cannot be resolved (ip=192.168.51.1, hostname=192.168.51.1): DatanodeRegistration(0.0.0.0:50010, datanodeUuid=cd8107ac-f1dd-4c33-a054-2dee95cc4a16, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-8e9d0e10-5744-448b-a45a-87644364e714;nsid=1441606918;c=0) Any idea what's happening here? On Thu, Sep 24, 2015 at 2:26 PM, Daniel Watrous wrote: > In a further test, I tried connecting to the NameNode from hadoop-master > (where it's running) using both the hostname and the IP address. > > vagrant@hadoop-master:~/src$ telnet 192.168.51.4 54310 > Trying 192.168.51.4... > telnet: Unable to connect to remote host: Connection refused > vagrant@hadoop-master:~/src$ telnet localhost 54310 > Trying 127.0.0.1... > telnet: Unable to connect to remote host: Connection refused > vagrant@hadoop-master:~/src$ telnet hadoop-master 54310 > Trying 127.0.1.1... > Connected to hadoop-master. > Escape character is '^]'. > > > As you can see the IP address or localhost connection is refused, but the > hostname connection succeeds. Is there some way to configure the namenode > to accept connections from all hosts? > > On Thu, Sep 24, 2015 at 2:14 PM, Daniel Watrous > wrote: > >> On one of the namenodes I have found the following warning: >> >> 2015-09-24 18:40:17,639 WARN >> org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to >> server: hadoop-master/192.168.51.4:54310 >> >> On my master node I see that the process is running and has bound that >> port >> >> vagrant@hadoop-master:~/src$ sudo lsof -i :54310 >> COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME >> java7480 hadoop 202u IPv4 26931 0t0 TCP hadoop-master:54310 >> (LISTEN) >> java7480 hadoop 212u IPv4 28758 0t0 TCP >> hadoop-master:54310->localhost:47226 (ESTABLISHED) >> java7651 hadoop 238u IPv4 28247 0t0 TCP >> localhost:47226->hadoop-master:54310 (ESTABLISHED) >> hadoop@hadoop-master:~$ jps >> 7856 SecondaryNameNode >> 7651 DataNode >> 7480 NameNode >> 8106 Jps >> >> I don't appear to have any firewall rules interfering with traffic >> vagrant@hadoop-master:~/src$ sudo iptables --list >> Chain INPUT (policy ACCEPT) >> target prot opt source destination >> >> Chain FORWARD (policy ACCEPT) >> target prot opt source destination >> >> Chain OUTPUT (policy ACCEPT) >> target prot opt source destination >> >> The iptables --list output is identical on hadoop-data1. I also show a >> process attempting to connect to hadoop-master >> vagrant@hadoop-data1:~$ sudo lsof -i :54310 >> COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME >> java3823 hadoop 238u IPv4 19304 0t0 TCP >> hadoop-data1:45600->hadoop-master:54310 (SYN_SENT) >> >> I am confused by the notation of hostname/IP:port. >> All help appreciated. >> >> Daniel >> >> >> >> On Thu, Sep 24, 2015 at 12:51 PM, Daniel Watrous >> wrote: >> >>> I have a multi-node cluster with two datanodes. After running >>> start-dfs.sh, I show the following processes running >>> >>> hadoop@hadoop-master:~$ $HADOOP_HOME/sbin/slaves.sh jps >>> hadoop-master: 10933 DataNode >>> hadoop-master: 10759 NameNode >>> hadoop-master: 11145 SecondaryNameNode >>> hadoop-master: 11567 Jps >>> hadoop-data1: 5186 Jps >>> hadoop-data1: 5059 DataNode >>> hadoop-data2: 5180 Jps >>> hadoop-data2: 5053 DataNode >>> >>> >>> However, the other two DataNodes aren't visible. >>> http://screencast.com/t/icsLnXXDk >>> >>> Where can I look for clues? >>> >> >> >
Re: Datanodes not connecting to the cluster
In a further test, I tried connecting to the NameNode from hadoop-master (where it's running) using both the hostname and the IP address. vagrant@hadoop-master:~/src$ telnet 192.168.51.4 54310 Trying 192.168.51.4... telnet: Unable to connect to remote host: Connection refused vagrant@hadoop-master:~/src$ telnet localhost 54310 Trying 127.0.0.1... telnet: Unable to connect to remote host: Connection refused vagrant@hadoop-master:~/src$ telnet hadoop-master 54310 Trying 127.0.1.1... Connected to hadoop-master. Escape character is '^]'. As you can see the IP address or localhost connection is refused, but the hostname connection succeeds. Is there some way to configure the namenode to accept connections from all hosts? On Thu, Sep 24, 2015 at 2:14 PM, Daniel Watrous wrote: > On one of the namenodes I have found the following warning: > > 2015-09-24 18:40:17,639 WARN > org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to > server: hadoop-master/192.168.51.4:54310 > > On my master node I see that the process is running and has bound that port > > vagrant@hadoop-master:~/src$ sudo lsof -i :54310 > COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME > java7480 hadoop 202u IPv4 26931 0t0 TCP hadoop-master:54310 > (LISTEN) > java7480 hadoop 212u IPv4 28758 0t0 TCP > hadoop-master:54310->localhost:47226 (ESTABLISHED) > java7651 hadoop 238u IPv4 28247 0t0 TCP > localhost:47226->hadoop-master:54310 (ESTABLISHED) > hadoop@hadoop-master:~$ jps > 7856 SecondaryNameNode > 7651 DataNode > 7480 NameNode > 8106 Jps > > I don't appear to have any firewall rules interfering with traffic > vagrant@hadoop-master:~/src$ sudo iptables --list > Chain INPUT (policy ACCEPT) > target prot opt source destination > > Chain FORWARD (policy ACCEPT) > target prot opt source destination > > Chain OUTPUT (policy ACCEPT) > target prot opt source destination > > The iptables --list output is identical on hadoop-data1. I also show a > process attempting to connect to hadoop-master > vagrant@hadoop-data1:~$ sudo lsof -i :54310 > COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME > java3823 hadoop 238u IPv4 19304 0t0 TCP > hadoop-data1:45600->hadoop-master:54310 (SYN_SENT) > > I am confused by the notation of hostname/IP:port. > All help appreciated. > > Daniel > > > > On Thu, Sep 24, 2015 at 12:51 PM, Daniel Watrous > wrote: > >> I have a multi-node cluster with two datanodes. After running >> start-dfs.sh, I show the following processes running >> >> hadoop@hadoop-master:~$ $HADOOP_HOME/sbin/slaves.sh jps >> hadoop-master: 10933 DataNode >> hadoop-master: 10759 NameNode >> hadoop-master: 11145 SecondaryNameNode >> hadoop-master: 11567 Jps >> hadoop-data1: 5186 Jps >> hadoop-data1: 5059 DataNode >> hadoop-data2: 5180 Jps >> hadoop-data2: 5053 DataNode >> >> >> However, the other two DataNodes aren't visible. >> http://screencast.com/t/icsLnXXDk >> >> Where can I look for clues? >> > >
Re: Datanodes not connecting to the cluster
On one of the namenodes I have found the following warning: 2015-09-24 18:40:17,639 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: hadoop-master/192.168.51.4:54310 On my master node I see that the process is running and has bound that port vagrant@hadoop-master:~/src$ sudo lsof -i :54310 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME java7480 hadoop 202u IPv4 26931 0t0 TCP hadoop-master:54310 (LISTEN) java7480 hadoop 212u IPv4 28758 0t0 TCP hadoop-master:54310->localhost:47226 (ESTABLISHED) java7651 hadoop 238u IPv4 28247 0t0 TCP localhost:47226->hadoop-master:54310 (ESTABLISHED) hadoop@hadoop-master:~$ jps 7856 SecondaryNameNode 7651 DataNode 7480 NameNode 8106 Jps I don't appear to have any firewall rules interfering with traffic vagrant@hadoop-master:~/src$ sudo iptables --list Chain INPUT (policy ACCEPT) target prot opt source destination Chain FORWARD (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination The iptables --list output is identical on hadoop-data1. I also show a process attempting to connect to hadoop-master vagrant@hadoop-data1:~$ sudo lsof -i :54310 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME java3823 hadoop 238u IPv4 19304 0t0 TCP hadoop-data1:45600->hadoop-master:54310 (SYN_SENT) I am confused by the notation of hostname/IP:port. All help appreciated. Daniel On Thu, Sep 24, 2015 at 12:51 PM, Daniel Watrous wrote: > I have a multi-node cluster with two datanodes. After running > start-dfs.sh, I show the following processes running > > hadoop@hadoop-master:~$ $HADOOP_HOME/sbin/slaves.sh jps > hadoop-master: 10933 DataNode > hadoop-master: 10759 NameNode > hadoop-master: 11145 SecondaryNameNode > hadoop-master: 11567 Jps > hadoop-data1: 5186 Jps > hadoop-data1: 5059 DataNode > hadoop-data2: 5180 Jps > hadoop-data2: 5053 DataNode > > > However, the other two DataNodes aren't visible. > http://screencast.com/t/icsLnXXDk > > Where can I look for clues? >
Datanodes not connecting to the cluster
I have a multi-node cluster with two datanodes. After running start-dfs.sh, I show the following processes running hadoop@hadoop-master:~$ $HADOOP_HOME/sbin/slaves.sh jps hadoop-master: 10933 DataNode hadoop-master: 10759 NameNode hadoop-master: 11145 SecondaryNameNode hadoop-master: 11567 Jps hadoop-data1: 5186 Jps hadoop-data1: 5059 DataNode hadoop-data2: 5180 Jps hadoop-data2: 5053 DataNode However, the other two DataNodes aren't visible. http://screencast.com/t/icsLnXXDk Where can I look for clues?
Re: Help troubleshooting multi-cluster setup
I was able to get the jobs submitting to the cluster by adding the following property to mapred-site.xml mapreduce.framework.name yarn I also had to add the following properties to yarn-site.xml yarn.nodemanager.aux-services mapreduce_shuffle yarn.nodemanager.aux-services.mapreduce_shuffle.class org.apache.hadoop.mapred.ShuffleHandler I'm still not sure why the datanodes don't show up in the nodes view. Is the idea that a data node is only used for HDFS and yarn doesn't schedule jobs there? If so, how can I add additional compute hosts? What are those called? On Wed, Sep 23, 2015 at 3:08 PM, Daniel Watrous wrote: > I'm not sure if this is related, but I'm seeing some errors > in hadoop-hadoop-namenode-hadoop-master.log > > 2015-09-23 19:56:27,798 WARN > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Unresolved > datanode registration: hostname cannot be resolved (ip=192.168.51.1, > hostname=192.168.51.1) > 2015-09-23 19:56:27,800 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 6 on 54310, call > org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.registerDatanode from > 192.168.51.1:54554 Call#373 Retry#0 > org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException: Datanode > denied communication with namenode because hostname cannot be resolved > (ip=192.168.51.1, hostname=192.168.51.1): DatanodeRegistration(0.0.0.0:50010, > datanodeUuid=8a5d90c8-b909-46d3-80ec-2a3a8f1fe904, infoPort=50075, > infoSecurePort=0, ipcPort=50020, > storageInfo=lv=-56;cid=CID-bc60d031-11b0-4eb5-8f9b-da0f8a069ea6;nsid=1223814533;c=0) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:863) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4529) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1279) > at > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:95) > at > org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28539) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043) > > > I don't have a server with the IP 192.168.51.1 and I don't think I'm > referencing that anywhere. Is there some reason that it's trying to add > that host as a namenode? > > On Wed, Sep 23, 2015 at 1:58 PM, Daniel Watrous > wrote: > >> Hi, >> >> I have deployed a multi-node cluster with one master and two data nodes. >> Here's what jps shows: >> >> hadoop@hadoop-master:~$ jps >> 24641 SecondaryNameNode >> 24435 DataNode >> 24261 NameNode >> 24791 ResourceManager >> 25483 Jps >> 24940 NodeManager >> >> hadoop@hadoop-data1:~$ jps >> 15556 DataNode >> 16198 NodeManager >> 16399 Jps >> >> hadoop@hadoop-data2:~$ jps >> 16418 Jps >> 15575 DataNode >> 16216 NodeManager >> >> When I open the web console, I only see one node running: >> http://screencast.com/t/E6yehRvUbt >> >> Where are the other two nodes? Why don't they show up? >> >> Next I run one of the example scripts >> >> hadoop@hadoop-master:~$ hadoop jar >> /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar >> pi 10 30 >> Number of Maps = 10 >> Samples per Map = 30 >> Wrote input for Map #0 >> Wrote input for Map #1 >> ... >> Job Finished in 2.956 seconds >> Estimated value of Pi is 3.14146667 >> >> I can't see this anywhere in the web interface. I thought it might show >> in the Applications sub-menu. Should I be able to see this? It appears to >> run successfully. >> >> Daniel >> > >
Re: Help troubleshooting multi-cluster setup
I'm not sure if this is related, but I'm seeing some errors in hadoop-hadoop-namenode-hadoop-master.log 2015-09-23 19:56:27,798 WARN org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Unresolved datanode registration: hostname cannot be resolved (ip=192.168.51.1, hostname=192.168.51.1) 2015-09-23 19:56:27,800 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 on 54310, call org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.registerDatanode from 192.168.51.1:54554 Call#373 Retry#0 org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException: Datanode denied communication with namenode because hostname cannot be resolved (ip=192.168.51.1, hostname=192.168.51.1): DatanodeRegistration(0.0.0.0:50010, datanodeUuid=8a5d90c8-b909-46d3-80ec-2a3a8f1fe904, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-bc60d031-11b0-4eb5-8f9b-da0f8a069ea6;nsid=1223814533;c=0) at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:863) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4529) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1279) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:95) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28539) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043) I don't have a server with the IP 192.168.51.1 and I don't think I'm referencing that anywhere. Is there some reason that it's trying to add that host as a namenode? On Wed, Sep 23, 2015 at 1:58 PM, Daniel Watrous wrote: > Hi, > > I have deployed a multi-node cluster with one master and two data nodes. > Here's what jps shows: > > hadoop@hadoop-master:~$ jps > 24641 SecondaryNameNode > 24435 DataNode > 24261 NameNode > 24791 ResourceManager > 25483 Jps > 24940 NodeManager > > hadoop@hadoop-data1:~$ jps > 15556 DataNode > 16198 NodeManager > 16399 Jps > > hadoop@hadoop-data2:~$ jps > 16418 Jps > 15575 DataNode > 16216 NodeManager > > When I open the web console, I only see one node running: > http://screencast.com/t/E6yehRvUbt > > Where are the other two nodes? Why don't they show up? > > Next I run one of the example scripts > > hadoop@hadoop-master:~$ hadoop jar > /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar > pi 10 30 > Number of Maps = 10 > Samples per Map = 30 > Wrote input for Map #0 > Wrote input for Map #1 > ... > Job Finished in 2.956 seconds > Estimated value of Pi is 3.14146667 > > I can't see this anywhere in the web interface. I thought it might show in > the Applications sub-menu. Should I be able to see this? It appears to run > successfully. > > Daniel >
Re: Hadoop alternate SSH key
That's exactly what I needed. I've posted the answer to my stackoverflow question. On Wed, Sep 23, 2015 at 1:48 PM, Varun Vasudev wrote: > Hi Daniel, > > Have you tried setting the HADOOP_SSH_OPTS environment variable? Take a > look at sbin/slaves.sh in your hadoop installation. > > -Varun > > From: Daniel Watrous > Reply-To: > Date: Thursday, September 24, 2015 at 12:04 AM > To: > Subject: Hadoop alternate SSH key > > Hi, > > I'm interested in specifying the key which hadoop should use when > communicating with other nodes in the cluster. I posted the question on > stackoverflow: > http://stackoverflow.com/questions/32527474/hadoop-alternate-ssh-key > > Is there someone here that can answer this? > > Daniel >
Help troubleshooting multi-cluster setup
Hi, I have deployed a multi-node cluster with one master and two data nodes. Here's what jps shows: hadoop@hadoop-master:~$ jps 24641 SecondaryNameNode 24435 DataNode 24261 NameNode 24791 ResourceManager 25483 Jps 24940 NodeManager hadoop@hadoop-data1:~$ jps 15556 DataNode 16198 NodeManager 16399 Jps hadoop@hadoop-data2:~$ jps 16418 Jps 15575 DataNode 16216 NodeManager When I open the web console, I only see one node running: http://screencast.com/t/E6yehRvUbt Where are the other two nodes? Why don't they show up? Next I run one of the example scripts hadoop@hadoop-master:~$ hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar pi 10 30 Number of Maps = 10 Samples per Map = 30 Wrote input for Map #0 Wrote input for Map #1 ... Job Finished in 2.956 seconds Estimated value of Pi is 3.14146667 I can't see this anywhere in the web interface. I thought it might show in the Applications sub-menu. Should I be able to see this? It appears to run successfully. Daniel
Hadoop alternate SSH key
Hi, I'm interested in specifying the key which hadoop should use when communicating with other nodes in the cluster. I posted the question on stackoverflow: http://stackoverflow.com/questions/32527474/hadoop-alternate-ssh-key Is there someone here that can answer this? Daniel