Re: Re: Re: Re: Regarding "Hadoop multi cluster" set-up

Amandeep Khurana Sat, 07 Feb 2009 15:57:54 -0800

I ran into this trouble again. This time, formatting the namenode didnt
help. So, I changed the directories where the metadata and the data was
being stored. That made it work.


You might want to check this up at your end too.

Amandeep

PS: I dont have an explanation for how and why this made it work.


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz


On Sat, Feb 7, 2009 at 9:06 AM, jason hadoop <jason.had...@gmail.com> wrote:

> On your master machine, use the netstat command to determine what ports and
> addresses the namenode process is listening on.
>
> On the datanode machines, examine the log files,, to verify that the
> datanode has attempted to connect to the namenode ip address on one of
> those
> ports, and was successfull.
>
> The common ports used for datanode -> namenode rondevu are 50010, 54320 and
> 8020, depending on your hadoop version
>
> If the datanodes have been started, and the connection to the namenode
> failed, there will be a log message with a socket error, indicating what
> host and port the datanode used to attempt to communicate with the
> namenode.
> Verify that that ip address is correct for your namenode, and reachable
> from
> the datanode host (for multi homed machines this can be an issue), and that
> the port listed is one of the tcp ports that the namenode process is
> listing
> on.
>
> For linux, you can use command
> *netstat -a -t -n -p | grep java | grep LISTEN*
> to determine the ip addresses and ports and pids of the java processes that
> are listening for tcp socket connections
>
> and the jps command from the bin directory of your java installation to
> determine the pid of the namenode.
>
> On Sat, Feb 7, 2009 at 6:27 AM, shefali pawar <shefal...@rediffmail.com
> >wrote:
>
> > Hi,
> >
> > No, not yet. We are still struggling! If you find the solution please let
> > me know.
> >
> > Shefali
> >
> > On Sat, 07 Feb 2009 02:56:15 +0530  wrote
> > >I had to change the master on my running cluster and ended up with the
> > same
> > >problem. Were you able to fix it at your end?
> > >
> > >Amandeep
> > >
> > >
> > >Amandeep Khurana
> > >Computer Science Graduate Student
> > >University of California, Santa Cruz
> > >
> > >
> > >On Thu, Feb 5, 2009 at 8:46 AM, shefali pawar wrote:
> > >
> > >> Hi,
> > >>
> > >> I do not think that the firewall is blocking the port because it has
> > been
> > >> turned off on both the computers! And also since it is a random port
> > number
> > >> I do not think it should create a problem.
> > >>
> > >> I do not understand what is going wrong!
> > >>
> > >> Shefali
> > >>
> > >> On Wed, 04 Feb 2009 23:23:04 +0530  wrote
> > >> >I'm not certain that the firewall is your problem but if that port is
> > >> >blocked on your master you should open it to let communication
> through.
> > >> Here
> > >> >is one website that might be relevant:
> > >> >
> > >> >
> > >>
> >
> http://stackoverflow.com/questions/255077/open-ports-under-fedora-core-8-for-vmware-server
> > >> >
> > >> >but again, this may not be your problem.
> > >> >
> > >> >John
> > >> >
> > >> >On Wed, Feb 4, 2009 at 12:46 PM, shefali pawar wrote:
> > >> >
> > >> >> Hi,
> > >> >>
> > >> >> I will have to check. I can do that tomorrow in college. But if
> that
> > is
> > >> the
> > >> >> case what should i do?
> > >> >>
> > >> >> Should i change the port number and try again?
> > >> >>
> > >> >> Shefali
> > >> >>
> > >> >> On Wed, 04 Feb 2009 S D wrote :
> > >> >>
> > >> >> >Shefali,
> > >> >> >
> > >> >> >Is your firewall blocking port 54310 on the master?
> > >> >> >
> > >> >> >John
> > >> >> >
> > >> >> >On Wed, Feb 4, 2009 at 12:34 PM, shefali pawar > >wrote:
> > >> >> >
> > >> >> > > Hi,
> > >> >> > >
> > >> >> > > I am trying to set-up a two node cluster using Hadoop0.19.0,
> with
> > 1
> > >> >> > > master(which should also work as a slave) and 1 slave node.
> > >> >> > >
> > >> >> > > But while running bin/start-dfs.sh the datanode is not starting
> > on
> > >> the
> > >> >> > > slave. I had read the previous mails on the list, but nothing
> > seems
> > >> to
> > >> >> be
> > >> >> > > working in this case. I am getting the following error in the
> > >> >> > > hadoop-root-datanode-slave log file while running the command
> > >> >> > > bin/start-dfs.sh =>
> > >> >> > >
> > >> >> > > 2009-02-03 13:00:27,516 INFO
> > >> >> > > org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
> > >> >> > > /************************************************************
> > >> >> > > STARTUP_MSG: Starting DataNode
> > >> >> > > STARTUP_MSG:  host = slave/172.16.0.32
> > >> >> > > STARTUP_MSG:  args = []
> > >> >> > > STARTUP_MSG:  version = 0.19.0
> > >> >> > > STARTUP_MSG:  build =
> > >> >> > >
> > https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.19-r
> > >> >> > > 713890; compiled by 'ndaley' on Fri Nov 14 03:12:29 UTC 2008
> > >> >> > > ************************************************************/
> > >> >> > > 2009-02-03 13:00:28,725 INFO org.apache.hadoop.ipc.Client:
> > Retrying
> > >> >> connect
> > >> >> > > to server: master/172.16.0.46:54310. Already tried 0 time(s).
> > >> >> > > 2009-02-03 13:00:29,726 INFO org.apache.hadoop.ipc.Client:
> > Retrying
> > >> >> connect
> > >> >> > > to server: master/172.16.0.46:54310. Already tried 1 time(s).
> > >> >> > > 2009-02-03 13:00:30,727 INFO org.apache.hadoop.ipc.Client:
> > Retrying
> > >> >> connect
> > >> >> > > to server: master/172.16.0.46:54310. Already tried 2 time(s).
> > >> >> > > 2009-02-03 13:00:31,728 INFO org.apache.hadoop.ipc.Client:
> > Retrying
> > >> >> connect
> > >> >> > > to server: master/172.16.0.46:54310. Already tried 3 time(s).
> > >> >> > > 2009-02-03 13:00:32,729 INFO org.apache.hadoop.ipc.Client:
> > Retrying
> > >> >> connect
> > >> >> > > to server: master/172.16.0.46:54310. Already tried 4 time(s).
> > >> >> > > 2009-02-03 13:00:33,730 INFO org.apache.hadoop.ipc.Client:
> > Retrying
> > >> >> connect
> > >> >> > > to server: master/172.16.0.46:54310. Already tried 5 time(s).
> > >> >> > > 2009-02-03 13:00:34,731 INFO org.apache.hadoop.ipc.Client:
> > Retrying
> > >> >> connect
> > >> >> > > to server: master/172.16.0.46:54310. Already tried 6 time(s).
> > >> >> > > 2009-02-03 13:00:35,732 INFO org.apache.hadoop.ipc.Client:
> > Retrying
> > >> >> connect
> > >> >> > > to server: master/172.16.0.46:54310. Already tried 7 time(s).
> > >> >> > > 2009-02-03 13:00:36,733 INFO org.apache.hadoop.ipc.Client:
> > Retrying
> > >> >> connect
> > >> >> > > to server: master/172.16.0.46:54310. Already tried 8 time(s).
> > >> >> > > 2009-02-03 13:00:37,734 INFO org.apache.hadoop.ipc.Client:
> > Retrying
> > >> >> connect
> > >> >> > > to server: master/172.16.0.46:54310. Already tried 9 time(s).
> > >> >> > > 2009-02-03 13:00:37,738 ERROR
> > >> >> > > org.apache.hadoop.hdfs.server.datanode.DataNode:
> > >> java.io.IOException:
> > >> >> Call
> > >> >> > > to master/172.16.0.46:54310 failed on local exception: No
> route
> > to
> > >> >> host
> > >> >> > >        at org.apache.hadoop.ipc.Client.call(Client.java:699)
> > >> >> > >        at
> org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
> > >> >> > >        at $Proxy4.getProtocolVersion(Unknown Source)
> > >> >> > >        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:319)
> > >> >> > >        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:306)
> > >> >> > >        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:343)
> > >> >> > >        at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:288)
> > >> >> > >        at
> > >> >> > >
> > >> >>
> > >>
> >
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:258)
> > >> >> > >        at
> > >> >> > >
> > >> >> org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:205)
> > >> >> > >        at
> > >> >> > >
> > >> >>
> > >>
> >
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1199)
> > >> >> > >        at
> > >> >> > >
> > >> >>
> > >>
> >
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1154)
> > >> >> > >        at
> > >> >> > >
> > >> >>
> > >>
> >
> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1162)
> > >> >> > >        at
> > >> >> > >
> > >> >>
> > org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1284)
> > >> >> > > Caused by: java.net.NoRouteToHostException: No route to host
> > >> >> > >        at sun.nio.ch.SocketChannelImpl.checkConnect(Native
> > Method)
> > >> >> > >        at
> > >> >> > >
> > >> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
> > >> >> > >        at
> > sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)
> > >> >> > >        at
> > >> >> > >
> > >>
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:299)
> > >> >> > >        at
> > >> >> > >
> > org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:176)
> > >> >> > >        at
> > >> org.apache.hadoop.ipc.Client.getConnection(Client.java:772)
> > >> >> > >        at org.apache.hadoop.ipc.Client.call(Client.java:685)
> > >> >> > >        ... 12 more
> > >> >> > >
> > >> >> > > 2009-02-03 13:00:37,739 INFO
> > >> >> > > org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
> > >> >> > > /************************************************************
> > >> >> > > SHUTDOWN_MSG: Shutting down DataNode at slave/172.16.0.32
> > >> >> > > ************************************************************/
> > >> >> > >
> > >> >> > >
> > >> >> > > Also, the Pseudo distributed operation is working on both the
> > >> machines.
> > >> >> And
> > >> >> > > i am able to ssh from 'master to master' and 'master to slave'
> > via a
> > >> >> > > password-less ssh login. I do not think there is any problem
> with
> > >> the
> > >> >> > > network because cross pinging is working fine.
> > >> >> > >
> > >> >> > > I am working on Linux (Fedora 8)
> > >> >> > >
> > >> >> > > The following is the configuration which i am using
> > >> >> > >
> > >> >> > > On master and slave, /conf/masters looks like this:
> > >> >> > >
> > >> >> > >  master
> > >> >> > >
> > >> >> > > On master and slave, /conf/slaves looks like this:
> > >> >> > >
> > >> >> > >  master
> > >> >> > >  slave
> > >> >> > >
> > >> >> > > On both the machines conf/hadoop-site.xml looks like this
> > >> >> > >
> > >> >> > >
> > >> >> > >  fs.default.name
> > >> >> > >  hdfs://master:54310
> > >> >> > >  The name of the default file system.  A URI whose
> > >> >> > >  scheme and authority determine the FileSystem implementation.
> >  The
> > >> >> > >  uri's scheme determines the config property (fs.SCHEME.impl)
> > naming
> > >> >> > >  the FileSystem implementation class.  The uri's authority is
> > used
> > >> to
> > >> >> > >  determine the host, port, etc. for a filesystem.
> > >> >> > >
> > >> >> > >
> > >> >> > >  mapred.job.tracker
> > >> >> > >  master:54311
> > >> >> > >  The host and port that the MapReduce job tracker runs
> > >> >> > >  at.  If "local", then jobs are run in-process as a single map
> > >> >> > >  and reduce task.
> > >> >> > >
> > >> >> > >
> > >> >> > >
> > >> >> > >  dfs.replication
> > >> >> > >  2
> > >> >> > >  Default block replication.
> > >> >> > >  The actual number of replications can be specified when the
> file
> > is
> > >> >> > > created.
> > >> >> > >  The default is used if replication is not specified in create
> > time.
> > >> >> > >
> > >> >> > >
> > >> >> > >
> > >> >> > > namenode is formatted succesfully by running
> > >> >> > >
> > >> >> > > "bin/hadoop namenode -format"
> > >> >> > >
> > >> >> > > on the master node.
> > >> >> > >
> > >> >> > > I am new to Hadoop and I do not know what is going wrong.
> > >> >> > >
> > >> >> > > Any help will be appreciated.
> > >> >> > >
> > >> >> > > Thanking you in advance
> > >> >> > >
> > >> >> > > Shefali Pawar
> > >> >> > > Pune, India
> > >> >> > >
> > >> >>
> > >> >>
> > >> >>
> > >> >
> > >>
> > >
> >
>

Re: Re: Re: Re: Regarding "Hadoop multi cluster" set-up

Reply via email to