Re: Setting up a Hadoop cluster where nodes are spread over the Internet

Andreas Kostyrka Sun, 10 Aug 2008 14:30:02 -0700

On Sunday 10 August 2008 08:02:52 Dhruba Borthakur wrote:
> In almost all hadoop configurations, all host names can be specified
> as IP address. So, in your hadoop-site.xml, please specify the IP
> address of the namenode (instead of its hostname).


Which is not necessary a solution for complicated situations => e.g. where a 
port is bound to a different IP address that is used to access it. Basically 
NAT ;)

I mention this, because one common environment for Hadoop, EC2 does exactly 
that ;)

Andreas

>
> -dhruba
>
> 2008/8/8 Lucas Nazário dos Santos <[EMAIL PROTECTED]>:
> > Thanks Andreas. I'll try it.
> >
> > On Fri, Aug 8, 2008 at 5:47 PM, Andreas Kostyrka 
<[EMAIL PROTECTED]>wrote:
> >> On Friday 08 August 2008 15:43:46 Lucas Nazário dos Santos wrote:
> >> > You are completely right. It's not safe at all. But this is what I
> >> > have
> >>
> >> for
> >>
> >> > now:
> >> > two computers distributed across the Internet. I would really
> >> > appreciate
> >>
> >> if
> >>
> >> > anyone could give me spark on how to configure the namenode's IP in a
> >> > datanode. As I could identify in log files, the datanode keeps trying
> >> > to connect
> >> > to the IP 10.1.1.5, which is the internal IP of the namenode. I just
> >>
> >> need a
> >>
> >> > way
> >> > to say to the datanode "Hey, could you instead connect to the IP
> >>
> >> 172.1.23.2
> >>
> >> > "?
> >>
> >> Your only bet is to set it up in a VPNed environment. That would make it
> >> securitywise okay too.
> >>
> >> Andreas
> >>
> >> > Lucas
> >> >
> >> > On Fri, Aug 8, 2008 at 10:25 AM, Lukáš Vlček <[EMAIL PROTECTED]>
> >>
> >> wrote:
> >> > > HI,
> >> > >
> >> > > I am not an expert on Hadoop configuration but is this safe? As far
> >> > > as
> >>
> >> I
> >>
> >> > > understand the IP address is public and connection to the datanode
> >> > > port is not secured. Am I correct?
> >> > >
> >> > > Lukas
> >> > >
> >> > > On Fri, Aug 8, 2008 at 8:35 AM, Lucas Nazário dos Santos <
> >> > >
> >> > > [EMAIL PROTECTED]> wrote:
> >> > > > Hello again,
> >> > > >
> >> > > > In fact I can get the cluster up and running with two nodes in
> >> > > > different LANs. The problem appears when executing a job.
> >> > > >
> >> > > > As you can see in the piece of log bellow, the datanode tries to
> >> > >
> >> > > comunicate
> >> > >
> >> > > > with the namenode using the IP 10.1.1.5. The issue is that the
> >>
> >> datanode
> >>
> >> > > > should be using a valid IP, and not 10.1.1.5.
> >> > > >
> >> > > > Is there a way of manually configuring the datanode with the
> >>
> >> namenode's
> >>
> >> > > IP,
> >> > >
> >> > > > so I can change from 10.1.1.5 to, say 189.11.131.172?
> >> > > >
> >> > > > Thanks,
> >> > > > Lucas
> >> > > >
> >> > > >
> >> > > > 2008-08-08 02:34:23,335 INFO org.apache.hadoop.mapred.TaskTracker:
> >> > > > TaskTracker up at: localhost/127.0.0.1:60394
> >> > > > 2008-08-08 02:34:23,335 INFO org.apache.hadoop.mapred.TaskTracker:
> >> > >
> >> > > Starting
> >> > >
> >> > > > tracker tracker_localhost:localhost/127.0.0.1:60394
> >> > > > 2008-08-08 02:34:23,589 INFO org.apache.hadoop.mapred.TaskTracker:
> >> > >
> >> > > Starting
> >> > >
> >> > > > thread: Map-events fetcher for all reduce tasks on
> >> > > > tracker_localhost:localhost/127.0.0.1:60394
> >> > > > 2008-08-08 03:06:43,239 INFO org.apache.hadoop.mapred.TaskTracker:
> >> > > > LaunchTaskAction: task_200808080234_0001_m_000000_0
> >> > > > 2008-08-08 03:07:43,989 INFO org.apache.hadoop.ipc.Client:
> >> > > > Retrying
> >> > >
> >> > > connect
> >> > >
> >> > > > to server: /10.1.1.5:9000. Already tried 1 time(s).
> >> > > > 2008-08-08 03:08:44,999 INFO org.apache.hadoop.ipc.Client:
> >> > > > Retrying
> >> > >
> >> > > connect
> >> > >
> >> > > > to server: /10.1.1.5:9000. Already tried 2 time(s).
> >> > > > 2008-08-08 03:09:45,999 INFO org.apache.hadoop.ipc.Client:
> >> > > > Retrying
> >> > >
> >> > > connect
> >> > >
> >> > > > to server: /10.1.1.5:9000. Already tried 3 time(s).
> >> > > > 2008-08-08 03:10:47,009 INFO org.apache.hadoop.ipc.Client:
> >> > > > Retrying
> >> > >
> >> > > connect
> >> > >
> >> > > > to server: /10.1.1.5:9000. Already tried 4 time(s).
> >> > > > 2008-08-08 03:11:48,009 INFO org.apache.hadoop.ipc.Client:
> >> > > > Retrying
> >> > >
> >> > > connect
> >> > >
> >> > > > to server: /10.1.1.5:9000. Already tried 5 time(s).
> >> > > > 2008-08-08 03:12:49,026 INFO org.apache.hadoop.ipc.Client:
> >> > > > Retrying
> >> > >
> >> > > connect
> >> > >
> >> > > > to server: /10.1.1.5:9000. Already tried 6 time(s).
> >> > > > 2008-08-08 03:13:50,036 INFO org.apache.hadoop.ipc.Client:
> >> > > > Retrying
> >> > >
> >> > > connect
> >> > >
> >> > > > to server: /10.1.1.5:9000. Already tried 7 time(s).
> >> > > > 2008-08-08 03:14:51,046 INFO org.apache.hadoop.ipc.Client:
> >> > > > Retrying
> >> > >
> >> > > connect
> >> > >
> >> > > > to server: /10.1.1.5:9000. Already tried 8 time(s).
> >> > > > 2008-08-08 03:15:52,056 INFO org.apache.hadoop.ipc.Client:
> >> > > > Retrying
> >> > >
> >> > > connect
> >> > >
> >> > > > to server: /10.1.1.5:9000. Already tried 9 time(s).
> >> > > > 2008-08-08 03:16:53,066 INFO org.apache.hadoop.ipc.Client:
> >> > > > Retrying
> >> > >
> >> > > connect
> >> > >
> >> > > > to server: /10.1.1.5:9000. Already tried 10 time(s).
> >> > > > 2008-08-08 03:17:54,077 WARN org.apache.hadoop.mapred.TaskTracker:
> >> > > > Error initializing task_200808080234_0001_m_000000_0:
> >> > > > java.net.SocketTimeoutException
> >> > > >    at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:109)
> >> > > >    at
> >>
> >> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:174)
> >>
> >> > > >    at org.apache.hadoop.ipc.Client.getConnection(Client.java:623)
> >> > > >    at org.apache.hadoop.ipc.Client.call(Client.java:546)
> >> > > >    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
> >> > > >    at org.apache.hadoop.dfs.$Proxy5.getProtocolVersion(Unknown
> >>
> >> Source)
> >>
> >> > > >    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:313)
> >> > > >    at
> >> > >
> >> > > org.apache.hadoop.dfs.DFSClient.createRPCNamenode(DFSClient.java:102
> >> > >)
> >> > >
> >> > > >    at org.apache.hadoop.dfs.DFSClient.<init>(DFSClient.java:178)
> >> > > >    at
> >>
> >> org.apache.hadoop.dfs.DistributedFileSystem.initialize(DistributedFileSy
> >>s
> >>
> >> > >tem.java:68)
> >> > >
> >> > > >    at
> >>
> >> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1280)
> >>
> >> > > >    at
> >> > > > org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:56) at
> >> > > > org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1291) at
> >> > > > org.apache.hadoop.fs.FileSystem.get(FileSystem.java:203) at
> >> > > > org.apache.hadoop.fs.FileSystem.getNamed(FileSystem.java:152) at
> >>
> >> org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:670)
> >>
> >> > > >    at
> >>
> >> org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1274
> >>
> >> > > >) at
> >>
> >> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:915)
> >>
> >> > > >    at
> >> > > > org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1310) at
> >>
> >> org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2251)
> >>
> >> > > > On Fri, Aug 8, 2008 at 12:16 AM, Lucas Nazário dos Santos <
> >> > > >
> >> > > > [EMAIL PROTECTED]> wrote:
> >> > > > > Hello,
> >> > > > >
> >> > > > > Can someone point me out what are the extra tasks that need to
> >> > > > > be
> >> > > >
> >> > > > performed
> >> > > >
> >> > > > > in order to set up a cluster where nodes are spread over the
> >> > > > > Internet,
> >> > >
> >> > > in
> >> > >
> >> > > > > different LANs?
> >> > > > >
> >> > > > > Do I need to free any datanode/namenode ports? How do I get the
> >> > >
> >> > > datanodes
> >> > >
> >> > > > > to know the valid namenode IP, and not something like 10.1.1.1?
> >> > > > >
> >> > > > > Any help is appreciate.
> >> > > > >
> >> > > > > Lucas
> >> > >
> >> > > --
> >> > > http://blog.lukas-vlcek.com/

signature.asc
Description: This is a digitally signed message part.

Re: Setting up a Hadoop cluster where nodes are spread over the Internet

Reply via email to