Re: Setting up a Hadoop cluster where nodes are spread over the Internet
Thanks Andreas. I'll try it. On Fri, Aug 8, 2008 at 5:47 PM, Andreas Kostyrka <[EMAIL PROTECTED]>wrote: > On Friday 08 August 2008 15:43:46 Lucas Nazário dos Santos wrote: > > You are completely right. It's not safe at all. But this is what I have > for > > now: > > two computers distributed across the Internet. I would really appreciate > if > > anyone could give me spark on how to configure the namenode's IP in a > > datanode. As I could identify in log files, the datanode keeps trying to > > connect > > to the IP 10.1.1.5, which is the internal IP of the namenode. I just > need a > > way > > to say to the datanode "Hey, could you instead connect to the IP > 172.1.23.2 > > "? > > Your only bet is to set it up in a VPNed environment. That would make it > securitywise okay too. > > Andreas > > > > > Lucas > > > > On Fri, Aug 8, 2008 at 10:25 AM, Lukáš Vlček <[EMAIL PROTECTED]> > wrote: > > > HI, > > > > > > I am not an expert on Hadoop configuration but is this safe? As far as > I > > > understand the IP address is public and connection to the datanode port > > > is not secured. Am I correct? > > > > > > Lukas > > > > > > On Fri, Aug 8, 2008 at 8:35 AM, Lucas Nazário dos Santos < > > > > > > [EMAIL PROTECTED]> wrote: > > > > Hello again, > > > > > > > > In fact I can get the cluster up and running with two nodes in > > > > different LANs. The problem appears when executing a job. > > > > > > > > As you can see in the piece of log bellow, the datanode tries to > > > > > > comunicate > > > > > > > with the namenode using the IP 10.1.1.5. The issue is that the > datanode > > > > should be using a valid IP, and not 10.1.1.5. > > > > > > > > Is there a way of manually configuring the datanode with the > namenode's > > > > > > IP, > > > > > > > so I can change from 10.1.1.5 to, say 189.11.131.172? > > > > > > > > Thanks, > > > > Lucas > > > > > > > > > > > > 2008-08-08 02:34:23,335 INFO org.apache.hadoop.mapred.TaskTracker: > > > > TaskTracker up at: localhost/127.0.0.1:60394 > > > > 2008-08-08 02:34:23,335 INFO org.apache.hadoop.mapred.TaskTracker: > > > > > > Starting > > > > > > > tracker tracker_localhost:localhost/127.0.0.1:60394 > > > > 2008-08-08 02:34:23,589 INFO org.apache.hadoop.mapred.TaskTracker: > > > > > > Starting > > > > > > > thread: Map-events fetcher for all reduce tasks on > > > > tracker_localhost:localhost/127.0.0.1:60394 > > > > 2008-08-08 03:06:43,239 INFO org.apache.hadoop.mapred.TaskTracker: > > > > LaunchTaskAction: task_200808080234_0001_m_00_0 > > > > 2008-08-08 03:07:43,989 INFO org.apache.hadoop.ipc.Client: Retrying > > > > > > connect > > > > > > > to server: /10.1.1.5:9000. Already tried 1 time(s). > > > > 2008-08-08 03:08:44,999 INFO org.apache.hadoop.ipc.Client: Retrying > > > > > > connect > > > > > > > to server: /10.1.1.5:9000. Already tried 2 time(s). > > > > 2008-08-08 03:09:45,999 INFO org.apache.hadoop.ipc.Client: Retrying > > > > > > connect > > > > > > > to server: /10.1.1.5:9000. Already tried 3 time(s). > > > > 2008-08-08 03:10:47,009 INFO org.apache.hadoop.ipc.Client: Retrying > > > > > > connect > > > > > > > to server: /10.1.1.5:9000. Already tried 4 time(s). > > > > 2008-08-08 03:11:48,009 INFO org.apache.hadoop.ipc.Client: Retrying > > > > > > connect > > > > > > > to server: /10.1.1.5:9000. Already tried 5 time(s). > > > > 2008-08-08 03:12:49,026 INFO org.apache.hadoop.ipc.Client: Retrying > > > > > > connect > > > > > > > to server: /10.1.1.5:9000. Already tried 6 time(s). > > > > 2008-08-08 03:13:50,036 INFO org.apache.hadoop.ipc.Client: Retrying > > > > > > connect > > > > > > > to server: /10.1.1.5:9000. Already tried 7 time(s). > > > > 2008-08-08 03:14:51,046 INFO org.apache.hadoop.ipc.Client: Retrying > > > > > > connect > > > > > > > to server: /10.1.1.5:9000. Already tried 8 time(s). > > > > 2008-08-08 03:15:52,056 INFO org.apache.hadoop.ipc.Client: Retrying >
Re: Setting up a Hadoop cluster where nodes are spread over the Internet
You are completely right. It's not safe at all. But this is what I have for now: two computers distributed across the Internet. I would really appreciate if anyone could give me spark on how to configure the namenode's IP in a datanode. As I could identify in log files, the datanode keeps trying to connect to the IP 10.1.1.5, which is the internal IP of the namenode. I just need a way to say to the datanode "Hey, could you instead connect to the IP 172.1.23.2 "? Lucas On Fri, Aug 8, 2008 at 10:25 AM, Lukáš Vlček <[EMAIL PROTECTED]> wrote: > HI, > > I am not an expert on Hadoop configuration but is this safe? As far as I > understand the IP address is public and connection to the datanode port is > not secured. Am I correct? > > Lukas > > On Fri, Aug 8, 2008 at 8:35 AM, Lucas Nazário dos Santos < > [EMAIL PROTECTED]> wrote: > > > Hello again, > > > > In fact I can get the cluster up and running with two nodes in different > > LANs. The problem appears when executing a job. > > > > As you can see in the piece of log bellow, the datanode tries to > comunicate > > with the namenode using the IP 10.1.1.5. The issue is that the datanode > > should be using a valid IP, and not 10.1.1.5. > > > > Is there a way of manually configuring the datanode with the namenode's > IP, > > so I can change from 10.1.1.5 to, say 189.11.131.172? > > > > Thanks, > > Lucas > > > > > > 2008-08-08 02:34:23,335 INFO org.apache.hadoop.mapred.TaskTracker: > > TaskTracker up at: localhost/127.0.0.1:60394 > > 2008-08-08 02:34:23,335 INFO org.apache.hadoop.mapred.TaskTracker: > Starting > > tracker tracker_localhost:localhost/127.0.0.1:60394 > > 2008-08-08 02:34:23,589 INFO org.apache.hadoop.mapred.TaskTracker: > Starting > > thread: Map-events fetcher for all reduce tasks on > > tracker_localhost:localhost/127.0.0.1:60394 > > 2008-08-08 03:06:43,239 INFO org.apache.hadoop.mapred.TaskTracker: > > LaunchTaskAction: task_200808080234_0001_m_00_0 > > 2008-08-08 03:07:43,989 INFO org.apache.hadoop.ipc.Client: Retrying > connect > > to server: /10.1.1.5:9000. Already tried 1 time(s). > > 2008-08-08 03:08:44,999 INFO org.apache.hadoop.ipc.Client: Retrying > connect > > to server: /10.1.1.5:9000. Already tried 2 time(s). > > 2008-08-08 03:09:45,999 INFO org.apache.hadoop.ipc.Client: Retrying > connect > > to server: /10.1.1.5:9000. Already tried 3 time(s). > > 2008-08-08 03:10:47,009 INFO org.apache.hadoop.ipc.Client: Retrying > connect > > to server: /10.1.1.5:9000. Already tried 4 time(s). > > 2008-08-08 03:11:48,009 INFO org.apache.hadoop.ipc.Client: Retrying > connect > > to server: /10.1.1.5:9000. Already tried 5 time(s). > > 2008-08-08 03:12:49,026 INFO org.apache.hadoop.ipc.Client: Retrying > connect > > to server: /10.1.1.5:9000. Already tried 6 time(s). > > 2008-08-08 03:13:50,036 INFO org.apache.hadoop.ipc.Client: Retrying > connect > > to server: /10.1.1.5:9000. Already tried 7 time(s). > > 2008-08-08 03:14:51,046 INFO org.apache.hadoop.ipc.Client: Retrying > connect > > to server: /10.1.1.5:9000. Already tried 8 time(s). > > 2008-08-08 03:15:52,056 INFO org.apache.hadoop.ipc.Client: Retrying > connect > > to server: /10.1.1.5:9000. Already tried 9 time(s). > > 2008-08-08 03:16:53,066 INFO org.apache.hadoop.ipc.Client: Retrying > connect > > to server: /10.1.1.5:9000. Already tried 10 time(s). > > 2008-08-08 03:17:54,077 WARN org.apache.hadoop.mapred.TaskTracker: Error > > initializing task_200808080234_0001_m_00_0: > > java.net.SocketTimeoutException > >at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:109) > >at > > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:174) > >at org.apache.hadoop.ipc.Client.getConnection(Client.java:623) > >at org.apache.hadoop.ipc.Client.call(Client.java:546) > >at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212) > >at org.apache.hadoop.dfs.$Proxy5.getProtocolVersion(Unknown Source) > >at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:313) > >at > org.apache.hadoop.dfs.DFSClient.createRPCNamenode(DFSClient.java:102) > >at org.apache.hadoop.dfs.DFSClient.(DFSClient.java:178) > >at > > > > > org.apache.hadoop.dfs.DistributedFileSystem.initialize(DistributedFileSystem.java:68) > >at > > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1280) > >at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:56) > >at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1291) > >a
Re: Setting up a Hadoop cluster where nodes are spread over the Internet
Hello again, In fact I can get the cluster up and running with two nodes in different LANs. The problem appears when executing a job. As you can see in the piece of log bellow, the datanode tries to comunicate with the namenode using the IP 10.1.1.5. The issue is that the datanode should be using a valid IP, and not 10.1.1.5. Is there a way of manually configuring the datanode with the namenode's IP, so I can change from 10.1.1.5 to, say 189.11.131.172? Thanks, Lucas 2008-08-08 02:34:23,335 INFO org.apache.hadoop.mapred.TaskTracker: TaskTracker up at: localhost/127.0.0.1:60394 2008-08-08 02:34:23,335 INFO org.apache.hadoop.mapred.TaskTracker: Starting tracker tracker_localhost:localhost/127.0.0.1:60394 2008-08-08 02:34:23,589 INFO org.apache.hadoop.mapred.TaskTracker: Starting thread: Map-events fetcher for all reduce tasks on tracker_localhost:localhost/127.0.0.1:60394 2008-08-08 03:06:43,239 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction: task_200808080234_0001_m_00_0 2008-08-08 03:07:43,989 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /10.1.1.5:9000. Already tried 1 time(s). 2008-08-08 03:08:44,999 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /10.1.1.5:9000. Already tried 2 time(s). 2008-08-08 03:09:45,999 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /10.1.1.5:9000. Already tried 3 time(s). 2008-08-08 03:10:47,009 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /10.1.1.5:9000. Already tried 4 time(s). 2008-08-08 03:11:48,009 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /10.1.1.5:9000. Already tried 5 time(s). 2008-08-08 03:12:49,026 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /10.1.1.5:9000. Already tried 6 time(s). 2008-08-08 03:13:50,036 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /10.1.1.5:9000. Already tried 7 time(s). 2008-08-08 03:14:51,046 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /10.1.1.5:9000. Already tried 8 time(s). 2008-08-08 03:15:52,056 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /10.1.1.5:9000. Already tried 9 time(s). 2008-08-08 03:16:53,066 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /10.1.1.5:9000. Already tried 10 time(s). 2008-08-08 03:17:54,077 WARN org.apache.hadoop.mapred.TaskTracker: Error initializing task_200808080234_0001_m_00_0: java.net.SocketTimeoutException at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:109) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:174) at org.apache.hadoop.ipc.Client.getConnection(Client.java:623) at org.apache.hadoop.ipc.Client.call(Client.java:546) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212) at org.apache.hadoop.dfs.$Proxy5.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:313) at org.apache.hadoop.dfs.DFSClient.createRPCNamenode(DFSClient.java:102) at org.apache.hadoop.dfs.DFSClient.(DFSClient.java:178) at org.apache.hadoop.dfs.DistributedFileSystem.initialize(DistributedFileSystem.java:68) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1280) at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:56) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1291) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:203) at org.apache.hadoop.fs.FileSystem.getNamed(FileSystem.java:152) at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:670) at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1274) at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:915) at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1310) at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2251) On Fri, Aug 8, 2008 at 12:16 AM, Lucas Nazário dos Santos < [EMAIL PROTECTED]> wrote: > Hello, > > Can someone point me out what are the extra tasks that need to be performed > in order to set up a cluster where nodes are spread over the Internet, in > different LANs? > > Do I need to free any datanode/namenode ports? How do I get the datanodes > to know the valid namenode IP, and not something like 10.1.1.1? > > Any help is appreciate. > > Lucas >
Setting up a Hadoop cluster where nodes are spread over the Internet
Hello, Can someone point me out what are the extra tasks that need to be performed in order to set up a cluster where nodes are spread over the Internet, in different LANs? Do I need to free any datanode/namenode ports? How do I get the datanodes to know the valid namenode IP, and not something like 10.1.1.1? Any help is appreciate. Lucas