Re: Setting up a Hadoop cluster where nodes are spread over the Internet

2008-08-08 Thread Lucas Nazário dos Santos
Thanks Andreas. I'll try it.


On Fri, Aug 8, 2008 at 5:47 PM, Andreas Kostyrka <[EMAIL PROTECTED]>wrote:

> On Friday 08 August 2008 15:43:46 Lucas Nazário dos Santos wrote:
> > You are completely right. It's not safe at all. But this is what I have
> for
> > now:
> > two computers distributed across the Internet. I would really appreciate
> if
> > anyone could give me spark on how to configure the namenode's IP in a
> > datanode. As I could identify in log files, the datanode keeps trying to
> > connect
> > to the IP 10.1.1.5, which is the internal IP of the namenode. I just
> need a
> > way
> > to say to the datanode "Hey, could you instead connect to the IP
> 172.1.23.2
> > "?
>
> Your only bet is to set it up in a VPNed environment. That would make it
> securitywise okay too.
>
> Andreas
>
> >
> > Lucas
> >
> > On Fri, Aug 8, 2008 at 10:25 AM, Lukáš Vlček <[EMAIL PROTECTED]>
> wrote:
> > > HI,
> > >
> > > I am not an expert on Hadoop configuration but is this safe? As far as
> I
> > > understand the IP address is public and connection to the datanode port
> > > is not secured. Am I correct?
> > >
> > > Lukas
> > >
> > > On Fri, Aug 8, 2008 at 8:35 AM, Lucas Nazário dos Santos <
> > >
> > > [EMAIL PROTECTED]> wrote:
> > > > Hello again,
> > > >
> > > > In fact I can get the cluster up and running with two nodes in
> > > > different LANs. The problem appears when executing a job.
> > > >
> > > > As you can see in the piece of log bellow, the datanode tries to
> > >
> > > comunicate
> > >
> > > > with the namenode using the IP 10.1.1.5. The issue is that the
> datanode
> > > > should be using a valid IP, and not 10.1.1.5.
> > > >
> > > > Is there a way of manually configuring the datanode with the
> namenode's
> > >
> > > IP,
> > >
> > > > so I can change from 10.1.1.5 to, say 189.11.131.172?
> > > >
> > > > Thanks,
> > > > Lucas
> > > >
> > > >
> > > > 2008-08-08 02:34:23,335 INFO org.apache.hadoop.mapred.TaskTracker:
> > > > TaskTracker up at: localhost/127.0.0.1:60394
> > > > 2008-08-08 02:34:23,335 INFO org.apache.hadoop.mapred.TaskTracker:
> > >
> > > Starting
> > >
> > > > tracker tracker_localhost:localhost/127.0.0.1:60394
> > > > 2008-08-08 02:34:23,589 INFO org.apache.hadoop.mapred.TaskTracker:
> > >
> > > Starting
> > >
> > > > thread: Map-events fetcher for all reduce tasks on
> > > > tracker_localhost:localhost/127.0.0.1:60394
> > > > 2008-08-08 03:06:43,239 INFO org.apache.hadoop.mapred.TaskTracker:
> > > > LaunchTaskAction: task_200808080234_0001_m_00_0
> > > > 2008-08-08 03:07:43,989 INFO org.apache.hadoop.ipc.Client: Retrying
> > >
> > > connect
> > >
> > > > to server: /10.1.1.5:9000. Already tried 1 time(s).
> > > > 2008-08-08 03:08:44,999 INFO org.apache.hadoop.ipc.Client: Retrying
> > >
> > > connect
> > >
> > > > to server: /10.1.1.5:9000. Already tried 2 time(s).
> > > > 2008-08-08 03:09:45,999 INFO org.apache.hadoop.ipc.Client: Retrying
> > >
> > > connect
> > >
> > > > to server: /10.1.1.5:9000. Already tried 3 time(s).
> > > > 2008-08-08 03:10:47,009 INFO org.apache.hadoop.ipc.Client: Retrying
> > >
> > > connect
> > >
> > > > to server: /10.1.1.5:9000. Already tried 4 time(s).
> > > > 2008-08-08 03:11:48,009 INFO org.apache.hadoop.ipc.Client: Retrying
> > >
> > > connect
> > >
> > > > to server: /10.1.1.5:9000. Already tried 5 time(s).
> > > > 2008-08-08 03:12:49,026 INFO org.apache.hadoop.ipc.Client: Retrying
> > >
> > > connect
> > >
> > > > to server: /10.1.1.5:9000. Already tried 6 time(s).
> > > > 2008-08-08 03:13:50,036 INFO org.apache.hadoop.ipc.Client: Retrying
> > >
> > > connect
> > >
> > > > to server: /10.1.1.5:9000. Already tried 7 time(s).
> > > > 2008-08-08 03:14:51,046 INFO org.apache.hadoop.ipc.Client: Retrying
> > >
> > > connect
> > >
> > > > to server: /10.1.1.5:9000. Already tried 8 time(s).
> > > > 2008-08-08 03:15:52,056 INFO org.apache.hadoop.ipc.Client: Retrying
>

Re: Setting up a Hadoop cluster where nodes are spread over the Internet

2008-08-08 Thread Lucas Nazário dos Santos
You are completely right. It's not safe at all. But this is what I have for
now:
two computers distributed across the Internet. I would really appreciate if
anyone could give me spark on how to configure the namenode's IP in a
datanode. As I could identify in log files, the datanode keeps trying to
connect
to the IP 10.1.1.5, which is the internal IP of the namenode. I just need a
way
to say to the datanode "Hey, could you instead connect to the IP 172.1.23.2
"?

Lucas


On Fri, Aug 8, 2008 at 10:25 AM, Lukáš Vlček <[EMAIL PROTECTED]> wrote:

> HI,
>
> I am not an expert on Hadoop configuration but is this safe? As far as I
> understand the IP address is public and connection to the datanode port is
> not secured. Am I correct?
>
> Lukas
>
> On Fri, Aug 8, 2008 at 8:35 AM, Lucas Nazário dos Santos <
> [EMAIL PROTECTED]> wrote:
>
> > Hello again,
> >
> > In fact I can get the cluster up and running with two nodes in different
> > LANs. The problem appears when executing a job.
> >
> > As you can see in the piece of log bellow, the datanode tries to
> comunicate
> > with the namenode using the IP 10.1.1.5. The issue is that the datanode
> > should be using a valid IP, and not 10.1.1.5.
> >
> > Is there a way of manually configuring the datanode with the namenode's
> IP,
> > so I can change from 10.1.1.5 to, say 189.11.131.172?
> >
> > Thanks,
> > Lucas
> >
> >
> > 2008-08-08 02:34:23,335 INFO org.apache.hadoop.mapred.TaskTracker:
> > TaskTracker up at: localhost/127.0.0.1:60394
> > 2008-08-08 02:34:23,335 INFO org.apache.hadoop.mapred.TaskTracker:
> Starting
> > tracker tracker_localhost:localhost/127.0.0.1:60394
> > 2008-08-08 02:34:23,589 INFO org.apache.hadoop.mapred.TaskTracker:
> Starting
> > thread: Map-events fetcher for all reduce tasks on
> > tracker_localhost:localhost/127.0.0.1:60394
> > 2008-08-08 03:06:43,239 INFO org.apache.hadoop.mapred.TaskTracker:
> > LaunchTaskAction: task_200808080234_0001_m_00_0
> > 2008-08-08 03:07:43,989 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: /10.1.1.5:9000. Already tried 1 time(s).
> > 2008-08-08 03:08:44,999 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: /10.1.1.5:9000. Already tried 2 time(s).
> > 2008-08-08 03:09:45,999 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: /10.1.1.5:9000. Already tried 3 time(s).
> > 2008-08-08 03:10:47,009 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: /10.1.1.5:9000. Already tried 4 time(s).
> > 2008-08-08 03:11:48,009 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: /10.1.1.5:9000. Already tried 5 time(s).
> > 2008-08-08 03:12:49,026 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: /10.1.1.5:9000. Already tried 6 time(s).
> > 2008-08-08 03:13:50,036 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: /10.1.1.5:9000. Already tried 7 time(s).
> > 2008-08-08 03:14:51,046 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: /10.1.1.5:9000. Already tried 8 time(s).
> > 2008-08-08 03:15:52,056 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: /10.1.1.5:9000. Already tried 9 time(s).
> > 2008-08-08 03:16:53,066 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: /10.1.1.5:9000. Already tried 10 time(s).
> > 2008-08-08 03:17:54,077 WARN org.apache.hadoop.mapred.TaskTracker: Error
> > initializing task_200808080234_0001_m_00_0:
> > java.net.SocketTimeoutException
> >at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:109)
> >at
> > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:174)
> >at org.apache.hadoop.ipc.Client.getConnection(Client.java:623)
> >at org.apache.hadoop.ipc.Client.call(Client.java:546)
> >at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
> >at org.apache.hadoop.dfs.$Proxy5.getProtocolVersion(Unknown Source)
> >at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:313)
> >at
> org.apache.hadoop.dfs.DFSClient.createRPCNamenode(DFSClient.java:102)
> >at org.apache.hadoop.dfs.DFSClient.(DFSClient.java:178)
> >at
> >
> >
> org.apache.hadoop.dfs.DistributedFileSystem.initialize(DistributedFileSystem.java:68)
> >at
> > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1280)
> >at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:56)
> >at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1291)
> >a

Re: Setting up a Hadoop cluster where nodes are spread over the Internet

2008-08-07 Thread Lucas Nazário dos Santos
Hello again,

In fact I can get the cluster up and running with two nodes in different
LANs. The problem appears when executing a job.

As you can see in the piece of log bellow, the datanode tries to comunicate
with the namenode using the IP 10.1.1.5. The issue is that the datanode
should be using a valid IP, and not 10.1.1.5.

Is there a way of manually configuring the datanode with the namenode's IP,
so I can change from 10.1.1.5 to, say 189.11.131.172?

Thanks,
Lucas


2008-08-08 02:34:23,335 INFO org.apache.hadoop.mapred.TaskTracker:
TaskTracker up at: localhost/127.0.0.1:60394
2008-08-08 02:34:23,335 INFO org.apache.hadoop.mapred.TaskTracker: Starting
tracker tracker_localhost:localhost/127.0.0.1:60394
2008-08-08 02:34:23,589 INFO org.apache.hadoop.mapred.TaskTracker: Starting
thread: Map-events fetcher for all reduce tasks on
tracker_localhost:localhost/127.0.0.1:60394
2008-08-08 03:06:43,239 INFO org.apache.hadoop.mapred.TaskTracker:
LaunchTaskAction: task_200808080234_0001_m_00_0
2008-08-08 03:07:43,989 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: /10.1.1.5:9000. Already tried 1 time(s).
2008-08-08 03:08:44,999 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: /10.1.1.5:9000. Already tried 2 time(s).
2008-08-08 03:09:45,999 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: /10.1.1.5:9000. Already tried 3 time(s).
2008-08-08 03:10:47,009 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: /10.1.1.5:9000. Already tried 4 time(s).
2008-08-08 03:11:48,009 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: /10.1.1.5:9000. Already tried 5 time(s).
2008-08-08 03:12:49,026 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: /10.1.1.5:9000. Already tried 6 time(s).
2008-08-08 03:13:50,036 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: /10.1.1.5:9000. Already tried 7 time(s).
2008-08-08 03:14:51,046 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: /10.1.1.5:9000. Already tried 8 time(s).
2008-08-08 03:15:52,056 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: /10.1.1.5:9000. Already tried 9 time(s).
2008-08-08 03:16:53,066 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: /10.1.1.5:9000. Already tried 10 time(s).
2008-08-08 03:17:54,077 WARN org.apache.hadoop.mapred.TaskTracker: Error
initializing task_200808080234_0001_m_00_0:
java.net.SocketTimeoutException
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:109)
at
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:174)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:623)
at org.apache.hadoop.ipc.Client.call(Client.java:546)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
at org.apache.hadoop.dfs.$Proxy5.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:313)
at org.apache.hadoop.dfs.DFSClient.createRPCNamenode(DFSClient.java:102)
at org.apache.hadoop.dfs.DFSClient.(DFSClient.java:178)
at
org.apache.hadoop.dfs.DistributedFileSystem.initialize(DistributedFileSystem.java:68)
at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1280)
at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:56)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1291)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:203)
at org.apache.hadoop.fs.FileSystem.getNamed(FileSystem.java:152)
at
org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:670)
at
org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1274)
at
org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:915)
at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1310)
at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2251)



On Fri, Aug 8, 2008 at 12:16 AM, Lucas Nazário dos Santos <
[EMAIL PROTECTED]> wrote:

> Hello,
>
> Can someone point me out what are the extra tasks that need to be performed
> in order to set up a cluster where nodes are spread over the Internet, in
> different LANs?
>
> Do I need to free any datanode/namenode ports? How do I get the datanodes
> to know the valid namenode IP, and not something like 10.1.1.1?
>
> Any help is appreciate.
>
> Lucas
>


Setting up a Hadoop cluster where nodes are spread over the Internet

2008-08-07 Thread Lucas Nazário dos Santos
Hello,

Can someone point me out what are the extra tasks that need to be performed
in order to set up a cluster where nodes are spread over the Internet, in
different LANs?

Do I need to free any datanode/namenode ports? How do I get the datanodes to
know the valid namenode IP, and not something like 10.1.1.1?

Any help is appreciate.

Lucas