Re: Hadoop topology.script.file.name Form

2009-03-18 Thread Vadim Zaliva
I just got around configuring this in my hadoop-0.18.3 install and I
can share my working topology script.
Documentaion is a bit confusing on this matter, so I hope it would be helpful.

The script is called by namenode as datanotes first connect to it. It
is passed an IP address of a datanode as a
parameter. I do not particularly like hardcoded data in python code,
perhaps the script could read this information
from separate configuration file.

Here is my script:

#!/usr/bin/env python

'''
This script used by hadoop to determine network/rack topology.  It
should be specified in hadoop-site.xml via topology.script.file.name
Property.


 topology.script.file.name
 /home/hadoop/topology.py

'''

import sys
from string import join

DEFAULT_RACK = '/default/rack0';

RACK_MAP = { '208.94.2.10' : '/datacenter1/rack0',
 '1.2.3.4' : '/datacenter1/rack0',
 '1.2.3.5' : '/datacenter1/rack0',
 '1.2.3.6' : '/datacenter1/rack0',

 '10.2.3.4' : '/datacenter2/rack0',
 '10.2.3.4' : '/datacenter2/rack0'
}

if len(sys.argv)==1:
print DEFAULT_RACK
else:
print join([RACK_MAP.get(i, DEFAULT_RACK) for i in sys.argv[1:]]," ")


RE: topology.script.file.name

2008-07-03 Thread Yunhong Gu1



This is my "script", which is actually a C++ program:

#include 
#include 

using namespace std;

int main(int argc, char** argv)
{
   for (int i = 1; i < argc; i ++ )
   {
  string dn = argv[i];

  if (dn.substr(0, 5) == "rack1")
 cout << "/rack1";
  else if (dn.substr(0, 5) == "rack2")
 cout << "/rack2";
  else if (dn.substr(0, 3) == "192")
 cout << "/rack1";
  else if (dn.substr(0, 2) == "10")
 cout << "/rack2";
  else
 cout << "/rack0";

  cout << " ";
   }

   return 1;
}

So I compiled the program as mydns. It can accept many IPs and print 
/rack0, /rack1, or /rack2 in a row.


e.g.,
./mydns 192.168.0.1 10.0.0.1
/rack1 rack2

(I tried other possible output, like each rack ID in one row, which 
didn't help)


I configured hadoop-site.xml and add this

  topology.script.file.name
  /home/my/hadoop-0.17.0/mydns


The program is located at /home/my/hadoop-0.17.0.

My understanding is that "mydns" should be called by 
ScriptBasedMapping.java.


I added some output to file in the mydns program and I can verify that it 
is actually being called, with an input parameter something like 
"192.168.0.1 192.168.0.10 10.0.0.5".


However, when I ran ./bin/hadoop fsck, it still tells me that there is 
only one rack in the system, and MapReduce program will immediately fail 
because some "topology initialization error" (I could find the exact text 
any more).


Thanks
Yunhong


On Thu, 3 Jul 2008, Devaraj Das wrote:


This is strange. If you don't mind, pls send the script to me.


-Original Message-
From: Yunhong Gu1 [mailto:[EMAIL PROTECTED]
Sent: Thursday, July 03, 2008 9:49 AM
To: core-user@hadoop.apache.org
Subject: topology.script.file.name



Hello,

I have been trying to figure out how to configure rack
awareness. I have written a script that reads a list of IPs
or host names and return a list of rack IDs of the same number.

This is my script running:

$./mydns 192.168.1.1 192.168.2.1
/rack0 /rack1

I specified the path of this script to
topology.script.file.name. I verified that this script was
called by Hadoop and I could see the input (up to 21 IPs in my case).

However, it seems the output of my script is not correct and
Hadoop cannot use it to get the correct topology (only 1 rack
is found by Hadoop no matter how I change the format of the output).

Please advise if you know how to do this.

Thanks
Yunhong






RE: topology.script.file.name

2008-07-03 Thread Devaraj Das
This is strange. If you don't mind, pls send the script to me.

> -Original Message-
> From: Yunhong Gu1 [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, July 03, 2008 9:49 AM
> To: core-user@hadoop.apache.org
> Subject: topology.script.file.name
> 
> 
> 
> Hello,
> 
> I have been trying to figure out how to configure rack 
> awareness. I have written a script that reads a list of IPs 
> or host names and return a list of rack IDs of the same number.
> 
> This is my script running:
> 
> $./mydns 192.168.1.1 192.168.2.1
> /rack0 /rack1
> 
> I specified the path of this script to 
> topology.script.file.name. I verified that this script was 
> called by Hadoop and I could see the input (up to 21 IPs in my case).
> 
> However, it seems the output of my script is not correct and 
> Hadoop cannot use it to get the correct topology (only 1 rack 
> is found by Hadoop no matter how I change the format of the output).
> 
> Please advise if you know how to do this.
> 
> Thanks
> Yunhong
> 



topology.script.file.name

2008-07-02 Thread Yunhong Gu1



Hello,

I have been trying to figure out how to configure rack awareness. I have
written a script that reads a list of IPs or host names and return a list 
of rack IDs of the same number.


This is my script running:

$./mydns 192.168.1.1 192.168.2.1
/rack0 /rack1

I specified the path of this script to topology.script.file.name. I 
verified that this script was called by Hadoop and I could see the input 
(up to 21 IPs in my case).


However, it seems the output of my script is not correct and Hadoop cannot
use it to get the correct topology (only 1 rack is found by Hadoop no 
matter how I change the format of the output).


Please advise if you know how to do this.

Thanks
Yunhong


RE: Hadoop topology.script.file.name Form

2008-06-27 Thread Yunhong Gu1


I guess what we need is an example of the "script", where do we put it, 
and what exactly to fill in the value of the "topology.script.file.name" 
entry.


So, I wrote a program called "mydns". I can run the program

./mydns node1.rack1.yahoo.com

It prints "/rack1" to the screen.

Is this correct? Where to put this program? What to fill the the 
configuration file?


Thanks
Yunhong


On Mon, 9 Jun 2008, Devaraj Das wrote:


This documentation is for the earlier versions. In 0.17 the way in which
racks are dealt with has changed.


-Original Message-
From: Yang Chen [mailto:[EMAIL PROTECTED]
Sent: Sunday, June 08, 2008 8:06 PM
To: core-user@hadoop.apache.org
Subject: Re: Hadoop topology.script.file.name Form

Rack Awareness

Typically large Hadoop clusters are arranged in *racks* and
network traffic between different nodes with in the same rack
is much more desirable than network traffic across the racks.
In addition Namenode tries to place replicas of block on
multiple racks for improved fault tolerance. Hadoop lets the
cluster administrators decide which *rack* a node belongs to
through configuration variable dfs.network.script. When this
script is configured, each node runs the script to determine
its *rackid*. A default installation assumes all the nodes
belong to the same rack. This feature and configuration is
further described in
PDF<http://issues.apache.org/jira/secure/attachment/12345251/R
ack_aware_HDFS_proposal.pdf>attached
to
HADOOP-692 <http://issues.apache.org/jira/browse/HADOOP-692>.

Hope this will be helpful.



YC

On Sun, Jun 8, 2008 at 9:53 PM, Devaraj Das
<[EMAIL PROTECTED]> wrote:


Hi Iver,
The implementation of the script depends on your setup. The

main thing

is that it should be able to accept a bunch of IP addresses and DNS
names and be able to give back the rackIDs for each. It is a
one-to-one correspondence between what you pass and what

you get back.

For getting the rackID the script could deduce it from the

IP address,

or, query some service (similar to the way dns works, or

some similar

mechanism), or, in the extreme case, read a file that has

the mapping

from IP address to rackId.
Thanks,
Devaraj.


-Original Message-
From: ?? [mailto:[EMAIL PROTECTED]
Sent: Friday, June 06, 2008 8:13 AM
To: core-user
Subject: Hadoop topology.script.file.name Form

hi,

I want to setup a hadoop cluster, and I want to make the

cluster to

be Rack Awareness. But I can't find any document about

the form of

topology.script.file.name.

Could anybody give me an example about the form of
topology.script.file.name?

thanks a lot.

iver

2008-06-06















RE: Hadoop topology.script.file.name Form

2008-06-09 Thread Devaraj Das
This documentation is for the earlier versions. In 0.17 the way in which
racks are dealt with has changed.

> -Original Message-
> From: Yang Chen [mailto:[EMAIL PROTECTED] 
> Sent: Sunday, June 08, 2008 8:06 PM
> To: core-user@hadoop.apache.org
> Subject: Re: Hadoop topology.script.file.name Form
> 
> Rack Awareness
> 
> Typically large Hadoop clusters are arranged in *racks* and 
> network traffic between different nodes with in the same rack 
> is much more desirable than network traffic across the racks. 
> In addition Namenode tries to place replicas of block on 
> multiple racks for improved fault tolerance. Hadoop lets the 
> cluster administrators decide which *rack* a node belongs to 
> through configuration variable dfs.network.script. When this 
> script is configured, each node runs the script to determine 
> its *rackid*. A default installation assumes all the nodes 
> belong to the same rack. This feature and configuration is 
> further described in 
> PDF<http://issues.apache.org/jira/secure/attachment/12345251/R
> ack_aware_HDFS_proposal.pdf>attached
> to
> HADOOP-692 <http://issues.apache.org/jira/browse/HADOOP-692>.
> 
> Hope this will be helpful.
> 
> 
> 
> YC
> 
> On Sun, Jun 8, 2008 at 9:53 PM, Devaraj Das 
> <[EMAIL PROTECTED]> wrote:
> 
> > Hi Iver,
> > The implementation of the script depends on your setup. The 
> main thing 
> > is that it should be able to accept a bunch of IP addresses and DNS 
> > names and be able to give back the rackIDs for each. It is a 
> > one-to-one correspondence between what you pass and what 
> you get back. 
> > For getting the rackID the script could deduce it from the 
> IP address, 
> > or, query some service (similar to the way dns works, or 
> some similar 
> > mechanism), or, in the extreme case, read a file that has 
> the mapping 
> > from IP address to rackId.
> > Thanks,
> > Devaraj.
> >
> > > -Original Message-
> > > From: ?? [mailto:[EMAIL PROTECTED]
> > > Sent: Friday, June 06, 2008 8:13 AM
> > > To: core-user
> > > Subject: Hadoop topology.script.file.name Form
> > >
> > > hi,
> > >
> > > I want to setup a hadoop cluster, and I want to make the 
> cluster to 
> > > be Rack Awareness. But I can't find any document about 
> the form of 
> > > topology.script.file.name.
> > >
> > > Could anybody give me an example about the form of 
> > > topology.script.file.name?
> > >
> > > thanks a lot.
> > >
> > > iver
> > >
> > > 2008-06-06
> > > 
> > >
> > >
> > >
> > >
> >
> >
> 



Re: Hadoop topology.script.file.name Form

2008-06-08 Thread Yang Chen
Rack Awareness

Typically large Hadoop clusters are arranged in *racks* and network traffic
between different nodes with in the same rack is much more desirable than
network traffic across the racks. In addition Namenode tries to place
replicas of block on multiple racks for improved fault tolerance. Hadoop
lets the cluster administrators decide which *rack* a node belongs to
through configuration variable dfs.network.script. When this script is
configured, each node runs the script to determine its *rackid*. A default
installation assumes all the nodes belong to the same rack. This feature and
configuration is further described in
PDF<http://issues.apache.org/jira/secure/attachment/12345251/Rack_aware_HDFS_proposal.pdf>attached
to
HADOOP-692 <http://issues.apache.org/jira/browse/HADOOP-692>.

Hope this will be helpful.



YC

On Sun, Jun 8, 2008 at 9:53 PM, Devaraj Das <[EMAIL PROTECTED]> wrote:

> Hi Iver,
> The implementation of the script depends on your setup. The main thing is
> that it should be able to accept a bunch of IP addresses and DNS names and
> be able to give back the rackIDs for each. It is a one-to-one
> correspondence
> between what you pass and what you get back. For getting the rackID the
> script could deduce it from the IP address, or, query some service (similar
> to the way dns works, or some similar mechanism), or, in the extreme case,
> read a file that has the mapping from IP address to rackId.
> Thanks,
> Devaraj.
>
> > -Original Message-
> > From: ?? [mailto:[EMAIL PROTECTED]
> > Sent: Friday, June 06, 2008 8:13 AM
> > To: core-user
> > Subject: Hadoop topology.script.file.name Form
> >
> > hi,
> >
> > I want to setup a hadoop cluster, and I want to make the
> > cluster to be Rack Awareness. But I can't find any document
> > about the form of topology.script.file.name.
> >
> > Could anybody give me an example about the form of
> > topology.script.file.name?
> >
> > thanks a lot.
> >
> > iver
> >
> > 2008-06-06
> > 
> >
> >
> >
> >
>
>


RE: Hadoop topology.script.file.name Form

2008-06-08 Thread Devaraj Das
Hi Iver,
The implementation of the script depends on your setup. The main thing is
that it should be able to accept a bunch of IP addresses and DNS names and
be able to give back the rackIDs for each. It is a one-to-one correspondence
between what you pass and what you get back. For getting the rackID the
script could deduce it from the IP address, or, query some service (similar
to the way dns works, or some similar mechanism), or, in the extreme case,
read a file that has the mapping from IP address to rackId.
Thanks,
Devaraj.

> -Original Message-
> From: ?? [mailto:[EMAIL PROTECTED] 
> Sent: Friday, June 06, 2008 8:13 AM
> To: core-user
> Subject: Hadoop topology.script.file.name Form
> 
> hi,
>  
> I want to setup a hadoop cluster, and I want to make the 
> cluster to be Rack Awareness. But I can't find any document 
> about the form of topology.script.file.name.
>  
> Could anybody give me an example about the form of 
> topology.script.file.name?
>  
> thanks a lot.
>  
> iver
>  
> 2008-06-06
> 
> 
> 
>  
> 



Hadoop topology.script.file.name Form

2008-06-05 Thread 田超
hi,

I want to setup a hadoop cluster, and I want to make the cluster to be Rack 
Awareness. But I can't find any document about the form of 
topology.script.file.name.

Could anybody give me an example about the form of topology.script.file.name?

thanks a lot.

iver

2008-06-06