Re: Hadoop topology.script.file.name Form
I just got around configuring this in my hadoop-0.18.3 install and I can share my working topology script. Documentaion is a bit confusing on this matter, so I hope it would be helpful. The script is called by namenode as datanotes first connect to it. It is passed an IP address of a datanode as a parameter. I do not particularly like hardcoded data in python code, perhaps the script could read this information from separate configuration file. Here is my script: #!/usr/bin/env python ''' This script used by hadoop to determine network/rack topology. It should be specified in hadoop-site.xml via topology.script.file.name Property. topology.script.file.name /home/hadoop/topology.py ''' import sys from string import join DEFAULT_RACK = '/default/rack0'; RACK_MAP = { '208.94.2.10' : '/datacenter1/rack0', '1.2.3.4' : '/datacenter1/rack0', '1.2.3.5' : '/datacenter1/rack0', '1.2.3.6' : '/datacenter1/rack0', '10.2.3.4' : '/datacenter2/rack0', '10.2.3.4' : '/datacenter2/rack0' } if len(sys.argv)==1: print DEFAULT_RACK else: print join([RACK_MAP.get(i, DEFAULT_RACK) for i in sys.argv[1:]]," ")
RE: topology.script.file.name
This is my "script", which is actually a C++ program: #include #include using namespace std; int main(int argc, char** argv) { for (int i = 1; i < argc; i ++ ) { string dn = argv[i]; if (dn.substr(0, 5) == "rack1") cout << "/rack1"; else if (dn.substr(0, 5) == "rack2") cout << "/rack2"; else if (dn.substr(0, 3) == "192") cout << "/rack1"; else if (dn.substr(0, 2) == "10") cout << "/rack2"; else cout << "/rack0"; cout << " "; } return 1; } So I compiled the program as mydns. It can accept many IPs and print /rack0, /rack1, or /rack2 in a row. e.g., ./mydns 192.168.0.1 10.0.0.1 /rack1 rack2 (I tried other possible output, like each rack ID in one row, which didn't help) I configured hadoop-site.xml and add this topology.script.file.name /home/my/hadoop-0.17.0/mydns The program is located at /home/my/hadoop-0.17.0. My understanding is that "mydns" should be called by ScriptBasedMapping.java. I added some output to file in the mydns program and I can verify that it is actually being called, with an input parameter something like "192.168.0.1 192.168.0.10 10.0.0.5". However, when I ran ./bin/hadoop fsck, it still tells me that there is only one rack in the system, and MapReduce program will immediately fail because some "topology initialization error" (I could find the exact text any more). Thanks Yunhong On Thu, 3 Jul 2008, Devaraj Das wrote: This is strange. If you don't mind, pls send the script to me. -Original Message- From: Yunhong Gu1 [mailto:[EMAIL PROTECTED] Sent: Thursday, July 03, 2008 9:49 AM To: core-user@hadoop.apache.org Subject: topology.script.file.name Hello, I have been trying to figure out how to configure rack awareness. I have written a script that reads a list of IPs or host names and return a list of rack IDs of the same number. This is my script running: $./mydns 192.168.1.1 192.168.2.1 /rack0 /rack1 I specified the path of this script to topology.script.file.name. I verified that this script was called by Hadoop and I could see the input (up to 21 IPs in my case). However, it seems the output of my script is not correct and Hadoop cannot use it to get the correct topology (only 1 rack is found by Hadoop no matter how I change the format of the output). Please advise if you know how to do this. Thanks Yunhong
RE: topology.script.file.name
This is strange. If you don't mind, pls send the script to me. > -Original Message- > From: Yunhong Gu1 [mailto:[EMAIL PROTECTED] > Sent: Thursday, July 03, 2008 9:49 AM > To: core-user@hadoop.apache.org > Subject: topology.script.file.name > > > > Hello, > > I have been trying to figure out how to configure rack > awareness. I have written a script that reads a list of IPs > or host names and return a list of rack IDs of the same number. > > This is my script running: > > $./mydns 192.168.1.1 192.168.2.1 > /rack0 /rack1 > > I specified the path of this script to > topology.script.file.name. I verified that this script was > called by Hadoop and I could see the input (up to 21 IPs in my case). > > However, it seems the output of my script is not correct and > Hadoop cannot use it to get the correct topology (only 1 rack > is found by Hadoop no matter how I change the format of the output). > > Please advise if you know how to do this. > > Thanks > Yunhong >
topology.script.file.name
Hello, I have been trying to figure out how to configure rack awareness. I have written a script that reads a list of IPs or host names and return a list of rack IDs of the same number. This is my script running: $./mydns 192.168.1.1 192.168.2.1 /rack0 /rack1 I specified the path of this script to topology.script.file.name. I verified that this script was called by Hadoop and I could see the input (up to 21 IPs in my case). However, it seems the output of my script is not correct and Hadoop cannot use it to get the correct topology (only 1 rack is found by Hadoop no matter how I change the format of the output). Please advise if you know how to do this. Thanks Yunhong
RE: Hadoop topology.script.file.name Form
I guess what we need is an example of the "script", where do we put it, and what exactly to fill in the value of the "topology.script.file.name" entry. So, I wrote a program called "mydns". I can run the program ./mydns node1.rack1.yahoo.com It prints "/rack1" to the screen. Is this correct? Where to put this program? What to fill the the configuration file? Thanks Yunhong On Mon, 9 Jun 2008, Devaraj Das wrote: This documentation is for the earlier versions. In 0.17 the way in which racks are dealt with has changed. -Original Message- From: Yang Chen [mailto:[EMAIL PROTECTED] Sent: Sunday, June 08, 2008 8:06 PM To: core-user@hadoop.apache.org Subject: Re: Hadoop topology.script.file.name Form Rack Awareness Typically large Hadoop clusters are arranged in *racks* and network traffic between different nodes with in the same rack is much more desirable than network traffic across the racks. In addition Namenode tries to place replicas of block on multiple racks for improved fault tolerance. Hadoop lets the cluster administrators decide which *rack* a node belongs to through configuration variable dfs.network.script. When this script is configured, each node runs the script to determine its *rackid*. A default installation assumes all the nodes belong to the same rack. This feature and configuration is further described in PDF<http://issues.apache.org/jira/secure/attachment/12345251/R ack_aware_HDFS_proposal.pdf>attached to HADOOP-692 <http://issues.apache.org/jira/browse/HADOOP-692>. Hope this will be helpful. YC On Sun, Jun 8, 2008 at 9:53 PM, Devaraj Das <[EMAIL PROTECTED]> wrote: Hi Iver, The implementation of the script depends on your setup. The main thing is that it should be able to accept a bunch of IP addresses and DNS names and be able to give back the rackIDs for each. It is a one-to-one correspondence between what you pass and what you get back. For getting the rackID the script could deduce it from the IP address, or, query some service (similar to the way dns works, or some similar mechanism), or, in the extreme case, read a file that has the mapping from IP address to rackId. Thanks, Devaraj. -Original Message- From: ?? [mailto:[EMAIL PROTECTED] Sent: Friday, June 06, 2008 8:13 AM To: core-user Subject: Hadoop topology.script.file.name Form hi, I want to setup a hadoop cluster, and I want to make the cluster to be Rack Awareness. But I can't find any document about the form of topology.script.file.name. Could anybody give me an example about the form of topology.script.file.name? thanks a lot. iver 2008-06-06
RE: Hadoop topology.script.file.name Form
This documentation is for the earlier versions. In 0.17 the way in which racks are dealt with has changed. > -Original Message- > From: Yang Chen [mailto:[EMAIL PROTECTED] > Sent: Sunday, June 08, 2008 8:06 PM > To: core-user@hadoop.apache.org > Subject: Re: Hadoop topology.script.file.name Form > > Rack Awareness > > Typically large Hadoop clusters are arranged in *racks* and > network traffic between different nodes with in the same rack > is much more desirable than network traffic across the racks. > In addition Namenode tries to place replicas of block on > multiple racks for improved fault tolerance. Hadoop lets the > cluster administrators decide which *rack* a node belongs to > through configuration variable dfs.network.script. When this > script is configured, each node runs the script to determine > its *rackid*. A default installation assumes all the nodes > belong to the same rack. This feature and configuration is > further described in > PDF<http://issues.apache.org/jira/secure/attachment/12345251/R > ack_aware_HDFS_proposal.pdf>attached > to > HADOOP-692 <http://issues.apache.org/jira/browse/HADOOP-692>. > > Hope this will be helpful. > > > > YC > > On Sun, Jun 8, 2008 at 9:53 PM, Devaraj Das > <[EMAIL PROTECTED]> wrote: > > > Hi Iver, > > The implementation of the script depends on your setup. The > main thing > > is that it should be able to accept a bunch of IP addresses and DNS > > names and be able to give back the rackIDs for each. It is a > > one-to-one correspondence between what you pass and what > you get back. > > For getting the rackID the script could deduce it from the > IP address, > > or, query some service (similar to the way dns works, or > some similar > > mechanism), or, in the extreme case, read a file that has > the mapping > > from IP address to rackId. > > Thanks, > > Devaraj. > > > > > -Original Message- > > > From: ?? [mailto:[EMAIL PROTECTED] > > > Sent: Friday, June 06, 2008 8:13 AM > > > To: core-user > > > Subject: Hadoop topology.script.file.name Form > > > > > > hi, > > > > > > I want to setup a hadoop cluster, and I want to make the > cluster to > > > be Rack Awareness. But I can't find any document about > the form of > > > topology.script.file.name. > > > > > > Could anybody give me an example about the form of > > > topology.script.file.name? > > > > > > thanks a lot. > > > > > > iver > > > > > > 2008-06-06 > > > > > > > > > > > > > > > > > > > >
Re: Hadoop topology.script.file.name Form
Rack Awareness Typically large Hadoop clusters are arranged in *racks* and network traffic between different nodes with in the same rack is much more desirable than network traffic across the racks. In addition Namenode tries to place replicas of block on multiple racks for improved fault tolerance. Hadoop lets the cluster administrators decide which *rack* a node belongs to through configuration variable dfs.network.script. When this script is configured, each node runs the script to determine its *rackid*. A default installation assumes all the nodes belong to the same rack. This feature and configuration is further described in PDF<http://issues.apache.org/jira/secure/attachment/12345251/Rack_aware_HDFS_proposal.pdf>attached to HADOOP-692 <http://issues.apache.org/jira/browse/HADOOP-692>. Hope this will be helpful. YC On Sun, Jun 8, 2008 at 9:53 PM, Devaraj Das <[EMAIL PROTECTED]> wrote: > Hi Iver, > The implementation of the script depends on your setup. The main thing is > that it should be able to accept a bunch of IP addresses and DNS names and > be able to give back the rackIDs for each. It is a one-to-one > correspondence > between what you pass and what you get back. For getting the rackID the > script could deduce it from the IP address, or, query some service (similar > to the way dns works, or some similar mechanism), or, in the extreme case, > read a file that has the mapping from IP address to rackId. > Thanks, > Devaraj. > > > -Original Message- > > From: ?? [mailto:[EMAIL PROTECTED] > > Sent: Friday, June 06, 2008 8:13 AM > > To: core-user > > Subject: Hadoop topology.script.file.name Form > > > > hi, > > > > I want to setup a hadoop cluster, and I want to make the > > cluster to be Rack Awareness. But I can't find any document > > about the form of topology.script.file.name. > > > > Could anybody give me an example about the form of > > topology.script.file.name? > > > > thanks a lot. > > > > iver > > > > 2008-06-06 > > > > > > > > > > > >
RE: Hadoop topology.script.file.name Form
Hi Iver, The implementation of the script depends on your setup. The main thing is that it should be able to accept a bunch of IP addresses and DNS names and be able to give back the rackIDs for each. It is a one-to-one correspondence between what you pass and what you get back. For getting the rackID the script could deduce it from the IP address, or, query some service (similar to the way dns works, or some similar mechanism), or, in the extreme case, read a file that has the mapping from IP address to rackId. Thanks, Devaraj. > -Original Message- > From: ?? [mailto:[EMAIL PROTECTED] > Sent: Friday, June 06, 2008 8:13 AM > To: core-user > Subject: Hadoop topology.script.file.name Form > > hi, > > I want to setup a hadoop cluster, and I want to make the > cluster to be Rack Awareness. But I can't find any document > about the form of topology.script.file.name. > > Could anybody give me an example about the form of > topology.script.file.name? > > thanks a lot. > > iver > > 2008-06-06 > > > > >
Hadoop topology.script.file.name Form
hi, I want to setup a hadoop cluster, and I want to make the cluster to be Rack Awareness. But I can't find any document about the form of topology.script.file.name. Could anybody give me an example about the form of topology.script.file.name? thanks a lot. iver 2008-06-06