This is my "script", which is actually a C++ program:
#include <iostream>
#include <string>
using namespace std;
int main(int argc, char** argv)
{
for (int i = 1; i < argc; i ++ )
{
string dn = argv[i];
if (dn.substr(0, 5) == "rack1")
cout << "/rack1";
else if (dn.substr(0, 5) == "rack2")
cout << "/rack2";
else if (dn.substr(0, 3) == "192")
cout << "/rack1";
else if (dn.substr(0, 2) == "10")
cout << "/rack2";
else
cout << "/rack0";
cout << " ";
}
return 1;
}
So I compiled the program as mydns. It can accept many IPs and print
/rack0, /rack1, or /rack2 in a row.
e.g.,
./mydns 192.168.0.1 10.0.0.1
/rack1 rack2
(I tried other possible output, like each rack ID in one row, which
didn't help)
I configured hadoop-site.xml and add this
<property>
<name>topology.script.file.name</name>
<value>/home/my/hadoop-0.17.0/mydns</value>
</property>
The program is located at /home/my/hadoop-0.17.0.
My understanding is that "mydns" should be called by
ScriptBasedMapping.java.
I added some output to file in the mydns program and I can verify that it
is actually being called, with an input parameter something like
"192.168.0.1 192.168.0.10 10.0.0.5".
However, when I ran ./bin/hadoop fsck, it still tells me that there is
only one rack in the system, and MapReduce program will immediately fail
because some "topology initialization error" (I could find the exact text
any more).
Thanks
Yunhong
On Thu, 3 Jul 2008, Devaraj Das wrote:
This is strange. If you don't mind, pls send the script to me.
-----Original Message-----
From: Yunhong Gu1 [mailto:[EMAIL PROTECTED]
Sent: Thursday, July 03, 2008 9:49 AM
To: core-user@hadoop.apache.org
Subject: topology.script.file.name
Hello,
I have been trying to figure out how to configure rack
awareness. I have written a script that reads a list of IPs
or host names and return a list of rack IDs of the same number.
This is my script running:
$./mydns 192.168.1.1 192.168.2.1
/rack0 /rack1
I specified the path of this script to
topology.script.file.name. I verified that this script was
called by Hadoop and I could see the input (up to 21 IPs in my case).
However, it seems the output of my script is not correct and
Hadoop cannot use it to get the correct topology (only 1 rack
is found by Hadoop no matter how I change the format of the output).
Please advise if you know how to do this.
Thanks
Yunhong