This is my "script", which is actually a C++ program:

#include <iostream>
#include <string>

using namespace std;

int main(int argc, char** argv)
{
   for (int i = 1; i < argc; i ++ )
   {
      string dn = argv[i];

      if (dn.substr(0, 5) == "rack1")
         cout << "/rack1";
      else if (dn.substr(0, 5) == "rack2")
         cout << "/rack2";
      else if (dn.substr(0, 3) == "192")
         cout << "/rack1";
      else if (dn.substr(0, 2) == "10")
         cout << "/rack2";
      else
         cout << "/rack0";

      cout << " ";
   }

   return 1;
}

So I compiled the program as mydns. It can accept many IPs and print /rack0, /rack1, or /rack2 in a row.

e.g.,
./mydns 192.168.0.1 10.0.0.1
/rack1 rack2

(I tried other possible output, like each rack ID in one row, which didn't help)

I configured hadoop-site.xml and add this
<property>
  <name>topology.script.file.name</name>
  <value>/home/my/hadoop-0.17.0/mydns</value>
</property>

The program is located at /home/my/hadoop-0.17.0.

My understanding is that "mydns" should be called by ScriptBasedMapping.java.

I added some output to file in the mydns program and I can verify that it is actually being called, with an input parameter something like "192.168.0.1 192.168.0.10 10.0.0.5".

However, when I ran ./bin/hadoop fsck, it still tells me that there is only one rack in the system, and MapReduce program will immediately fail because some "topology initialization error" (I could find the exact text any more).

Thanks
Yunhong


On Thu, 3 Jul 2008, Devaraj Das wrote:

This is strange. If you don't mind, pls send the script to me.

-----Original Message-----
From: Yunhong Gu1 [mailto:[EMAIL PROTECTED]
Sent: Thursday, July 03, 2008 9:49 AM
To: core-user@hadoop.apache.org
Subject: topology.script.file.name



Hello,

I have been trying to figure out how to configure rack
awareness. I have written a script that reads a list of IPs
or host names and return a list of rack IDs of the same number.

This is my script running:

$./mydns 192.168.1.1 192.168.2.1
/rack0 /rack1

I specified the path of this script to
topology.script.file.name. I verified that this script was
called by Hadoop and I could see the input (up to 21 IPs in my case).

However, it seems the output of my script is not correct and
Hadoop cannot use it to get the correct topology (only 1 rack
is found by Hadoop no matter how I change the format of the output).

Please advise if you know how to do this.

Thanks
Yunhong



Reply via email to