I just got around configuring this in my hadoop-0.18.3 install and I
can share my working topology script.
Documentaion is a bit confusing on this matter, so I hope it would be helpful.

The script is called by namenode as datanotes first connect to it. It
is passed an IP address of a datanode as a
parameter. I do not particularly like hardcoded data in python code,
perhaps the script could read this information
from separate configuration file.

Here is my script:

#!/usr/bin/env python

'''
This script used by hadoop to determine network/rack topology.  It
should be specified in hadoop-site.xml via topology.script.file.name
Property.

<property>
 <name>topology.script.file.name</name>
 <value>/home/hadoop/topology.py</value>
</property>
'''

import sys
from string import join

DEFAULT_RACK = '/default/rack0';

RACK_MAP = { '208.94.2.10' : '/datacenter1/rack0',
             '1.2.3.4' : '/datacenter1/rack0',
             '1.2.3.5' : '/datacenter1/rack0',
             '1.2.3.6' : '/datacenter1/rack0',

             '10.2.3.4' : '/datacenter2/rack0',
             '10.2.3.4' : '/datacenter2/rack0'
    }

if len(sys.argv)==1:
    print DEFAULT_RACK
else:
    print join([RACK_MAP.get(i, DEFAULT_RACK) for i in sys.argv[1:]]," ")

Reply via email to