I just got around configuring this in my hadoop-0.18.3 install and I can share my working topology script. Documentaion is a bit confusing on this matter, so I hope it would be helpful.
The script is called by namenode as datanotes first connect to it. It is passed an IP address of a datanode as a parameter. I do not particularly like hardcoded data in python code, perhaps the script could read this information from separate configuration file. Here is my script: #!/usr/bin/env python ''' This script used by hadoop to determine network/rack topology. It should be specified in hadoop-site.xml via topology.script.file.name Property. <property> <name>topology.script.file.name</name> <value>/home/hadoop/topology.py</value> </property> ''' import sys from string import join DEFAULT_RACK = '/default/rack0'; RACK_MAP = { '208.94.2.10' : '/datacenter1/rack0', '1.2.3.4' : '/datacenter1/rack0', '1.2.3.5' : '/datacenter1/rack0', '1.2.3.6' : '/datacenter1/rack0', '10.2.3.4' : '/datacenter2/rack0', '10.2.3.4' : '/datacenter2/rack0' } if len(sys.argv)==1: print DEFAULT_RACK else: print join([RACK_MAP.get(i, DEFAULT_RACK) for i in sys.argv[1:]]," ")