Hadoop will identify data nodes in your cluster by name and execute
your script with the data node as an argument. The expected output of
your script is the name of the rack on which it is located.

The script you referenced takes the node name as an argument ($1), and
crawls through a separate file looking up that node in the left
column, and printing the value in the second column if it finds it.

If you were to use this script, you would just create the topology
file that lists all your nodes by name/ip on the left and the rack
they are in on the right.

On Wed, Mar 17, 2010 at 11:34 PM, Mag Gam <magaw...@gmail.com> wrote:
> Well,  I didn't really solve the problem. Now I have even more questions.
>
> I came across this script,
> http://wiki.apache.org/hadoop/topology_rack_awareness_scripts
>
> but it makes no sense to me! Can someone please try to explain what
> its trying to do?
>
>
> MikeThomas:
>
> Your script isn't working for me. I think there are some syntax
> errors. Is this how its supposed to look: http://pastebin.ca/1844287
>
> thanks
>
>
>
> On Thu, Mar 4, 2010 at 10:30 PM, Jeff Hammerbacher <ham...@cloudera.com> 
> wrote:
>> Hey Mag,
>>
>> Glad you have solved the problem. I've created a JIRA ticket to improve the
>> existing documentation: https://issues.apache.org/jira/browse/HADOOP-6616.
>> If you have some time, it would be useful to hear what could be added to the
>> existing documentation that would have helped you figure this out sooner.
>>
>> Thanks,
>> Jeff
>>
>> On Thu, Mar 4, 2010 at 3:39 PM, Mag Gam <magaw...@gmail.com> wrote:
>>
>>> Thanks everyone for explaining this to me instead of giving me RTFM!
>>>
>>> I will play around with it and see how far I get.
>>>
>>>
>>>
>>> On Thu, Mar 4, 2010 at 9:21 AM, Steve Loughran <ste...@apache.org> wrote:
>>> > Allen Wittenauer wrote:
>>> >>
>>> >> On 3/3/10 5:01 PM, "Mag Gam" <magaw...@gmail.com> wrote:
>>> >>
>>> >>> Thanks Alan! Your presentation is very nice!
>>> >>
>>> >> Thanks. :)
>>> >>
>>> >>> "If you don't provide a script for rack awareness, it treats every
>>> >>> node as if it was its own rack". I am using the default settings and
>>> >>> the report still says only 1 rack.
>>> >>
>>> >> Let's take a different approach to convince you. :)
>>> >>
>>> >> Think about the question:  Is there a difference between all nodes in
>>> one
>>> >> rack vs. every node acting as a lone rack?
>>> >>
>>> >> The answer is no, there isn't any difference.  In both cases, all copies
>>> >> of
>>> >> the blocks can go to pretty much any node. When a MR job runs, every
>>> node
>>> >> would either be considered 'off rack' or 'rack-local'.
>>> >>
>>> >> So there is no difference.
>>> >>
>>> >>
>>> >>> Do you mind sharing a script with us on how you determine a rack? and
>>> >>> a sample <configuration> </configuration> syntax?
>>> >>
>>> >> Michael has already posted his, so I'll skip this one. :)
>>> >>
>>> >
>>> > Think Mag probably wanted a shell script.
>>> >
>>> > Mag, give your machines IPv4 addresses that map to rack number. 10.1.1.*
>>> for
>>> > rack one, 10.1.2.* for rack 2, etc. Then just filter out the IP address
>>> by
>>> > the top bytes, returning "10.1.1" for everything in rack one, "10.1.2"
>>> for
>>> > rack 2; Hadoop will be happy
>>> >
>>>
>>
>

Reply via email to