Re: rack awareness help

Christopher Tubbs Thu, 18 Mar 2010 22:01:02 -0700

You only specify the script on the namenode.
So, you could do something like:


#!/bin/bash
#rack_decider.sh

if [ $1 = "server1.mydomain" -o $1 = "192.168.0.1" ] ; then
  echo rack1
elif [ $1 = "server2.mydomain" -o $1 = "192.168.0.2" ] ; then
  echo rack1
elif [ $1 = "server3.mydomain" -o $1 = "192.168.0.3" ] ; then
  echo rack2
elif [ $1 = "server4.mydomain" -o $1 = "192.168.0.4" ] ; then
  echo rack2
else
  echo unknown_rack
fi
# EOF

Of course, this is by far the most basic script you could have (I'm
not sure why it wasn't offered as an example instead of a more
complicated one).

On Thu, Mar 18, 2010 at 8:41 PM, Mag Gam <magaw...@gmail.com> wrote:
> Chris:
>
> This clears up my questions a lot! Thankyou.
>
> So, if I have 4 data servers and I want 2 racks. I can do this
>
> #!/bin/bash
> #rack1.sh
> echo rack1
>
> #bin/bash
> #rack2.sh
> echo rack2
>
>
> So, I can do this for 2 servers
>
>
> <property>
>  <name>topology.script.file.name</name>
>  <value>rack1.sh</value>
> </property>
>
> And for the other 2 servers, I can do this:
>
>
> <property>
>  <name>topology.script.file.name</name>
>  <value>rack2.sh</value>
> </property>
>
>
> correct?
>
>
> On Thu, Mar 18, 2010 at 3:15 AM, Christopher Tubbs <ctubb...@gmail.com> wrote:
>> Hadoop will identify data nodes in your cluster by name and execute
>> your script with the data node as an argument. The expected output of
>> your script is the name of the rack on which it is located.
>>
>> The script you referenced takes the node name as an argument ($1), and
>> crawls through a separate file looking up that node in the left
>> column, and printing the value in the second column if it finds it.
>>
>> If you were to use this script, you would just create the topology
>> file that lists all your nodes by name/ip on the left and the rack
>> they are in on the right.
>>
>> On Wed, Mar 17, 2010 at 11:34 PM, Mag Gam <magaw...@gmail.com> wrote:
>>> Well,  I didn't really solve the problem. Now I have even more questions.
>>>
>>> I came across this script,
>>> http://wiki.apache.org/hadoop/topology_rack_awareness_scripts
>>>
>>> but it makes no sense to me! Can someone please try to explain what
>>> its trying to do?
>>>
>>>
>>> MikeThomas:
>>>
>>> Your script isn't working for me. I think there are some syntax
>>> errors. Is this how its supposed to look: http://pastebin.ca/1844287
>>>
>>> thanks
>>>
>>>
>>>
>>> On Thu, Mar 4, 2010 at 10:30 PM, Jeff Hammerbacher <ham...@cloudera.com> 
>>> wrote:
>>>> Hey Mag,
>>>>
>>>> Glad you have solved the problem. I've created a JIRA ticket to improve the
>>>> existing documentation: https://issues.apache.org/jira/browse/HADOOP-6616.
>>>> If you have some time, it would be useful to hear what could be added to 
>>>> the
>>>> existing documentation that would have helped you figure this out sooner.
>>>>
>>>> Thanks,
>>>> Jeff
>>>>
>>>> On Thu, Mar 4, 2010 at 3:39 PM, Mag Gam <magaw...@gmail.com> wrote:
>>>>
>>>>> Thanks everyone for explaining this to me instead of giving me RTFM!
>>>>>
>>>>> I will play around with it and see how far I get.
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Mar 4, 2010 at 9:21 AM, Steve Loughran <ste...@apache.org> wrote:
>>>>> > Allen Wittenauer wrote:
>>>>> >>
>>>>> >> On 3/3/10 5:01 PM, "Mag Gam" <magaw...@gmail.com> wrote:
>>>>> >>
>>>>> >>> Thanks Alan! Your presentation is very nice!
>>>>> >>
>>>>> >> Thanks. :)
>>>>> >>
>>>>> >>> "If you don't provide a script for rack awareness, it treats every
>>>>> >>> node as if it was its own rack". I am using the default settings and
>>>>> >>> the report still says only 1 rack.
>>>>> >>
>>>>> >> Let's take a different approach to convince you. :)
>>>>> >>
>>>>> >> Think about the question:  Is there a difference between all nodes in
>>>>> one
>>>>> >> rack vs. every node acting as a lone rack?
>>>>> >>
>>>>> >> The answer is no, there isn't any difference.  In both cases, all 
>>>>> >> copies
>>>>> >> of
>>>>> >> the blocks can go to pretty much any node. When a MR job runs, every
>>>>> node
>>>>> >> would either be considered 'off rack' or 'rack-local'.
>>>>> >>
>>>>> >> So there is no difference.
>>>>> >>
>>>>> >>
>>>>> >>> Do you mind sharing a script with us on how you determine a rack? and
>>>>> >>> a sample <configuration> </configuration> syntax?
>>>>> >>
>>>>> >> Michael has already posted his, so I'll skip this one. :)
>>>>> >>
>>>>> >
>>>>> > Think Mag probably wanted a shell script.
>>>>> >
>>>>> > Mag, give your machines IPv4 addresses that map to rack number. 10.1.1.*
>>>>> for
>>>>> > rack one, 10.1.2.* for rack 2, etc. Then just filter out the IP address
>>>>> by
>>>>> > the top bytes, returning "10.1.1" for everything in rack one, "10.1.2"
>>>>> for
>>>>> > rack 2; Hadoop will be happy
>>>>> >
>>>>>
>>>>
>>>
>>
>

Re: rack awareness help

Reply via email to