Re: rack awareness help

2010-03-20 Thread Michael Thomas

On 03/20/2010 05:55 AM, Mag Gam wrote:

Thanks!

I managed to write a script which will give me rack01, rack02
depending on the ip address.

Now, how would I check how many racks there are in my cluster? That
not documented anywhere.


'hadoop fsck /' will report the # of racks at the end:

[...]
 Number of data-nodes:  163
 Number of racks:   12

You can verify the IP-to-rack mappings with 'hadoop dfsadmin -report':
[...]
Name: 10.3.255.144:50010
Rack: /Rack12
[...]
Name: 10.3.255.62:50010
Rack: /Rack16

--Mike


My intention is after I get all this working is to submit a bug report
for the documentation team so they can fix this.



On Fri, Mar 19, 2010 at 10:13 AM, Allen Wittenauer
  wrote:




On 3/19/10 4:32 AM, "Mag Gam"  wrote:


Thanks everyone. I think everyone can agree that this part of the
documentation is lacking for hadoop.

Can someone please provide be a use case, for example:

#server 1
Input>  script.sh
Output>  rack01

#server 2
Input>  script.sh
Output>  rack02


I think you have it in your head that the NameNode asks the DataNode what
rack it is.  This is completely backwards.  The DataNode has *no* concept of
what a rack is.  It is purely a storage process.  There isn't much logic in
it at all.

The topology script is *ONLY* run by the NameNode and JobTracker processes.
That's it.  It is not run on the compute nodes.  That setting is completely
*ignored* by the DataNode and TaskTracker processes.

So to rewrite your use case:

# NameNode
Input>  server 1
Output>  rack01

# NameNode
Input>  server 2
Output>  rack02


Is this how its supposed to work? I am bad with bash so I am trying to
understand the logic so I can implement it with another language such
as tcl



The program logic is :

Input ->  IP address or Hostname
Output ->  /racklocation

That's it.  There is nothing fancy going on here.







smime.p7s
Description: S/MIME Cryptographic Signature


Re: rack awareness help

2010-03-20 Thread Christopher Tubbs
I think hadoop's fsck will report the number of racks.

On Sat, Mar 20, 2010 at 8:55 AM, Mag Gam  wrote:
> Thanks!
>
> I managed to write a script which will give me rack01, rack02
> depending on the ip address.
>
> Now, how would I check how many racks there are in my cluster? That
> not documented anywhere.
>
> My intention is after I get all this working is to submit a bug report
> for the documentation team so they can fix this.
>
>
>
> On Fri, Mar 19, 2010 at 10:13 AM, Allen Wittenauer
>  wrote:
>>
>>
>>
>> On 3/19/10 4:32 AM, "Mag Gam"  wrote:
>>
>>> Thanks everyone. I think everyone can agree that this part of the
>>> documentation is lacking for hadoop.
>>>
>>> Can someone please provide be a use case, for example:
>>>
>>> #server 1
>>> Input > script.sh
>>> Output > rack01
>>>
>>> #server 2
>>> Input > script.sh
>>> Output > rack02
>>
>> I think you have it in your head that the NameNode asks the DataNode what
>> rack it is.  This is completely backwards.  The DataNode has *no* concept of
>> what a rack is.  It is purely a storage process.  There isn't much logic in
>> it at all.
>>
>> The topology script is *ONLY* run by the NameNode and JobTracker processes.
>> That's it.  It is not run on the compute nodes.  That setting is completely
>> *ignored* by the DataNode and TaskTracker processes.
>>
>> So to rewrite your use case:
>>
>> # NameNode
>> Input > server 1
>> Output > rack01
>>
>> # NameNode
>> Input > server 2
>> Output > rack02
>>
>>> Is this how its supposed to work? I am bad with bash so I am trying to
>>> understand the logic so I can implement it with another language such
>>> as tcl
>>
>>
>> The program logic is :
>>
>> Input -> IP address or Hostname
>> Output -> /racklocation
>>
>> That's it.  There is nothing fancy going on here.
>>
>>
>


Re: rack awareness help

2010-03-20 Thread Mag Gam
Thanks!

I managed to write a script which will give me rack01, rack02
depending on the ip address.

Now, how would I check how many racks there are in my cluster? That
not documented anywhere.

My intention is after I get all this working is to submit a bug report
for the documentation team so they can fix this.



On Fri, Mar 19, 2010 at 10:13 AM, Allen Wittenauer
 wrote:
>
>
>
> On 3/19/10 4:32 AM, "Mag Gam"  wrote:
>
>> Thanks everyone. I think everyone can agree that this part of the
>> documentation is lacking for hadoop.
>>
>> Can someone please provide be a use case, for example:
>>
>> #server 1
>> Input > script.sh
>> Output > rack01
>>
>> #server 2
>> Input > script.sh
>> Output > rack02
>
> I think you have it in your head that the NameNode asks the DataNode what
> rack it is.  This is completely backwards.  The DataNode has *no* concept of
> what a rack is.  It is purely a storage process.  There isn't much logic in
> it at all.
>
> The topology script is *ONLY* run by the NameNode and JobTracker processes.
> That's it.  It is not run on the compute nodes.  That setting is completely
> *ignored* by the DataNode and TaskTracker processes.
>
> So to rewrite your use case:
>
> # NameNode
> Input > server 1
> Output > rack01
>
> # NameNode
> Input > server 2
> Output > rack02
>
>> Is this how its supposed to work? I am bad with bash so I am trying to
>> understand the logic so I can implement it with another language such
>> as tcl
>
>
> The program logic is :
>
> Input -> IP address or Hostname
> Output -> /racklocation
>
> That's it.  There is nothing fancy going on here.
>
>


Re: rack awareness help

2010-03-19 Thread Allen Wittenauer



On 3/19/10 4:32 AM, "Mag Gam"  wrote:

> Thanks everyone. I think everyone can agree that this part of the
> documentation is lacking for hadoop.
>
> Can someone please provide be a use case, for example:
> 
> #server 1
> Input > script.sh
> Output > rack01
> 
> #server 2
> Input > script.sh
> Output > rack02

I think you have it in your head that the NameNode asks the DataNode what
rack it is.  This is completely backwards.  The DataNode has *no* concept of
what a rack is.  It is purely a storage process.  There isn't much logic in
it at all.

The topology script is *ONLY* run by the NameNode and JobTracker processes.
That's it.  It is not run on the compute nodes.  That setting is completely
*ignored* by the DataNode and TaskTracker processes.

So to rewrite your use case:

# NameNode 
Input > server 1
Output > rack01

# NameNode
Input > server 2
Output > rack02

> Is this how its supposed to work? I am bad with bash so I am trying to
> understand the logic so I can implement it with another language such
> as tcl


The program logic is :

Input -> IP address or Hostname
Output -> /racklocation

That's it.  There is nothing fancy going on here.  



Re: rack awareness help

2010-03-19 Thread Christopher Tubbs
More like the following (shown with the bash prompt). You could type
this for testing. However, ultimately, hadoop itself will actually be
executing this script and reading its output.

$ ./script.sh server1.mydomain
rack1
$ ./script.sh server2.mydomain
rack1
$ ./script.sh server3.mydomain
rack2
$ ./script.sh server4.mydomain
rack2

On Fri, Mar 19, 2010 at 7:32 AM, Mag Gam  wrote:
> Thanks everyone. I think everyone can agree that this part of the
> documentation is lacking for hadoop.
>
> Can someone please provide be a use case, for example:
>
> #server 1
> Input > script.sh
> Output > rack01
>
> #server 2
> Input > script.sh
> Output > rack02
>
>
> Is this how its supposed to work? I am bad with bash so I am trying to
> understand the logic so I can implement it with another language such
> as tcl
>
>
> On Fri, Mar 19, 2010 at 1:00 AM, Christopher Tubbs  wrote:
>> You only specify the script on the namenode.
>> So, you could do something like:
>>
>> #!/bin/bash
>> #rack_decider.sh
>>
>> if [ $1 = "server1.mydomain" -o $1 = "192.168.0.1" ] ; then
>>  echo rack1
>> elif [ $1 = "server2.mydomain" -o $1 = "192.168.0.2" ] ; then
>>  echo rack1
>> elif [ $1 = "server3.mydomain" -o $1 = "192.168.0.3" ] ; then
>>  echo rack2
>> elif [ $1 = "server4.mydomain" -o $1 = "192.168.0.4" ] ; then
>>  echo rack2
>> else
>>  echo unknown_rack
>> fi
>> # EOF
>>
>> Of course, this is by far the most basic script you could have (I'm
>> not sure why it wasn't offered as an example instead of a more
>> complicated one).
>>
>> On Thu, Mar 18, 2010 at 8:41 PM, Mag Gam  wrote:
>>> Chris:
>>>
>>> This clears up my questions a lot! Thankyou.
>>>
>>> So, if I have 4 data servers and I want 2 racks. I can do this
>>>
>>> #!/bin/bash
>>> #rack1.sh
>>> echo rack1
>>>
>>> #bin/bash
>>> #rack2.sh
>>> echo rack2
>>>
>>>
>>> So, I can do this for 2 servers
>>>
>>>
>>> 
>>>  topology.script.file.name
>>>  rack1.sh
>>> 
>>>
>>> And for the other 2 servers, I can do this:
>>>
>>>
>>> 
>>>  topology.script.file.name
>>>  rack2.sh
>>> 
>>>
>>>
>>> correct?
>>>
>>>
>>> On Thu, Mar 18, 2010 at 3:15 AM, Christopher Tubbs  
>>> wrote:
 Hadoop will identify data nodes in your cluster by name and execute
 your script with the data node as an argument. The expected output of
 your script is the name of the rack on which it is located.

 The script you referenced takes the node name as an argument ($1), and
 crawls through a separate file looking up that node in the left
 column, and printing the value in the second column if it finds it.

 If you were to use this script, you would just create the topology
 file that lists all your nodes by name/ip on the left and the rack
 they are in on the right.

 On Wed, Mar 17, 2010 at 11:34 PM, Mag Gam  wrote:
> Well,  I didn't really solve the problem. Now I have even more questions.
>
> I came across this script,
> http://wiki.apache.org/hadoop/topology_rack_awareness_scripts
>
> but it makes no sense to me! Can someone please try to explain what
> its trying to do?
>
>
> MikeThomas:
>
> Your script isn't working for me. I think there are some syntax
> errors. Is this how its supposed to look: http://pastebin.ca/1844287
>
> thanks
>
>
>
> On Thu, Mar 4, 2010 at 10:30 PM, Jeff Hammerbacher  
> wrote:
>> Hey Mag,
>>
>> Glad you have solved the problem. I've created a JIRA ticket to improve 
>> the
>> existing documentation: 
>> https://issues.apache.org/jira/browse/HADOOP-6616.
>> If you have some time, it would be useful to hear what could be added to 
>> the
>> existing documentation that would have helped you figure this out sooner.
>>
>> Thanks,
>> Jeff
>>
>> On Thu, Mar 4, 2010 at 3:39 PM, Mag Gam  wrote:
>>
>>> Thanks everyone for explaining this to me instead of giving me RTFM!
>>>
>>> I will play around with it and see how far I get.
>>>
>>>
>>>
>>> On Thu, Mar 4, 2010 at 9:21 AM, Steve Loughran  
>>> wrote:
>>> > Allen Wittenauer wrote:
>>> >>
>>> >> On 3/3/10 5:01 PM, "Mag Gam"  wrote:
>>> >>
>>> >>> Thanks Alan! Your presentation is very nice!
>>> >>
>>> >> Thanks. :)
>>> >>
>>> >>> "If you don't provide a script for rack awareness, it treats every
>>> >>> node as if it was its own rack". I am using the default settings and
>>> >>> the report still says only 1 rack.
>>> >>
>>> >> Let's take a different approach to convince you. :)
>>> >>
>>> >> Think about the question:  Is there a difference between all nodes in
>>> one
>>> >> rack vs. every node acting as a lone rack?
>>> >>
>>> >> The answer is no, there isn't any difference.  In both cases, all 
>>> >> copies
>>> >> of
>>> >> the blocks can go to pretty much any node. When a MR job runs, every
>>> node
>>> >> would either be conside

Re: rack awareness help

2010-03-19 Thread Mag Gam
Thanks everyone. I think everyone can agree that this part of the
documentation is lacking for hadoop.

Can someone please provide be a use case, for example:

#server 1
Input > script.sh
Output > rack01

#server 2
Input > script.sh
Output > rack02


Is this how its supposed to work? I am bad with bash so I am trying to
understand the logic so I can implement it with another language such
as tcl


On Fri, Mar 19, 2010 at 1:00 AM, Christopher Tubbs  wrote:
> You only specify the script on the namenode.
> So, you could do something like:
>
> #!/bin/bash
> #rack_decider.sh
>
> if [ $1 = "server1.mydomain" -o $1 = "192.168.0.1" ] ; then
>  echo rack1
> elif [ $1 = "server2.mydomain" -o $1 = "192.168.0.2" ] ; then
>  echo rack1
> elif [ $1 = "server3.mydomain" -o $1 = "192.168.0.3" ] ; then
>  echo rack2
> elif [ $1 = "server4.mydomain" -o $1 = "192.168.0.4" ] ; then
>  echo rack2
> else
>  echo unknown_rack
> fi
> # EOF
>
> Of course, this is by far the most basic script you could have (I'm
> not sure why it wasn't offered as an example instead of a more
> complicated one).
>
> On Thu, Mar 18, 2010 at 8:41 PM, Mag Gam  wrote:
>> Chris:
>>
>> This clears up my questions a lot! Thankyou.
>>
>> So, if I have 4 data servers and I want 2 racks. I can do this
>>
>> #!/bin/bash
>> #rack1.sh
>> echo rack1
>>
>> #bin/bash
>> #rack2.sh
>> echo rack2
>>
>>
>> So, I can do this for 2 servers
>>
>>
>> 
>>  topology.script.file.name
>>  rack1.sh
>> 
>>
>> And for the other 2 servers, I can do this:
>>
>>
>> 
>>  topology.script.file.name
>>  rack2.sh
>> 
>>
>>
>> correct?
>>
>>
>> On Thu, Mar 18, 2010 at 3:15 AM, Christopher Tubbs  
>> wrote:
>>> Hadoop will identify data nodes in your cluster by name and execute
>>> your script with the data node as an argument. The expected output of
>>> your script is the name of the rack on which it is located.
>>>
>>> The script you referenced takes the node name as an argument ($1), and
>>> crawls through a separate file looking up that node in the left
>>> column, and printing the value in the second column if it finds it.
>>>
>>> If you were to use this script, you would just create the topology
>>> file that lists all your nodes by name/ip on the left and the rack
>>> they are in on the right.
>>>
>>> On Wed, Mar 17, 2010 at 11:34 PM, Mag Gam  wrote:
 Well,  I didn't really solve the problem. Now I have even more questions.

 I came across this script,
 http://wiki.apache.org/hadoop/topology_rack_awareness_scripts

 but it makes no sense to me! Can someone please try to explain what
 its trying to do?


 MikeThomas:

 Your script isn't working for me. I think there are some syntax
 errors. Is this how its supposed to look: http://pastebin.ca/1844287

 thanks



 On Thu, Mar 4, 2010 at 10:30 PM, Jeff Hammerbacher  
 wrote:
> Hey Mag,
>
> Glad you have solved the problem. I've created a JIRA ticket to improve 
> the
> existing documentation: https://issues.apache.org/jira/browse/HADOOP-6616.
> If you have some time, it would be useful to hear what could be added to 
> the
> existing documentation that would have helped you figure this out sooner.
>
> Thanks,
> Jeff
>
> On Thu, Mar 4, 2010 at 3:39 PM, Mag Gam  wrote:
>
>> Thanks everyone for explaining this to me instead of giving me RTFM!
>>
>> I will play around with it and see how far I get.
>>
>>
>>
>> On Thu, Mar 4, 2010 at 9:21 AM, Steve Loughran  wrote:
>> > Allen Wittenauer wrote:
>> >>
>> >> On 3/3/10 5:01 PM, "Mag Gam"  wrote:
>> >>
>> >>> Thanks Alan! Your presentation is very nice!
>> >>
>> >> Thanks. :)
>> >>
>> >>> "If you don't provide a script for rack awareness, it treats every
>> >>> node as if it was its own rack". I am using the default settings and
>> >>> the report still says only 1 rack.
>> >>
>> >> Let's take a different approach to convince you. :)
>> >>
>> >> Think about the question:  Is there a difference between all nodes in
>> one
>> >> rack vs. every node acting as a lone rack?
>> >>
>> >> The answer is no, there isn't any difference.  In both cases, all 
>> >> copies
>> >> of
>> >> the blocks can go to pretty much any node. When a MR job runs, every
>> node
>> >> would either be considered 'off rack' or 'rack-local'.
>> >>
>> >> So there is no difference.
>> >>
>> >>
>> >>> Do you mind sharing a script with us on how you determine a rack? and
>> >>> a sample   syntax?
>> >>
>> >> Michael has already posted his, so I'll skip this one. :)
>> >>
>> >
>> > Think Mag probably wanted a shell script.
>> >
>> > Mag, give your machines IPv4 addresses that map to rack number. 
>> > 10.1.1.*
>> for
>> > rack one, 10.1.2.* for rack 2, etc. Then just filter out the IP address
>> by
>> > the

Re: rack awareness help

2010-03-18 Thread Christopher Tubbs
You only specify the script on the namenode.
So, you could do something like:

#!/bin/bash
#rack_decider.sh

if [ $1 = "server1.mydomain" -o $1 = "192.168.0.1" ] ; then
  echo rack1
elif [ $1 = "server2.mydomain" -o $1 = "192.168.0.2" ] ; then
  echo rack1
elif [ $1 = "server3.mydomain" -o $1 = "192.168.0.3" ] ; then
  echo rack2
elif [ $1 = "server4.mydomain" -o $1 = "192.168.0.4" ] ; then
  echo rack2
else
  echo unknown_rack
fi
# EOF

Of course, this is by far the most basic script you could have (I'm
not sure why it wasn't offered as an example instead of a more
complicated one).

On Thu, Mar 18, 2010 at 8:41 PM, Mag Gam  wrote:
> Chris:
>
> This clears up my questions a lot! Thankyou.
>
> So, if I have 4 data servers and I want 2 racks. I can do this
>
> #!/bin/bash
> #rack1.sh
> echo rack1
>
> #bin/bash
> #rack2.sh
> echo rack2
>
>
> So, I can do this for 2 servers
>
>
> 
>  topology.script.file.name
>  rack1.sh
> 
>
> And for the other 2 servers, I can do this:
>
>
> 
>  topology.script.file.name
>  rack2.sh
> 
>
>
> correct?
>
>
> On Thu, Mar 18, 2010 at 3:15 AM, Christopher Tubbs  wrote:
>> Hadoop will identify data nodes in your cluster by name and execute
>> your script with the data node as an argument. The expected output of
>> your script is the name of the rack on which it is located.
>>
>> The script you referenced takes the node name as an argument ($1), and
>> crawls through a separate file looking up that node in the left
>> column, and printing the value in the second column if it finds it.
>>
>> If you were to use this script, you would just create the topology
>> file that lists all your nodes by name/ip on the left and the rack
>> they are in on the right.
>>
>> On Wed, Mar 17, 2010 at 11:34 PM, Mag Gam  wrote:
>>> Well,  I didn't really solve the problem. Now I have even more questions.
>>>
>>> I came across this script,
>>> http://wiki.apache.org/hadoop/topology_rack_awareness_scripts
>>>
>>> but it makes no sense to me! Can someone please try to explain what
>>> its trying to do?
>>>
>>>
>>> MikeThomas:
>>>
>>> Your script isn't working for me. I think there are some syntax
>>> errors. Is this how its supposed to look: http://pastebin.ca/1844287
>>>
>>> thanks
>>>
>>>
>>>
>>> On Thu, Mar 4, 2010 at 10:30 PM, Jeff Hammerbacher  
>>> wrote:
 Hey Mag,

 Glad you have solved the problem. I've created a JIRA ticket to improve the
 existing documentation: https://issues.apache.org/jira/browse/HADOOP-6616.
 If you have some time, it would be useful to hear what could be added to 
 the
 existing documentation that would have helped you figure this out sooner.

 Thanks,
 Jeff

 On Thu, Mar 4, 2010 at 3:39 PM, Mag Gam  wrote:

> Thanks everyone for explaining this to me instead of giving me RTFM!
>
> I will play around with it and see how far I get.
>
>
>
> On Thu, Mar 4, 2010 at 9:21 AM, Steve Loughran  wrote:
> > Allen Wittenauer wrote:
> >>
> >> On 3/3/10 5:01 PM, "Mag Gam"  wrote:
> >>
> >>> Thanks Alan! Your presentation is very nice!
> >>
> >> Thanks. :)
> >>
> >>> "If you don't provide a script for rack awareness, it treats every
> >>> node as if it was its own rack". I am using the default settings and
> >>> the report still says only 1 rack.
> >>
> >> Let's take a different approach to convince you. :)
> >>
> >> Think about the question:  Is there a difference between all nodes in
> one
> >> rack vs. every node acting as a lone rack?
> >>
> >> The answer is no, there isn't any difference.  In both cases, all 
> >> copies
> >> of
> >> the blocks can go to pretty much any node. When a MR job runs, every
> node
> >> would either be considered 'off rack' or 'rack-local'.
> >>
> >> So there is no difference.
> >>
> >>
> >>> Do you mind sharing a script with us on how you determine a rack? and
> >>> a sample   syntax?
> >>
> >> Michael has already posted his, so I'll skip this one. :)
> >>
> >
> > Think Mag probably wanted a shell script.
> >
> > Mag, give your machines IPv4 addresses that map to rack number. 10.1.1.*
> for
> > rack one, 10.1.2.* for rack 2, etc. Then just filter out the IP address
> by
> > the top bytes, returning "10.1.1" for everything in rack one, "10.1.2"
> for
> > rack 2; Hadoop will be happy
> >
>

>>>
>>
>


Re: rack awareness help

2010-03-18 Thread Michael Thomas

On 03/18/2010 06:21 PM, Michael Thomas wrote:

On 03/17/2010 08:34 PM, Mag Gam wrote:

Well, I didn't really solve the problem. Now I have even more questions.

I came across this script,
http://wiki.apache.org/hadoop/topology_rack_awareness_scripts

but it makes no sense to me! Can someone please try to explain what
its trying to do?


MikeThomas:

Your script isn't working for me. I think there are some syntax
errors. Is this how its supposed to look: http://pastebin.ca/1844287


Not quite. A couple of lines got incorrectly wrapped. It should look
like this:

http://pastebin.ca/1845286


One more miswrapped line.  I hate auto-wrapping...

http://pastebin.ca/1845290

--Mike



--Mike


On Thu, Mar 4, 2010 at 10:30 PM, Jeff
Hammerbacher wrote:

Hey Mag,

Glad you have solved the problem. I've created a JIRA ticket to
improve the
existing documentation:
https://issues.apache.org/jira/browse/HADOOP-6616.
If you have some time, it would be useful to hear what could be added
to the
existing documentation that would have helped you figure this out
sooner.

Thanks,
Jeff

On Thu, Mar 4, 2010 at 3:39 PM, Mag Gam wrote:


Thanks everyone for explaining this to me instead of giving me RTFM!

I will play around with it and see how far I get.



On Thu, Mar 4, 2010 at 9:21 AM, Steve Loughran
wrote:

Allen Wittenauer wrote:


On 3/3/10 5:01 PM, "Mag Gam" wrote:


Thanks Alan! Your presentation is very nice!


Thanks. :)


"If you don't provide a script for rack awareness, it treats every
node as if it was its own rack". I am using the default settings and
the report still says only 1 rack.


Let's take a different approach to convince you. :)

Think about the question: Is there a difference between all nodes in

one

rack vs. every node acting as a lone rack?

The answer is no, there isn't any difference. In both cases, all
copies
of
the blocks can go to pretty much any node. When a MR job runs, every

node

would either be considered 'off rack' or 'rack-local'.

So there is no difference.



Do you mind sharing a script with us on how you determine a rack?
and
a sample  syntax?


Michael has already posted his, so I'll skip this one. :)



Think Mag probably wanted a shell script.

Mag, give your machines IPv4 addresses that map to rack number.
10.1.1.*

for

rack one, 10.1.2.* for rack 2, etc. Then just filter out the IP
address

by

the top bytes, returning "10.1.1" for everything in rack one, "10.1.2"

for

rack 2; Hadoop will be happy













smime.p7s
Description: S/MIME Cryptographic Signature


Re: rack awareness help

2010-03-18 Thread Michael Thomas

On 03/17/2010 08:34 PM, Mag Gam wrote:

Well,  I didn't really solve the problem. Now I have even more questions.

I came across this script,
http://wiki.apache.org/hadoop/topology_rack_awareness_scripts

but it makes no sense to me! Can someone please try to explain what
its trying to do?


MikeThomas:

Your script isn't working for me. I think there are some syntax
errors. Is this how its supposed to look: http://pastebin.ca/1844287


Not quite.  A couple of lines got incorrectly wrapped.  It should look 
like this:


http://pastebin.ca/1845286

--Mike


On Thu, Mar 4, 2010 at 10:30 PM, Jeff Hammerbacher  wrote:

Hey Mag,

Glad you have solved the problem. I've created a JIRA ticket to improve the
existing documentation: https://issues.apache.org/jira/browse/HADOOP-6616.
If you have some time, it would be useful to hear what could be added to the
existing documentation that would have helped you figure this out sooner.

Thanks,
Jeff

On Thu, Mar 4, 2010 at 3:39 PM, Mag Gam  wrote:


Thanks everyone for explaining this to me instead of giving me RTFM!

I will play around with it and see how far I get.



On Thu, Mar 4, 2010 at 9:21 AM, Steve Loughran  wrote:

Allen Wittenauer wrote:


On 3/3/10 5:01 PM, "Mag Gam"  wrote:


Thanks Alan! Your presentation is very nice!


Thanks. :)


"If you don't provide a script for rack awareness, it treats every
node as if it was its own rack". I am using the default settings and
the report still says only 1 rack.


Let's take a different approach to convince you. :)

Think about the question:  Is there a difference between all nodes in

one

rack vs. every node acting as a lone rack?

The answer is no, there isn't any difference.  In both cases, all copies
of
the blocks can go to pretty much any node. When a MR job runs, every

node

would either be considered 'off rack' or 'rack-local'.

So there is no difference.



Do you mind sharing a script with us on how you determine a rack? and
a samplesyntax?


Michael has already posted his, so I'll skip this one. :)



Think Mag probably wanted a shell script.

Mag, give your machines IPv4 addresses that map to rack number. 10.1.1.*

for

rack one, 10.1.2.* for rack 2, etc. Then just filter out the IP address

by

the top bytes, returning "10.1.1" for everything in rack one, "10.1.2"

for

rack 2; Hadoop will be happy










smime.p7s
Description: S/MIME Cryptographic Signature


Re: rack awareness help

2010-03-18 Thread Michael Thomas

On 03/18/2010 05:41 PM, Mag Gam wrote:

Chris:

This clears up my questions a lot! Thankyou.

So, if I have 4 data servers and I want 2 racks. I can do this

#!/bin/bash
#rack1.sh
echo rack1

#bin/bash
#rack2.sh
echo rack2


So, I can do this for 2 servers



  topology.script.file.name
  rack1.sh


And for the other 2 servers, I can do this:



  topology.script.file.name
  rack2.sh



correct?


Incorrect.  You only specify a single topology.script.file.name on the 
namenode.  This attribute is ignored on the datanodes.


Also be aware that the namenode invokes the script with the _ip address_ 
of each datanode, not the hostname of each datanode.  If you need to 
convert the ip address to a hostname in order to determine which rack 
the datanode is on, then you have to put that conversion logic in your 
topology.script.file.name.


--Mike


On Thu, Mar 18, 2010 at 3:15 AM, Christopher Tubbs  wrote:

Hadoop will identify data nodes in your cluster by name and execute
your script with the data node as an argument. The expected output of
your script is the name of the rack on which it is located.

The script you referenced takes the node name as an argument ($1), and
crawls through a separate file looking up that node in the left
column, and printing the value in the second column if it finds it.

If you were to use this script, you would just create the topology
file that lists all your nodes by name/ip on the left and the rack
they are in on the right.

On Wed, Mar 17, 2010 at 11:34 PM, Mag Gam  wrote:

Well,  I didn't really solve the problem. Now I have even more questions.

I came across this script,
http://wiki.apache.org/hadoop/topology_rack_awareness_scripts

but it makes no sense to me! Can someone please try to explain what
its trying to do?


MikeThomas:

Your script isn't working for me. I think there are some syntax
errors. Is this how its supposed to look: http://pastebin.ca/1844287

thanks



On Thu, Mar 4, 2010 at 10:30 PM, Jeff Hammerbacher  wrote:

Hey Mag,

Glad you have solved the problem. I've created a JIRA ticket to improve the
existing documentation: https://issues.apache.org/jira/browse/HADOOP-6616.
If you have some time, it would be useful to hear what could be added to the
existing documentation that would have helped you figure this out sooner.

Thanks,
Jeff

On Thu, Mar 4, 2010 at 3:39 PM, Mag Gam  wrote:


Thanks everyone for explaining this to me instead of giving me RTFM!

I will play around with it and see how far I get.



On Thu, Mar 4, 2010 at 9:21 AM, Steve Loughran  wrote:

Allen Wittenauer wrote:


On 3/3/10 5:01 PM, "Mag Gam"  wrote:


Thanks Alan! Your presentation is very nice!


Thanks. :)


"If you don't provide a script for rack awareness, it treats every
node as if it was its own rack". I am using the default settings and
the report still says only 1 rack.


Let's take a different approach to convince you. :)

Think about the question:  Is there a difference between all nodes in

one

rack vs. every node acting as a lone rack?

The answer is no, there isn't any difference.  In both cases, all copies
of
the blocks can go to pretty much any node. When a MR job runs, every

node

would either be considered 'off rack' or 'rack-local'.

So there is no difference.



Do you mind sharing a script with us on how you determine a rack? and
a samplesyntax?


Michael has already posted his, so I'll skip this one. :)



Think Mag probably wanted a shell script.

Mag, give your machines IPv4 addresses that map to rack number. 10.1.1.*

for

rack one, 10.1.2.* for rack 2, etc. Then just filter out the IP address

by

the top bytes, returning "10.1.1" for everything in rack one, "10.1.2"

for

rack 2; Hadoop will be happy














smime.p7s
Description: S/MIME Cryptographic Signature


Re: rack awareness help

2010-03-18 Thread Mag Gam
Chris:

This clears up my questions a lot! Thankyou.

So, if I have 4 data servers and I want 2 racks. I can do this

#!/bin/bash
#rack1.sh
echo rack1

#bin/bash
#rack2.sh
echo rack2


So, I can do this for 2 servers



 topology.script.file.name
 rack1.sh


And for the other 2 servers, I can do this:



 topology.script.file.name
 rack2.sh



correct?


On Thu, Mar 18, 2010 at 3:15 AM, Christopher Tubbs  wrote:
> Hadoop will identify data nodes in your cluster by name and execute
> your script with the data node as an argument. The expected output of
> your script is the name of the rack on which it is located.
>
> The script you referenced takes the node name as an argument ($1), and
> crawls through a separate file looking up that node in the left
> column, and printing the value in the second column if it finds it.
>
> If you were to use this script, you would just create the topology
> file that lists all your nodes by name/ip on the left and the rack
> they are in on the right.
>
> On Wed, Mar 17, 2010 at 11:34 PM, Mag Gam  wrote:
>> Well,  I didn't really solve the problem. Now I have even more questions.
>>
>> I came across this script,
>> http://wiki.apache.org/hadoop/topology_rack_awareness_scripts
>>
>> but it makes no sense to me! Can someone please try to explain what
>> its trying to do?
>>
>>
>> MikeThomas:
>>
>> Your script isn't working for me. I think there are some syntax
>> errors. Is this how its supposed to look: http://pastebin.ca/1844287
>>
>> thanks
>>
>>
>>
>> On Thu, Mar 4, 2010 at 10:30 PM, Jeff Hammerbacher  
>> wrote:
>>> Hey Mag,
>>>
>>> Glad you have solved the problem. I've created a JIRA ticket to improve the
>>> existing documentation: https://issues.apache.org/jira/browse/HADOOP-6616.
>>> If you have some time, it would be useful to hear what could be added to the
>>> existing documentation that would have helped you figure this out sooner.
>>>
>>> Thanks,
>>> Jeff
>>>
>>> On Thu, Mar 4, 2010 at 3:39 PM, Mag Gam  wrote:
>>>
 Thanks everyone for explaining this to me instead of giving me RTFM!

 I will play around with it and see how far I get.



 On Thu, Mar 4, 2010 at 9:21 AM, Steve Loughran  wrote:
 > Allen Wittenauer wrote:
 >>
 >> On 3/3/10 5:01 PM, "Mag Gam"  wrote:
 >>
 >>> Thanks Alan! Your presentation is very nice!
 >>
 >> Thanks. :)
 >>
 >>> "If you don't provide a script for rack awareness, it treats every
 >>> node as if it was its own rack". I am using the default settings and
 >>> the report still says only 1 rack.
 >>
 >> Let's take a different approach to convince you. :)
 >>
 >> Think about the question:  Is there a difference between all nodes in
 one
 >> rack vs. every node acting as a lone rack?
 >>
 >> The answer is no, there isn't any difference.  In both cases, all copies
 >> of
 >> the blocks can go to pretty much any node. When a MR job runs, every
 node
 >> would either be considered 'off rack' or 'rack-local'.
 >>
 >> So there is no difference.
 >>
 >>
 >>> Do you mind sharing a script with us on how you determine a rack? and
 >>> a sample   syntax?
 >>
 >> Michael has already posted his, so I'll skip this one. :)
 >>
 >
 > Think Mag probably wanted a shell script.
 >
 > Mag, give your machines IPv4 addresses that map to rack number. 10.1.1.*
 for
 > rack one, 10.1.2.* for rack 2, etc. Then just filter out the IP address
 by
 > the top bytes, returning "10.1.1" for everything in rack one, "10.1.2"
 for
 > rack 2; Hadoop will be happy
 >

>>>
>>
>


Re: rack awareness help

2010-03-18 Thread Christopher Tubbs
Hadoop will identify data nodes in your cluster by name and execute
your script with the data node as an argument. The expected output of
your script is the name of the rack on which it is located.

The script you referenced takes the node name as an argument ($1), and
crawls through a separate file looking up that node in the left
column, and printing the value in the second column if it finds it.

If you were to use this script, you would just create the topology
file that lists all your nodes by name/ip on the left and the rack
they are in on the right.

On Wed, Mar 17, 2010 at 11:34 PM, Mag Gam  wrote:
> Well,  I didn't really solve the problem. Now I have even more questions.
>
> I came across this script,
> http://wiki.apache.org/hadoop/topology_rack_awareness_scripts
>
> but it makes no sense to me! Can someone please try to explain what
> its trying to do?
>
>
> MikeThomas:
>
> Your script isn't working for me. I think there are some syntax
> errors. Is this how its supposed to look: http://pastebin.ca/1844287
>
> thanks
>
>
>
> On Thu, Mar 4, 2010 at 10:30 PM, Jeff Hammerbacher  
> wrote:
>> Hey Mag,
>>
>> Glad you have solved the problem. I've created a JIRA ticket to improve the
>> existing documentation: https://issues.apache.org/jira/browse/HADOOP-6616.
>> If you have some time, it would be useful to hear what could be added to the
>> existing documentation that would have helped you figure this out sooner.
>>
>> Thanks,
>> Jeff
>>
>> On Thu, Mar 4, 2010 at 3:39 PM, Mag Gam  wrote:
>>
>>> Thanks everyone for explaining this to me instead of giving me RTFM!
>>>
>>> I will play around with it and see how far I get.
>>>
>>>
>>>
>>> On Thu, Mar 4, 2010 at 9:21 AM, Steve Loughran  wrote:
>>> > Allen Wittenauer wrote:
>>> >>
>>> >> On 3/3/10 5:01 PM, "Mag Gam"  wrote:
>>> >>
>>> >>> Thanks Alan! Your presentation is very nice!
>>> >>
>>> >> Thanks. :)
>>> >>
>>> >>> "If you don't provide a script for rack awareness, it treats every
>>> >>> node as if it was its own rack". I am using the default settings and
>>> >>> the report still says only 1 rack.
>>> >>
>>> >> Let's take a different approach to convince you. :)
>>> >>
>>> >> Think about the question:  Is there a difference between all nodes in
>>> one
>>> >> rack vs. every node acting as a lone rack?
>>> >>
>>> >> The answer is no, there isn't any difference.  In both cases, all copies
>>> >> of
>>> >> the blocks can go to pretty much any node. When a MR job runs, every
>>> node
>>> >> would either be considered 'off rack' or 'rack-local'.
>>> >>
>>> >> So there is no difference.
>>> >>
>>> >>
>>> >>> Do you mind sharing a script with us on how you determine a rack? and
>>> >>> a sample   syntax?
>>> >>
>>> >> Michael has already posted his, so I'll skip this one. :)
>>> >>
>>> >
>>> > Think Mag probably wanted a shell script.
>>> >
>>> > Mag, give your machines IPv4 addresses that map to rack number. 10.1.1.*
>>> for
>>> > rack one, 10.1.2.* for rack 2, etc. Then just filter out the IP address
>>> by
>>> > the top bytes, returning "10.1.1" for everything in rack one, "10.1.2"
>>> for
>>> > rack 2; Hadoop will be happy
>>> >
>>>
>>
>


Re: rack awareness help

2010-03-17 Thread Mag Gam
Well,  I didn't really solve the problem. Now I have even more questions.

I came across this script,
http://wiki.apache.org/hadoop/topology_rack_awareness_scripts

but it makes no sense to me! Can someone please try to explain what
its trying to do?


MikeThomas:

Your script isn't working for me. I think there are some syntax
errors. Is this how its supposed to look: http://pastebin.ca/1844287

thanks



On Thu, Mar 4, 2010 at 10:30 PM, Jeff Hammerbacher  wrote:
> Hey Mag,
>
> Glad you have solved the problem. I've created a JIRA ticket to improve the
> existing documentation: https://issues.apache.org/jira/browse/HADOOP-6616.
> If you have some time, it would be useful to hear what could be added to the
> existing documentation that would have helped you figure this out sooner.
>
> Thanks,
> Jeff
>
> On Thu, Mar 4, 2010 at 3:39 PM, Mag Gam  wrote:
>
>> Thanks everyone for explaining this to me instead of giving me RTFM!
>>
>> I will play around with it and see how far I get.
>>
>>
>>
>> On Thu, Mar 4, 2010 at 9:21 AM, Steve Loughran  wrote:
>> > Allen Wittenauer wrote:
>> >>
>> >> On 3/3/10 5:01 PM, "Mag Gam"  wrote:
>> >>
>> >>> Thanks Alan! Your presentation is very nice!
>> >>
>> >> Thanks. :)
>> >>
>> >>> "If you don't provide a script for rack awareness, it treats every
>> >>> node as if it was its own rack". I am using the default settings and
>> >>> the report still says only 1 rack.
>> >>
>> >> Let's take a different approach to convince you. :)
>> >>
>> >> Think about the question:  Is there a difference between all nodes in
>> one
>> >> rack vs. every node acting as a lone rack?
>> >>
>> >> The answer is no, there isn't any difference.  In both cases, all copies
>> >> of
>> >> the blocks can go to pretty much any node. When a MR job runs, every
>> node
>> >> would either be considered 'off rack' or 'rack-local'.
>> >>
>> >> So there is no difference.
>> >>
>> >>
>> >>> Do you mind sharing a script with us on how you determine a rack? and
>> >>> a sample   syntax?
>> >>
>> >> Michael has already posted his, so I'll skip this one. :)
>> >>
>> >
>> > Think Mag probably wanted a shell script.
>> >
>> > Mag, give your machines IPv4 addresses that map to rack number. 10.1.1.*
>> for
>> > rack one, 10.1.2.* for rack 2, etc. Then just filter out the IP address
>> by
>> > the top bytes, returning "10.1.1" for everything in rack one, "10.1.2"
>> for
>> > rack 2; Hadoop will be happy
>> >
>>
>


Re: rack awareness help

2010-03-04 Thread Jeff Hammerbacher
Hey Mag,

Glad you have solved the problem. I've created a JIRA ticket to improve the
existing documentation: https://issues.apache.org/jira/browse/HADOOP-6616.
If you have some time, it would be useful to hear what could be added to the
existing documentation that would have helped you figure this out sooner.

Thanks,
Jeff

On Thu, Mar 4, 2010 at 3:39 PM, Mag Gam  wrote:

> Thanks everyone for explaining this to me instead of giving me RTFM!
>
> I will play around with it and see how far I get.
>
>
>
> On Thu, Mar 4, 2010 at 9:21 AM, Steve Loughran  wrote:
> > Allen Wittenauer wrote:
> >>
> >> On 3/3/10 5:01 PM, "Mag Gam"  wrote:
> >>
> >>> Thanks Alan! Your presentation is very nice!
> >>
> >> Thanks. :)
> >>
> >>> "If you don't provide a script for rack awareness, it treats every
> >>> node as if it was its own rack". I am using the default settings and
> >>> the report still says only 1 rack.
> >>
> >> Let's take a different approach to convince you. :)
> >>
> >> Think about the question:  Is there a difference between all nodes in
> one
> >> rack vs. every node acting as a lone rack?
> >>
> >> The answer is no, there isn't any difference.  In both cases, all copies
> >> of
> >> the blocks can go to pretty much any node. When a MR job runs, every
> node
> >> would either be considered 'off rack' or 'rack-local'.
> >>
> >> So there is no difference.
> >>
> >>
> >>> Do you mind sharing a script with us on how you determine a rack? and
> >>> a sample   syntax?
> >>
> >> Michael has already posted his, so I'll skip this one. :)
> >>
> >
> > Think Mag probably wanted a shell script.
> >
> > Mag, give your machines IPv4 addresses that map to rack number. 10.1.1.*
> for
> > rack one, 10.1.2.* for rack 2, etc. Then just filter out the IP address
> by
> > the top bytes, returning "10.1.1" for everything in rack one, "10.1.2"
> for
> > rack 2; Hadoop will be happy
> >
>


Re: rack awareness help

2010-03-04 Thread Mag Gam
Thanks everyone for explaining this to me instead of giving me RTFM!

I will play around with it and see how far I get.



On Thu, Mar 4, 2010 at 9:21 AM, Steve Loughran  wrote:
> Allen Wittenauer wrote:
>>
>> On 3/3/10 5:01 PM, "Mag Gam"  wrote:
>>
>>> Thanks Alan! Your presentation is very nice!
>>
>> Thanks. :)
>>
>>> "If you don't provide a script for rack awareness, it treats every
>>> node as if it was its own rack". I am using the default settings and
>>> the report still says only 1 rack.
>>
>> Let's take a different approach to convince you. :)
>>
>> Think about the question:  Is there a difference between all nodes in one
>> rack vs. every node acting as a lone rack?
>>
>> The answer is no, there isn't any difference.  In both cases, all copies
>> of
>> the blocks can go to pretty much any node. When a MR job runs, every node
>> would either be considered 'off rack' or 'rack-local'.
>>
>> So there is no difference.
>>
>>
>>> Do you mind sharing a script with us on how you determine a rack? and
>>> a sample   syntax?
>>
>> Michael has already posted his, so I'll skip this one. :)
>>
>
> Think Mag probably wanted a shell script.
>
> Mag, give your machines IPv4 addresses that map to rack number. 10.1.1.* for
> rack one, 10.1.2.* for rack 2, etc. Then just filter out the IP address by
> the top bytes, returning "10.1.1" for everything in rack one, "10.1.2" for
> rack 2; Hadoop will be happy
>


Re: rack awareness help

2010-03-04 Thread Steve Loughran

Allen Wittenauer wrote:

On 3/3/10 5:01 PM, "Mag Gam"  wrote:


Thanks Alan! Your presentation is very nice!


Thanks. :)


"If you don't provide a script for rack awareness, it treats every
node as if it was its own rack". I am using the default settings and
the report still says only 1 rack.


Let's take a different approach to convince you. :)

Think about the question:  Is there a difference between all nodes in one
rack vs. every node acting as a lone rack?

The answer is no, there isn't any difference.  In both cases, all copies of
the blocks can go to pretty much any node. When a MR job runs, every node
would either be considered 'off rack' or 'rack-local'.

So there is no difference.



Do you mind sharing a script with us on how you determine a rack? and
a sample   syntax?


Michael has already posted his, so I'll skip this one. :)



Think Mag probably wanted a shell script.

Mag, give your machines IPv4 addresses that map to rack number. 10.1.1.* 
for rack one, 10.1.2.* for rack 2, etc. Then just filter out the IP 
address by the top bytes, returning "10.1.1" for everything in rack one, 
"10.1.2" for rack 2; Hadoop will be happy


Re: rack awareness help

2010-03-03 Thread Allen Wittenauer
On 3/3/10 5:01 PM, "Mag Gam"  wrote:

> Thanks Alan! Your presentation is very nice!

Thanks. :)

> "If you don't provide a script for rack awareness, it treats every
> node as if it was its own rack". I am using the default settings and
> the report still says only 1 rack.

Let's take a different approach to convince you. :)

Think about the question:  Is there a difference between all nodes in one
rack vs. every node acting as a lone rack?

The answer is no, there isn't any difference.  In both cases, all copies of
the blocks can go to pretty much any node. When a MR job runs, every node
would either be considered 'off rack' or 'rack-local'.

So there is no difference.


> Do you mind sharing a script with us on how you determine a rack? and
> a sample   syntax?

Michael has already posted his, so I'll skip this one. :)



Re: rack awareness help

2010-03-03 Thread Michael Thomas
Hi,

From what I've observed, if you don't provide a script for rack
awareness, hadoop treats all nodes as if they belong the the same rack.

Here is the section in hadoop-site xml where we define our rack
awareness script:


  topology.script.file.name
  /usr/bin/ip-to-rack.sh



This is the ip-to-rack.sh script that we use on our cluster.  It assumes
that the reverse lookup of the IP returns a hostname of the form
'compute-x-y', where 'x' is the rack id.  This type of naming is common
for Rocks-managed clusters:

#!/bin/sh

# The default rule assumes that the nodes are connected to the PDU and
switch
# located in the same rack.  Only the exceptions need to be
# explicitly listed here.
for ip in $@ ; do
hostname=`nslookup $ip | grep "name =" | awk '{print $4}' | sed -e
's/\.local\.$//' `
case $hostname in
compute-14-3) rack="/Rack15" ;;
*)
rack=`echo $hostname | sed -e
's/^[a-z]*-\([0-9]*\)-[0-9]*.*/\/Rack\1/'`
;;
esac
echo $rack
done

Hope this helps,

--Mike

On 03/03/2010 05:01 PM, Mag Gam wrote:
> Thanks Alan! Your presentation is very nice!
> 
> "If you don't provide a script for rack awareness, it treats every
> node as if it was its own rack". I am using the default settings and
> the report still says only 1 rack.
> 
> 
> Do you mind sharing a script with us on how you determine a rack? and
> a sample   syntax?
> 
> TIA
> 
> 
> 
> 
> On Wed, Mar 3, 2010 at 11:57 AM, Allen Wittenauer
>  wrote:
>>
>>
>>
>> On 3/3/10 4:11 AM, "Mag Gam"  wrote:
>>> An example would be very helpful. There is only 1 paragraph about this
>>> but its far too important not to have an example or two.
>>
>> I covered this in my preso to apachecon last year:
>>
>> http://wiki.apache.org/hadoop/HadoopPresentations?action=AttachFile&do=view&;
>> target=aw-apachecon-eu-2009.pdf
>>
>> aka
>>
>> http://bit.ly/d3UU4A
>>
>> You might find the example in/out helpful.  No code, but it is (seriously)
>> trivial to write.
>>
>>
>>




smime.p7s
Description: S/MIME Cryptographic Signature


Re: rack awareness help

2010-03-03 Thread Mag Gam
Thanks Alan! Your presentation is very nice!

"If you don't provide a script for rack awareness, it treats every
node as if it was its own rack". I am using the default settings and
the report still says only 1 rack.


Do you mind sharing a script with us on how you determine a rack? and
a sample   syntax?

TIA




On Wed, Mar 3, 2010 at 11:57 AM, Allen Wittenauer
 wrote:
>
>
>
> On 3/3/10 4:11 AM, "Mag Gam"  wrote:
>> An example would be very helpful. There is only 1 paragraph about this
>> but its far too important not to have an example or two.
>
> I covered this in my preso to apachecon last year:
>
> http://wiki.apache.org/hadoop/HadoopPresentations?action=AttachFile&do=view&;
> target=aw-apachecon-eu-2009.pdf
>
> aka
>
> http://bit.ly/d3UU4A
>
> You might find the example in/out helpful.  No code, but it is (seriously)
> trivial to write.
>
>
>


Re: rack awareness help

2010-03-03 Thread Neil Bliss
On Wed, Mar 3, 2010 at 8:57 AM, Allen Wittenauer
wrote:

> On 3/3/10 4:11 AM, "Mag Gam"  wrote:
> > An example would be very helpful. There is only 1 paragraph about this
> > but its far too important not to have an example or two.
>
> I covered this in my preso to apachecon last year:
>
>
> http://wiki.apache.org/hadoop/HadoopPresentations?action=AttachFile&do=view&;
> target=aw-apachecon-eu-2009.pdf
>
> aka
>
> http://bit.ly/d3UU4A
>
> You might find the example in/out helpful.  No code, but it is (seriously)
> trivial to write.
>

So in the "rack awareness" output, are there meaningful distance metrics
that should be provided, or is it simply a "local rack vs. non-local rack"
determination?

thanks,

Neil


Re: rack awareness help

2010-03-03 Thread Allen Wittenauer



On 3/3/10 4:11 AM, "Mag Gam"  wrote:
> An example would be very helpful. There is only 1 paragraph about this
> but its far too important not to have an example or two.

I covered this in my preso to apachecon last year:

http://wiki.apache.org/hadoop/HadoopPresentations?action=AttachFile&do=view&;
target=aw-apachecon-eu-2009.pdf

aka

http://bit.ly/d3UU4A

You might find the example in/out helpful.  No code, but it is (seriously)
trivial to write.




Re: rack awareness help

2010-03-03 Thread Allen Wittenauer



On 3/2/10 7:43 PM, "Mag Gam"  wrote:

> I have a 5 slave servers and I would like to be rackaware meaning each
> server represents 1 rack resulting in 5 racks. I have looked around
> for examples online but could not find anything concrete. Can someone
> please show me an example on how to set this up?


You're in luck.

If you don't provide a script for rack awareness, it treats every node as if
it was its own rack.




Re: rack awareness help

2010-03-03 Thread Mag Gam
hello jeff:

yes, I have looked at this page many times.

I guess I am a bit confused with this line, "The default
implementation of the same runs a script/command configured using
topology.script.file.name. If topology.script.file.name is not set,
the rack id /default-rack is returned for any passed IP address."

Do I have to write a script which will convert my IP address into a
unique rack? Once we get the unique id  hdfs will write according to
the id?

An example would be very helpful. There is only 1 paragraph about this
but its far too important not to have an example or two.





On Wed, Mar 3, 2010 at 12:37 AM, Jeff Hammerbacher  wrote:
> Hey,
>
> Have you looked at
> http://hadoop.apache.org/common/docs/r0.20.0/cluster_setup.html#Hadoop+Rack+Awareness?
> If there's something lacking in that documentation, perhaps you can submit a
> patch with a more useful explanation.
>
> Regards,
> Jeff
>
> On Tue, Mar 2, 2010 at 7:43 PM, Mag Gam  wrote:
>
>> I have a 5 slave servers and I would like to be rackaware meaning each
>> server represents 1 rack resulting in 5 racks. I have looked around
>> for examples online but could not find anything concrete. Can someone
>> please show me an example on how to set this up?
>>
>> TIA
>>
>


Re: rack awareness help

2010-03-02 Thread Jeff Hammerbacher
Hey,

Have you looked at
http://hadoop.apache.org/common/docs/r0.20.0/cluster_setup.html#Hadoop+Rack+Awareness?
If there's something lacking in that documentation, perhaps you can submit a
patch with a more useful explanation.

Regards,
Jeff

On Tue, Mar 2, 2010 at 7:43 PM, Mag Gam  wrote:

> I have a 5 slave servers and I would like to be rackaware meaning each
> server represents 1 rack resulting in 5 racks. I have looked around
> for examples online but could not find anything concrete. Can someone
> please show me an example on how to set this up?
>
> TIA
>