Re: Map/Reduce | Multiple node configuration

Bejoy KS Tue, 12 Jun 2012 00:08:44 -0700

Hi Girish

Lemme try answering your queries


1. For multiple nodes I understand I should add the URL of the secondary nodes 
in the slaves.xml. Am I correct?

Bejoy: AFAIK you nedd to add it on /etc/hosts

2. What should be installed on the secondary nodes for executing a job/task?

Bejoy: In small clusters you have the NameNode and JobTracker on one node , 
SecondaryNameNode on another node and DataNode and TaskTrackers on all other 
nodes.

3. I understand I can set the map/reduce classes as a jar to the Job - through 
the JobConf - so does this mean I need not really install/copy my map/reduce 
code on all the secondary nodes?

Bejoy: There is no differnce in sub$itting jobs as compared to a pseudo node 
set up. MapReduce frame work distributes this job jar and other required files. 
It is better having a client node to launch jobs

4. How do I route the data to these nodes? Is it required for the Map Reduce to 
execute on the machines which has the data stored (DFS)?

Bejoy: MR framework takes care of this. Map tasks consider data locality.


Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: Girish Ravi <giri...@srmtech.com>
Date: Tue, 12 Jun 2012 06:55:26 
To: mapreduce-user@hadoop.apache.org<mapreduce-user@hadoop.apache.org>
Reply-To: mapreduce-user@hadoop.apache.org
Subject: Map/Reduce | Multiple node configuration

Hello Team,

I have started to understand about Hadoop Mapreduce and was able to set-up a 
single cluster single node execution environment.

I want to now extend this to a multi node environment.
I have the following questions and it would very helpful if somebody can help:
1. For multiple nodes I understand I should add the URL of the secondary nodes 
in the slaves.xml. Am I correct?
2. What should be installed on the secondary nodes for executing a job/task?
3. I understand I can set the map/reduce classes as a jar to the Job - through 
the JobConf - so does this mean I need not really install/copy my map/reduce 
code on all the secondary nodes?
4. How do I route the data to these nodes? Is it required for the Map Reduce to 
execute on the machines which has the data stored (DFS)?

Any samples for doing this would help.
Request for suggestions.

Regards
Girish
Ph: +91-9916212114

Re: Map/Reduce | Multiple node configuration

Reply via email to