Re: Optimized way

Alex Loddengaard Thu, 04 Dec 2008 09:53:14 -0800

Well, Map/Reduce and Hadoop by definition run maps in parallel.  I think
you're interested in the following two configuration settings:

mapred.tasktracker.map.tasks.maximum
mapred.tasktracker.reduce.tasks.maximum

These go in hadoop-site.xml and will set the number of map and reduce tasks
for each tasktracker (node).  Learn more here:

<
http://hadoop.apache.org/core/docs/current/cluster_setup.html#Configuring+the+Hadoop+Daemons
>

Map tasks + reduce tasks should be slightly above the number of cores you
have per node.  So if you have 8 cores per node, setting map tasks to 6 and
reduce tasks to 4 would probably be good.

Hope this helps.

Alex

On Thu, Dec 4, 2008 at 6:42 AM, Aayush Garg <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I am having a 5 node cluster for hadoop usage. All nodes are multi-core.
> I am running a shell command in Map function of my program and this shell
> command takes one file as an input. Many of such files are copied in the
> HDFS.
>
> So in summary map function will run a command like ./run <file1>
> <outputfile1>
>
> Could you please suggest the optimized way to do this..like if I can use
> multi core processing of nodes and many of such maps in parallel.
>
> Thanks,
> Aayush
>

Re: Optimized way

Reply via email to