[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Kimball updated MAPREDUCE-1367:
-------------------------------------

    Attachment: MAPREDUCE-1367.patch

Attaching a patch that implements this improvement. This patch includes a test 
case which launches 6 mappers concurrently; these mappers run on a variety of 
schedules (some are faster, some are slower) in an attempt to suss out any race 
conditions that might develop.

The level of parallelism is controlled by a new parameter: 
{{mapred.local.map.tasks.maximum}}. This defaults to 1, so that unspecified 
behavior is as before.

I also tested this by running the 'pi' example from the command line:
{code}bin/hadoop jar hadoop-mapred-examples-0.22.0-SNAPSHOT.jar pi -D 
mapreduce.jobtracker.address=local -D mapreduce.local.map.tasks.maximum=2 20 
5000000
{code}

With {{mapreduce.local.map.tasks.maximum}} set to 1, this takes 13.5 seconds on 
my machine. With it set to 2 or above (I have two cores), the runtime drops to 
8.5 seconds.

> LocalJobRunner should support parallel mapper execution
> -------------------------------------------------------
>
>                 Key: MAPREDUCE-1367
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1367
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-1367.patch
>
>
> The LocalJobRunner currently supports only a single execution thread. Given 
> the prevalence of multi-core CPUs, it makes sense to allow users to run 
> multiple tasks in parallel for improved performance on small (local-only) 
> jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to