[ https://issues.apache.org/jira/browse/MAPREDUCE-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aaron Kimball updated MAPREDUCE-1367: ------------------------------------- Attachment: MAPREDUCE-1367.patch Attaching a patch that implements this improvement. This patch includes a test case which launches 6 mappers concurrently; these mappers run on a variety of schedules (some are faster, some are slower) in an attempt to suss out any race conditions that might develop. The level of parallelism is controlled by a new parameter: {{mapred.local.map.tasks.maximum}}. This defaults to 1, so that unspecified behavior is as before. I also tested this by running the 'pi' example from the command line: {code}bin/hadoop jar hadoop-mapred-examples-0.22.0-SNAPSHOT.jar pi -D mapreduce.jobtracker.address=local -D mapreduce.local.map.tasks.maximum=2 20 5000000 {code} With {{mapreduce.local.map.tasks.maximum}} set to 1, this takes 13.5 seconds on my machine. With it set to 2 or above (I have two cores), the runtime drops to 8.5 seconds. > LocalJobRunner should support parallel mapper execution > ------------------------------------------------------- > > Key: MAPREDUCE-1367 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1367 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Reporter: Aaron Kimball > Assignee: Aaron Kimball > Attachments: MAPREDUCE-1367.patch > > > The LocalJobRunner currently supports only a single execution thread. Given > the prevalence of multi-core CPUs, it makes sense to allow users to run > multiple tasks in parallel for improved performance on small (local-only) > jobs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.