Parallelize job initialization
------------------------------
Key: HADOOP-4664
URL: https://issues.apache.org/jira/browse/HADOOP-4664
Project: Hadoop Core
Issue Type: Improvement
Components: mapred
Reporter: Matei Zaharia
The job init thread currently initializes one job at a time. However, this is a
lengthy and partly IO-bound process because all of the job's block locations
need to be resolved through the namenode and a map of them needs to be built.
It can take tens of seconds. As a result, the cluster sometimes initializes
jobs too slowly for full utilization to be achieved, if there are many small
jobs queued up. It would be better to have a pool of threads that initialize
multiple jobs in parallel. One thing to be careful of, however, is not causing
deadlocks or holding locks for too long in these threads.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.