Parallelize job initialization
------------------------------

                 Key: HADOOP-4664
                 URL: https://issues.apache.org/jira/browse/HADOOP-4664
             Project: Hadoop Core
          Issue Type: Improvement
          Components: mapred
            Reporter: Matei Zaharia


The job init thread currently initializes one job at a time. However, this is a 
lengthy and partly IO-bound process because all of the job's block locations 
need to be resolved through the namenode and a map of them needs to be built. 
It can take tens of seconds. As a result, the cluster sometimes initializes 
jobs too slowly for full utilization to be achieved, if there are many small 
jobs queued up. It would be better to have a pool of threads that initialize 
multiple jobs in parallel. One thing to be careful of, however, is not causing 
deadlocks or holding locks for too long in these threads.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to