[ 
https://issues.apache.org/jira/browse/HADOOP-4664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12648085#action_12648085
 ] 

matei edited comment on HADOOP-4664 at 11/16/08 10:12 PM:
------------------------------------------------------------------

In some initial testing of this patch on a jobtracker with a lot of old history 
files, I found that the lock in JobHistory on getJobHistoryFileName and 
recoverJobHistoryFile was causing most of the threads to block while one thread 
listed the directory, leading to no improvement. However, Amar Kamat explained 
that HADOOP-4372 will help solve this issue. I'll wait on that before trying to 
modify things myself. The patch provided here should still help when the job 
init phase is limited more by CPU than by the history file scanning and 
creation.

      was (Author: matei):
    In some initial testing of this patch on a job with a lot of old history 
files, I found that the lock in JobHistory on getJobHistoryFileName and 
recoverJobHistoryFile was causing most of the threads to block while one thread 
listed the directory, leading to no improvement. However, Amar Kamat explained 
that HADOOP-4372 will help solve this issue. I'll wait on that before trying to 
modify things myself. The patch provided here should still help when the job 
init phase is limited more by CPU than by the history file scanning and 
creation.
  
> Parallelize job initialization
> ------------------------------
>
>                 Key: HADOOP-4664
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4664
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Matei Zaharia
>         Attachments: parallel-job-init-v1.patch
>
>
> The job init thread currently initializes one job at a time. However, this is 
> a lengthy and partly IO-bound process because all of the job's block 
> locations need to be resolved through the namenode and a map of them needs to 
> be built. It can take tens of seconds. As a result, the cluster sometimes 
> initializes jobs too slowly for full utilization to be achieved, if there are 
> many small jobs queued up. It would be better to have a pool of threads that 
> initialize multiple jobs in parallel. One thing to be careful of, however, is 
> not causing deadlocks or holding locks for too long in these threads.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to