[ 
https://issues.apache.org/jira/browse/HADOOP-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12580500#action_12580500
 ] 

Devaraj Das commented on HADOOP-2119:
-------------------------------------

Some comments. 
1) Remove default-node --> use a separate list for non-local 
running/non-running maps. So instead of falling to the array on a cache miss 
you hit the list that you can update as well (remove items, and add them to a 
equivalent list for running, etc.).
2) Maintain a mapping from the level to the set of nodes in that level (except 
level 0). You should look at the TIPs at the topmost level cache (in case max 
cache level is 2, then that will mean all racks), when you look for something 
to run on a cache miss. 
3) Change the JobInProgress code to reflect proper terminologies like 
caches/lists etc
4) TIPs that don't have locations get added to a special list instead of the 
default-node cache (point 1)
5) Change the signature of findNewCachedTask to take the level instead of a 
boolean. Also, i think it'd be better if you call the method findTaskFromList 
since it caters to both maps and reduces and reduces really don't have a cache.
6) getCurrentTime should be moved out to a place where it is called exactly 
once per findTask
7) I don't think it is that important to move tasks to the back of the list in 
case of speculative tasks.


> JobTracker becomes non-responsive if the task trackers finish task too fast
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-2119
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2119
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.16.0
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>            Priority: Critical
>             Fix For: 0.17.0
>
>         Attachments: HADOOP-2119-v4.1.patch, hadoop-2119.patch, 
> hadoop-jobtracker-thread-dump.txt
>
>
> I ran a job with 0 reducer on a cluster with 390 nodes.
> The mappers ran very fast.
> The jobtracker lacks behind on committing completed mapper tasks.
> The number of running mappers displayed on web UI getting bigger and bigger.
> The jos tracker eventually stopped responding to web UI.
> No progress is reported afterwards.
> Job tracker is running on a separate node.
> The job tracker process consumed 100% cpu, with vm size 1.01g (reach the heap 
> space limit).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to