[
https://issues.apache.org/jira/browse/HADOOP-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12580500#action_12580500
]
Devaraj Das commented on HADOOP-2119:
-------------------------------------
Some comments.
1) Remove default-node --> use a separate list for non-local
running/non-running maps. So instead of falling to the array on a cache miss
you hit the list that you can update as well (remove items, and add them to a
equivalent list for running, etc.).
2) Maintain a mapping from the level to the set of nodes in that level (except
level 0). You should look at the TIPs at the topmost level cache (in case max
cache level is 2, then that will mean all racks), when you look for something
to run on a cache miss.
3) Change the JobInProgress code to reflect proper terminologies like
caches/lists etc
4) TIPs that don't have locations get added to a special list instead of the
default-node cache (point 1)
5) Change the signature of findNewCachedTask to take the level instead of a
boolean. Also, i think it'd be better if you call the method findTaskFromList
since it caters to both maps and reduces and reduces really don't have a cache.
6) getCurrentTime should be moved out to a place where it is called exactly
once per findTask
7) I don't think it is that important to move tasks to the back of the list in
case of speculative tasks.
> JobTracker becomes non-responsive if the task trackers finish task too fast
> ---------------------------------------------------------------------------
>
> Key: HADOOP-2119
> URL: https://issues.apache.org/jira/browse/HADOOP-2119
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.16.0
> Reporter: Runping Qi
> Assignee: Amar Kamat
> Priority: Critical
> Fix For: 0.17.0
>
> Attachments: HADOOP-2119-v4.1.patch, hadoop-2119.patch,
> hadoop-jobtracker-thread-dump.txt
>
>
> I ran a job with 0 reducer on a cluster with 390 nodes.
> The mappers ran very fast.
> The jobtracker lacks behind on committing completed mapper tasks.
> The number of running mappers displayed on web UI getting bigger and bigger.
> The jos tracker eventually stopped responding to web UI.
> No progress is reported afterwards.
> Job tracker is running on a separate node.
> The job tracker process consumed 100% cpu, with vm size 1.01g (reach the heap
> space limit).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.