[
https://issues.apache.org/jira/browse/HADOOP-5632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696289#action_12696289
]
Khaled Elmeleegy commented on HADOOP-5632:
------------------------------------------
Methodology: I tried an experiment studying slot utilization in a hadoop
cluster under heavy load. For the workload, I used three large web scan jobs
from the gridmix with uncompressed input. On my experimental setup, the map
runtime was about 20 seconds. Each job had 16,000 maps. Block size was
configured to be 128MB. I used hadoop20. Each node was configured to have 3 map
slots and two reduce slots. I varied the inter heartbeat interval and measured
the corresponding slot utilization. The way I measured the slot utilization was
by plotting a timeline of the number of slots used in one of the workers
(tasktrackers). To do that, I used a tasktracker's log. One tasktracker, should
be representative of all the rest since I am using a single cluster and a
single type of jobs. Also, maps of one job are more that the whole number of
slots available.
Results:
For 10 seconds inter heartbeat interval slot utilization was 62.4%
For 1 second inter heartbeat interval slot utilization was 93.7%
For 100 miliseconds inter heartbeat interval slot utilization was 99%
Comments: For a fixed inter heartbeat interval, the longer the map is the
higher the utilization gets.
Explanation:
Reducing the inter heatbeat interval to these miniscule values is unacceptable
as in a big cluster the jobtracker to be overwhelmed with heartbeats. What
should happen is to have tasktrackers send a heartbeat to the jobtracker asking
for a new task as soon as a slot becomes open. This is instead of waiting for
the periodic heartbeats. This currently doesn't happen as this could overload
the jobtracker with heartbeats.
I found that the not only the heartbeat handler at the jobtracker is protected
by a giant lock, which prevents the jobtracker from utilizing more than one CPU
in the machine it's running on, but also that the cross-section of this lock is
not constant. It's proportional to the load of the system. For example, the
heartbeat handler in the jobtracker calls functions like getTasksToKill(). This
function scans a the list of tasks on the tasktracker to check if any tasks
should be killed. This list can be arbitrary long. It keeps track of maps, of
running jobs, that have run or are still running on this task tracker. This is
not scalable.
I observed in the above experiement that the runtime of a single heartbeat
handler can go up to 3 ms. About half of this time is spent in getTasksToKill().
Solution:
1. Break the hearbeat handler into two kinds of handlers. A light weight one,
which is called on demand by the tasktraker whenever a new slot comes open. And
a heavy weight duty cycle handler that is called periodically with a fairly low
frequency to do periodic things like kill tasks and jobs,... On the
tasktracker. As soon as a new slot opens up at a tasktracker, it goes directly
to the jobtracker asking for a new task. In this case, the light weight handler
is invoked. Periodically, tasktrackers call the jobtracker for a duty cycle.
2. try to have finer grain locking in the heartbeat handler to be able to
utilize more CPUs in the machine to help with the heavy load.
Attached is a patch that creates these light weight and heavy weight heartbeat
handlers. Using this patch, I was able to get 99+% slot utilization and reduced
the heartbeat runtime to 1 ms.
To do:
1- The light weight handler can get lighter. Currently, it does two main
things, processHeartbeat(..) and assigntasks(..). ProcessHeartbeat(..) takes
more than half a ms. I think it's not required in the light weight handler. The
only thing needed is task completion notification. So that needs to be stripped
off of it. This can cut the lightweight handler's runtime by another half.
2-revisit the locking of the hearbeat handler.
> Jobtracker leaves tasktrackers underutilized
> --------------------------------------------
>
> Key: HADOOP-5632
> URL: https://issues.apache.org/jira/browse/HADOOP-5632
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Affects Versions: 0.18.0, 0.18.1, 0.18.2, 0.18.3, 0.19.0, 0.19.1, 0.20.0
> Environment: 2x HT 2.8GHz Intel Xeon, 3GB RAM, 4x 250GB HD linux
> boxes, 100 node cluster
> Reporter: Khaled Elmeleegy
> Fix For: 0.20.0
>
>
> For some workloads, the jobtracker doesn't keep all the slots utilized even
> under heavy load.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.