[
https://issues.apache.org/jira/browse/HADOOP-2560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557821#action_12557821
]
Doug Cutting commented on HADOOP-2560:
--------------------------------------
> Much simpler to make the late binding decision to bundle them.
The algorithm I outlined above could be done incrementally, rather than all
up-front:
- N is the desired splits/task
- build <node, split*> map for the job inputs
- when a node asks for a task, pop up to N splits off its list to form a task
- if a node has no more splits, pop splits from other nodes
- as each split is popped, remove it from other map entries
This is essentially the existing algorithm, except that we allocate more than
one split per task. In fact, the existing algorithm handles lots of other
subtle cases like speculative execution, task failure, etc. So the best way to
implement this is probably to use the existing algorithm multiple times per
task, etc.
Earlier I'd spoke of implementing this up front, when constructing splits. But
if it's done this way, then we needn't actually change public APIs or
InputFormats. Tasks could simply internally be changed to execute a list of
splits rather than a single split.
> Combining multiple input blocks into one mapper
> -----------------------------------------------
>
> Key: HADOOP-2560
> URL: https://issues.apache.org/jira/browse/HADOOP-2560
> Project: Hadoop
> Issue Type: Bug
> Reporter: Runping Qi
>
> Currently, an input split contains a consecutive chunk of input file, which
> by default, corresponding to a DFS block.
> This may lead to a large number of mapper tasks if the input data is large.
> This leads to the following problems:
> 1. Shuffling cost: since the framework has to move M * R map output segments
> to the nodes running reducers,
> larger M means larger shuffling cost.
> 2. High JVM initialization overhead
> 3. Disk fragmentation: larger number of map output files means lower read
> throughput for accessing them.
> Ideally, you want to keep the number of mappers to no more than 16 times the
> number of nodes in the cluster.
> To achive that, we can increase the input split size. However, if a split
> span over more than one dfs block,
> you lose the data locality scheduling benefits.
> One way to address this problem is to combine multiple input blocks with the
> same rack into one split.
> If in average we combine B blocks into one split, then we will reduce the
> number of mappers by a factor of B.
> Since all the blocks for one mapper share a rack, thus we can benefit from
> rack-aware scheduling.
> Thoughts?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.