[ 
https://issues.apache.org/jira/browse/HADOOP-2560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557412#action_12557412
 ] 

Doug Cutting commented on HADOOP-2560:
--------------------------------------

> It is not going to work to combine splits statically because block replicas 
> are not co-resident.

If the number of blocks in the job input is hugely greater than the number of 
nodes, then it should be easy to find nodes that have a large number of blocks 
locally, and group the blocks thusly into tasks.  If a task fails, then the 
re-execution might not be local, but most tasks don't fail, and we can arrange 
things so that the first node a task is assigned to has all its blocks.  Or am 
i missing something?

Consider the following algorithm:
- build <node, block*> and <block, node*> maps for the job input files
- N is the desired blocks/task
- for (node : nodes) pop N blocks off each nodes list and add it to the list of 
tasks
- as each block is popped, also remove it from all other node's lists, using 
the other map to accelerate this
- repeat until nodes have fewer than N blocks, then emit tasks with fewer than 
N blocks as the tail of the job

Wouldn't that work?



> Combining multiple input blocks into one mapper
> -----------------------------------------------
>
>                 Key: HADOOP-2560
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2560
>             Project: Hadoop
>          Issue Type: Bug
>            Reporter: Runping Qi
>
> Currently, an input split contains a consecutive chunk of input file, which 
> by default, corresponding to a DFS block.
> This may lead to a large number of mapper tasks if the input data is large. 
> This leads to the following problems:
> 1. Shuffling cost: since the framework has to move M * R map output segments 
> to the nodes running reducers, 
> larger M means larger shuffling cost.
> 2. High JVM initialization overhead
> 3. Disk fragmentation: larger number of map output files means lower read 
> throughput for accessing them.
> Ideally, you want to keep the number of mappers to no more than 16 times the 
> number of  nodes in the cluster.
> To achive that, we can increase the input split size. However, if a split 
> span over more than one dfs block,
> you lose the data locality scheduling benefits.
> One way to address this problem is to combine multiple input blocks with the 
> same rack into one split.
> If in average we combine B blocks into one split, then we will reduce the 
> number of mappers by a factor of B.
> Since all the blocks for one mapper share a rack, thus we can benefit from 
> rack-aware scheduling.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to