[ https://issues.apache.org/jira/browse/HADOOP-2560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557412#action_12557412 ]
Doug Cutting commented on HADOOP-2560: -------------------------------------- > It is not going to work to combine splits statically because block replicas > are not co-resident. If the number of blocks in the job input is hugely greater than the number of nodes, then it should be easy to find nodes that have a large number of blocks locally, and group the blocks thusly into tasks. If a task fails, then the re-execution might not be local, but most tasks don't fail, and we can arrange things so that the first node a task is assigned to has all its blocks. Or am i missing something? Consider the following algorithm: - build <node, block*> and <block, node*> maps for the job input files - N is the desired blocks/task - for (node : nodes) pop N blocks off each nodes list and add it to the list of tasks - as each block is popped, also remove it from all other node's lists, using the other map to accelerate this - repeat until nodes have fewer than N blocks, then emit tasks with fewer than N blocks as the tail of the job Wouldn't that work? > Combining multiple input blocks into one mapper > ----------------------------------------------- > > Key: HADOOP-2560 > URL: https://issues.apache.org/jira/browse/HADOOP-2560 > Project: Hadoop > Issue Type: Bug > Reporter: Runping Qi > > Currently, an input split contains a consecutive chunk of input file, which > by default, corresponding to a DFS block. > This may lead to a large number of mapper tasks if the input data is large. > This leads to the following problems: > 1. Shuffling cost: since the framework has to move M * R map output segments > to the nodes running reducers, > larger M means larger shuffling cost. > 2. High JVM initialization overhead > 3. Disk fragmentation: larger number of map output files means lower read > throughput for accessing them. > Ideally, you want to keep the number of mappers to no more than 16 times the > number of nodes in the cluster. > To achive that, we can increase the input split size. However, if a split > span over more than one dfs block, > you lose the data locality scheduling benefits. > One way to address this problem is to combine multiple input blocks with the > same rack into one split. > If in average we combine B blocks into one split, then we will reduce the > number of mappers by a factor of B. > Since all the blocks for one mapper share a rack, thus we can benefit from > rack-aware scheduling. > Thoughts? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.