[ https://issues.apache.org/jira/browse/HADOOP-2560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557396#action_12557396 ]
eric baldeschwieler commented on HADOOP-2560: --------------------------------------------- A) this is important because it could lead to big throughput gains and seek reductions. B) It is not going to work to combine splits statically because block replicas are not co-resident. This would lead to a huge performance hit due to loss of locality. I think we will need to invest in more complexity to get the desired performance improvement here. My gut is that we should do this dynamically in the task trackers. This would let us do it when we are seeing good io throughput. The map driver could always just request a new split after each input finishes. The TT would keep a small number of candidate splits locally and decide after each map completes a split if it is going to hand it another one. None of the public interfaces would need to change. We would need to change the JT quite a bit to manage maps publishing split collections, but it seems fairly straightforward. We could realize a huge performance gain on simple scanning jobs that process input quickly. We could also see good shuffle improvements. This would interact with speculative execution in undesirable ways... Something to watch-out for. There are a whole class of collation optimizations here. The fact that we are sorting early may make a lot of them harder... ugh. A related idea runping and I discussed is that if you have multiple spills in a map (combined or unchanged map), there is no point collating the spills if the reduce partitions are relatively large (say 1MB). We could just make each spill an output to the reduces. Even if they are small it would be more efficient to collate in larger units than within a single map, but that starts really broadening the design space... Could static splits combinations work at all? Yes I think they might if we produced only a small number and executed them early, but this would reduce the possible gain we could get. > Combining multiple input blocks into one mapper > ----------------------------------------------- > > Key: HADOOP-2560 > URL: https://issues.apache.org/jira/browse/HADOOP-2560 > Project: Hadoop > Issue Type: Bug > Reporter: Runping Qi > > Currently, an input split contains a consecutive chunk of input file, which > by default, corresponding to a DFS block. > This may lead to a large number of mapper tasks if the input data is large. > This leads to the following problems: > 1. Shuffling cost: since the framework has to move M * R map output segments > to the nodes running reducers, > larger M means larger shuffling cost. > 2. High JVM initialization overhead > 3. Disk fragmentation: larger number of map output files means lower read > throughput for accessing them. > Ideally, you want to keep the number of mappers to no more than 16 times the > number of nodes in the cluster. > To achive that, we can increase the input split size. However, if a split > span over more than one dfs block, > you lose the data locality scheduling benefits. > One way to address this problem is to combine multiple input blocks with the > same rack into one split. > If in average we combine B blocks into one split, then we will reduce the > number of mappers by a factor of B. > Since all the blocks for one mapper share a rack, thus we can benefit from > rack-aware scheduling. > Thoughts? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.