[ 
https://issues.apache.org/jira/browse/HADOOP-2560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557396#action_12557396
 ] 

eric baldeschwieler commented on HADOOP-2560:
---------------------------------------------

A) this is important because it could lead to big throughput gains and seek 
reductions.

B) It is not going to work to combine splits statically because block replicas 
are not co-resident.  This would lead to a huge performance hit due to loss of 
locality.  I think we will need to invest in more complexity to get the desired 
performance improvement here.

My gut is that we should do this dynamically in the task trackers.  This would 
let us do it when we are seeing good io throughput.  The map driver could 
always just request a new split after each input finishes.  The TT would keep a 
small number of candidate splits locally and decide after each map completes a 
split if it is going to hand it another one.  None of the public interfaces 
would need to change.

We would need to change the JT quite a bit to manage maps publishing split 
collections, but it seems fairly straightforward.  We could realize a huge 
performance gain on simple scanning jobs that process input quickly.  We could 
also see good shuffle improvements.

This would interact with speculative execution in undesirable ways...  
Something to watch-out for.  There are a whole class of collation optimizations 
here.  The fact that we are sorting early may make a lot of them harder...  ugh.

A related idea runping and I discussed is that if you have multiple spills in a 
map (combined or unchanged map), there is no point collating the spills if the 
reduce partitions are relatively large (say 1MB).  We could just make each 
spill an output to the reduces.  Even if they are small it would be more 
efficient to collate in larger units than within a single map, but that starts 
really broadening the design space...

Could static splits combinations work at all?  Yes I think they might if we 
produced only a small number and executed them early, but this would reduce the 
possible gain we could get.

> Combining multiple input blocks into one mapper
> -----------------------------------------------
>
>                 Key: HADOOP-2560
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2560
>             Project: Hadoop
>          Issue Type: Bug
>            Reporter: Runping Qi
>
> Currently, an input split contains a consecutive chunk of input file, which 
> by default, corresponding to a DFS block.
> This may lead to a large number of mapper tasks if the input data is large. 
> This leads to the following problems:
> 1. Shuffling cost: since the framework has to move M * R map output segments 
> to the nodes running reducers, 
> larger M means larger shuffling cost.
> 2. High JVM initialization overhead
> 3. Disk fragmentation: larger number of map output files means lower read 
> throughput for accessing them.
> Ideally, you want to keep the number of mappers to no more than 16 times the 
> number of  nodes in the cluster.
> To achive that, we can increase the input split size. However, if a split 
> span over more than one dfs block,
> you lose the data locality scheduling benefits.
> One way to address this problem is to combine multiple input blocks with the 
> same rack into one split.
> If in average we combine B blocks into one split, then we will reduce the 
> number of mappers by a factor of B.
> Since all the blocks for one mapper share a rack, thus we can benefit from 
> rack-aware scheduling.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to