[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13563988#comment-13563988
 ] 

Jerry Chen commented on MAPREDUCE-4961:
---------------------------------------

[~asokan]
Really thanks for your review and suggestion. I thinked about your suggestion. 
It can be done in that way.

While the problem is that when the MergeManager don't provide a interface 
method for local merge case, other implementations of merge manager will still 
hard to benefit from the MergeManager abstraction. For example, when I am 
considering the HashMergeManager, I need to call into HashMergeManager for 
local merge because the merge process is different with the static 
Merege.merge. So the HashShuffle still needs to deal with specially on this in 
runLocal(). Although this is not a big issue, yet the purpose of MergeManager 
interface is to provide an abstraction layer for Shuffle use.

While I am not very insisting on chaning MergeManager, if you think the above 
reason making sense, let keep the change in MergeManager; Otherwise, let's take 
your approach. Please kindly give your idea on this.

Jerry


                
> Map reduce running local should also go through ShuffleConsumerPlugin for 
> enabling different MergeManager implementations
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4961
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4961
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: trunk
>            Reporter: Jerry Chen
>            Assignee: Jerry Chen
>         Attachments: MAPREDUCE-4961.patch
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> MAPREDUCE-4049 provide the ability for pluggable Shuffle and MAPREDUCE-4080 
> extends Shuffle to be able to provide different MergeManager implementations. 
> While using these pluggable features, I find that when a map reduce is 
> running locally, a RawKeyValueIterator was returned directly from a static 
> call of Merge.merge, which break the assumption that the Shuffle may provide 
> different merge methods although there is no copy phase for this situation.
> The use case is when I am implementating a hash-based MergeManager, we don't 
> need sort in map side, while when running the map reduce locally, the 
> hash-based MergeManager will have no chance to be used as it goes directly to 
> Merger.merge. This makes the pluggable Shuffle and MergeManager incomplete.
> So we need to move the code calling Merger.merge from Reduce Task to 
> ShuffleConsumerPlugin implementation, so that the Suffle implementation can 
> decide how to do the merge and return corresponding iterator.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to