[ 
https://issues.apache.org/jira/browse/MRUNIT-88?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427579#comment-13427579
 ] 

Dave Beech commented on MRUNIT-88:
----------------------------------

As I've been looking at grouping comparators and working on the "shuffle" 
method in MRUNIT-127, I've had an idea about how we could test partitioners 
under the current framework. 

In the MapReduceDriver, as outputs come from the mapper we could apply a 
user-specified partitioner and then organise the map outputs into numbered 
buckets which represent the reduce slots. Then we would apply the sorting / 
grouping logic currently found in "shuffle" to each bucket and call reduce 
appropriately. 

This would allow a more realistic test of mapreduce techniques like secondary 
sort, which require the grouping & sorting comparators plus the partitioner to 
all be correct and work properly together. 
                
> MRUnit should support custom partitioners, comparator, and groupComparator
> --------------------------------------------------------------------------
>
>                 Key: MRUNIT-88
>                 URL: https://issues.apache.org/jira/browse/MRUNIT-88
>             Project: MRUnit
>          Issue Type: Improvement
>            Reporter: Matthew Rathbone
>              Labels: partitioners
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> We're building something that essentially does a secondary sort, to test that 
> we need to be able to specify comparators and partitioners.
> Example:
> the following two tuple keys: (id1, source1), (id1, source2)
> should be grouped together based on the first value of the tuple, and their 
> records should end up in the same reducer
> To do this we have our own custom partitioner / comparator, this is what we 
> need to test through the whole pipeline in this way:
> MapReduceDriver.setPartitioner(p)
> MapReduceDriver.setGroupComparator(c)
> I'm not familiar enough with the MRUnit code to add this easily, but I 
> suspect it would be pretty quick to do.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to