[
https://issues.apache.org/jira/browse/MRUNIT-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13426149#comment-13426149
]
Dave Beech commented on MRUNIT-127:
-----------------------------------
Consider the following keys sorted in natural order: A1, A2, B1, B2, C1.
If you have a grouping comparator that only considers the first character, you
get 3 reduce groups. This is correct behaviour.
(A1, A2), (B1, B2) and (C1)
But - if instead the keys are 1A, 2A, 1B, 2B, 1C but you still want to group
according to whether it's an A,B or C, you may think you can implement a
grouping comparator which only considers the second character. This doesn't
work. This is because when the keys are sorted in natural order you get 1A, 1B,
1C, 2A, 2B and the groups are not contiguous. So, MapReduce will give you 5
reduce groups but MRUnit will still group these into 3.
> Key grouping with GroupingComparators is not consistent with MapReduce
> behaviour
> --------------------------------------------------------------------------------
>
> Key: MRUNIT-127
> URL: https://issues.apache.org/jira/browse/MRUNIT-127
> Project: MRUnit
> Issue Type: Bug
> Affects Versions: 0.9.0
> Reporter: Dave Beech
>
> In Hadoop MapReduce, for a set of keys to be properly grouped together by a
> grouping comparator into a reduce call, they need to be in a contiguous range
> when sorted by the key's ordering comparator.
> MRUnit does not impose this requirement, so if the user's grouping and
> sorting comparator logic is incorrect, their tests may pass and give the
> expected result even though the outcome would be different when run as a true
> MapReduce job.
> (see below for further explanation)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira