[
https://issues.apache.org/jira/browse/CRUNCH-528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Brandon Vargo updated CRUNCH-528:
---------------------------------
Attachment: 0001-Pair-Fix-comparison-for-large-hash-codes.patch
A patch against 06688d5 (current master).
> Pair: Integer overflow during comparison cause inconsistent sort.
> -----------------------------------------------------------------
>
> Key: CRUNCH-528
> URL: https://issues.apache.org/jira/browse/CRUNCH-528
> Project: Crunch
> Issue Type: Bug
> Components: Core
> Reporter: Brandon Vargo
> Assignee: Josh Wills
> Priority: Minor
> Attachments: 0001-Pair-Fix-comparison-for-large-hash-codes.patch
>
>
> Pair uses the hash code of each value for comparison if the values are not
> themselves comparable. If the hash code values are too large, then the values
> will wrap when doing subtraction. This results in a comparison function that
> is not transitive.
> Among other things, this makes Joins using the in-memory pipeline not work,
> since the in-memory shuffler uses a TreeMap if the key type is Comparable.
> Since the key in a join is a Pair of the original key and a join tag, the key
> is always comparable. With a non-transitive comparison function, it is
> possible for the two join tags of the original key to sort differently,
> resulting in the two join tags not being adjacent for the original key. This
> results either in either the cross product erroneously producing no values in
> the case of an inner join, since the two join tags are not adjacent, or null
> values appearing when they should not in the case of an outer join.
> As a workaround, ensure that the key used in a Join is comparable.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)