Brandon Vargo created CRUNCH-528:
------------------------------------
Summary: Pair: Integer overflow during comparison cause
inconsistent sort.
Key: CRUNCH-528
URL: https://issues.apache.org/jira/browse/CRUNCH-528
Project: Crunch
Issue Type: Bug
Components: Core
Reporter: Brandon Vargo
Assignee: Josh Wills
Priority: Minor
Pair uses the hash code of each value for comparison if the values are not
themselves comparable. If the hash code values are too large, then the values
will wrap when doing subtraction. This results in a comparison function that is
not transitive.
Among other things, this makes Joins using the in-memory pipeline not work,
since the in-memory shuffler uses a TreeMap if the key type is Comparable.
Since the key in a join is a Pair of the original key and a join tag, the key
is always comparable. With a non-transitive comparison function, it is possible
for the two join tags of the original key to sort differently, resulting in the
two join tags not being adjacent for the original key. This results either in
either the cross product erroneously producing no values in the case of an
inner join, since the two join tags are not adjacent, or null values appearing
when they should not in the case of an outer join.
As a workaround, ensure that the key used in a Join is comparable.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)