Hi all,
I have problem when using Array[Byte] in RDD operation.
When I join two different RDDs of type [(Long, Array[Byte])], I obtain wrong
results... But if I translate the byte array in integer and join two different
RDDs of type [(Long, Integer)], then the results is correct... Any idea ?
Are they keys that you're joining on the bye arrays themselves? If so,
that's not likely to work because of how Java computes arrays' hashCodes;
see https://issues.apache.org/jira/browse/SPARK-597. If this turns out to
be the problem, we should look into strengthening the checks for array-type
Hi Josh,
Thanks for the answer.
No, in my case, the byte arrays are the values... I use indexes generated by
zipWithIndex as the keys (I inverse the RDD to put them in the front).
However, if I clone the bytearrays before joining the RDDs, it seems to fix my
problem (but I'm not sure why)
--