Problem with RDD of (Long, Byte[Array])

2015-12-03 Thread Hervé Yviquel
Hi all, I have problem when using Array[Byte] in RDD operation. When I join two different RDDs of type [(Long, Array[Byte])], I obtain wrong results... But if I translate the byte array in integer and join two different RDDs of type [(Long, Integer)], then the results is correct... Any idea ?

Re: Problem with RDD of (Long, Byte[Array])

2015-12-03 Thread Josh Rosen
Are they keys that you're joining on the bye arrays themselves? If so, that's not likely to work because of how Java computes arrays' hashCodes; see https://issues.apache.org/jira/browse/SPARK-597. If this turns out to be the problem, we should look into strengthening the checks for array-type

Re: Problem with RDD of (Long, Byte[Array])

2015-12-03 Thread Hervé Yviquel
Hi Josh, Thanks for the answer. No, in my case, the byte arrays are the values... I use indexes generated by zipWithIndex as the keys (I inverse the RDD to put them in the front). However, if I clone the bytearrays before joining the RDDs, it seems to fix my problem (but I'm not sure why) --