Hi Josh,

Thanks for the answer.
No, in my case, the byte arrays are the values... I use indexes generated by 
zipWithIndex as the keys (I inverse the RDD to put them in the front).
However, if I clone the bytearrays before joining the RDDs, it seems to fix my 
problem (but I'm not sure why)

-- R.V



> Le 3 déc. 2015 à 15:51, Josh Rosen <joshro...@databricks.com> a écrit :
> 
> Are they keys that you're joining on the bye arrays themselves? If so, that's 
> not likely to work because of how Java computes arrays' hashCodes; see 
> https://issues.apache.org/jira/browse/SPARK-597 
> <https://issues.apache.org/jira/browse/SPARK-597>. If this turns out to be 
> the problem, we should look into strengthening the checks for array-type keys 
> in order to detect and fail fast for this join() case.
> 
> On Thu, Dec 3, 2015 at 8:58 AM, Hervé Yviquel <ellde...@gmail.com 
> <mailto:ellde...@gmail.com>> wrote:
> Hi all,
> 
> I have problem when using Array[Byte] in RDD operation.
> When I join two different RDDs of type [(Long, Array[Byte])], I obtain wrong 
> results... But if I translate the byte array in integer and join two 
> different RDDs of type [(Long, Integer)], then the results is correct... Any 
> idea ?
> 
> ----------
> The code:
> 
> val byteRDD0 = sc.binaryRecords(path_arg0, 4).zipWithIndex.map{x => (x._2, 
> x._1)}
> val byteRDD1 = sc.binaryRecords(path_arg1, 4).zipWithIndex.map{x => (x._2, 
> x._1)}
> 
> byteRDD0.foreach{x => println("BYTE0 " + x._1 + "=> " 
> +ByteBuffer.wrap(x._2).order(java.nio.ByteOrder.LITTLE_ENDIAN).getInt())}
> byteRDD1.foreach{x => println("BYTE1 " + x._1 + "=> " 
> +ByteBuffer.wrap(x._2).order(java.nio.ByteOrder.LITTLE_ENDIAN).getInt())}
> 
> val intRDD1 = byteRDD1.mapValues{x=> 
> ByteBuffer.wrap(x).order(java.nio.ByteOrder.LITTLE_ENDIAN).getInt()}
> val intRDD2 = byteRDD2.mapValues{x=> 
> ByteBuffer.wrap(x).order(java.nio.ByteOrder.LITTLE_ENDIAN).getInt()}
> 
> val byteJOIN = byteRDD1.join(byteRDD2)
> byteJOIN.foreach{x => println("BYTEJOIN " + x._1 + "=> " + 
> ByteBuffer.wrap(x._2._1).order(java.nio.ByteOrder.LITTLE_ENDIAN).getInt() + " 
> - 
> "+ByteBuffer.wrap(x._2._2).order(java.nio.ByteOrder.LITTLE_ENDIAN).getInt())}
> 
> val intJOIN = intRDD1.join(intRDD2)
> intJOIN.foreach{x => println("INTJOIN " + x._1 + "=> " + x._2._1 + " - "+ 
> x._2._2)}
> 
> 
> ----------
> stdout:
> 
> BYTE0 0=> 1
> BYTE0 1=> 3
> BYTE0 2=> 5
> BYTE0 3=> 7
> BYTE0 4=> 9
> BYTE0 5=> 11
> BYTE0 6=> 13
> BYTE0 7=> 15
> BYTE0 8=> 17
> BYTE0 9=> 19
> BYTE0 10=> 21
> BYTE0 11=> 23
> BYTE0 12=> 25
> BYTE0 13=> 27
> BYTE0 14=> 29
> BYTE0 15=> 31
> BYTE0 16=> 33
> BYTE0 17=> 35
> BYTE0 18=> 37
> BYTE0 19=> 39
> BYTE0 20=> 41
> BYTE0 21=> 43
> BYTE0 22=> 45
> BYTE0 23=> 47
> BYTE0 24=> 49
> BYTE0 25=> 51
> BYTE0 26=> 53
> BYTE0 27=> 55
> BYTE0 28=> 57
> BYTE0 29=> 59
> BYTE1 0=> 0
> BYTE1 1=> 1
> BYTE1 2=> 2
> BYTE1 3=> 3
> BYTE1 4=> 4
> BYTE1 5=> 5
> BYTE1 6=> 6
> BYTE1 7=> 7
> BYTE1 8=> 8
> BYTE1 9=> 9
> BYTE1 10=> 10
> BYTE1 11=> 11
> BYTE1 12=> 12
> BYTE1 13=> 13
> BYTE1 14=> 14
> BYTE1 15=> 15
> BYTE1 16=> 16
> BYTE1 17=> 17
> BYTE1 18=> 18
> BYTE1 19=> 19
> BYTE1 20=> 20
> BYTE1 21=> 21
> BYTE1 22=> 22
> BYTE1 23=> 23
> BYTE1 24=> 24
> BYTE1 25=> 25
> BYTE1 26=> 26
> BYTE1 27=> 27
> BYTE1 28=> 28
> BYTE1 29=> 29
> BYTEJOIN 13=> 1 - 0
> BYTEJOIN 19=> 1 - 0
> BYTEJOIN 15=> 1 - 0
> BYTEJOIN 4=> 1 - 0
> BYTEJOIN 21=> 1 - 0
> BYTEJOIN 16=> 1 - 0
> BYTEJOIN 22=> 1 - 0
> BYTEJOIN 25=> 1 - 0
> BYTEJOIN 28=> 1 - 0
> BYTEJOIN 29=> 1 - 0
> BYTEJOIN 11=> 1 - 0
> BYTEJOIN 14=> 1 - 0
> BYTEJOIN 27=> 1 - 0
> BYTEJOIN 0=> 1 - 0
> BYTEJOIN 24=> 1 - 0
> BYTEJOIN 23=> 1 - 0
> BYTEJOIN 1=> 1 - 0
> BYTEJOIN 6=> 1 - 0
> BYTEJOIN 17=> 1 - 0
> BYTEJOIN 3=> 1 - 0
> BYTEJOIN 7=> 1 - 0
> BYTEJOIN 9=> 1 - 0
> BYTEJOIN 8=> 1 - 0
> BYTEJOIN 12=> 1 - 0
> BYTEJOIN 18=> 1 - 0
> BYTEJOIN 20=> 1 - 0
> BYTEJOIN 26=> 1 - 0
> BYTEJOIN 10=> 1 - 0
> BYTEJOIN 5=> 1 - 0
> BYTEJOIN 2=> 1 - 0
> INTJOIN 13=> 27 - 13
> INTJOIN 19=> 39 - 19
> INTJOIN 15=> 31 - 15
> INTJOIN 4=> 9 - 4
> INTJOIN 21=> 43 - 21
> INTJOIN 16=> 33 - 16
> INTJOIN 22=> 45 - 22
> INTJOIN 25=> 51 - 25
> INTJOIN 28=> 57 - 28
> INTJOIN 29=> 59 - 29
> INTJOIN 11=> 23 - 11
> INTJOIN 14=> 29 - 14
> INTJOIN 27=> 55 - 27
> INTJOIN 0=> 1 - 0
> INTJOIN 24=> 49 - 24
> INTJOIN 23=> 47 - 23
> INTJOIN 1=> 3 - 1
> INTJOIN 6=> 13 - 6
> INTJOIN 17=> 35 - 17
> INTJOIN 3=> 7 - 3
> INTJOIN 7=> 15 - 7
> INTJOIN 9=> 19 - 9
> INTJOIN 8=> 17 - 8
> INTJOIN 12=> 25 - 12
> INTJOIN 18=> 37 - 18
> INTJOIN 20=> 41 - 20
> INTJOIN 26=> 53 - 26
> INTJOIN 10=> 21 - 10
> INTJOIN 5=> 11 - 5
> INTJOIN 2=> 5 - 2
> 
> 
> Thanks,
> Hervé
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
> <mailto:user-unsubscr...@spark.apache.org>
> For additional commands, e-mail: user-h...@spark.apache.org 
> <mailto:user-h...@spark.apache.org>
> 
> 

Reply via email to