Hi Josh, Thanks for the answer. No, in my case, the byte arrays are the values... I use indexes generated by zipWithIndex as the keys (I inverse the RDD to put them in the front). However, if I clone the bytearrays before joining the RDDs, it seems to fix my problem (but I'm not sure why)
-- R.V > Le 3 déc. 2015 à 15:51, Josh Rosen <joshro...@databricks.com> a écrit : > > Are they keys that you're joining on the bye arrays themselves? If so, that's > not likely to work because of how Java computes arrays' hashCodes; see > https://issues.apache.org/jira/browse/SPARK-597 > <https://issues.apache.org/jira/browse/SPARK-597>. If this turns out to be > the problem, we should look into strengthening the checks for array-type keys > in order to detect and fail fast for this join() case. > > On Thu, Dec 3, 2015 at 8:58 AM, Hervé Yviquel <ellde...@gmail.com > <mailto:ellde...@gmail.com>> wrote: > Hi all, > > I have problem when using Array[Byte] in RDD operation. > When I join two different RDDs of type [(Long, Array[Byte])], I obtain wrong > results... But if I translate the byte array in integer and join two > different RDDs of type [(Long, Integer)], then the results is correct... Any > idea ? > > ---------- > The code: > > val byteRDD0 = sc.binaryRecords(path_arg0, 4).zipWithIndex.map{x => (x._2, > x._1)} > val byteRDD1 = sc.binaryRecords(path_arg1, 4).zipWithIndex.map{x => (x._2, > x._1)} > > byteRDD0.foreach{x => println("BYTE0 " + x._1 + "=> " > +ByteBuffer.wrap(x._2).order(java.nio.ByteOrder.LITTLE_ENDIAN).getInt())} > byteRDD1.foreach{x => println("BYTE1 " + x._1 + "=> " > +ByteBuffer.wrap(x._2).order(java.nio.ByteOrder.LITTLE_ENDIAN).getInt())} > > val intRDD1 = byteRDD1.mapValues{x=> > ByteBuffer.wrap(x).order(java.nio.ByteOrder.LITTLE_ENDIAN).getInt()} > val intRDD2 = byteRDD2.mapValues{x=> > ByteBuffer.wrap(x).order(java.nio.ByteOrder.LITTLE_ENDIAN).getInt()} > > val byteJOIN = byteRDD1.join(byteRDD2) > byteJOIN.foreach{x => println("BYTEJOIN " + x._1 + "=> " + > ByteBuffer.wrap(x._2._1).order(java.nio.ByteOrder.LITTLE_ENDIAN).getInt() + " > - > "+ByteBuffer.wrap(x._2._2).order(java.nio.ByteOrder.LITTLE_ENDIAN).getInt())} > > val intJOIN = intRDD1.join(intRDD2) > intJOIN.foreach{x => println("INTJOIN " + x._1 + "=> " + x._2._1 + " - "+ > x._2._2)} > > > ---------- > stdout: > > BYTE0 0=> 1 > BYTE0 1=> 3 > BYTE0 2=> 5 > BYTE0 3=> 7 > BYTE0 4=> 9 > BYTE0 5=> 11 > BYTE0 6=> 13 > BYTE0 7=> 15 > BYTE0 8=> 17 > BYTE0 9=> 19 > BYTE0 10=> 21 > BYTE0 11=> 23 > BYTE0 12=> 25 > BYTE0 13=> 27 > BYTE0 14=> 29 > BYTE0 15=> 31 > BYTE0 16=> 33 > BYTE0 17=> 35 > BYTE0 18=> 37 > BYTE0 19=> 39 > BYTE0 20=> 41 > BYTE0 21=> 43 > BYTE0 22=> 45 > BYTE0 23=> 47 > BYTE0 24=> 49 > BYTE0 25=> 51 > BYTE0 26=> 53 > BYTE0 27=> 55 > BYTE0 28=> 57 > BYTE0 29=> 59 > BYTE1 0=> 0 > BYTE1 1=> 1 > BYTE1 2=> 2 > BYTE1 3=> 3 > BYTE1 4=> 4 > BYTE1 5=> 5 > BYTE1 6=> 6 > BYTE1 7=> 7 > BYTE1 8=> 8 > BYTE1 9=> 9 > BYTE1 10=> 10 > BYTE1 11=> 11 > BYTE1 12=> 12 > BYTE1 13=> 13 > BYTE1 14=> 14 > BYTE1 15=> 15 > BYTE1 16=> 16 > BYTE1 17=> 17 > BYTE1 18=> 18 > BYTE1 19=> 19 > BYTE1 20=> 20 > BYTE1 21=> 21 > BYTE1 22=> 22 > BYTE1 23=> 23 > BYTE1 24=> 24 > BYTE1 25=> 25 > BYTE1 26=> 26 > BYTE1 27=> 27 > BYTE1 28=> 28 > BYTE1 29=> 29 > BYTEJOIN 13=> 1 - 0 > BYTEJOIN 19=> 1 - 0 > BYTEJOIN 15=> 1 - 0 > BYTEJOIN 4=> 1 - 0 > BYTEJOIN 21=> 1 - 0 > BYTEJOIN 16=> 1 - 0 > BYTEJOIN 22=> 1 - 0 > BYTEJOIN 25=> 1 - 0 > BYTEJOIN 28=> 1 - 0 > BYTEJOIN 29=> 1 - 0 > BYTEJOIN 11=> 1 - 0 > BYTEJOIN 14=> 1 - 0 > BYTEJOIN 27=> 1 - 0 > BYTEJOIN 0=> 1 - 0 > BYTEJOIN 24=> 1 - 0 > BYTEJOIN 23=> 1 - 0 > BYTEJOIN 1=> 1 - 0 > BYTEJOIN 6=> 1 - 0 > BYTEJOIN 17=> 1 - 0 > BYTEJOIN 3=> 1 - 0 > BYTEJOIN 7=> 1 - 0 > BYTEJOIN 9=> 1 - 0 > BYTEJOIN 8=> 1 - 0 > BYTEJOIN 12=> 1 - 0 > BYTEJOIN 18=> 1 - 0 > BYTEJOIN 20=> 1 - 0 > BYTEJOIN 26=> 1 - 0 > BYTEJOIN 10=> 1 - 0 > BYTEJOIN 5=> 1 - 0 > BYTEJOIN 2=> 1 - 0 > INTJOIN 13=> 27 - 13 > INTJOIN 19=> 39 - 19 > INTJOIN 15=> 31 - 15 > INTJOIN 4=> 9 - 4 > INTJOIN 21=> 43 - 21 > INTJOIN 16=> 33 - 16 > INTJOIN 22=> 45 - 22 > INTJOIN 25=> 51 - 25 > INTJOIN 28=> 57 - 28 > INTJOIN 29=> 59 - 29 > INTJOIN 11=> 23 - 11 > INTJOIN 14=> 29 - 14 > INTJOIN 27=> 55 - 27 > INTJOIN 0=> 1 - 0 > INTJOIN 24=> 49 - 24 > INTJOIN 23=> 47 - 23 > INTJOIN 1=> 3 - 1 > INTJOIN 6=> 13 - 6 > INTJOIN 17=> 35 - 17 > INTJOIN 3=> 7 - 3 > INTJOIN 7=> 15 - 7 > INTJOIN 9=> 19 - 9 > INTJOIN 8=> 17 - 8 > INTJOIN 12=> 25 - 12 > INTJOIN 18=> 37 - 18 > INTJOIN 20=> 41 - 20 > INTJOIN 26=> 53 - 26 > INTJOIN 10=> 21 - 10 > INTJOIN 5=> 11 - 5 > INTJOIN 2=> 5 - 2 > > > Thanks, > Hervé > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > <mailto:user-unsubscr...@spark.apache.org> > For additional commands, e-mail: user-h...@spark.apache.org > <mailto:user-h...@spark.apache.org> > >