Dear all,
I'm facing the following problem and I can't figure how to solve it. I need to join 2 rdd in order to find their intersections. The first RDD represent an image encoded in base64 string associated with image id. The second RDD represent a set of geometric primitives (rectangle) associated with image id. My goal is to draw these primitives on the corresponding image. So my first attempt is to join images and primitives by image ids and then do the drawing. But, when I do *primitives.join(images) * I got the following error : *java.lang.OutOfMemoryError: Java heap space* * at java.util.Arrays.copyOf(Arrays.java:2367)* * at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)* * at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)* * at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:535)* * at java.lang.StringBuilder.append(StringBuilder.java:204)* * at java.io.ObjectInputStream$BlockDataInputStream.readUTFSpan(ObjectInputStream.java:3143)* * at java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3051)* * at java.io.ObjectInputStream$BlockDataInputStream.readLongUTF(ObjectInputStream.java:3034)* * at java.io.ObjectInputStream.readString(ObjectInputStream.java:1642)* * at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1341)* * at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)* * at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)* * at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)* * at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)* * at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)* * at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)* * at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)* * at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)* * at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)* * at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63)* * at org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:125)* * at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)* * at org.apache.spark.storage.BlockManager$LazyProxyIterator$1.hasNext(BlockManager.scala:1031)* * at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)* * at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)* * at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)* * at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)* * at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)* * at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)* * at scala.collection.Iterator$class.foreach(Iterator.scala:727)* * at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)* * at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158)* I notice that sometime if I change the partition of the images RDD with coalesce I can get it working. What I'm doing wrong ? Cheers, Jaonary