Thank you for your response Unfortunately I cannot share a thread dump. What are you looking for exactly?
Here is the list of the 50 biggest objects (retained size order, descendent): java.util.concurrent.ArrayBlockingQueue# java.lang.Object[]# org.apache.spark.streaming.receiver.BlockGenerator$Block# scala.collection.mutable.ArrayBuffer# java.lang.Object[]# org.apache.spark.streaming.receiver.BlockGenerator# scala.collection.mutable.ArrayBuffer# java.lang.Object[]# scala.concurrent.forkjoin.ForkJoinPool# scala.concurrent.forkjoin.ForkJoinPool$WorkQueue[]# org.apache.spark.streaming.receiver.BlockGenerator$Block# scala.collection.mutable.ArrayBuffer# java.lang.Object[]# org.apache.spark.streaming.receiver.BlockGenerator$Block# scala.collection.mutable.ArrayBuffer# java.lang.Object[]# org.apache.spark.storage.MemoryStore# java.util.LinkedHashMap# scala.concurrent.forkjoin.ForkJoinPool$WorkQueue# scala.concurrent.forkjoin.ForkJoinTask[]# scala.concurrent.forkjoin.ForkJoinPool$WorkQueue# scala.concurrent.forkjoin.ForkJoinTask[]# scala.concurrent.forkjoin.ForkJoinPool$WorkQueue# scala.concurrent.forkjoin.ForkJoinTask[]# org.apache.spark.streaming.receiver.BlockGenerator$Block# scala.collection.mutable.ArrayBuffer# java.lang.Object[]# org.apache.spark.streaming.receiver.BlockGenerator$Block# scala.collection.mutable.ArrayBuffer# java.lang.Object[]# org.apache.spark.streaming.receiver.BlockGenerator$Block# scala.collection.mutable.ArrayBuffer# java.lang.Object[]# org.apache.spark.streaming.receiver.BlockGenerator$Block# scala.collection.mutable.ArrayBuffer# java.lang.Object[]# org.apache.spark.streaming.receiver.BlockGenerator$Block# scala.collection.mutable.ArrayBuffer# java.lang.Object[]# org.apache.spark.streaming.receiver.BlockGenerator$Block# scala.collection.mutable.ArrayBuffer# java.lang.Object[]# org.apache.spark.streaming.receiver.BlockGenerator$Block# scala.collection.mutable.ArrayBuffer# java.lang.Object[]# scala.collection.Iterator$$anon$ org.apache.spark.InterruptibleIterator# scala.collection.IndexedSeqLike$Elements# scala.collection.mutable.ArrayOps$ofRef# java.lang.Object[]# On Thu, Feb 4, 2016 at 7:14 PM, Shixiong(Ryan) Zhu <shixi...@databricks.com> wrote: > Hey Udo, > > mapWithState usually uses much more memory than updateStateByKey since it > caches the states in memory. > > However, from your description, looks BlockGenerator cannot push data into > BlockManager, there may be something wrong in BlockGenerator. Could you > share the top 50 objects in the heap dump and the thread dump? > > > On Wed, Feb 3, 2016 at 9:52 AM, Udo Fholl <udofholl1...@gmail.com> wrote: > >> Hi all, >> >> I recently migrated from 'updateStateByKey' to 'mapWithState' and now I >> see a huge increase of memory. Most of it is a massive "BlockGenerator" >> (which points to a massive "ArrayBlockingQueue" that in turns point to a >> huge "Object[]"). >> >> I'm pretty sure it has to do with my code, but I barely changed anything >> in the code. Just adapted the function. >> >> Did anyone run into this? >> >> Best regards, >> Udo. >> > >