Re: Spark Streaming - 1.6.0: mapWithState Kinesis huge memory usage

Udo Fholl Thu, 04 Feb 2016 16:15:00 -0800

Thank you for your response

Unfortunately I cannot share  a thread dump. What are you looking for
exactly?


Here is the list of the 50 biggest objects (retained size order,
descendent):

java.util.concurrent.ArrayBlockingQueue#
java.lang.Object[]#
org.apache.spark.streaming.receiver.BlockGenerator$Block#
scala.collection.mutable.ArrayBuffer#
java.lang.Object[]#
org.apache.spark.streaming.receiver.BlockGenerator#
scala.collection.mutable.ArrayBuffer#
java.lang.Object[]#
scala.concurrent.forkjoin.ForkJoinPool#
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue[]#
org.apache.spark.streaming.receiver.BlockGenerator$Block#
scala.collection.mutable.ArrayBuffer#
java.lang.Object[]#
org.apache.spark.streaming.receiver.BlockGenerator$Block#
scala.collection.mutable.ArrayBuffer#
java.lang.Object[]#
org.apache.spark.storage.MemoryStore#
java.util.LinkedHashMap#
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue#
scala.concurrent.forkjoin.ForkJoinTask[]#
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue#
scala.concurrent.forkjoin.ForkJoinTask[]#
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue#
scala.concurrent.forkjoin.ForkJoinTask[]#
org.apache.spark.streaming.receiver.BlockGenerator$Block#
scala.collection.mutable.ArrayBuffer#
java.lang.Object[]#
org.apache.spark.streaming.receiver.BlockGenerator$Block#
scala.collection.mutable.ArrayBuffer#
java.lang.Object[]#
org.apache.spark.streaming.receiver.BlockGenerator$Block#
scala.collection.mutable.ArrayBuffer#
java.lang.Object[]#
org.apache.spark.streaming.receiver.BlockGenerator$Block#
scala.collection.mutable.ArrayBuffer#
java.lang.Object[]#
org.apache.spark.streaming.receiver.BlockGenerator$Block#
scala.collection.mutable.ArrayBuffer#
java.lang.Object[]#
org.apache.spark.streaming.receiver.BlockGenerator$Block#
scala.collection.mutable.ArrayBuffer#
java.lang.Object[]#
org.apache.spark.streaming.receiver.BlockGenerator$Block#
scala.collection.mutable.ArrayBuffer#
java.lang.Object[]#
scala.collection.Iterator$$anon$
org.apache.spark.InterruptibleIterator#
scala.collection.IndexedSeqLike$Elements#
scala.collection.mutable.ArrayOps$ofRef#
java.lang.Object[]#



On Thu, Feb 4, 2016 at 7:14 PM, Shixiong(Ryan) Zhu <shixi...@databricks.com>
wrote:

> Hey Udo,
>
> mapWithState usually uses much more memory than updateStateByKey since it
> caches the states in memory.
>
> However, from your description, looks BlockGenerator cannot push data into
> BlockManager, there may be something wrong in BlockGenerator. Could you
> share the top 50 objects in the heap dump and the thread dump?
>
>
> On Wed, Feb 3, 2016 at 9:52 AM, Udo Fholl <udofholl1...@gmail.com> wrote:
>
>> Hi all,
>>
>> I recently migrated from 'updateStateByKey' to 'mapWithState' and now I
>> see a huge increase of memory. Most of it is a massive "BlockGenerator"
>> (which points to a massive "ArrayBlockingQueue" that in turns point to a
>> huge "Object[]").
>>
>> I'm pretty sure it has to do with my code, but I barely changed anything
>> in the code. Just adapted the function.
>>
>> Did anyone run into this?
>>
>> Best regards,
>> Udo.
>>
>
>

Re: Spark Streaming - 1.6.0: mapWithState Kinesis huge memory usage

Reply via email to