streaming in 1.6.0 slower than 1.5.1

2016-01-28 Thread Jesse F Chen


I ran the same streaming application (compiled individually for 1.5.1 and
1.6.0) that processes 5-second tweet batches.

I noticed two things:

1. 10% regression in 1.6.0 vs 1.5.1

 Spark v1.6.0: 1,564 tweets/s
 Spark v1.5.1: 1,747 tweets/s

2. 1.6.0 streaming seems to have a memory leak.

1.6.0, processing time gradually increases and eventually exceeds 5 seconds
so batches started to queue up.
While in 1.5.1, no such slow down.  See chart below to see the increasing
scheduling delay in 1.6:




I captured heap dumps in two version and did a comparison. I noticed the
Byte base class is using 50X more space in 1.5.1.

Here are some top classes in heap histogram and references.

Heap Histogram

All Classes (excluding platform)
1.6.0 Streaming 1.5.1 Streaming
Class   Instance Count  Total Size  Class   Instance Count
Total Size
class [B84533,227,649,599   class [B5095
62,938,466
class [C44682   4,255,502   class [C130482  
12,844,182
class java.lang.reflect.Method  90591,177,670   class
java.lang.String130171  1,562,052


References by Type  References by Type

class [B [0x640039e38]  class [B [0x6c020bb08]

Referrers by Type   Referrers by Type

Class   Count   Class   Count
java.nio.HeapByteBuffer 3239sun.security.util.DerInputBuffer
1233
sun.security.util.DerInputBuffer1233
sun.security.util.ObjectIdentifier  620
sun.security.util.ObjectIdentifier  620 [[B 397
[Ljava.lang.Object; 408 java.lang.reflect.Method
326




The total size by class B is 3GB in 1.5.1 and only 60MB in 1.6.0.
The Java.nio.HeapByteBuffer referencing class did not show up in top in
1.5.1.

I have also placed jstack output for 1.5.1 and 1.6.0 online..you can get
them here

https://ibm.box.com/sparkstreaming-jstack160
https://ibm.box.com/sparkstreaming-jstack151

Jesse







Re: streaming in 1.6.0 slower than 1.5.1

2016-01-28 Thread Ted Yu
bq. The total size by class B is 3GB in 1.5.1 and only 60MB in 1.6.0.

>From the information you posted, it seems the above is backwards.

BTW [B is byte[], not class B.

FYI

On Thu, Jan 28, 2016 at 11:49 AM, Jesse F Chen  wrote:

> I ran the same streaming application (compiled individually for 1.5.1 and
> 1.6.0) that processes 5-second tweet batches.
>
> I noticed two things:
>
> 1. 10% regression in 1.6.0 vs 1.5.1
>
> Spark v1.6.0: 1,564 tweets/s
> Spark v1.5.1: 1,747 tweets/s
>
> 2. 1.6.0 streaming seems to have a memory leak.
>
> 1.6.0, processing time gradually increases and eventually exceeds 5
> seconds so batches started to queue up.
> While in 1.5.1, no such slow down. See chart below to see the increasing
> scheduling delay in 1.6:
>
>
>
> I captured heap dumps in two version and did a comparison. I noticed the
> Byte base class is using 50X more space in 1.5.1.
>
> Here are some top classes in heap histogram and references.
>
> Heap Histogram
>
> All Classes (excluding platform)
> 1.6.0 Streaming 1.5.1 Streaming
> Class Instance Count Total Size Class Instance Count Total Size
> class [B 8453 *3,227,649,599 * class [B 5095 62,938,466
> class [C 44682 4,255,502 class [C 130482 12,844,182
> class java.lang.reflect.Method 9059 1,177,670 class java.lang.String
> 130171 1,562,052
>
>
> References by Type References by Type
>
> class [B [0x640039e38] class [B [0x6c020bb08]
>
> Referrers by Type Referrers by Type
>
> Class Count Class Count
> java.nio.HeapByteBuffer *3239* sun.security.util.DerInputBuffer 1233
> sun.security.util.DerInputBuffer 1233 sun.security.util.ObjectIdentifier
> 620
> sun.security.util.ObjectIdentifier 620 [[B 397
> [Ljava.lang.Object; 408 java.lang.reflect.Method 326
>
>
> 
>
> The total size by class B is 3GB in 1.5.1 and only 60MB in 1.6.0.
> The Java.nio.HeapByteBuffer referencing class did not show up in top in
> 1.5.1.
>
> I have also placed jstack output for 1.5.1 and 1.6.0 online..you can get
> them here
>
> https://ibm.box.com/sparkstreaming-jstack160
> https://ibm.box.com/sparkstreaming-jstack151
>
> Jesse
>
>
>
>
>
>
>


Re: streaming in 1.6.0 slower than 1.5.1

2016-01-28 Thread Shixiong(Ryan) Zhu
Hey Jesse,

Could you provide the operators you using?

For the heap dump, it may be not a real memory leak. Since batches started
to queue up, the memory usage should increase.

On Thu, Jan 28, 2016 at 11:54 AM, Ted Yu  wrote:

> bq. The total size by class B is 3GB in 1.5.1 and only 60MB in 1.6.0.
>
> From the information you posted, it seems the above is backwards.
>
> BTW [B is byte[], not class B.
>
> FYI
>
> On Thu, Jan 28, 2016 at 11:49 AM, Jesse F Chen  wrote:
>
>> I ran the same streaming application (compiled individually for 1.5.1 and
>> 1.6.0) that processes 5-second tweet batches.
>>
>> I noticed two things:
>>
>> 1. 10% regression in 1.6.0 vs 1.5.1
>>
>> Spark v1.6.0: 1,564 tweets/s
>> Spark v1.5.1: 1,747 tweets/s
>>
>> 2. 1.6.0 streaming seems to have a memory leak.
>>
>> 1.6.0, processing time gradually increases and eventually exceeds 5
>> seconds so batches started to queue up.
>> While in 1.5.1, no such slow down. See chart below to see the increasing
>> scheduling delay in 1.6:
>>
>>
>>
>> I captured heap dumps in two version and did a comparison. I noticed the
>> Byte base class is using 50X more space in 1.5.1.
>>
>> Here are some top classes in heap histogram and references.
>>
>> Heap Histogram
>>
>> All Classes (excluding platform)
>> 1.6.0 Streaming 1.5.1 Streaming
>> Class Instance Count Total Size Class Instance Count Total Size
>> class [B 8453 *3,227,649,599 * class [B 5095 62,938,466
>> class [C 44682 4,255,502 class [C 130482 12,844,182
>> class java.lang.reflect.Method 9059 1,177,670 class java.lang.String
>> 130171 1,562,052
>>
>>
>> References by Type References by Type
>>
>> class [B [0x640039e38] class [B [0x6c020bb08]
>>
>> Referrers by Type Referrers by Type
>>
>> Class Count Class Count
>> java.nio.HeapByteBuffer *3239* sun.security.util.DerInputBuffer 1233
>> sun.security.util.DerInputBuffer 1233 sun.security.util.ObjectIdentifier
>> 620
>> sun.security.util.ObjectIdentifier 620 [[B 397
>> [Ljava.lang.Object; 408 java.lang.reflect.Method 326
>>
>>
>> 
>>
>> The total size by class B is 3GB in 1.5.1 and only 60MB in 1.6.0.
>> The Java.nio.HeapByteBuffer referencing class did not show up in top in
>> 1.5.1.
>>
>> I have also placed jstack output for 1.5.1 and 1.6.0 online..you can get
>> them here
>>
>> https://ibm.box.com/sparkstreaming-jstack160
>> https://ibm.box.com/sparkstreaming-jstack151
>>
>> Jesse
>>
>>
>>
>>
>>
>>
>>
>