streaming in 1.6.0 slower than 1.5.1
I ran the same streaming application (compiled individually for 1.5.1 and 1.6.0) that processes 5-second tweet batches. I noticed two things: 1. 10% regression in 1.6.0 vs 1.5.1 Spark v1.6.0: 1,564 tweets/s Spark v1.5.1: 1,747 tweets/s 2. 1.6.0 streaming seems to have a memory leak. 1.6.0, processing time gradually increases and eventually exceeds 5 seconds so batches started to queue up. While in 1.5.1, no such slow down. See chart below to see the increasing scheduling delay in 1.6: I captured heap dumps in two version and did a comparison. I noticed the Byte base class is using 50X more space in 1.5.1. Here are some top classes in heap histogram and references. Heap Histogram All Classes (excluding platform) 1.6.0 Streaming 1.5.1 Streaming Class Instance Count Total Size Class Instance Count Total Size class [B84533,227,649,599 class [B5095 62,938,466 class [C44682 4,255,502 class [C130482 12,844,182 class java.lang.reflect.Method 90591,177,670 class java.lang.String130171 1,562,052 References by Type References by Type class [B [0x640039e38] class [B [0x6c020bb08] Referrers by Type Referrers by Type Class Count Class Count java.nio.HeapByteBuffer 3239sun.security.util.DerInputBuffer 1233 sun.security.util.DerInputBuffer1233 sun.security.util.ObjectIdentifier 620 sun.security.util.ObjectIdentifier 620 [[B 397 [Ljava.lang.Object; 408 java.lang.reflect.Method 326 The total size by class B is 3GB in 1.5.1 and only 60MB in 1.6.0. The Java.nio.HeapByteBuffer referencing class did not show up in top in 1.5.1. I have also placed jstack output for 1.5.1 and 1.6.0 online..you can get them here https://ibm.box.com/sparkstreaming-jstack160 https://ibm.box.com/sparkstreaming-jstack151 Jesse
Re: streaming in 1.6.0 slower than 1.5.1
bq. The total size by class B is 3GB in 1.5.1 and only 60MB in 1.6.0. >From the information you posted, it seems the above is backwards. BTW [B is byte[], not class B. FYI On Thu, Jan 28, 2016 at 11:49 AM, Jesse F Chenwrote: > I ran the same streaming application (compiled individually for 1.5.1 and > 1.6.0) that processes 5-second tweet batches. > > I noticed two things: > > 1. 10% regression in 1.6.0 vs 1.5.1 > > Spark v1.6.0: 1,564 tweets/s > Spark v1.5.1: 1,747 tweets/s > > 2. 1.6.0 streaming seems to have a memory leak. > > 1.6.0, processing time gradually increases and eventually exceeds 5 > seconds so batches started to queue up. > While in 1.5.1, no such slow down. See chart below to see the increasing > scheduling delay in 1.6: > > > > I captured heap dumps in two version and did a comparison. I noticed the > Byte base class is using 50X more space in 1.5.1. > > Here are some top classes in heap histogram and references. > > Heap Histogram > > All Classes (excluding platform) > 1.6.0 Streaming 1.5.1 Streaming > Class Instance Count Total Size Class Instance Count Total Size > class [B 8453 *3,227,649,599 * class [B 5095 62,938,466 > class [C 44682 4,255,502 class [C 130482 12,844,182 > class java.lang.reflect.Method 9059 1,177,670 class java.lang.String > 130171 1,562,052 > > > References by Type References by Type > > class [B [0x640039e38] class [B [0x6c020bb08] > > Referrers by Type Referrers by Type > > Class Count Class Count > java.nio.HeapByteBuffer *3239* sun.security.util.DerInputBuffer 1233 > sun.security.util.DerInputBuffer 1233 sun.security.util.ObjectIdentifier > 620 > sun.security.util.ObjectIdentifier 620 [[B 397 > [Ljava.lang.Object; 408 java.lang.reflect.Method 326 > > > > > The total size by class B is 3GB in 1.5.1 and only 60MB in 1.6.0. > The Java.nio.HeapByteBuffer referencing class did not show up in top in > 1.5.1. > > I have also placed jstack output for 1.5.1 and 1.6.0 online..you can get > them here > > https://ibm.box.com/sparkstreaming-jstack160 > https://ibm.box.com/sparkstreaming-jstack151 > > Jesse > > > > > > >
Re: streaming in 1.6.0 slower than 1.5.1
Hey Jesse, Could you provide the operators you using? For the heap dump, it may be not a real memory leak. Since batches started to queue up, the memory usage should increase. On Thu, Jan 28, 2016 at 11:54 AM, Ted Yuwrote: > bq. The total size by class B is 3GB in 1.5.1 and only 60MB in 1.6.0. > > From the information you posted, it seems the above is backwards. > > BTW [B is byte[], not class B. > > FYI > > On Thu, Jan 28, 2016 at 11:49 AM, Jesse F Chen wrote: > >> I ran the same streaming application (compiled individually for 1.5.1 and >> 1.6.0) that processes 5-second tweet batches. >> >> I noticed two things: >> >> 1. 10% regression in 1.6.0 vs 1.5.1 >> >> Spark v1.6.0: 1,564 tweets/s >> Spark v1.5.1: 1,747 tweets/s >> >> 2. 1.6.0 streaming seems to have a memory leak. >> >> 1.6.0, processing time gradually increases and eventually exceeds 5 >> seconds so batches started to queue up. >> While in 1.5.1, no such slow down. See chart below to see the increasing >> scheduling delay in 1.6: >> >> >> >> I captured heap dumps in two version and did a comparison. I noticed the >> Byte base class is using 50X more space in 1.5.1. >> >> Here are some top classes in heap histogram and references. >> >> Heap Histogram >> >> All Classes (excluding platform) >> 1.6.0 Streaming 1.5.1 Streaming >> Class Instance Count Total Size Class Instance Count Total Size >> class [B 8453 *3,227,649,599 * class [B 5095 62,938,466 >> class [C 44682 4,255,502 class [C 130482 12,844,182 >> class java.lang.reflect.Method 9059 1,177,670 class java.lang.String >> 130171 1,562,052 >> >> >> References by Type References by Type >> >> class [B [0x640039e38] class [B [0x6c020bb08] >> >> Referrers by Type Referrers by Type >> >> Class Count Class Count >> java.nio.HeapByteBuffer *3239* sun.security.util.DerInputBuffer 1233 >> sun.security.util.DerInputBuffer 1233 sun.security.util.ObjectIdentifier >> 620 >> sun.security.util.ObjectIdentifier 620 [[B 397 >> [Ljava.lang.Object; 408 java.lang.reflect.Method 326 >> >> >> >> >> The total size by class B is 3GB in 1.5.1 and only 60MB in 1.6.0. >> The Java.nio.HeapByteBuffer referencing class did not show up in top in >> 1.5.1. >> >> I have also placed jstack output for 1.5.1 and 1.6.0 online..you can get >> them here >> >> https://ibm.box.com/sparkstreaming-jstack160 >> https://ibm.box.com/sparkstreaming-jstack151 >> >> Jesse >> >> >> >> >> >> >> >