[jira] [Commented] (SPARK-13288) [1.6.0] Memory leak in Spark streaming

JESSE CHEN (JIRA) Mon, 22 Feb 2016 13:33:36 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-13288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15157725#comment-15157725
 ]


JESSE CHEN commented on SPARK-13288:
------------------------------------

The code is simple but I can't share the data per legal restrictions.

It is basically computing sentiments for tweets from a Twitter stream:

----

    val sentiTweets=
      tweets.map(tweet=>(tweet, getSentimentScore(sentiMap,tweet.body)))
    val numTweets=tweets.count()

    val posThreshold = 0.25
    val negThreshold = -0.25

    val sentiCount=sentiTweets.map(t=>(t._2._1-t._2._2))
                               .map(score=>if (score>posThreshold) 1
                                           else if (score<negThreshold) -1
                                           else 0)
                               .map(score=>(score,1))
                               .reduceByKeyAndWindow(_+_,_-_,
                                               Seconds(ReduceWindow*60),
                                               Seconds(BatchWindow))
    sentiCount.print(5)
    numTweets.map(count=>"Batch rate: "+count/BatchWindow+" tweets per second")
             .print(1)


---

The same code ran on both 1.5.1 and 1.6.0.

1.6 has the auto memory feature which I think may have an impact on this.

I am going to enable spark.storage.memoryFraction

        0.6     (deprecated) This is read only if spark.memory.useLegacyMode is 
enabled. Fraction of Java heap to use for Spark's memory cache. This should not 
be larger than the "old" generation of objects in the JVM, which by default is 
given 0.6 of the heap, but you can increase it if you configure your own old 
generation size. 


Will see if this changes the behavior.

Also, if the heap dumps are not helpful, what do you recommend I collect for 
you? DEBUG traces (which will change timing so I am not sure how helpful it 
will be)...need some ideas please.

Jesse


> [1.6.0] Memory leak in Spark streaming
> --------------------------------------
>
>                 Key: SPARK-13288
>                 URL: https://issues.apache.org/jira/browse/SPARK-13288
>             Project: Spark
>          Issue Type: Bug
>          Components: Streaming
>    Affects Versions: 1.6.0
>         Environment: Bare metal cluster
> RHEL 6.6
>            Reporter: JESSE CHEN
>              Labels: streaming
>
> Streaming in 1.6 seems to have a memory leak.
> Running the same streaming app in Spark 1.5.1 and 1.6, all things equal, 1.6 
> showed a gradual increasing processing time. 
> The app is simple: 1 Kafka receiver of tweet stream and 20 executors 
> processing the tweets in 5-second batches. 
> Spark 1.5.0 handles this smoothly and did not show increasing processing time 
> in the 40-minute test; but 1.6 showed increasing time about 8 minutes into 
> the test. Please see chart here:
> https://ibm.box.com/s/7q4ulik70iwtvyfhoj1dcl4nc469b116
> I captured heap dumps in two version and did a comparison. I noticed the Byte 
> is using 50X more space in 1.5.1.
> Here are some top classes in heap histogram and references. 
> Heap Histogram                                                
>                                               
> All Classes (excluding platform)                                              
>       1.6.0 Streaming                 1.5.1 Streaming         
> Class Instance Count  Total Size              Class   Instance Count  Total 
> Size
> class [B      8453    3,227,649,599           class [B        5095    
> 62,938,466
> class [C      44682   4,255,502               class [C        130482  
> 12,844,182
> class java.lang.reflect.Method        9059    1,177,670               class 
> java.lang.String  130171  1,562,052
>                                               
>                                               
> References by Type                            References by Type              
>                                       
> class [B [0x640039e38]                                class [B [0x6c020bb08]  
>         
>                                               
> Referrers by Type                             Referrers by Type               
>                                               
> Class Count                   Class   Count   
> java.nio.HeapByteBuffer       3239                    
> sun.security.util.DerInputBuffer        1233    
> sun.security.util.DerInputBuffer      1233                    
> sun.security.util.ObjectIdentifier      620     
> sun.security.util.ObjectIdentifier    620                     [[B     397     
> [Ljava.lang.Object;   408                     java.lang.reflect.Method        
> 326     
> ----
> The total size by class B is 3GB in 1.5.1 and only 60MB in 1.6.0.
> The Java.nio.HeapByteBuffer referencing class did not show up in top in 
> 1.5.1. 
> I have also placed jstack output for 1.5.1 and 1.6.0 online..you can get them 
> here
> https://ibm.box.com/sparkstreaming-jstack160
> https://ibm.box.com/sparkstreaming-jstack151
> Jesse 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13288) [1.6.0] Memory leak in Spark streaming

Reply via email to