[ https://issues.apache.org/jira/browse/SPARK-17380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen reopened SPARK-17380: ------------------------------- > Spark streaming with a multi shard Kinesis freezes after several days > (memory/resource leak?) > --------------------------------------------------------------------------------------------- > > Key: SPARK-17380 > URL: https://issues.apache.org/jira/browse/SPARK-17380 > Project: Spark > Issue Type: Bug > Components: DStreams > Affects Versions: 2.0.0 > Reporter: Xeto > Fix For: 2.0.2 > > Attachments: exec_Leak_Hunter.zip, memory-after-freeze.png, memory.png > > > Running Spark Streaming 2.0.0 on AWS EMR 5.0.0 consuming from Kinesis (125 > shards). > Used memory keeps growing all the time according to Ganglia. > The application works properly for about 3.5 days till all free memory has > been used. > Then, micro batches start queuing up but none is served. > Spark freezes. You can see in Ganglia that some memory is being freed but it > doesn't help the job to recover. > Is it a memory/resource leak? > The job uses back pressure and Kryo. > The code has a mapToPair(), groupByKey(), flatMap(), > persist(StorageLevel.MEMORY_AND_DISK_SER_2()) and repartition(19); Then > storing to s3 using foreachRDD() > Cluster size: 20 machines > Spark cofiguration: > spark.executor.extraJavaOptions -verbose:gc -XX:+PrintGCDetails > -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC > -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 > -XX:PermSize=256M -XX:MaxPermSize=256M -XX:OnOutOfMemoryError='kill -9 %p' > spark.driver.extraJavaOptions -Dspark.driver.log.level=INFO > -XX:+UseConcMarkSweepGC -XX:PermSize=256M -XX:MaxPermSize=256M > -XX:OnOutOfMemoryError='kill -9 %p' > spark.master yarn-cluster > spark.executor.instances 19 > spark.executor.cores 7 > spark.executor.memory 7500M > spark.driver.memory 7500M > spark.default.parallelism 133 > spark.yarn.executor.memoryOverhead 2950 > spark.yarn.driver.memoryOverhead 2950 > spark.eventLog.enabled false > spark.eventLog.dir hdfs:///spark-logs/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org