Re: Spark off heap memory leak on Yarn with Kafka direct stream

Cody Koeninger Mon, 13 Jul 2015 12:14:50 -0700

Does the issue only happen when you have no traffic on the topic?

Have you profiled to see what's using heap space?



On Mon, Jul 13, 2015 at 1:05 PM, Apoorva Sareen <apoorva.sar...@gmail.com>
wrote:

> Hi,
>
> I am running spark streaming 1.4.0 on Yarn (Apache distribution 2.6.0)
> with java 1.8.0_45 and also Kafka direct stream. I am also using spark with
> scala 2.11 support.
>
> The issue I am seeing is that both driver and executor containers are
> gradually increasing the physical memory usage till a point where yarn
> container kill it. I have configured upto 192M Heap and 384 off heap space
> in my driver but it eventually runs out of it
>
> The Heap memory appears to be fine with regular GC cycles. There is no
> OutOffMemory encountered ever in any such runs
>
> Infact I am not generating any traffic on the kafka queues still this
> happens. Here is the code I am using
>
> object SimpleSparkStreaming extends App {
>
> val conf = new SparkConf()
> val ssc = new 
> StreamingContext(conf,Seconds(conf.getLong("spark.batch.window.size",1L)));
> ssc.checkpoint("checkpoint")
> val topics = Set(conf.get("spark.kafka.topic.name"));
>     val kafkaParams = Map[String, String]("metadata.broker.list" -> 
> conf.get("spark.kafka.broker.list"))
>             val kafkaStream = 
> KafkaUtils.createDirectStream[String,String,StringDecoder,StringDecoder](ssc, 
> kafkaParams, topics)
>             kafkaStream.foreachRDD(rdd => {
>                 rdd.foreach(x => {
>                     println(x._2)
>                 })
>
>             })
>     kafkaStream.print()
>             ssc.start()
>
>             ssc.awaitTermination()
>
> }
>
> I am running this on CentOS 7. The command used for spark submit is
> following
>
> ./bin/spark-submit --class 
> com.rasa.cloud.prototype.spark.SimpleSparkStreaming \
> --conf spark.yarn.executor.memoryOverhead=256 \
> --conf spark.yarn.driver.memoryOverhead=384 \
> --conf spark.kafka.topic.name=test \
> --conf spark.kafka.broker.list=172.31.45.218:9092 \
> --conf spark.batch.window.size=1 \
> --conf spark.app.name="Simple Spark Kafka application" \
> --master yarn-cluster \
> --num-executors 1 \
> --driver-memory 192m \
> --executor-memory 128m \
> --executor-cores 1 \
> /home/centos/spark-poc/target/lib/spark-streaming-prototype-0.0.1-SNAPSHOT.jar
>
> Any help is greatly appreciated
>
> Regards,
>
> Apoorva
>

Re: Spark off heap memory leak on Yarn with Kafka direct stream

Reply via email to