Re: [SQL] Memory leak with spark streaming and spark sql in spark 1.5.1

2015-10-15 Thread Shixiong Zhu
Thanks for reporting it Terry. I submitted a PR to fix it:
https://github.com/apache/spark/pull/9132

Best Regards,
Shixiong Zhu

2015-10-15 2:39 GMT+08:00 Reynold Xin :

> +dev list
>
> On Wed, Oct 14, 2015 at 1:07 AM, Terry Hoo  wrote:
>
>> All,
>>
>> Does anyone meet memory leak issue with spark streaming and spark sql in
>> spark 1.5.1? I can see the memory is increasing all the time when running
>> this simple sample:
>>
>> val sc = new SparkContext(conf)
>> val sqlContext = new HiveContext(sc)
>> import sqlContext.implicits._
>> val ssc = new StreamingContext(sc, Seconds(1))
>> val s1 = ssc.socketTextStream("localhost", ).map(x =>
>> (x,1)).reduceByKey((x : Int, y : Int) => x + y)
>> s1.print
>> s1.foreachRDD(rdd => {
>>   rdd.foreach(_ => Unit)
>>   sqlContext.createDataFrame(rdd).registerTempTable("A")
>>   sqlContext.sql("""select * from A""").show(1)
>> })
>>
>> After dump the the java heap, I can see there is about 22K entries
>> in SQLListener._stageIdToStageMetrics after 2 hour running (other maps in
>> this SQLListener has about 1K entries), is this a leak in SQLListener?
>>
>> Thanks!
>> Terry
>>
>
>


Re: [SQL] Memory leak with spark streaming and spark sql in spark 1.5.1

2015-10-14 Thread Reynold Xin
+dev list

On Wed, Oct 14, 2015 at 1:07 AM, Terry Hoo  wrote:

> All,
>
> Does anyone meet memory leak issue with spark streaming and spark sql in
> spark 1.5.1? I can see the memory is increasing all the time when running
> this simple sample:
>
> val sc = new SparkContext(conf)
> val sqlContext = new HiveContext(sc)
> import sqlContext.implicits._
> val ssc = new StreamingContext(sc, Seconds(1))
> val s1 = ssc.socketTextStream("localhost", ).map(x =>
> (x,1)).reduceByKey((x : Int, y : Int) => x + y)
> s1.print
> s1.foreachRDD(rdd => {
>   rdd.foreach(_ => Unit)
>   sqlContext.createDataFrame(rdd).registerTempTable("A")
>   sqlContext.sql("""select * from A""").show(1)
> })
>
> After dump the the java heap, I can see there is about 22K entries
> in SQLListener._stageIdToStageMetrics after 2 hour running (other maps in
> this SQLListener has about 1K entries), is this a leak in SQLListener?
>
> Thanks!
> Terry
>


[SQL] Memory leak with spark streaming and spark sql in spark 1.5.1

2015-10-14 Thread Terry Hoo
All,

Does anyone meet memory leak issue with spark streaming and spark sql in
spark 1.5.1? I can see the memory is increasing all the time when running
this simple sample:

val sc = new SparkContext(conf)
val sqlContext = new HiveContext(sc)
import sqlContext.implicits._
val ssc = new StreamingContext(sc, Seconds(1))
val s1 = ssc.socketTextStream("localhost", ).map(x =>
(x,1)).reduceByKey((x : Int, y : Int) => x + y)
s1.print
s1.foreachRDD(rdd => {
  rdd.foreach(_ => Unit)
  sqlContext.createDataFrame(rdd).registerTempTable("A")
  sqlContext.sql("""select * from A""").show(1)
})

After dump the the java heap, I can see there is about 22K entries
in SQLListener._stageIdToStageMetrics after 2 hour running (other maps in
this SQLListener has about 1K entries), is this a leak in SQLListener?

Thanks!
Terry