It seems we hit the same issue.

There was a bug on 1.5.1 about memory leak. But I am using 1.6.1


Here is the link about the bug in 1.5.1 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark






At 2016-05-12 23:10:43, "Simon Schiff [via Apache Spark User List]" 
<ml-node+s1001560n2694...@n3.nabble.com> wrote:
I read with Spark-Streaming from a Port. The incoming data consists of key and 
value pairs. Then I call forEachRDD on each window. There I create a Dataset 
from the window and do some SQL Querys on it. On the result i only do show, to 
see the content. It works well, but the memory usage increases. When it reaches 
the maximum nothing works anymore. When I use more memory. The Program runs 
some time longer, but the problem persists. Because I run a Programm which 
writes to the Port, I can control perfectly how much Data Spark has to Process. 
When I write every one ms one key and value Pair the Problem is the same as 
when i write only every second a key and value pair to the port.

When I dont create a Dataset in the foreachRDD and only count the Elements in 
the RDD, then everything works fine. I also use groupBy agg functions in the 
querys.


If you reply to this email, your message will be added to the discussion below:
http://apache-spark-user-list.1001560.n3.nabble.com/Will-the-HiveContext-cause-memory-leak-tp26921p26940.html
To unsubscribe from Will the HiveContext cause memory leak ?, click here.
NAML



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Will-the-HiveContext-cause-memory-leak-tp26921p26946.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to