It seems we hit the same issue.
There was a bug on 1.5.1 about memory leak. But I am using 1.6.1 Here is the link about the bug in 1.5.1 https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark At 2016-05-12 23:10:43, "Simon Schiff [via Apache Spark User List]" <ml-node+s1001560n2694...@n3.nabble.com> wrote: I read with Spark-Streaming from a Port. The incoming data consists of key and value pairs. Then I call forEachRDD on each window. There I create a Dataset from the window and do some SQL Querys on it. On the result i only do show, to see the content. It works well, but the memory usage increases. When it reaches the maximum nothing works anymore. When I use more memory. The Program runs some time longer, but the problem persists. Because I run a Programm which writes to the Port, I can control perfectly how much Data Spark has to Process. When I write every one ms one key and value Pair the Problem is the same as when i write only every second a key and value pair to the port. When I dont create a Dataset in the foreachRDD and only count the Elements in the RDD, then everything works fine. I also use groupBy agg functions in the querys. If you reply to this email, your message will be added to the discussion below: http://apache-spark-user-list.1001560.n3.nabble.com/Will-the-HiveContext-cause-memory-leak-tp26921p26940.html To unsubscribe from Will the HiveContext cause memory leak ?, click here. NAML -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Will-the-HiveContext-cause-memory-leak-tp26921p26946.html Sent from the Apache Spark User List mailing list archive at Nabble.com.