Could you give more information on the operations that you are using? The
code outline?

And what do you mean by "Spark Driver receiver events"? If the driver is
receiving events, how is it being sent to the executors.

BTW, for memory usages, I strongly recommend using jmap --histo:live to see
what are the type of objects that is causing most memory usage?

TD

On Tue, Jun 30, 2015 at 9:48 AM, easyonthemayo <neil.m...@velocityww.com>
wrote:

> I have a Spark program which exhibits increasing resource usage. Spark
> Streaming (https://spark.apache.org/streaming/) is used to provide the
> data
> source. The Spark Driver class receives "events" by querying a MongoDB in a
> custom JavaReceiverInputDStream. These events are then transformed via
> mapToPair(), which creates tuples mapping an id to each event. The stream
> is
> partitioned and we run a groupByKey(). Finally the events are processed by
> foreachRDD().
>
> Running it for several hours on a standalone cluster, a clear trend emerges
> of both CPU and heap memory usage increasing. This occurs even if the data
> source offers no events, so there is no actual processing to perform.
> Similarly, omitting the bulk of processing code within foreachRDD() does
> not
> eliminate the problem.
>
> I've tried eliminating steps in the process to identify the culprit, and it
> looks like it's the partitioning step that prompts the CPU usage to
> increase
> over time.
>
> Has anyone else experienced this sort of behaviour?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-driver-using-Spark-Streaming-shows-increasing-memory-CPU-usage-tp23545.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to