Re: why spark driver program is creating so many threads? How can I limit this number?

2016-11-01 Thread kant kodali
Looks like upgrading to Spark 2.0.1 fixed it! The thread count now when I do cat /proc/pid/status is about 84 as opposed to a 1000 in the span of 2 mins in Spark 2.0.0 On Tue, Nov 1, 2016 at 11:40 AM, Shixiong(Ryan) Zhu wrote: > Yes, try 2.0.1! > > On Tue, Nov 1, 2016

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-11-01 Thread Shixiong(Ryan) Zhu
Yes, try 2.0.1! On Tue, Nov 1, 2016 at 11:25 AM, kant kodali wrote: > AH!!! Got it! Should I use 2.0.1 then ? I don't see 2.1.0 > > On Tue, Nov 1, 2016 at 10:14 AM, Shixiong(Ryan) Zhu < > shixi...@databricks.com> wrote: > >> Dstream "Window" uses "union" to combine multiple

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-11-01 Thread kant kodali
AH!!! Got it! Should I use 2.0.1 then ? I don't see 2.1.0 On Tue, Nov 1, 2016 at 10:14 AM, Shixiong(Ryan) Zhu wrote: > Dstream "Window" uses "union" to combine multiple RDDs in one window into > a single RDD. > > On Tue, Nov 1, 2016 at 2:59 AM kant kodali

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-11-01 Thread Shixiong(Ryan) Zhu
Dstream "Window" uses "union" to combine multiple RDDs in one window into a single RDD. On Tue, Nov 1, 2016 at 2:59 AM kant kodali wrote: > @Sean It looks like this problem can happen with other RDD's as well. Not > just unionRDD > > On Tue, Nov 1, 2016 at 2:52 AM, kant

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-11-01 Thread kant kodali
@Sean It looks like this problem can happen with other RDD's as well. Not just unionRDD On Tue, Nov 1, 2016 at 2:52 AM, kant kodali wrote: > Hi Sean, > > The comments seem very relevant although I am not sure if this pull > request https://github.com/apache/spark/pull/14985

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-11-01 Thread kant kodali
Hi Sean, The comments seem very relevant although I am not sure if this pull request https://github.com/apache/spark/pull/14985 would fix my issue? I am not sure what unionRDD.scala has anything to do with my error (I don't know much about spark code base). Do I ever use unionRDD.scala when I

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-11-01 Thread Sean Owen
Possibly https://issues.apache.org/jira/browse/SPARK-17396 ? On Tue, Nov 1, 2016 at 2:11 AM kant kodali wrote: > Hi Ryan, > > I think you are right. This may not be related to the Receiver. I have > attached jstack dump here. I do a simple MapToPair and reduceByKey and I >

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-11-01 Thread kant kodali
This question looks very similar to mine but I don't see any answer. http://markmail.org/message/kkxhi5jjtwyadzxt On Mon, Oct 31, 2016 at 11:24 PM, kant kodali wrote: > Here is a UI of my thread dump. > > http://fastthread.io/my-thread-report.jsp?p=c2hhcmVkLzIwMTYv >

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-11-01 Thread kant kodali
Here is a UI of my thread dump. http://fastthread.io/my-thread-report.jsp?p=c2hhcmVkLzIwMTYvMTEvMS8tLWpzdG Fja19kdW1wX3dpbmRvd19pbnRlcnZhbF8xbWluX2JhdGNoX2ludGVydmFsXz FzLnR4dC0tNi0xNy00Ng== On Mon, Oct 31, 2016 at 7:10 PM, kant kodali wrote: > Hi Ryan, > > I think you are

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-10-31 Thread Shixiong(Ryan) Zhu
If there is some leaking threads, I think you should be able to see the number of threads is increasing. You can just dump threads after 1-2 hours. On Mon, Oct 31, 2016 at 12:59 PM, kant kodali wrote: > yes I can certainly use jstack but it requires 4 to 5 hours for me to >

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-10-31 Thread kant kodali
yes I can certainly use jstack but it requires 4 to 5 hours for me to reproduce the error so I can get back as early as possible. Thanks a lot! On Mon, Oct 31, 2016 at 12:41 PM, Shixiong(Ryan) Zhu < shixi...@databricks.com> wrote: > Then it should not be a Receiver issue. Could you use `jstack`

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-10-31 Thread Shixiong(Ryan) Zhu
Then it should not be a Receiver issue. Could you use `jstack` to find out the name of leaking threads? On Mon, Oct 31, 2016 at 12:35 PM, kant kodali wrote: > Hi Ryan, > > It happens on the driver side and I am running on a client mode (not the > cluster mode). > > Thanks! >

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-10-31 Thread kant kodali
Hi Ryan, It happens on the driver side and I am running on a client mode (not the cluster mode). Thanks! On Mon, Oct 31, 2016 at 12:32 PM, Shixiong(Ryan) Zhu < shixi...@databricks.com> wrote: > Sorry, there is a typo in my previous email: this may **not** be the root > cause if the leak

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-10-31 Thread Shixiong(Ryan) Zhu
Sorry, there is a typo in my previous email: this may **not** be the root cause if the leak threads are in the driver side. Does it happen in the driver or executors? On Mon, Oct 31, 2016 at 12:20 PM, kant kodali wrote: > Hi Ryan, > > Ahh My Receiver.onStop method is

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-10-31 Thread Sean Owen
This is more of a Java question. You don't 'clean up' threads but rather rearchitect your app so that you don't create long running threads that don't terminate. Consider also an Executor instead of manually creating threads. On Mon, Oct 31, 2016 at 7:20 PM kant kodali wrote:

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-10-31 Thread kant kodali
Hi Ryan, Ahh My Receiver.onStop method is currently empty. 1) I have a hard time seeing why the receiver would crash so many times within a span of 4 to 5 hours but anyways I understand I should still cleanup during OnStop. 2) How do I clean up those threads? The documentation here

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-10-31 Thread Shixiong(Ryan) Zhu
So in your code, each Receiver will start a new thread. Did you stop the receiver properly in `Receiver.onStop`? Otherwise, you may leak threads after a receiver crashes and is restarted by Spark. However, this may be the root cause since the leak threads are in the driver side. Could you use

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-10-31 Thread kant kodali
I am also under the assumption that *onStart *function of the Receiver is only called only once by Spark. please correct me if I am wrong. On Mon, Oct 31, 2016 at 11:35 AM, kant kodali wrote: > My driver program runs a spark streaming job. And it spawns a thread by > itself

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-10-31 Thread kant kodali
My driver program runs a spark streaming job. And it spawns a thread by itself only in the *onStart()* function below Other than that it doesn't spawn any other threads. It only calls MapToPair, ReduceByKey, forEachRDD, Collect functions. public class NSQReceiver extends Receiver { private

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-10-31 Thread Jakob Odersky
> how do I tell my spark driver program to not create so many? This may depend on your driver program. Do you spawn any threads in it? Could you share some more information on the driver program, spark version and your environment? It would greatly help others to help you On Mon, Oct 31, 2016

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-10-31 Thread kant kodali
The source of my problem is actually that I am running into the following error. This error seems to happen after running my driver program for 4 hours. "Exception in thread "ForkJoinPool-50-worker-11" Exception in thread "dag-scheduler-event-loop" Exception in thread "ForkJoinPool-50-worker-13"

Re: why spark driver program is creating so many threads? How can I limit this number?

2016-10-31 Thread Sean Owen
ps -L [pid] is what shows threads. I am not sure this is counting what you think it does. My shell process has about a hundred threads, and I can't imagine why one would have thousands unless your app spawned them. On Mon, Oct 31, 2016 at 10:20 AM kant kodali wrote: > when I

why spark driver program is creating so many threads? How can I limit this number?

2016-10-31 Thread kant kodali
when I do ps -elfT | grep "spark-driver-program.jar" | wc -l The result is around 32K. why does it create so many threads how can I limit this?