Yes, try 2.0.1! On Tue, Nov 1, 2016 at 11:25 AM, kant kodali <kanth...@gmail.com> wrote:
> AH!!! Got it! Should I use 2.0.1 then ? I don't see 2.1.0 > > On Tue, Nov 1, 2016 at 10:14 AM, Shixiong(Ryan) Zhu < > shixi...@databricks.com> wrote: > >> Dstream "Window" uses "union" to combine multiple RDDs in one window into >> a single RDD. >> >> On Tue, Nov 1, 2016 at 2:59 AM kant kodali <kanth...@gmail.com> wrote: >> >>> @Sean It looks like this problem can happen with other RDD's as well. >>> Not just unionRDD >>> >>> On Tue, Nov 1, 2016 at 2:52 AM, kant kodali <kanth...@gmail.com> wrote: >>> >>> Hi Sean, >>> >>> The comments seem very relevant although I am not sure if this pull >>> request https://github.com/apache/spark/pull/14985 would fix my issue? >>> I am not sure what unionRDD.scala has anything to do with my error (I don't >>> know much about spark code base). Do I ever use unionRDD.scala when I call >>> mapToPair or ReduceByKey or forEachRDD? This error is very easy to >>> reproduce you actually don't need to ingest any data to spark streaming >>> job. Just have one simple transformation consists of mapToPair, reduceByKey >>> and forEachRDD and have the window interval of 1min and batch interval of >>> one one second and simple call ssc.awaitTermination() and watch the >>> Thread Count go up significantly. >>> >>> I do think that using a fixed size executor service would probably be a >>> safer approach. One could leverage ForJoinPool if they think they could >>> benefit a lot from the work-steal algorithm and doubly ended queues in the >>> ForkJoinPool. >>> >>> Thanks! >>> >>> >>> >>> >>> On Tue, Nov 1, 2016 at 2:19 AM, Sean Owen <so...@cloudera.com> wrote: >>> >>> Possibly https://issues.apache.org/jira/browse/SPARK-17396 ? >>> >>> On Tue, Nov 1, 2016 at 2:11 AM kant kodali <kanth...@gmail.com> wrote: >>> >>> Hi Ryan, >>> >>> I think you are right. This may not be related to the Receiver. I have >>> attached jstack dump here. I do a simple MapToPair and reduceByKey and I >>> have a window Interval of 1 minute (60000ms) and batch interval of 1s ( >>> 1000) This is generating lot of threads atleast 5 to 8 threads per >>> second and the total number of threads is monotonically increasing. So just >>> for tweaking purpose I changed my window interval to 1min (60000ms) and >>> batch interval of 10s (10000) this looked lot better but still not >>> ideal at very least it is not monotonic anymore (It goes up and down). Now >>> my question really is how do I tune such that my number of threads are >>> optimal while satisfying the window Interval of 1 minute (60000ms) and >>> batch interval of 1s (1000) ? >>> >>> This jstack dump is taken after running my spark driver program for 2 >>> mins and there are about 1000 threads. >>> >>> Thanks! >>> >>> >>> >>> >