Looks like upgrading to Spark 2.0.1 fixed it! The thread count now when I
do cat /proc/pid/status is about 84 as opposed to a 1000 in the span of 2
mins in Spark 2.0.0

On Tue, Nov 1, 2016 at 11:40 AM, Shixiong(Ryan) Zhu <shixi...@databricks.com
> wrote:

> Yes, try 2.0.1!
>
> On Tue, Nov 1, 2016 at 11:25 AM, kant kodali <kanth...@gmail.com> wrote:
>
>> AH!!! Got it! Should I use 2.0.1 then ? I don't see 2.1.0
>>
>> On Tue, Nov 1, 2016 at 10:14 AM, Shixiong(Ryan) Zhu <
>> shixi...@databricks.com> wrote:
>>
>>> Dstream "Window" uses "union" to combine multiple RDDs in one window
>>> into a single RDD.
>>>
>>> On Tue, Nov 1, 2016 at 2:59 AM kant kodali <kanth...@gmail.com> wrote:
>>>
>>>> @Sean It looks like this problem can happen with other RDD's as well.
>>>> Not just unionRDD
>>>>
>>>> On Tue, Nov 1, 2016 at 2:52 AM, kant kodali <kanth...@gmail.com> wrote:
>>>>
>>>> Hi Sean,
>>>>
>>>> The comments seem very relevant although I am not sure if this pull
>>>> request https://github.com/apache/spark/pull/14985 would fix my issue?
>>>> I am not sure what unionRDD.scala has anything to do with my error (I don't
>>>> know much about spark code base). Do I ever use unionRDD.scala when I call
>>>> mapToPair or ReduceByKey or forEachRDD?  This error is very easy to
>>>> reproduce you actually don't need to ingest any data to spark streaming
>>>> job. Just have one simple transformation consists of mapToPair, reduceByKey
>>>> and forEachRDD and have the window interval of 1min and batch interval of
>>>> one one second and simple call ssc.awaitTermination() and watch the
>>>> Thread Count go up significantly.
>>>>
>>>> I do think that using a fixed size executor service would probably be a
>>>> safer approach. One could leverage ForJoinPool if they think they could
>>>> benefit a lot from the work-steal algorithm and doubly ended queues in the
>>>> ForkJoinPool.
>>>>
>>>> Thanks!
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Nov 1, 2016 at 2:19 AM, Sean Owen <so...@cloudera.com> wrote:
>>>>
>>>> Possibly https://issues.apache.org/jira/browse/SPARK-17396 ?
>>>>
>>>> On Tue, Nov 1, 2016 at 2:11 AM kant kodali <kanth...@gmail.com> wrote:
>>>>
>>>> Hi Ryan,
>>>>
>>>> I think you are right. This may not be related to the Receiver. I have
>>>> attached jstack dump here. I do a simple MapToPair and reduceByKey and  I
>>>> have a window Interval of 1 minute (60000ms) and batch interval of 1s (
>>>> 1000) This is generating lot of threads atleast 5 to 8 threads per
>>>> second and the total number of threads is monotonically increasing. So just
>>>> for tweaking purpose I changed my window interval to 1min (60000ms) and
>>>> batch interval of 10s (10000) this looked lot better but still not
>>>> ideal at very least it is not monotonic anymore (It goes up and down). Now
>>>> my question  really is how do I tune such that my number of threads are
>>>> optimal while satisfying the window Interval of 1 minute (60000ms) and
>>>> batch interval of 1s (1000) ?
>>>>
>>>> This jstack dump is taken after running my spark driver program for 2
>>>> mins and there are about 1000 threads.
>>>>
>>>> Thanks!
>>>>
>>>>
>>>>
>>>>
>>
>

Reply via email to