Well won't the code in lambda execute inside multiple threads in the worker
because it has to process many records? I would just want to have a single
copy of SomeClass instantiated per thread rather than once per each record
being processed. That was what triggered this thought anyways.

Thanks
NB


On Fri, Jan 29, 2016 at 5:09 PM, Shixiong(Ryan) Zhu <shixi...@databricks.com
> wrote:

> It looks weird. Why don't you just pass "new SomeClass()" to "somefunc"?
> You don't need to use ThreadLocal if there are no multiple threads in your
> codes.
>
> On Fri, Jan 29, 2016 at 4:39 PM, N B <nb.nos...@gmail.com> wrote:
>
>> Fixed a typo in the code to avoid any confusion.... Please comment on the
>> code below...
>>
>> dstream.map( p -> { ThreadLocal<SomeClass> d = new ThreadLocal<>() {
>>          public SomeClass initialValue() { return new SomeClass(); }
>>     };
>>     somefunc(p, d.get());
>>     d.remove();
>>     return p;
>> }; );
>>
>> On Fri, Jan 29, 2016 at 4:32 PM, N B <nb.nos...@gmail.com> wrote:
>>
>>> So this use of ThreadLocal will be inside the code of a function
>>> executing on the workers i.e. within a call from one of the lambdas. Would
>>> it just look like this then:
>>>
>>> dstream.map( p -> { ThreadLocal<Data> d = new ThreadLocal<>() {
>>>          public SomeClass initialValue() { return new SomeClass(); }
>>>     };
>>>     somefunc(p, d.get());
>>>     d.remove();
>>>     return p;
>>> }; );
>>>
>>> Will this make sure that all threads inside the worker clean up the
>>> ThreadLocal once they are done with processing this task?
>>>
>>> Thanks
>>> NB
>>>
>>>
>>> On Fri, Jan 29, 2016 at 1:00 PM, Shixiong(Ryan) Zhu <
>>> shixi...@databricks.com> wrote:
>>>
>>>> Spark Streaming uses threadpools so you need to remove ThreadLocal when
>>>> it's not used.
>>>>
>>>> On Fri, Jan 29, 2016 at 12:55 PM, N B <nb.nos...@gmail.com> wrote:
>>>>
>>>>> Thanks for the response Ryan. So I would say that it is in fact the
>>>>> purpose of a ThreadLocal i.e. to have a copy of the variable as long as 
>>>>> the
>>>>> thread lives. I guess my concern is around usage of threadpools and 
>>>>> whether
>>>>> Spark streaming will internally create many threads that rotate between
>>>>> tasks on purpose thereby holding onto ThreadLocals that may actually never
>>>>> be used again.
>>>>>
>>>>> Thanks
>>>>>
>>>>> On Fri, Jan 29, 2016 at 12:12 PM, Shixiong(Ryan) Zhu <
>>>>> shixi...@databricks.com> wrote:
>>>>>
>>>>>> Of cause. If you use a ThreadLocal in a long living thread and forget
>>>>>> to remove it, it's definitely a memory leak.
>>>>>>
>>>>>> On Thu, Jan 28, 2016 at 9:31 PM, N B <nb.nos...@gmail.com> wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> Does anyone know if there are any potential pitfalls associated with
>>>>>>> using ThreadLocal variables in a Spark streaming application? One 
>>>>>>> things I
>>>>>>> have seen mentioned in the context of app servers that use thread pools 
>>>>>>> is
>>>>>>> that ThreadLocals can leak memory. Could this happen in Spark streaming
>>>>>>> also?
>>>>>>>
>>>>>>> Thanks
>>>>>>> Nikunj
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to