So this use of ThreadLocal will be inside the code of a function executing
on the workers i.e. within a call from one of the lambdas. Would it just
look like this then:

dstream.map( p -> { ThreadLocal<Data> d = new ThreadLocal<>() {
         public SomeClass initialValue() { return new SomeClass(); }
    };
    somefunc(p, d.get());
    d.remove();
    return p;
}; );

Will this make sure that all threads inside the worker clean up the
ThreadLocal once they are done with processing this task?

Thanks
NB


On Fri, Jan 29, 2016 at 1:00 PM, Shixiong(Ryan) Zhu <shixi...@databricks.com
> wrote:

> Spark Streaming uses threadpools so you need to remove ThreadLocal when
> it's not used.
>
> On Fri, Jan 29, 2016 at 12:55 PM, N B <nb.nos...@gmail.com> wrote:
>
>> Thanks for the response Ryan. So I would say that it is in fact the
>> purpose of a ThreadLocal i.e. to have a copy of the variable as long as the
>> thread lives. I guess my concern is around usage of threadpools and whether
>> Spark streaming will internally create many threads that rotate between
>> tasks on purpose thereby holding onto ThreadLocals that may actually never
>> be used again.
>>
>> Thanks
>>
>> On Fri, Jan 29, 2016 at 12:12 PM, Shixiong(Ryan) Zhu <
>> shixi...@databricks.com> wrote:
>>
>>> Of cause. If you use a ThreadLocal in a long living thread and forget to
>>> remove it, it's definitely a memory leak.
>>>
>>> On Thu, Jan 28, 2016 at 9:31 PM, N B <nb.nos...@gmail.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> Does anyone know if there are any potential pitfalls associated with
>>>> using ThreadLocal variables in a Spark streaming application? One things I
>>>> have seen mentioned in the context of app servers that use thread pools is
>>>> that ThreadLocals can leak memory. Could this happen in Spark streaming
>>>> also?
>>>>
>>>> Thanks
>>>> Nikunj
>>>>
>>>>
>>>
>>
>

Reply via email to