So this use of ThreadLocal will be inside the code of a function executing on the workers i.e. within a call from one of the lambdas. Would it just look like this then:
dstream.map( p -> { ThreadLocal<Data> d = new ThreadLocal<>() { public SomeClass initialValue() { return new SomeClass(); } }; somefunc(p, d.get()); d.remove(); return p; }; ); Will this make sure that all threads inside the worker clean up the ThreadLocal once they are done with processing this task? Thanks NB On Fri, Jan 29, 2016 at 1:00 PM, Shixiong(Ryan) Zhu <shixi...@databricks.com > wrote: > Spark Streaming uses threadpools so you need to remove ThreadLocal when > it's not used. > > On Fri, Jan 29, 2016 at 12:55 PM, N B <nb.nos...@gmail.com> wrote: > >> Thanks for the response Ryan. So I would say that it is in fact the >> purpose of a ThreadLocal i.e. to have a copy of the variable as long as the >> thread lives. I guess my concern is around usage of threadpools and whether >> Spark streaming will internally create many threads that rotate between >> tasks on purpose thereby holding onto ThreadLocals that may actually never >> be used again. >> >> Thanks >> >> On Fri, Jan 29, 2016 at 12:12 PM, Shixiong(Ryan) Zhu < >> shixi...@databricks.com> wrote: >> >>> Of cause. If you use a ThreadLocal in a long living thread and forget to >>> remove it, it's definitely a memory leak. >>> >>> On Thu, Jan 28, 2016 at 9:31 PM, N B <nb.nos...@gmail.com> wrote: >>> >>>> Hello, >>>> >>>> Does anyone know if there are any potential pitfalls associated with >>>> using ThreadLocal variables in a Spark streaming application? One things I >>>> have seen mentioned in the context of app servers that use thread pools is >>>> that ThreadLocals can leak memory. Could this happen in Spark streaming >>>> also? >>>> >>>> Thanks >>>> Nikunj >>>> >>>> >>> >> >