Well won't the code in lambda execute inside multiple threads in the worker because it has to process many records? I would just want to have a single copy of SomeClass instantiated per thread rather than once per each record being processed. That was what triggered this thought anyways.
Thanks NB On Fri, Jan 29, 2016 at 5:09 PM, Shixiong(Ryan) Zhu <shixi...@databricks.com > wrote: > It looks weird. Why don't you just pass "new SomeClass()" to "somefunc"? > You don't need to use ThreadLocal if there are no multiple threads in your > codes. > > On Fri, Jan 29, 2016 at 4:39 PM, N B <nb.nos...@gmail.com> wrote: > >> Fixed a typo in the code to avoid any confusion.... Please comment on the >> code below... >> >> dstream.map( p -> { ThreadLocal<SomeClass> d = new ThreadLocal<>() { >> public SomeClass initialValue() { return new SomeClass(); } >> }; >> somefunc(p, d.get()); >> d.remove(); >> return p; >> }; ); >> >> On Fri, Jan 29, 2016 at 4:32 PM, N B <nb.nos...@gmail.com> wrote: >> >>> So this use of ThreadLocal will be inside the code of a function >>> executing on the workers i.e. within a call from one of the lambdas. Would >>> it just look like this then: >>> >>> dstream.map( p -> { ThreadLocal<Data> d = new ThreadLocal<>() { >>> public SomeClass initialValue() { return new SomeClass(); } >>> }; >>> somefunc(p, d.get()); >>> d.remove(); >>> return p; >>> }; ); >>> >>> Will this make sure that all threads inside the worker clean up the >>> ThreadLocal once they are done with processing this task? >>> >>> Thanks >>> NB >>> >>> >>> On Fri, Jan 29, 2016 at 1:00 PM, Shixiong(Ryan) Zhu < >>> shixi...@databricks.com> wrote: >>> >>>> Spark Streaming uses threadpools so you need to remove ThreadLocal when >>>> it's not used. >>>> >>>> On Fri, Jan 29, 2016 at 12:55 PM, N B <nb.nos...@gmail.com> wrote: >>>> >>>>> Thanks for the response Ryan. So I would say that it is in fact the >>>>> purpose of a ThreadLocal i.e. to have a copy of the variable as long as >>>>> the >>>>> thread lives. I guess my concern is around usage of threadpools and >>>>> whether >>>>> Spark streaming will internally create many threads that rotate between >>>>> tasks on purpose thereby holding onto ThreadLocals that may actually never >>>>> be used again. >>>>> >>>>> Thanks >>>>> >>>>> On Fri, Jan 29, 2016 at 12:12 PM, Shixiong(Ryan) Zhu < >>>>> shixi...@databricks.com> wrote: >>>>> >>>>>> Of cause. If you use a ThreadLocal in a long living thread and forget >>>>>> to remove it, it's definitely a memory leak. >>>>>> >>>>>> On Thu, Jan 28, 2016 at 9:31 PM, N B <nb.nos...@gmail.com> wrote: >>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> Does anyone know if there are any potential pitfalls associated with >>>>>>> using ThreadLocal variables in a Spark streaming application? One >>>>>>> things I >>>>>>> have seen mentioned in the context of app servers that use thread pools >>>>>>> is >>>>>>> that ThreadLocals can leak memory. Could this happen in Spark streaming >>>>>>> also? >>>>>>> >>>>>>> Thanks >>>>>>> Nikunj >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >