Hi Sean Could you please elaborate on how can this be done on a per partition basis?
Regards Sumit Chawla On Thu, Oct 27, 2016 at 7:44 AM, Walter rakoff <walter.rak...@gmail.com> wrote: > Thanks for the info Sean. > > I'm initializing them in a singleton but Scala objects are evaluated > lazily. > So it gets initialized only when the first task is run(and makes use of > the object). > Plan is to start a background thread in the object that does periodic > cache refresh too. > I'm trying to see if this init can be done right when executor is created. > > Btw, this is for a Spark streaming app. So doing this per partition during > each batch isn't ideal. > I'd like to keep them(connect & cache) across batches. > > Finally, how do I setup the shutdown hook on an executor? Except for > operations on RDD everything else is executed in the driver. > All I can think of is something like this > sc.makeRDD((1 until sc.defaultParallelism), sc.defaultParallelism) > .foreachPartition(sys.ShutdownHookThread { Singleton.DoCleanup() } > ) > > Walt > > On Thu, Oct 27, 2016 at 3:05 AM, Sean Owen <so...@cloudera.com> wrote: > >> Init is easy -- initialize them in your singleton. >> Shutdown is harder; a shutdown hook is probably the only reliable way to >> go. >> Global state is not ideal in Spark. Consider initializing things like >> connections per partition, and open/close them with the lifecycle of a >> computation on a partition instead. >> >> On Wed, Oct 26, 2016 at 9:27 PM Walter rakoff <walter.rak...@gmail.com> >> wrote: >> >>> Hello, >>> >>> Is there a way I can add an init() call when an executor is created? I'd >>> like to initialize a few connections that are part of my singleton object. >>> Preferably this happens before it runs the first task >>> On the same line, how can I provide an shutdown hook that cleans up >>> these connections on termination. >>> >>> Thanks >>> Walt >>> >> >