Thanks for the info Sean.

I'm initializing them in a singleton but Scala objects are evaluated lazily.
So it gets initialized only when the first task is run(and makes use of the
object).
Plan is to start a background thread in the object that does periodic cache
refresh too.
I'm trying to see if this init can be done right when executor is created.

Btw, this is for a Spark streaming app. So doing this per partition during
each batch isn't ideal.
I'd like to keep them(connect & cache) across batches.

Finally, how do I setup the shutdown hook on an executor? Except for
operations on RDD everything else is executed in the driver.
All I can think of is something like this
    sc.makeRDD((1 until sc.defaultParallelism), sc.defaultParallelism)
       .foreachPartition(sys.ShutdownHookThread { Singleton.DoCleanup() } )

Walt

On Thu, Oct 27, 2016 at 3:05 AM, Sean Owen <so...@cloudera.com> wrote:

> Init is easy -- initialize them in your singleton.
> Shutdown is harder; a shutdown hook is probably the only reliable way to
> go.
> Global state is not ideal in Spark. Consider initializing things like
> connections per partition, and open/close them with the lifecycle of a
> computation on a partition instead.
>
> On Wed, Oct 26, 2016 at 9:27 PM Walter rakoff <walter.rak...@gmail.com>
> wrote:
>
>> Hello,
>>
>> Is there a way I can add an init() call when an executor is created? I'd
>> like to initialize a few connections that are part of my singleton object.
>> Preferably this happens before it runs the first task
>> On the same line, how can I provide an shutdown hook that cleans up these
>> connections on termination.
>>
>> Thanks
>> Walt
>>
>

Reply via email to