Hi Sean

Could you please elaborate on how can this be done on a per partition
basis?

Regards
Sumit Chawla


On Thu, Oct 27, 2016 at 7:44 AM, Walter rakoff <walter.rak...@gmail.com>
wrote:

> Thanks for the info Sean.
>
> I'm initializing them in a singleton but Scala objects are evaluated
> lazily.
> So it gets initialized only when the first task is run(and makes use of
> the object).
> Plan is to start a background thread in the object that does periodic
> cache refresh too.
> I'm trying to see if this init can be done right when executor is created.
>
> Btw, this is for a Spark streaming app. So doing this per partition during
> each batch isn't ideal.
> I'd like to keep them(connect & cache) across batches.
>
> Finally, how do I setup the shutdown hook on an executor? Except for
> operations on RDD everything else is executed in the driver.
> All I can think of is something like this
>     sc.makeRDD((1 until sc.defaultParallelism), sc.defaultParallelism)
>        .foreachPartition(sys.ShutdownHookThread { Singleton.DoCleanup() }
> )
>
> Walt
>
> On Thu, Oct 27, 2016 at 3:05 AM, Sean Owen <so...@cloudera.com> wrote:
>
>> Init is easy -- initialize them in your singleton.
>> Shutdown is harder; a shutdown hook is probably the only reliable way to
>> go.
>> Global state is not ideal in Spark. Consider initializing things like
>> connections per partition, and open/close them with the lifecycle of a
>> computation on a partition instead.
>>
>> On Wed, Oct 26, 2016 at 9:27 PM Walter rakoff <walter.rak...@gmail.com>
>> wrote:
>>
>>> Hello,
>>>
>>> Is there a way I can add an init() call when an executor is created? I'd
>>> like to initialize a few connections that are part of my singleton object.
>>> Preferably this happens before it runs the first task
>>> On the same line, how can I provide an shutdown hook that cleans up
>>> these connections on termination.
>>>
>>> Thanks
>>> Walt
>>>
>>
>

Reply via email to