Re: Difference between setup() and activate()

Vlad Rozov Mon, 07 Aug 2017 08:53:51 -0700

There is no contract or guaranteed delay SLA between activate() andbeginWindow() either. The only guarantee is that setup() will be calledonce in an operator/port/component lifetime and activate() will becalled after setup() but before beginWindow(). Order in which setup()are called for an operator and its ports is not a contract and maychange in the future.


Thank you,


Vlad

On 8/4/17 20:01, Bhupesh Chawda wrote:

There's one more reason why you might need to initialize your transients in
activate rather than setup. If you are working with pojos and depend on the
class of the a tuple via a configured port attribute (TUPLE_CLASS), then
this class will only be available in the activate method, and not in the
setup.
The order of calls is operator setup -> port setup (where the TUPLE_CLASS
attribute is populated) -> operator activate (class available to work with).

~ Bhupesh


On Aug 5, 2017 06:01, "Pramod Immaneni" <[email protected]> wrote:

It may not end up making a big difference but it prevents any worker
threads you may have from starting off ahead of processing resulting in
unnecessary queue build up right from the start. This might still be the
case later in the processing but if it can be avoided to start off on the
wrong foot why not start the right way. Today the difference may be a few
hundred milliseconds today but that is not a contract and the interval of
execution between setup and activate could be higher in the future.

Thanks

On Fri, Aug 4, 2017 at 4:39 AM, Vlad Rozov <[email protected]> wrote:

This recommendation to use activate() over setup() is questionable with
the introduction of the back pressure. In a distributed streaming
application operators need to handle downstream downtime, difference
between upstream and downstream throughput, busy output ports and back
pressure. A few hundreds milliseconds difference between setup() and
activate() is not something that I would be concerned as an operator
developer once the above conditions are handled.

Thank you,

Vlad


On 8/3/17 15:37, Pramod Immaneni wrote:

Yes activate is called closer to start of tuple processing as far as

apex

is concerned, so if you are doing things like writing an input operator
that does asynchronous processing where you will start receiving data as
soon as you open a connection to your external source it is better to do
it
in activate to reduce latency and buffer build up.

On Thu, Aug 3, 2017 at 3:07 PM, Vlad Rozov <[email protected]> wrote:

Correct, both setup() and activate() are called when an operator is

restored from a checkpoint. When an operator is restored from a
checkpoint
it is considered to be a new instance/deployment of an operator with

it's

state reset to a checkpoint. In this case Apex core gives an operator a
chance to initialize transient fields both in setup() or activate().

I am not aware of any use case where platform will go through
activate/deactivate cycle without setup/teardown, but such code path

may

be
introduced in the future (for example it may be used to manage an input
operator with high emit rate). It is better not to make any assumptions
on
how many times activate/deactivate may be called.

Currently the main difference between setup() and activate() is

described

in the java doc for ActivationListener:

* An example of where one would consider implementing

ActivationListener

is an * input operator which wants to consume a high throughput stream.
Since there is * typically at least a few hundreds of milliseconds
between
the time the setup method * is called and the first window, you would
want
to place the code to activate the * stream inside activate instead of
setup.


My recommendation is to use setup() to initialize transient fields

unless

you need to deal with the above case.

Thank you,

Vlad


On 8/2/17 13:31, Ananth G wrote:

Hello Vlad,

Thanks for your response.

Do you refer to restoring from a checkpoint as serialize/deserialize

cycles?

Yes.

In case of restoring from a checkpoint (deserialization) setup() is a

part of a redeployment request, AFAIK.

This sounds a bit in contradiction to the response from Sanjay in

the

mail thread below. I tried to quickly glance in the apex-core code and
it
looks like both are being called ( Perhaps I am entirely wrong on this
as
it was only a quick scan). I was referring to the code in
StreamingContainer.java in the engine package and the method called
deploy().


Please see ActivationListener javadoc for details when it is necessary
to

use activate() vs setup().

I had to raise this question in the mail after going through the

javadoc. The javadoc is a bit cryptic in this scenario of
serialise/deserialize. Also the javadoc is not clear as to what we
meant by
activate/deactivate being called multiple times whereas setup is

called

once in a lifetime of the operator. If the setup is called once in
lifetime
of an operator per javadoc, did it mean once in the lifetime of the

JVM

instantiating via the constructor or across the deserialise cycles of
the
passivated operator state ? If it is once across all passivated
instances
of the operator, then setup() would not be called multiple times and
hence
not a great location for transient variables ? If setup() is called
across
deserialise cycles, then I find it more confusing as to why we need
setup()
and activate() methods almost having the same functionality.

Thoughts ?


Regards,
Ananth


On 1 Aug 2017, at 3:38 am, Vlad Rozov <[email protected]>

wrote:

Do you refer to restoring from a checkpoint as serialize/deserialize
cycles? There are no calls to setup/teardown and/or

activate/deactivate

during checkpointing/serialization. In case of restoring from a
checkpoint
(deserialization) setup() is a part of a redeployment request, AFAIK.
The
best answer to question 3 is it depends. In most cases using setup()

to

resolve all transient field is as good as doing that in activate().
Please
see ActivationListener javadoc for details when it is necessary to

use

activate() vs setup().

Thank you,

Vlad

On 7/29/17 19:58, Sanjay Pujare wrote:

The Javadoc comment

for com.datatorrent.api.Operator.ActivationListener<CONTEXT>  (in
https://github.com/apache/apex-core/blob/master/api/src/main
/java/com/datatorrent/api/Operator.java)
should hopefully answer your questions.

Specifically:

1. No, setup() is called only once in the entire lifetime (
http://apex.apache.org/docs/apex/operator_development/#setup-call)

2. Yes. When an operator is "activated" - first time in its life or
reactivation after a failover -  actuvate() is called before the

first

beginWindow() is called.

3. Yes.


On Sun, Jul 30, 2017 at 12:18 AM, Ananth G <[email protected]>
wrote:

Hello All,

I was looking at the documentation and could not get a clear
distinction
of behaviours for setup() and activate() during scenarios when an
operator
is passivated ( ex: application shutdown, repartition use cases )

and

being
brought back to life again. Could someone from the community advise
me
on
the following questions ?

1. Is setup() called in these scenarios (serialize/deserialize
cycles)
as
well ?

2. I am assuming activate() is called in these scenarios ? - The
javadoc
for activation states that the activate() can be called multiple
times
(
without explicitly stating why ) and my assumption is that it is
because of
these scenarios.

3. If setup() is only called once during the lifetime of an

operator

,
is
it fair to assume that activate() is the best place to resolve all

of

the
transient fields of an operator ?


Regards,
Ananth

Re: Difference between setup() and activate()

Reply via email to