Why restrict the control tuples to input operators?

On Sat, Jun 25, 2016 at 9:07 AM Amol Kekre <[email protected]> wrote:

> David,
> We should avoid control tuple within the window by simply restricting it
> through API. This can be done by calling something like "sendControlTuple"
> between windows, notably in input operators.
>
> Thks
> Amol
>
>
> On Sat, Jun 25, 2016 at 7:32 AM, Munagala Ramanath <[email protected]>
> wrote:
>
> > What would the API look like for option 1 ? Another operator callback
> > called controlTuple() or does the operator code have to check each
> > incoming tuple to see if it was data or control ?
> >
> > Ram
> >
> > On Fri, Jun 24, 2016 at 11:42 PM, David Yan <[email protected]>
> wrote:
> >
> > > It looks like option 1 is preferred by the community. But let me
> > elaborate
> > > why I brought up the option of piggy backing BEGIN and END_WINDOW
> > >
> > > Option 2 implicitly enforces that the operations related to the custom
> > > control tuple be done at the streaming window boundary.
> > >
> > > For most operations, it makes sense to have that enforcement. Option 1
> > > opens the door to the possibility of sending and handling control
> tuples
> > > within a window, thus imposing a challenge of ensuring idempotency. In
> > > fact, allowing that would make idempotency extremely difficult to
> > achieve.
> > >
> > > David
> > >
> > > On Fri, Jun 24, 2016 at 4:38 PM, Vlad Rozov <[email protected]>
> > > wrote:
> > >
> > > > +1 for option 1.
> > > >
> > > > Thank you,
> > > >
> > > > Vlad
> > > >
> > > >
> > > > On 6/24/16 14:35, Bright Chen wrote:
> > > >
> > > >> +1
> > > >> It also can help to Shutdown the application gracefully.
> > > >> Bright
> > > >>
> > > >> On Jun 24, 2016, at 1:35 PM, Siyuan Hua <[email protected]>
> > wrote:
> > > >>>
> > > >>> +1
> > > >>>
> > > >>> I think it's good to have custom control tuple and I prefer the 1
> > > option.
> > > >>>
> > > >>> Also I think we should think about couple different callbacks, that
> > > could
> > > >>> be operator level(triggered when an operator receives an control
> > tuple)
> > > >>> or
> > > >>> dag level(triggered when control tuple flow over the whole dag)
> > > >>>
> > > >>> Regards,
> > > >>> Siyuan
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> On Fri, Jun 24, 2016 at 12:42 PM, David Yan <[email protected]
> >
> > > >>> wrote:
> > > >>>
> > > >>> My initial thinking is that the custom control tuples, just like
> the
> > > >>>> existing control tuples, will only be generated from the input
> > > operators
> > > >>>> and will be propagated downstream to all operators in the DAG. So
> > the
> > > >>>> NxM
> > > >>>> partitioning scenario works just like how other control tuples
> work,
> > > >>>> i.e.
> > > >>>> the callback will not be called unless all ports have received the
> > > >>>> control
> > > >>>> tuple for a particular window. This creates a little bit of
> > > complication
> > > >>>> with multiple input operators though.
> > > >>>>
> > > >>>> David
> > > >>>>
> > > >>>>
> > > >>>> On Fri, Jun 24, 2016 at 12:03 PM, Tushar Gosavi <
> > > [email protected]
> > > >>>> >
> > > >>>> wrote:
> > > >>>>
> > > >>>> +1 for the feature
> > > >>>>>
> > > >>>>> I am in favor of option 1, but we may need an helper method to
> > avoid
> > > >>>>> compiler error on typed port, as calling port.emit(controlTuple)
> > will
> > > >>>>> be an error if type of control tuple and port does not match. or
> > new
> > > >>>>> method in outputPort object , emitControlTuple(ControlTuple).
> > > >>>>>
> > > >>>>> Can you give example of piggy backing tuple with current
> > BEGIN_WINDOW
> > > >>>>> and END_WINDOW control tuples?
> > > >>>>>
> > > >>>>> In case of NxM partitioning, each downstream operator will
> receive
> > N
> > > >>>>> control tuples. will it call user handler N times for each
> > downstream
> > > >>>>> operator or just once.
> > > >>>>>
> > > >>>>> Regards,
> > > >>>>> - Tushar.
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> On Fri, Jun 24, 2016 at 11:52 PM, David Yan <
> [email protected]
> > >
> > > >>>>>
> > > >>>> wrote:
> > > >>>>
> > > >>>>> Hi all,
> > > >>>>>>
> > > >>>>>> I would like to propose a new feature to the Apex core engine --
> > the
> > > >>>>>> support of custom control tuples. Currently, we have control
> > tuples
> > > >>>>>>
> > > >>>>> such
> > > >>>>
> > > >>>>> as
> > > >>>>>
> > > >>>>>> BEGIN_WINDOW, END_WINDOW, CHECKPOINT, and so on, but we don't
> have
> > > the
> > > >>>>>> support for applications to insert their own control tuples. The
> > way
> > > >>>>>> currently to get around this is to use data tuples and have a
> > > separate
> > > >>>>>>
> > > >>>>> port
> > > >>>>>
> > > >>>>>> for such tuples that sends tuples to all partitions of the
> > > downstream
> > > >>>>>> operators, which is not exactly developer friendly.
> > > >>>>>>
> > > >>>>>> We have already seen a number of use cases that can use this
> > > feature:
> > > >>>>>>
> > > >>>>>> 1) Batch support: We need to tell all operators of the physical
> > DAG
> > > >>>>>>
> > > >>>>> when
> > > >>>>
> > > >>>>> a
> > > >>>>>
> > > >>>>>> batch starts and ends, so the operators can do whatever that is
> > > needed
> > > >>>>>>
> > > >>>>> upon
> > > >>>>>
> > > >>>>>> the start or the end of a batch.
> > > >>>>>>
> > > >>>>>> 2) Watermark: To support the concepts of event time windowing,
> the
> > > >>>>>> watermark control tuple is needed to tell which windows should
> be
> > > >>>>>> considered late.
> > > >>>>>>
> > > >>>>>> 3) Changing operator properties: We do have the support of
> > changing
> > > >>>>>> operator properties on the fly, but with a custom control tuple,
> > the
> > > >>>>>> command to change operator properties can be window aligned for
> > all
> > > >>>>>> partitions and also across the DAG.
> > > >>>>>>
> > > >>>>>> 4) Recording tuples: Like changing operator properties, we do
> have
> > > >>>>>> this
> > > >>>>>> support now but only at the individual physical operator level,
> > and
> > > >>>>>>
> > > >>>>> without
> > > >>>>>
> > > >>>>>> control of which window to record tuples for. With a custom
> > control
> > > >>>>>>
> > > >>>>> tuple,
> > > >>>>>
> > > >>>>>> because a control tuple must belong to a window, all operators
> in
> > > the
> > > >>>>>>
> > > >>>>> DAG
> > > >>>>
> > > >>>>> can start (and stop) recording for the same windows.
> > > >>>>>>
> > > >>>>>> I can think of two options to achieve this:
> > > >>>>>>
> > > >>>>>> 1) new custom control tuple type that takes user's serializable
> > > >>>>>> object.
> > > >>>>>>
> > > >>>>>> 2) piggy back the current BEGIN_WINDOW and END_WINDOW control
> > > tuples.
> > > >>>>>>
> > > >>>>>> Please provide your feedback. Thank you.
> > > >>>>>>
> > > >>>>>> David
> > > >>>>>>
> > > >>>>>
> > > >
> > >
> >
>

Reply via email to