Re: [DISCUSSION] Custom Control Tuples

Munagala Ramanath Sat, 25 Jun 2016 07:33:11 -0700

What would the API look like for option 1 ? Another operator callback
called controlTuple() or does the operator code have to check each
incoming tuple to see if it was data or control ?


Ram

On Fri, Jun 24, 2016 at 11:42 PM, David Yan <[email protected]> wrote:

> It looks like option 1 is preferred by the community. But let me elaborate
> why I brought up the option of piggy backing BEGIN and END_WINDOW
>
> Option 2 implicitly enforces that the operations related to the custom
> control tuple be done at the streaming window boundary.
>
> For most operations, it makes sense to have that enforcement. Option 1
> opens the door to the possibility of sending and handling control tuples
> within a window, thus imposing a challenge of ensuring idempotency. In
> fact, allowing that would make idempotency extremely difficult to achieve.
>
> David
>
> On Fri, Jun 24, 2016 at 4:38 PM, Vlad Rozov <[email protected]>
> wrote:
>
> > +1 for option 1.
> >
> > Thank you,
> >
> > Vlad
> >
> >
> > On 6/24/16 14:35, Bright Chen wrote:
> >
> >> +1
> >> It also can help to Shutdown the application gracefully.
> >> Bright
> >>
> >> On Jun 24, 2016, at 1:35 PM, Siyuan Hua <[email protected]> wrote:
> >>>
> >>> +1
> >>>
> >>> I think it's good to have custom control tuple and I prefer the 1
> option.
> >>>
> >>> Also I think we should think about couple different callbacks, that
> could
> >>> be operator level(triggered when an operator receives an control tuple)
> >>> or
> >>> dag level(triggered when control tuple flow over the whole dag)
> >>>
> >>> Regards,
> >>> Siyuan
> >>>
> >>>
> >>>
> >>>
> >>> On Fri, Jun 24, 2016 at 12:42 PM, David Yan <[email protected]>
> >>> wrote:
> >>>
> >>> My initial thinking is that the custom control tuples, just like the
> >>>> existing control tuples, will only be generated from the input
> operators
> >>>> and will be propagated downstream to all operators in the DAG. So the
> >>>> NxM
> >>>> partitioning scenario works just like how other control tuples work,
> >>>> i.e.
> >>>> the callback will not be called unless all ports have received the
> >>>> control
> >>>> tuple for a particular window. This creates a little bit of
> complication
> >>>> with multiple input operators though.
> >>>>
> >>>> David
> >>>>
> >>>>
> >>>> On Fri, Jun 24, 2016 at 12:03 PM, Tushar Gosavi <
> [email protected]
> >>>> >
> >>>> wrote:
> >>>>
> >>>> +1 for the feature
> >>>>>
> >>>>> I am in favor of option 1, but we may need an helper method to avoid
> >>>>> compiler error on typed port, as calling port.emit(controlTuple) will
> >>>>> be an error if type of control tuple and port does not match. or new
> >>>>> method in outputPort object , emitControlTuple(ControlTuple).
> >>>>>
> >>>>> Can you give example of piggy backing tuple with current BEGIN_WINDOW
> >>>>> and END_WINDOW control tuples?
> >>>>>
> >>>>> In case of NxM partitioning, each downstream operator will receive N
> >>>>> control tuples. will it call user handler N times for each downstream
> >>>>> operator or just once.
> >>>>>
> >>>>> Regards,
> >>>>> - Tushar.
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Fri, Jun 24, 2016 at 11:52 PM, David Yan <[email protected]>
> >>>>>
> >>>> wrote:
> >>>>
> >>>>> Hi all,
> >>>>>>
> >>>>>> I would like to propose a new feature to the Apex core engine -- the
> >>>>>> support of custom control tuples. Currently, we have control tuples
> >>>>>>
> >>>>> such
> >>>>
> >>>>> as
> >>>>>
> >>>>>> BEGIN_WINDOW, END_WINDOW, CHECKPOINT, and so on, but we don't have
> the
> >>>>>> support for applications to insert their own control tuples. The way
> >>>>>> currently to get around this is to use data tuples and have a
> separate
> >>>>>>
> >>>>> port
> >>>>>
> >>>>>> for such tuples that sends tuples to all partitions of the
> downstream
> >>>>>> operators, which is not exactly developer friendly.
> >>>>>>
> >>>>>> We have already seen a number of use cases that can use this
> feature:
> >>>>>>
> >>>>>> 1) Batch support: We need to tell all operators of the physical DAG
> >>>>>>
> >>>>> when
> >>>>
> >>>>> a
> >>>>>
> >>>>>> batch starts and ends, so the operators can do whatever that is
> needed
> >>>>>>
> >>>>> upon
> >>>>>
> >>>>>> the start or the end of a batch.
> >>>>>>
> >>>>>> 2) Watermark: To support the concepts of event time windowing, the
> >>>>>> watermark control tuple is needed to tell which windows should be
> >>>>>> considered late.
> >>>>>>
> >>>>>> 3) Changing operator properties: We do have the support of changing
> >>>>>> operator properties on the fly, but with a custom control tuple, the
> >>>>>> command to change operator properties can be window aligned for all
> >>>>>> partitions and also across the DAG.
> >>>>>>
> >>>>>> 4) Recording tuples: Like changing operator properties, we do have
> >>>>>> this
> >>>>>> support now but only at the individual physical operator level, and
> >>>>>>
> >>>>> without
> >>>>>
> >>>>>> control of which window to record tuples for. With a custom control
> >>>>>>
> >>>>> tuple,
> >>>>>
> >>>>>> because a control tuple must belong to a window, all operators in
> the
> >>>>>>
> >>>>> DAG
> >>>>
> >>>>> can start (and stop) recording for the same windows.
> >>>>>>
> >>>>>> I can think of two options to achieve this:
> >>>>>>
> >>>>>> 1) new custom control tuple type that takes user's serializable
> >>>>>> object.
> >>>>>>
> >>>>>> 2) piggy back the current BEGIN_WINDOW and END_WINDOW control
> tuples.
> >>>>>>
> >>>>>> Please provide your feedback. Thank you.
> >>>>>>
> >>>>>> David
> >>>>>>
> >>>>>
> >
>

Re: [DISCUSSION] Custom Control Tuples

Reply via email to