What would the API look like for option 1 ? Another operator callback called controlTuple() or does the operator code have to check each incoming tuple to see if it was data or control ?
Ram On Fri, Jun 24, 2016 at 11:42 PM, David Yan <[email protected]> wrote: > It looks like option 1 is preferred by the community. But let me elaborate > why I brought up the option of piggy backing BEGIN and END_WINDOW > > Option 2 implicitly enforces that the operations related to the custom > control tuple be done at the streaming window boundary. > > For most operations, it makes sense to have that enforcement. Option 1 > opens the door to the possibility of sending and handling control tuples > within a window, thus imposing a challenge of ensuring idempotency. In > fact, allowing that would make idempotency extremely difficult to achieve. > > David > > On Fri, Jun 24, 2016 at 4:38 PM, Vlad Rozov <[email protected]> > wrote: > > > +1 for option 1. > > > > Thank you, > > > > Vlad > > > > > > On 6/24/16 14:35, Bright Chen wrote: > > > >> +1 > >> It also can help to Shutdown the application gracefully. > >> Bright > >> > >> On Jun 24, 2016, at 1:35 PM, Siyuan Hua <[email protected]> wrote: > >>> > >>> +1 > >>> > >>> I think it's good to have custom control tuple and I prefer the 1 > option. > >>> > >>> Also I think we should think about couple different callbacks, that > could > >>> be operator level(triggered when an operator receives an control tuple) > >>> or > >>> dag level(triggered when control tuple flow over the whole dag) > >>> > >>> Regards, > >>> Siyuan > >>> > >>> > >>> > >>> > >>> On Fri, Jun 24, 2016 at 12:42 PM, David Yan <[email protected]> > >>> wrote: > >>> > >>> My initial thinking is that the custom control tuples, just like the > >>>> existing control tuples, will only be generated from the input > operators > >>>> and will be propagated downstream to all operators in the DAG. So the > >>>> NxM > >>>> partitioning scenario works just like how other control tuples work, > >>>> i.e. > >>>> the callback will not be called unless all ports have received the > >>>> control > >>>> tuple for a particular window. This creates a little bit of > complication > >>>> with multiple input operators though. > >>>> > >>>> David > >>>> > >>>> > >>>> On Fri, Jun 24, 2016 at 12:03 PM, Tushar Gosavi < > [email protected] > >>>> > > >>>> wrote: > >>>> > >>>> +1 for the feature > >>>>> > >>>>> I am in favor of option 1, but we may need an helper method to avoid > >>>>> compiler error on typed port, as calling port.emit(controlTuple) will > >>>>> be an error if type of control tuple and port does not match. or new > >>>>> method in outputPort object , emitControlTuple(ControlTuple). > >>>>> > >>>>> Can you give example of piggy backing tuple with current BEGIN_WINDOW > >>>>> and END_WINDOW control tuples? > >>>>> > >>>>> In case of NxM partitioning, each downstream operator will receive N > >>>>> control tuples. will it call user handler N times for each downstream > >>>>> operator or just once. > >>>>> > >>>>> Regards, > >>>>> - Tushar. > >>>>> > >>>>> > >>>>> > >>>>> On Fri, Jun 24, 2016 at 11:52 PM, David Yan <[email protected]> > >>>>> > >>>> wrote: > >>>> > >>>>> Hi all, > >>>>>> > >>>>>> I would like to propose a new feature to the Apex core engine -- the > >>>>>> support of custom control tuples. Currently, we have control tuples > >>>>>> > >>>>> such > >>>> > >>>>> as > >>>>> > >>>>>> BEGIN_WINDOW, END_WINDOW, CHECKPOINT, and so on, but we don't have > the > >>>>>> support for applications to insert their own control tuples. The way > >>>>>> currently to get around this is to use data tuples and have a > separate > >>>>>> > >>>>> port > >>>>> > >>>>>> for such tuples that sends tuples to all partitions of the > downstream > >>>>>> operators, which is not exactly developer friendly. > >>>>>> > >>>>>> We have already seen a number of use cases that can use this > feature: > >>>>>> > >>>>>> 1) Batch support: We need to tell all operators of the physical DAG > >>>>>> > >>>>> when > >>>> > >>>>> a > >>>>> > >>>>>> batch starts and ends, so the operators can do whatever that is > needed > >>>>>> > >>>>> upon > >>>>> > >>>>>> the start or the end of a batch. > >>>>>> > >>>>>> 2) Watermark: To support the concepts of event time windowing, the > >>>>>> watermark control tuple is needed to tell which windows should be > >>>>>> considered late. > >>>>>> > >>>>>> 3) Changing operator properties: We do have the support of changing > >>>>>> operator properties on the fly, but with a custom control tuple, the > >>>>>> command to change operator properties can be window aligned for all > >>>>>> partitions and also across the DAG. > >>>>>> > >>>>>> 4) Recording tuples: Like changing operator properties, we do have > >>>>>> this > >>>>>> support now but only at the individual physical operator level, and > >>>>>> > >>>>> without > >>>>> > >>>>>> control of which window to record tuples for. With a custom control > >>>>>> > >>>>> tuple, > >>>>> > >>>>>> because a control tuple must belong to a window, all operators in > the > >>>>>> > >>>>> DAG > >>>> > >>>>> can start (and stop) recording for the same windows. > >>>>>> > >>>>>> I can think of two options to achieve this: > >>>>>> > >>>>>> 1) new custom control tuple type that takes user's serializable > >>>>>> object. > >>>>>> > >>>>>> 2) piggy back the current BEGIN_WINDOW and END_WINDOW control > tuples. > >>>>>> > >>>>>> Please provide your feedback. Thank you. > >>>>>> > >>>>>> David > >>>>>> > >>>>> > > >
