Why restrict the control tuples to input operators? On Sat, Jun 25, 2016 at 9:07 AM Amol Kekre <[email protected]> wrote:
> David, > We should avoid control tuple within the window by simply restricting it > through API. This can be done by calling something like "sendControlTuple" > between windows, notably in input operators. > > Thks > Amol > > > On Sat, Jun 25, 2016 at 7:32 AM, Munagala Ramanath <[email protected]> > wrote: > > > What would the API look like for option 1 ? Another operator callback > > called controlTuple() or does the operator code have to check each > > incoming tuple to see if it was data or control ? > > > > Ram > > > > On Fri, Jun 24, 2016 at 11:42 PM, David Yan <[email protected]> > wrote: > > > > > It looks like option 1 is preferred by the community. But let me > > elaborate > > > why I brought up the option of piggy backing BEGIN and END_WINDOW > > > > > > Option 2 implicitly enforces that the operations related to the custom > > > control tuple be done at the streaming window boundary. > > > > > > For most operations, it makes sense to have that enforcement. Option 1 > > > opens the door to the possibility of sending and handling control > tuples > > > within a window, thus imposing a challenge of ensuring idempotency. In > > > fact, allowing that would make idempotency extremely difficult to > > achieve. > > > > > > David > > > > > > On Fri, Jun 24, 2016 at 4:38 PM, Vlad Rozov <[email protected]> > > > wrote: > > > > > > > +1 for option 1. > > > > > > > > Thank you, > > > > > > > > Vlad > > > > > > > > > > > > On 6/24/16 14:35, Bright Chen wrote: > > > > > > > >> +1 > > > >> It also can help to Shutdown the application gracefully. > > > >> Bright > > > >> > > > >> On Jun 24, 2016, at 1:35 PM, Siyuan Hua <[email protected]> > > wrote: > > > >>> > > > >>> +1 > > > >>> > > > >>> I think it's good to have custom control tuple and I prefer the 1 > > > option. > > > >>> > > > >>> Also I think we should think about couple different callbacks, that > > > could > > > >>> be operator level(triggered when an operator receives an control > > tuple) > > > >>> or > > > >>> dag level(triggered when control tuple flow over the whole dag) > > > >>> > > > >>> Regards, > > > >>> Siyuan > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> On Fri, Jun 24, 2016 at 12:42 PM, David Yan <[email protected] > > > > > >>> wrote: > > > >>> > > > >>> My initial thinking is that the custom control tuples, just like > the > > > >>>> existing control tuples, will only be generated from the input > > > operators > > > >>>> and will be propagated downstream to all operators in the DAG. So > > the > > > >>>> NxM > > > >>>> partitioning scenario works just like how other control tuples > work, > > > >>>> i.e. > > > >>>> the callback will not be called unless all ports have received the > > > >>>> control > > > >>>> tuple for a particular window. This creates a little bit of > > > complication > > > >>>> with multiple input operators though. > > > >>>> > > > >>>> David > > > >>>> > > > >>>> > > > >>>> On Fri, Jun 24, 2016 at 12:03 PM, Tushar Gosavi < > > > [email protected] > > > >>>> > > > > >>>> wrote: > > > >>>> > > > >>>> +1 for the feature > > > >>>>> > > > >>>>> I am in favor of option 1, but we may need an helper method to > > avoid > > > >>>>> compiler error on typed port, as calling port.emit(controlTuple) > > will > > > >>>>> be an error if type of control tuple and port does not match. or > > new > > > >>>>> method in outputPort object , emitControlTuple(ControlTuple). > > > >>>>> > > > >>>>> Can you give example of piggy backing tuple with current > > BEGIN_WINDOW > > > >>>>> and END_WINDOW control tuples? > > > >>>>> > > > >>>>> In case of NxM partitioning, each downstream operator will > receive > > N > > > >>>>> control tuples. will it call user handler N times for each > > downstream > > > >>>>> operator or just once. > > > >>>>> > > > >>>>> Regards, > > > >>>>> - Tushar. > > > >>>>> > > > >>>>> > > > >>>>> > > > >>>>> On Fri, Jun 24, 2016 at 11:52 PM, David Yan < > [email protected] > > > > > > >>>>> > > > >>>> wrote: > > > >>>> > > > >>>>> Hi all, > > > >>>>>> > > > >>>>>> I would like to propose a new feature to the Apex core engine -- > > the > > > >>>>>> support of custom control tuples. Currently, we have control > > tuples > > > >>>>>> > > > >>>>> such > > > >>>> > > > >>>>> as > > > >>>>> > > > >>>>>> BEGIN_WINDOW, END_WINDOW, CHECKPOINT, and so on, but we don't > have > > > the > > > >>>>>> support for applications to insert their own control tuples. The > > way > > > >>>>>> currently to get around this is to use data tuples and have a > > > separate > > > >>>>>> > > > >>>>> port > > > >>>>> > > > >>>>>> for such tuples that sends tuples to all partitions of the > > > downstream > > > >>>>>> operators, which is not exactly developer friendly. > > > >>>>>> > > > >>>>>> We have already seen a number of use cases that can use this > > > feature: > > > >>>>>> > > > >>>>>> 1) Batch support: We need to tell all operators of the physical > > DAG > > > >>>>>> > > > >>>>> when > > > >>>> > > > >>>>> a > > > >>>>> > > > >>>>>> batch starts and ends, so the operators can do whatever that is > > > needed > > > >>>>>> > > > >>>>> upon > > > >>>>> > > > >>>>>> the start or the end of a batch. > > > >>>>>> > > > >>>>>> 2) Watermark: To support the concepts of event time windowing, > the > > > >>>>>> watermark control tuple is needed to tell which windows should > be > > > >>>>>> considered late. > > > >>>>>> > > > >>>>>> 3) Changing operator properties: We do have the support of > > changing > > > >>>>>> operator properties on the fly, but with a custom control tuple, > > the > > > >>>>>> command to change operator properties can be window aligned for > > all > > > >>>>>> partitions and also across the DAG. > > > >>>>>> > > > >>>>>> 4) Recording tuples: Like changing operator properties, we do > have > > > >>>>>> this > > > >>>>>> support now but only at the individual physical operator level, > > and > > > >>>>>> > > > >>>>> without > > > >>>>> > > > >>>>>> control of which window to record tuples for. With a custom > > control > > > >>>>>> > > > >>>>> tuple, > > > >>>>> > > > >>>>>> because a control tuple must belong to a window, all operators > in > > > the > > > >>>>>> > > > >>>>> DAG > > > >>>> > > > >>>>> can start (and stop) recording for the same windows. > > > >>>>>> > > > >>>>>> I can think of two options to achieve this: > > > >>>>>> > > > >>>>>> 1) new custom control tuple type that takes user's serializable > > > >>>>>> object. > > > >>>>>> > > > >>>>>> 2) piggy back the current BEGIN_WINDOW and END_WINDOW control > > > tuples. > > > >>>>>> > > > >>>>>> Please provide your feedback. Thank you. > > > >>>>>> > > > >>>>>> David > > > >>>>>> > > > >>>>> > > > > > > > > > >
