Atri, A concept of "transactional window" is needed for some applications that interact with external systems. A number of Malhar operators support it today. For example, a JDBC operator might perform all operations within a transaction that commences with the first write in a window and endWindow will commit the transaction. The engine provides the callbacks, the operator implements the transaction based on the capabilities of the external system. Note that this does not imply batching, it merely speaks to transaction demarcation.
But this is just part of the work needed to make the operator "transactional". Windows can be reprocessed based on the processing semantics. When a container goes down, the operator will reset to the recovery checkpoint and reprocess the windows from the checkpoint till the point where the failure occurred. Unless the processing done by the operator is idempotent, this would lead to incorrect results. For example, if the operation was "UPDATE sometable SET count = count + 1", we would double count. One technique to deal with this is to maintain the windowId as part of the state that gets committed to the external system. Now we can skip the processing if we find that the window was already processed. Of course, this requires that the upstream operators also deliver the tuples in an idempotent manner on a window replay. Thomas On Fri, Aug 28, 2015 at 2:14 PM, Chetan Narsude <[email protected]> wrote: > Atri, > > BEGIN_WINDOW, and END_WINDOW control events demarcate the the > transaction. We do not hold the first event after BEGIN_WINDOW hostage > until the END_WINDOW is received. This allows us to provide almost zero > latency at per tuple level. This is one of the the differentiating > paradigms for Apex. > > If we do it otherwise - the platform degrades to micro-batch processing > mode. More details about it here: > > > https://www.datatorrent.com/real-time-event-stream-processing-what-are-your-choices/ > > > Let me know if this answers your question or I misunderstood the question. > > -- > Chetan > > > > On Fri, Aug 28, 2015 at 1:37 PM, Atri Sharma <[email protected]> wrote: > > > Team, > > > > Does it make sense to have functionality to have all or nothing > > transactional system for windows? With future functionality to have > dynamic > > operators I feel it makes sense to allow data from an entire window to be > > processed or none of the data to be sent. > > > > I am not sure if window batching in its current form is a logical > > implementation of this feature. > > > > Thoughts? > > > > Regards, > > > > Atri > > >
