Hi guys,

I just create a ticket for windowed operators. It is the prerequisite for
High-level API and Beam integration.

As per our recent several discussions in the community. A group of Windowed
Operators that delivers the window semantic follows the google Data Flow
model(https://cloud.google.com/dataflow/) is very important.
The operators should be designed and implemented in a way for
High-level API
Beam translation
Easy to use with other popular operator

Hierarchy of the operators,
The windowed operators should cover all possible transformations that
require window, and batch processing is also considered as special window
called global window

                   +-------------------+
       +---------> |  WindowedOperator | <--------+
       |           +--------+----------+          |
       |                    ^      ^--------------------------------+
       |                    |                     |                 |
       |                    |                     |                 |
+------+--------+    +------+------+      +-------+-----+    +------+-----+
|CombineOperator|    |GroupOperator|      |KeyedOperator|    |JoinOperator|
+---------------+    +-------------+      +------+------+    +-----+------+
                                   +---------^   ^                 ^
                                   |             |                 |
                          +--------+---+   +-----+----+       +----+----+
                          |KeyedCombine|   |KeyedGroup|       | CoGroup |
                          +------------+   +----------+       +---------+


Combine operation includes all operations that combine all tuples in one
window into one or small number of tuples, Group operation group all tuples
in one window, Join and CoGroup are used to join and group tuples from
different inputs.

Components:
Window Component, it includes configuration, window state that should be
checkpointed, etc. It should support NonMergibleWindow(fixed or slide)
MergibleWindow(Session)

 Trigger:
It should support early trigger, late trigger. It should support
customizable trigger behaviour

Outside components:
Watermark generator, can be plugged into input source to generate watermark
Tuple util components, The operator should be only working on Tuples with
schema. It can handle either predefined tuple type or give a declarative
API to describe the user defined tuple class

Most component API should be reused in High-Level API

This is the umbrella ticket, separate tickets would be created for
different components and operators respectively

Reply via email to