[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15313339#comment-15313339
 ] 

Thomas Weise commented on APEXMALHAR-2085:
------------------------------------------

The windowed operator base will consist of multiple pluggable components. We 
will need to be a PoC to better demarcate those and ensure they fit the use 
cases (aggregation, inner/outer join, topN/sort etc.)

Based on our learning with dimension computation, we should be able to fork off 
the pluggable storage fairly soon. This should be based on the spillable data 
structures with mock/in-memory (for testing), Managed state or HDHT as backend:

 - Review spillable DS interfaces
 - PoC with in-memory impl
 - Spillable implementation for HDHT
 - Spillable impl for ManagedState

Windowed operator base class
  - identify components to cover the various scenarios
  - accumulating vs discarding
  - session windows
  - usage for count, join

Further discussion:
 - watermark handling
 - should we use the Beam API windowing classes directly


> Implement Windowed Operators
> ----------------------------
>
>                 Key: APEXMALHAR-2085
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2085
>             Project: Apache Apex Malhar
>          Issue Type: New Feature
>            Reporter: Siyuan Hua
>            Assignee: Siyuan Hua
>
> As per our recent several discussions in the community. A group of Windowed 
> Operators that delivers the window semantic follows the google Data Flow 
> model(https://cloud.google.com/dataflow/) is very important. 
> The operators should be designed and implemented in a way for 
> High-level API
> Beam translation
> Easy to use with other popular operator
> {panel:title=Operator Hierarchy}
> Hierarchy of the operators,
> The windowed operators should cover all possible transformations that require 
> window, and batch processing is also considered as special window called 
> global window
> {code}
>                    +-------------------+
>        +---------> |  WindowedOperator | <--------+
>        |           +--------+----------+          |
>        |                    ^      ^--------------------------------+
>        |                    |                     |                 |
>        |                    |                     |                 |
> +------+--------+    +------+------+      +-------+-----+    +------+-----+
> |CombineOperator|    |GroupOperator|      |KeyedOperator|    |JoinOperator|
> +---------------+    +-------------+      +------+------+    +-----+------+
>                                    +---------^   ^                 ^
>                                    |             |                 |
>                           +--------+---+   +-----+----+       +----+----+
>                           |KeyedCombine|   |KeyedGroup|       | CoGroup |
>                           +------------+   +----------+       +---------+
> {code}
> Combine operation includes all operations that combine all tuples in one 
> window into one or small number of tuples, Group operation group all tuples 
> in one window, Join and CoGroup are used to join and group tuples from 
> different inputs.
> {panel}
> {panel:title=Components}
> * Window Component
> It includes configuration, window state that should be checkpointed, etc. It 
> should support NonMergibleWindow(fixed or slide) MergibleWindow(Session)
> * Trigger
> It should support early trigger, late trigger with customizable trigger 
> behaviour 
> * Other related components:
> ** Watermark generator, can be plugged into input source to generate watermark
> ** Tuple schema support:
> It should handle either predefined tuple type or give a declarative API to 
> describe the user defined tuple class
> {panel}
> Most component API should be reused in High-Level API
> This is the umbrella ticket, separate tickets would be created for different 
> components and operators respectively 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to