[
https://issues.apache.org/jira/browse/APEXMALHAR-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15319642#comment-15319642
]
Siyuan Hua commented on APEXMALHAR-2085:
----------------------------------------
There are pros and cons for the mix use of Beam, Guava API with our API
Pros:
Trigger and Window definition could be very complicated, use existing API could
save us some work for now.
Cons:
1. More dependency means more learning curve, Apex only user need to read box
Apex and Beam/Guava document to know how things work. Especially need to switch
between different website to read javadoc
2. Any API in Beam and Guava is designed for their purpose and Apex has no
control of it. Any semantic change in the future will cause unpredictable
consequences.
3. There is no clear boundary of which part of Beam API can be used. Especially
Window and Trigger classes are highly bundled with other APIs in Beam. It is
very difficult and confusing to use Window/Trigger API alone from Beam API set
I think we should stay with our own API though there might be some redundancy.
Apex API(especially abstract API) to me is an agreement between Apex platform
and Apex developer, have third-party library into our API is like bring another
person into the agreement. The complexity would grow exponentially and become
unpredictable. Having dependency in classpath and use them internally or even
use them as some parameters(for example guava cache object) in API is totally
fine.
Just my thoughts. :)
> Implement Windowed Operators
> ----------------------------
>
> Key: APEXMALHAR-2085
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2085
> Project: Apache Apex Malhar
> Issue Type: New Feature
> Reporter: Siyuan Hua
> Assignee: David Yan
>
> As per our recent several discussions in the community. A group of Windowed
> Operators that delivers the window semantic follows the google Data Flow
> model(https://cloud.google.com/dataflow/) is very important.
> The operators should be designed and implemented in a way for
> High-level API
> Beam translation
> Easy to use with other popular operator
> {panel:title=Operator Hierarchy}
> Hierarchy of the operators,
> The windowed operators should cover all possible transformations that require
> window, and batch processing is also considered as special window called
> global window
> {code}
> +-------------------+
> +---------> | WindowedOperator | <--------+
> | +--------+----------+ |
> | ^ ^--------------------------------+
> | | | |
> | | | |
> +------+--------+ +------+------+ +-------+-----+ +------+-----+
> |CombineOperator| |GroupOperator| |KeyedOperator| |JoinOperator|
> +---------------+ +-------------+ +------+------+ +-----+------+
> +---------^ ^ ^
> | | |
> +--------+---+ +-----+----+ +----+----+
> |KeyedCombine| |KeyedGroup| | CoGroup |
> +------------+ +----------+ +---------+
> {code}
> Combine operation includes all operations that combine all tuples in one
> window into one or small number of tuples, Group operation group all tuples
> in one window, Join and CoGroup are used to join and group tuples from
> different inputs.
> {panel}
> {panel:title=Components}
> * Window Component
> It includes configuration, window state that should be checkpointed, etc. It
> should support NonMergibleWindow(fixed or slide) MergibleWindow(Session)
> * Trigger
> It should support early trigger, late trigger with customizable trigger
> behaviour
> * Other related components:
> ** Watermark generator, can be plugged into input source to generate watermark
> ** Tuple schema support:
> It should handle either predefined tuple type or give a declarative API to
> describe the user defined tuple class
> {panel}
> Most component API should be reused in High-Level API
> This is the umbrella ticket, separate tickets would be created for different
> components and operators respectively
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)