[
https://issues.apache.org/jira/browse/APEXMALHAR-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557076#comment-15557076
]
Thomas Weise commented on APEXMALHAR-2283:
------------------------------------------
The exactly-once output logic is suspect. Why is it using the same key for all
messages (appId+operatorId), why does it track extra window state in the
operator and why does it rely on the hashcode of the object. In cases where the
application can provide a unique message id, it should also be possible to use
it for the key. It should be possible with the state stored in Kafka alone to
do the dedup.
The operator is also not easy to extend, we tried to implement output to topic
depending on the tuple and found ourselves stuck with some private methods and
unfriendly hooks.
There is a need for redesign and good example.
> Refactor kafka output operator
> ------------------------------
>
> Key: APEXMALHAR-2283
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2283
> Project: Apache Apex Malhar
> Issue Type: Improvement
> Reporter: Siyuan Hua
> Assignee: Siyuan Hua
>
> The abstract kafka output operator needs to be refactored
> 1. Needs to set some mandatory properties on operator level instead of kafka
> property level.
> 2. More document and examples
> 3. Find a standard way to achieve exactly once in both 0.8 and 0.9
> More will be added when working on the ticket
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)