[
https://issues.apache.org/jira/browse/APEXMALHAR-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15278982#comment-15278982
]
bright chen commented on APEXMALHAR-2076:
-----------------------------------------
Solution for no duplicate tuples
Assumptions:
- assume the value of incoming tuples are not duplicate(at least in same
window) among all operator partitions.
- assume one Kafka partition can be written by multiple operator partitions
at the same time
- assume the the Kafka partition was decided by tuple value itself( not
depended on operator partition)
Notes:
- the order of data could be changed when replay.
- the data could go to the other partition when replay. For example if the
upstream operator failed.
Implementation: for each Kafka partition, load minimum last window and the
minimum offset of the last window of all operator partitions. And then load the
tuples from Kafka based on this minimum offset. When processing tuple, if the
window id is less than the minimum last window, just ignore the tuple. If
window id equals loaded minimum window id, and tuple equals any of loaded
tuple, ignore it. Else, send to Kafka
> AbstractExactlyOnceKafkaOutputOperator didn't handle the orderless of tuples
> in a window
> ----------------------------------------------------------------------------------------
>
> Key: APEXMALHAR-2076
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2076
> Project: Apache Apex Malhar
> Issue Type: Bug
> Reporter: bright chen
> Assignee: bright chen
>
> The order of the tuples in the same window are not guaranteed in replay.
> AbstractExactlyOnceKafkaOutputOperator's logic assume the replayed tuples
> have same order.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)