[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15278982#comment-15278982
 ] 

bright chen commented on APEXMALHAR-2076:
-----------------------------------------

Solution for no duplicate tuples
Assumptions: 
  - assume the value of incoming tuples are not duplicate(at least in same 
window) among all operator partitions.
  - assume one Kafka partition can be written by multiple operator partitions 
at the same time
  - assume the the Kafka partition was decided by tuple value itself( not 
depended on operator partition)
 
Notes: 
  - the order of data could be changed when replay.
  - the data could go to the other partition when replay. For example if the 
upstream operator failed.
   
Implementation: for each Kafka partition, load minimum last window and the 
minimum offset of the last window of all operator partitions. And then load the 
tuples from Kafka based on this minimum offset. When processing tuple, if the 
window id is less than the minimum last window, just ignore the tuple. If 
window id equals loaded minimum window id, and tuple equals any of loaded 
tuple, ignore it. Else, send to Kafka

> AbstractExactlyOnceKafkaOutputOperator didn't handle the orderless of tuples 
> in a window
> ----------------------------------------------------------------------------------------
>
>                 Key: APEXMALHAR-2076
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2076
>             Project: Apache Apex Malhar
>          Issue Type: Bug
>            Reporter: bright chen
>            Assignee: bright chen
>
> The order of the tuples in the same window are not guaranteed in replay. 
> AbstractExactlyOnceKafkaOutputOperator's logic assume the replayed tuples 
> have same order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to