[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15276796#comment-15276796
 ] 

bright chen commented on APEXMALHAR-2076:
-----------------------------------------

There are some problems after discussed with Siyuan, Vlad and Pramod. Some of 
them already have solution, but some of them haven't have good solution. 
Following only care about the last window, as replay of other windows can 
simply ignored by comparing of window id.

1. Problem: Tuple value duplication: for example, the tuples of whole window is 
"a, a, a, b, b, b, c, c, c", but when operator crashed just before writing the 
last "b". When replay, the tuples became for example "a, b, c, a, b, c, a, b, 
c" 
Solution: This case can be handled by cache( or load the tuples Kafka ) to a 
map of value to count. So, for this case, the map is ( a=>3, b=>2). And when 
replay, decrease the count if have the entry for the value, and write to Kafka 
if don't have value entry. So, for the replay tuples ("a, b, c, a, b, c, a, b, 
c"). the behavior should like following:
input tuple         behavior                                               map 
after handle the tuple
a                        decrease count                                   ( 
a=>2, b=>2)
b                        decrease count                                   ( 
a=>2, b=>1)
c                        write to kafka                                      ( 
a=>2, b=>1)
a                       decrease count                                   ( 
a=>1, b=>1)
b                       decrease count and remove entry      ( a=>1 )
c                       write to kafka                                       ( 
a=>1 )
a                       decrease count and remove entry       ()
b                      write to kafka                                         ()
c                      write to kafka                                         ()


> AbstractExactlyOnceKafkaOutputOperator didn't handle the orderless of tuples 
> in a window
> ----------------------------------------------------------------------------------------
>
>                 Key: APEXMALHAR-2076
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2076
>             Project: Apache Apex Malhar
>          Issue Type: Bug
>            Reporter: bright chen
>            Assignee: bright chen
>
> The order of the tuples in the same window are not guaranteed in replay. 
> AbstractExactlyOnceKafkaOutputOperator's logic assume the replayed tuples 
> have same order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to