[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15601112#comment-15601112
 ] 

ASF GitHub Bot commented on APEXMALHAR-2309:
--------------------------------------------

Github user francisf closed the pull request at:

    https://github.com/apache/apex-malhar/pull/464


> TimeBasedDedupOperator marks new tuples as duplicates if expired tuples exist
> -----------------------------------------------------------------------------
>
>                 Key: APEXMALHAR-2309
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2309
>             Project: Apache Apex Malhar
>          Issue Type: Bug
>    Affects Versions: 3.5.0
>            Reporter: Francis Fernandes
>            Assignee: Francis Fernandes
>
> The deduper marks valid tuples outside the expiry window as duplicates. 
> Consider the following configuration (number of buckets = 1 )
> {code}
>   <property>
>     
> <name>dt.application.DedupTestApp.operator.Deduper.prop.expireBefore</name>
>     <value>10</value>
>   </property>
>   <property>
>     <name>dt.application.DedupTestApp.operator.Deduper.prop.bucketSpan</name>
>     <value>10</value>
>   </property>
> {code}
> The data piped in is : 
> {code}
> "10",1474614305000,"Test"
> "11",1474614315000,"Test"
> "10",1474614325000,"Test"
> {code}
> The 3rd tuple is valid since it is outside of the expiry window. But it is 
> marked as duplicate because although the first tuple although expired is 
> still present in the Bucket.flash.
> The issue happens when the expiry duration lesser than the checkpointing 
> duration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to