[ https://issues.apache.org/jira/browse/APEXMALHAR-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15601112#comment-15601112 ]
ASF GitHub Bot commented on APEXMALHAR-2309: -------------------------------------------- Github user francisf closed the pull request at: https://github.com/apache/apex-malhar/pull/464 > TimeBasedDedupOperator marks new tuples as duplicates if expired tuples exist > ----------------------------------------------------------------------------- > > Key: APEXMALHAR-2309 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2309 > Project: Apache Apex Malhar > Issue Type: Bug > Affects Versions: 3.5.0 > Reporter: Francis Fernandes > Assignee: Francis Fernandes > > The deduper marks valid tuples outside the expiry window as duplicates. > Consider the following configuration (number of buckets = 1 ) > {code} > <property> > > <name>dt.application.DedupTestApp.operator.Deduper.prop.expireBefore</name> > <value>10</value> > </property> > <property> > <name>dt.application.DedupTestApp.operator.Deduper.prop.bucketSpan</name> > <value>10</value> > </property> > {code} > The data piped in is : > {code} > "10",1474614305000,"Test" > "11",1474614315000,"Test" > "10",1474614325000,"Test" > {code} > The 3rd tuple is valid since it is outside of the expiry window. But it is > marked as duplicate because although the first tuple although expired is > still present in the Bucket.flash. > The issue happens when the expiry duration lesser than the checkpointing > duration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)