Francis Fernandes created APEXMALHAR-2309:
---------------------------------------------

             Summary: TimeBasedDedupOperator marks new tuples as duplicates if 
expired tuples exist
                 Key: APEXMALHAR-2309
                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2309
             Project: Apache Apex Malhar
          Issue Type: Bug
    Affects Versions: 3.5.0
            Reporter: Francis Fernandes
            Assignee: Francis Fernandes


The deduper marks valid tuples outside the expiry window as duplicates. 

Consider the following configuration (number of buckets = 1 )
{code}
  <property>
    <name>dt.application.DedupTestApp.operator.Deduper.prop.expireBefore</name>
    <value>10</value>
  </property>
  <property>
    <name>dt.application.DedupTestApp.operator.Deduper.prop.bucketSpan</name>
    <value>10</value>
  </property>
{code}

The data piped in is : 
{code}
"10",1474614305000,"Test"
"11",1474614315000,"Test"
"10",1474614325000,"Test"
{code}

The 3rd tuple is valid since it is outside of the expiry window. But it is 
marked as duplicate because although the first tuple although expired is still 
present in the Bucket.flash.

The issue happens when the expiry duration lesser than the checkpointing 
duration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to