Hi All,

I am working on adding a De-duplication operator in Malhar library based on
managed state APIs. I will be working off the already created JIRA -
https://issues.apache.org/jira/browse/APEXMALHAR-1701 and the initial pull
request for an AbstractDeduper here:
https://github.com/apache/apex-malhar/pull/260/files

I am planning to include the following features in the first version:
1. Time based de-duplication. Assumption: Tuple_Key -> Tuple_Time
correlation holds.
2. Option to maintain order of incoming tuples.
3. Duplicate and Expired ports to emit duplicate and expired tuples
respectively.

Thanks.

~ Bhupesh

Reply via email to