It is not latest and greatest; however, here is an Akka Streams GraphStage
implementation for deduplication:
https://squbs.readthedocs.io/en/latest/deduplicate/.  All happens in
memory, so you need to watch for memory growing and potentially pass a
custom registry that self cleans after a while.  Source code is at
https://github.com/paypal/squbs/blob/master/squbs-ext/src/main/scala/org/squbs/streams/Deduplicate.scala
.

Thanks,
Anil

On Fri, Jun 23, 2017 at 11:42 AM, Shiva Ramagopal <tr.s...@gmail.com> wrote:

> Hi,
>
> I'm looking for the latest and greatest techniques and thoughts in stream
> deduplication and would love to know if anyone here has done this at scale.
> Specifically, I'm looking for deduping that also handles late-arriving
> messages.
>
> In the past few days of my search, I've mostly come across ideas and
> implementations like
>
> - Batching the stream based on time windows (non-overlapping) and deduping
> within the batch
> - Possible improvements on the above technique using overlaping time
> windows
> - HDFS-specific cases where a stream is consumed (pretty batchy), written
> to HDFS and deduped there
>
> My source is Kafka, if that helps.
>
> Thanks
> Shiv
>
> --
> >>>>>>>>>> Read the docs: http://akka.io/docs/
> >>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/
> current/additional/faq.html
> >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
> ---
> You received this message because you are subscribed to the Google Groups
> "Akka User List" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to akka-user+unsubscr...@googlegroups.com.
> To post to this group, send email to akka-user@googlegroups.com.
> Visit this group at https://groups.google.com/group/akka-user.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Reply via email to