Thanks Matthias, For #1 this is something we have done already, but it can obviously take a while to catch up for large topics (or may not catch up at all if the volume is large).
For #2, this is what I am wondering about, if anyone knows of any implementations that provide an option like this under the hood of the consumer, some sort of SynchronizedConsumer. I can see doing it on a per-client basis, but it would be great to abstract it out into a general approach. I'll think about this one more. Thanks On Tue, Jul 2, 2019 at 11:27 PM Matthias J. Sax <matth...@confluent.io> wrote: > I think you can only achieve this, if > > 1) you don't use two clients, but only one client that reads both > partitions > > or > > 2) let both clients exchange data about their time progress > > > -Matthias > > > On 7/2/19 6:01 PM, Adam Bellemare wrote: > > Hi All > > > > The use-case is pretty simple. Lets say we have a history of events with > > the following: > > key=userId, value = (timestamp, productId) > > > > and we want to remap it to (just as we would with an internal topic): > > key=productId, value=(original_timestamp, userId) > > > > Now, say I have 30 days of backlog, and 2 partitions for the input > topic. I > > spin up two instances and let them process the data from the start of > time, > > but one instance is only half as powerful (less CPU, Mem, etc), such that > > instance 0 processes X events / sec which instance 1 processes x/2 events > > /sec. > > > > My question is: Are there *any* clients, kafka streams, spark, flink, etc > > or otherwise, that would allow these two consumers to remain in sync > *according > > to their timestamps*? I don't want to see events with original_timestamp > of > > today (from instance 0) interleaved with events from 15 days ago (from > the > > underpowered instance 1). Yes, I do realize this would bring my > throughput > > down, but I am looking for any existing tech that would effectively say > *"cap > > the time difference of events coming out of this repartition processor at > > 60 seconds max"* > > > > Currently, I am not aware of ANY open source solutions that do this for > > Kafka, but if someone has heard otherwise I would love to know. > > Alternately, perhaps this could lead to a KIP. > > > > Thanks > > Adam > > > >