Thanks Eno ! My intention is to reprocess all the data from the beginning. And we'll reset the application as documented in the Confluent blog. We don't want to keep the previous results; in fact, we want to overwrite them. Kafka Connect will happily replace all records in our sink database.
So I'll reset the streams app, them change the window duration times to 6 months until the application processes fresh messages, and then I'll restart the application with the original window duration time (without a reset this time). Let's hope Kafka Streams will detect this window duration change and drop old windows immediately ? 2017-01-12 17:06 GMT+01:00 Eno Thereska <eno.there...@gmail.com>: > Hi Nicolas, > > I've seen your previous message thread too. I think your best bet for now > is to increase the window duration time, to 6 months. > > If you change your application logic, e.g., by changing the duration time, > the semantics of the change wouldn't immediate be clear and it's worth > clarifying those. For example, would the intention be to reprocess all the > data from the beginning? Or start where you left off (in which case the > fact that the original processing went over data that is 6 month old would > not be relevant, since you'd start from where you left off the second > time)? Right now we support a limited way to reprocess the data by > effectively resetting a streams application (https://www.confluent.io/ > blog/data-reprocessing-with-kafka-streams-resetting-a-streams-application/ > <https://www.confluent.io/blog/data-reprocessing-with- > kafka-streams-resetting-a-streams-application/>). I wouldn't recommend > using that if you want to keep the results of the previous run though. > > Eno > > > On 12 Jan 2017, at 09:15, Nicolas Fouché <nfou...@onfocus.io> wrote: > > > > Hi. > > > > > > I'd like to re-consume 6 months old data with Kafka Streams. > > > > My current topology can't because it defines aggregations with windows > maintain durations of 3 days. > > TimeWindows.of(ONE_HOUR_MILLIS).until(THREE_DAYS_MILLIS) > > > > > > > > As discovered (and shared [1]) a few months ago, consuming a record > older than 3 days will mess up my aggregates. How do you deal with this ? > Do you temporarily raise the windows maintain durations until all records > are consumed ? Do you always run your topologies with long durations, like > a year ? I have no idea what would be the impact on the RAM and disk, but I > guess RocksDB would cry a little. > > > > > > Final question: il I raise the duration to 6 months, consume my records, > and then set the duration back to 3 days, would the old aggregates > automatically destroyed ? > > > > > > [1] http://mail-archives.apache.org/mod_mbox/kafka-users/201610.mbox/% > 3ccabqkjkj42n7z4bxjdkrdyz_kmpunh738uxvm7gy24dnkx+r...@mail.gmail.com%3e > > > > > > Thanks > > Nicolas > > > > > >