[
https://issues.apache.org/jira/browse/KAFKA-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Henry Cai updated KAFKA-3596:
-----------------------------
Description:
Currently in Kafka Streams, the way the windows are expired in RocksDB is
triggered by new event insertion. When a window is created at T0 with 10
minutes retention, when we saw a new record coming with event timestamp T0 + 10
+1, we will expire that window (remove it) out of RocksDB.
In the real world, it's very easy to see event coming with future timestamp (or
out-of-order events coming with big time gaps between events), this way of
retiring a window based on one event's event timestamp is dangerous. I think
at least we need to consider both the event's event time and server/stream time
elapse.
was:
Currently state store replication always go through a compact kafka topic. For
some state stores, e.g. JoinWindow, there are no duplicates in the store, there
is not much benefit using a compacted topic.
The problem of using compacted topic is the records can stay in kafka broker
forever. In my use case, my key is ad_id, it's incrementing all the time, not
bounded, I am worried the disk space on broker for that topic will go forever.
I think we either need the capability to purge the compacted records on broker,
or allow us to specify different compact option for state store replication.
> Kafka Streams: Window expiration needs to consider more than event time
> -----------------------------------------------------------------------
>
> Key: KAFKA-3596
> URL: https://issues.apache.org/jira/browse/KAFKA-3596
> Project: Kafka
> Issue Type: Improvement
> Components: streams
> Affects Versions: 0.10.1.0
> Reporter: Henry Cai
> Assignee: Guozhang Wang
> Priority: Minor
>
> Currently in Kafka Streams, the way the windows are expired in RocksDB is
> triggered by new event insertion. When a window is created at T0 with 10
> minutes retention, when we saw a new record coming with event timestamp T0 +
> 10 +1, we will expire that window (remove it) out of RocksDB.
> In the real world, it's very easy to see event coming with future timestamp
> (or out-of-order events coming with big time gaps between events), this way
> of retiring a window based on one event's event timestamp is dangerous. I
> think at least we need to consider both the event's event time and
> server/stream time elapse.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)