[ https://issues.apache.org/jira/browse/KAFKA-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Guozhang Wang updated KAFKA-7934: --------------------------------- Labels: new-streams-runtime-should-fix (was: ) > Optimize restore for windowed and session stores > ------------------------------------------------ > > Key: KAFKA-7934 > URL: https://issues.apache.org/jira/browse/KAFKA-7934 > Project: Kafka > Issue Type: Improvement > Components: streams > Reporter: Matthias J. Sax > Priority: Major > Labels: new-streams-runtime-should-fix > > During state restore of window/session stores, the changelog topic is scanned > from the oldest entries to the newest entry. This happen on a > record-per-record basis or in record batches. > During this process, new segments are created while time advances (base on > the record timestamp of the record that are restored). However, depending on > the retention time, we might expire segments during restore process later > again. This is wasteful. Because retention time is based on the largest > timestamp per partition, it is possible to compute a bound for live and > expired segment upfront (assuming that we know the largest timestamp). This > way, during restore, we could avoid creating segments that are expired later > anyway and skip over all corresponding records. > The problem is, that we don't know the largest timestamp per partition. Maybe > the broker timestamp index could help to provide an approximation for this > value. -- This message was sent by Atlassian Jira (v8.20.1#820001)