[ https://issues.apache.org/jira/browse/KAFKA-10391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
A. Sophie Blee-Goldman updated KAFKA-10391: ------------------------------------------- Affects Version/s: 2.7.0 > Streams should overwrite checkpoint excluding corrupted partitions > ------------------------------------------------------------------ > > Key: KAFKA-10391 > URL: https://issues.apache.org/jira/browse/KAFKA-10391 > Project: Kafka > Issue Type: Bug > Components: streams > Affects Versions: 2.7.0 > Reporter: Guozhang Wang > Assignee: Guozhang Wang > Priority: Major > Fix For: 2.7.0 > > > While working on https://issues.apache.org/jira/browse/KAFKA-9450 I > discovered another bug in Streams: when some partitions are corrupted due to > offsets out of range, we treat it as task corrupted and would close them as > dirty and then revive. However we forget to overwrite the checkpoint file > excluding those out-of-range partitions to let them be re-bootstrapped from > the new log-start offset, and hence when the task is revived, it would still > load the old offset and start from there and then get the out-of-range > exception again. This may cause {{StreamsUpgradeTest.test_app_upgrade}} to be > flaky. > We do not see this often because in the past we always delete the checkpoint > file after loading it and we usually only see the out-of-range exception at > the beginning of the restoration but not during restoration. -- This message was sent by Atlassian Jira (v8.3.4#803005)