[
https://issues.apache.org/jira/browse/KAFKA-10391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Guozhang Wang resolved KAFKA-10391.
-----------------------------------
Fix Version/s: 2.7.0
Resolution: Fixed
> Streams should overwrite checkpoint excluding corrupted partitions
> ------------------------------------------------------------------
>
> Key: KAFKA-10391
> URL: https://issues.apache.org/jira/browse/KAFKA-10391
> Project: Kafka
> Issue Type: Bug
> Components: streams
> Reporter: Guozhang Wang
> Assignee: Guozhang Wang
> Priority: Major
> Fix For: 2.7.0
>
>
> While working on https://issues.apache.org/jira/browse/KAFKA-9450 I
> discovered another bug in Streams: when some partitions are corrupted due to
> offsets out of range, we treat it as task corrupted and would close them as
> dirty and then revive. However we forget to overwrite the checkpoint file
> excluding those out-of-range partitions to let them be re-bootstrapped from
> the new log-start offset, and hence when the task is revived, it would still
> load the old offset and start from there and then get the out-of-range
> exception again. This may cause {{StreamsUpgradeTest.test_app_upgrade}} to be
> flaky.
> We do not see this often because in the past we always delete the checkpoint
> file after loading it and we usually only see the out-of-range exception at
> the beginning of the restoration but not during restoration.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)