Guozhang Wang created KAFKA-10391:
-------------------------------------
Summary: Streams should overwrite checkpoint excluding corrupted
partitions
Key: KAFKA-10391
URL: https://issues.apache.org/jira/browse/KAFKA-10391
Project: Kafka
Issue Type: Bug
Components: streams
Reporter: Guozhang Wang
Assignee: Guozhang Wang
While working on https://issues.apache.org/jira/browse/KAFKA-9450 I discovered
another bug in Streams: when some partitions are corrupted due to offsets out
of range, we treat it as task corrupted and would close them as dirty and then
revive. However we forget to overwrite the checkpoint file excluding those
out-of-range partitions to let them be re-bootstrapped from the new log-start
offset, and hence when the task is revived, it would still load the old offset
and start from there and then get the out-of-range exception again. This may
cause {{StreamsUpgradeTest.test_app_upgrade}} to be flaky.
We do not see this often because in the past we always delete the checkpoint
file after loading it and we usually only see the out-of-range exception at the
beginning of the restoration but not during restoration.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)