Balaji Rao created KAFKA-14624:
----------------------------------
Summary: State restoration is broken with standby tasks and
cache-enabled stores in processor API
Key: KAFKA-14624
URL: https://issues.apache.org/jira/browse/KAFKA-14624
Project: Kafka
Issue Type: Bug
Components: streams
Affects Versions: 3.3.1
Reporter: Balaji Rao
I found that cache-enabled state stores in PAPI with standby tasks sometimes
returns stale data when a partition moves from one app instance to another and
back. [Here's|https://github.com/balajirrao/kafka-streams-multi-runner] a small
project that I used to reproduce the issue.
I dug around a bit and it seems like it's a bug in standby task state
restoration when caching is enabled. If a partition moves from instance 1 to 2
and then back to instance 1, since the `CachingKeyValueStore` doesn't register
a restore callback, it can return potentially stale data for non-dirty keys.
I could fix the issue by modifying the `CachingKeyValueStore` to register a
restore callback in which the cache restored keys are added to the cache. Is
this fix in the right direction?
{code:java}
// register the store
context.register(
root,
(RecordBatchingStateRestoreCallback) records -> {
for (final ConsumerRecord<byte[], byte[]> record : records)
{
put(Bytes.wrap(record.key()), record.value());
}
}
);
{code}
I would like to contribute a fix, if I can get some help!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)