[
https://issues.apache.org/jira/browse/KAFKA-9846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Randall Hauch updated KAFKA-9846:
---------------------------------
Fix Version/s: (was: 2.6.0)
2.7.0
Since this is not a blocker issue, as part of the 2.6.0 release process I'm
changing the fix version to `2.7.0`. If this is incorrect, please respond and
discuss on the "[DISCUSS] Apache Kafka 2.6.0 release" discussion mailing list
thread.
> Race condition can lead to severe lag underestimate for active tasks
> --------------------------------------------------------------------
>
> Key: KAFKA-9846
> URL: https://issues.apache.org/jira/browse/KAFKA-9846
> Project: Kafka
> Issue Type: Bug
> Components: streams
> Affects Versions: 2.5.0
> Reporter: Sophie Blee-Goldman
> Priority: Critical
> Fix For: 2.7.0
>
>
> In KIP-535 we added the ability to query still-restoring and standby tasks.
> To give users control over how out of date the data they fetch can be, we
> added an API to KafkaStreams that fetches the end offsets for all changelog
> partitions and computes the lag for each local state store.
> During this lag computation, we check whether an active task is in RESTORING
> and calculate the actual lag if so. If not, we assume it's in RUNNING and
> return a lag of zero. However, tasks may be in other states besides running
> and restoring; notably they first pass through the CREATED state before
> getting to RESTORING. A CREATED task may happen to be caught-up to the end
> offset, but in many cases it is likely to be lagging or even completely
> uninitialized.
> This introduces a race condition where users may be led to believe that a
> task has zero lag and is "safe" to query even with the strictest correctness
> guarantees, while the task is actually lagging by some unknown amount.
> During transfer of ownership of the task between different threads on the
> same machine, tasks can actually spend a while in CREATED while the new owner
> waits to acquire the task directory lock. So, this race condition may not be
> particularly rare in multi-threaded Streams applications
--
This message was sent by Atlassian Jira
(v8.3.4#803005)