Nicholas Telford created KAFKA-20711:
----------------------------------------
Summary: Streams task restore-remaining-records metric invalid
under EOS
Key: KAFKA-20711
URL: https://issues.apache.org/jira/browse/KAFKA-20711
Project: Kafka
Issue Type: Bug
Components: streams
Affects Versions: 4.3.0
Reporter: Nicholas Telford
Assignee: Nicholas Telford
The Kafka Streams Task-level metric {{restore-remaining-records}} is intended
to track the total number of records that still need to be restored.
When the application runs under EOS, this metric is inaccurate, never actually
dropping to 0 for fully restored tasks, and always showing values substantially
higher than reality.
The root-cause is that the metric is initialized with a total number of records
to restored derived as {{logEndOffset - committedOffset}}, using a
READ_UNCOMMITTED consumer.
This offset range naturally includes uncommitted records and transaction
markers, in addition to the actual records to restore.
When decrementing the metric during restore, we decrement by the actual number
of (committed) records that were restored. Since this excludes uncommitted
records and transaction markers, we will never decrement the metric by the
total it was initialized with.
I have a fix that I will raise a PR for.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)