[ https://issues.apache.org/jira/browse/KAFKA-13370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17444638#comment-17444638 ]
Randall Hauch edited comment on KAFKA-13370 at 11/16/21, 5:01 PM: ------------------------------------------------------------------ I reverted the change (https://github.com/apache/kafka/pull/9642) that caused this regression in the following branches: * `2.8` for inclusion in a future 2.8.2 release * `3.0` for inclusion in a future 3.0.1 release * `3.1` for inclusion in the upcoming 3.1.0 release * `trunk` for inclusion in the next major/minor release (e.g., 3.2.0 or 4.0.0) was (Author: rhauch): I reverted the change (https://github.com/apache/kafka/pull/9642) that caused this regression in the following branches: * `2.8` for inclusion in a future 2.8.2 release * `3.0` for inclusion in a future 3.0.1 release * `3.1` for inclusion in the upcoming 3.1.0 release * `trunk` for inclusion in the next major/minor release (e.g., 3.2.0 or 4.0.0) > Offset commit failure percentage metric is not computed correctly (regression) > ------------------------------------------------------------------------------ > > Key: KAFKA-13370 > URL: https://issues.apache.org/jira/browse/KAFKA-13370 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect, metrics > Affects Versions: 2.8.0 > Environment: Confluent Platform Helm Chart (v6.2.0) > Reporter: Vincent Giroux > Assignee: Luke Chen > Priority: Blocker > Fix For: 3.1.0, 3.0.1, 2.8.2 > > > There seems to have been a regression in the way the offset-commit-* metrics > are calculated for *source* Kafka Connect connectors since version 2.8.0. > Before this version, any timeout or interruption while trying to commit > offsets for source connectors (e.g. MM2 MirrorSourceConnector) would get > correctly flagged as an offset commit failure (i.e the > *offset-commit-failure-percentage* metric ** would be non-zero). Since > version 2.8.0, these errors are considered as successes. > After digging through the code, the commit where this bug was introduced > appears to be this one : > [https://github.com/apache/kafka/commit/047ad654da7903f3903760b0e6a6a58648ca7715] > I believe removing the boolean *success* argument in the *recordCommit* > method of the *WorkerTask* class (argument deemed redundant because of the > presence of the Throwable *error* argument) and only considering the presence > of a non-null error to determine if a commit is a success or failure might be > a mistake. This is because in the *commitOffsets* method of the > *WorkerSourceTask* class, there are multiple cases where an exception object > is either not available or is not passed to the *recordCommitFailure* method, > e.g. : > * *TImeout #1* : > [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L519] > > * *Timeout #2* : > [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L584] > > * *Interruption* : > [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L529] > > * *Unserializable offset* : > [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L562] > > -- This message was sent by Atlassian Jira (v8.20.1#820001)