[
https://issues.apache.org/jira/browse/KAFKA-17493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881001#comment-17881001
]
Sagar Rao edited comment on KAFKA-17493 at 9/11/24 3:49 PM:
------------------------------------------------------------
[~dajac] , [~ChrisEgerton] I took a look at the logs for
[testGetSinkConnectorOffsets|https://ge.apache.org/scans/tests?search.rootProjectNames=kafka&search.startTimeMax=1725681599999&search.startTimeMin=1724731200000&search.tags=trunk&search.timeZoneId=America%2FNew_York&tests.container=org.apache.kafka.connect.integration.OffsetsApiIntegrationTest&tests.sortField=FLAKY&tests.test=testGetSinkConnectorOffsets()].
I noticed a couple of differences which which may contribute to the flakiness
(not totally sure at this point):
1) For the passed test case, I see that when the test passes, at that point we
are spinning up a new connect cluster. When that happens, I see
[verifyClusterReadiness|https://github.com/apache/kafka/blob/trunk/connect/runtime/src/test/java/org/apache/kafka/connect/util/clusters/EmbeddedKafkaCluster.java#L181]
getting triggered which checks whether the kafka cluster is ready or not and
also an Admin client is able to do admin stuff. In the failing case, I see we
don't have that and instead we reuse an existing connect cluster as per
[this|https://github.com/apache/kafka/blob/trunk/connect/runtime/src/test/java/org/apache/kafka/connect/integration/OffsetsApiIntegrationTest.java#L128-L149].]
2) In the failed test, the connector comes up properly till this point, but it
appears to me that it gets stuck when trying to read the offsets using the
Admin client
[here|https://github.com/apache/kafka/blob/trunk/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/Worker.java#L1234-L1252]
I see the same line in the stacktrace as well
```
at
org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
at
org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
at org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63)
at org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36)
at org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:214)
at
org.apache.kafka.test.TestUtils.lambda$waitForCondition$3(TestUtils.java:397)
at
org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:445)
at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:394)
at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:378)
at
org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.verifyExpectedSinkConnectorOffsets(OffsetsApiIntegrationTest.java:999)
at
org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.getAndVerifySinkConnectorOffsets(OffsetsApiIntegrationTest.java:226)
at
org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.testGetSinkConnectorOffsets(OffsetsApiIntegrationTest.java:173)
at java.lang.reflect.Method.invoke(Method.java:569)
at java.util.ArrayList.forEach(ArrayList.java:1511)
at java.util.ArrayList.forEach(ArrayList.java:1511)
```
We are trying to use the AdminClient to read the sink connector offsets
[here|https://github.com/apache/kafka/blob/trunk/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/Worker.java#L1234-L1252].]
There's not much indication in the logs as to why this is happening.
was (Author: sagarrao):
[~dajac] , [~ChrisEgerton] I took a look at the logs for
[testGetSinkConnectorOffsets|https://ge.apache.org/scans/tests?search.rootProjectNames=kafka&search.startTimeMax=1725681599999&search.startTimeMin=1724731200000&search.tags=trunk&search.timeZoneId=America%2FNew_York&tests.container=org.apache.kafka.connect.integration.OffsetsApiIntegrationTest&tests.sortField=FLAKY&tests.test=testGetSinkConnectorOffsets()].
I noticed a couple of differences which which may contribute to the flakiness
(not totally sure at this point):
1) For the passed test case, I see that when the test passes, at that point we
are spinning up a new connect cluster. When that happens, I see
[verifyClusterReadiness|https://github.com/apache/kafka/blob/trunk/connect/runtime/src/test/java/org/apache/kafka/connect/util/clusters/EmbeddedKafkaCluster.java#L181]
getting triggered which checks whether the kafka cluster is ready or not and
also an Admin client is able to do admin stuff. In the failing case, I see we
don't have that and instead we reuse an existing connect cluster as per
[this|https://github.com/apache/kafka/blob/trunk/connect/runtime/src/test/java/org/apache/kafka/connect/integration/OffsetsApiIntegrationTest.java#L128-L149].]
2) In the failed test, the connector comes up properly till this point, but it
appears to me that it gets stuck when trying to read the offsets using the
Admin client
[here|https://github.com/apache/kafka/blob/trunk/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/Worker.java#L1234-L1252]
I see the same line in the stacktrace as well
```
at
org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
at
org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
at org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63)
at org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36)
at org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:214)
at
org.apache.kafka.test.TestUtils.lambda$waitForCondition$3(TestUtils.java:397)
at
org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:445)
at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:394)
at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:378)
at
org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.verifyExpectedSinkConnectorOffsets(OffsetsApiIntegrationTest.java:999)
at
org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.getAndVerifySinkConnectorOffsets(OffsetsApiIntegrationTest.java:226)
at
org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.testGetSinkConnectorOffsets(OffsetsApiIntegrationTest.java:173)
at java.lang.reflect.Method.invoke(Method.java:569)
at java.util.ArrayList.forEach(ArrayList.java:1511)
at java.util.ArrayList.forEach(ArrayList.java:1511)
```
We are trying to use the AdminClient to read the sink connector offsets
[here|#L1234-L1252].] There's not much indication in the logs as to why this
is happening.
> Sink connector-related OffsetsApiIntegrationTest suite test cases failing
> more frequently with new consumer/group coordinator
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: KAFKA-17493
> URL: https://issues.apache.org/jira/browse/KAFKA-17493
> Project: Kafka
> Issue Type: Test
> Components: connect, consumer, group-coordinator
> Reporter: Chris Egerton
> Priority: Major
>
> We recently updated trunk to use the new KIP-848 consumer/group coordinator
> by default, which appears to have led to an uptick in flakiness for the
> OffsetsApiIntegrationTest suite for Connect (specifically, the test cases
> that use sink connectors, which makes sense since they're the type of
> connector that uses a consumer group under the hood).
> Gradle Enterprise shows that in the week before that update was made, the
> test suite had a flakiness rate of about 4%
> (https://ge.apache.org/scans/tests?search.rootProjectNames=kafka&search.startTimeMax=1724558400000&search.startTimeMin=1723953600000&search.tags=trunk&search.timeZoneId=America%2FNew_York&tests.container=org.apache.kafka.connect.integration.*&tests.sortField=FLAKY),
> and in the week and a half since, the flakiness rate has jumped to 17%
> (https://ge.apache.org/scans/tests?search.rootProjectNames=kafka&search.startTimeMax=1725681599999&search.startTimeMin=1724731200000&search.tags=trunk&search.timeZoneId=America%2FNew_York&tests.container=org.apache.kafka.connect.integration.*&tests.sortField=FLAKY).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)