Re: [PR] MINOR: Fix flaky test ConnectWorkerIntegrationTest::testReconfigureConnectorWithFailingTaskConfigs [kafka]

2024-06-11 Thread via GitHub


C0urante merged PR #16273:
URL: https://github.com/apache/kafka/pull/16273


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] MINOR: Fix flaky test ConnectWorkerIntegrationTest::testReconfigureConnectorWithFailingTaskConfigs [kafka]

2024-06-11 Thread via GitHub


C0urante commented on PR #16273:
URL: https://github.com/apache/kafka/pull/16273#issuecomment-2160373057

   Are you seeing these failures in CI, running locally in a normal 
environment, or running locally in a special environment? I don't think we need 
to worry about them if they aren't cropping up in the wild.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] MINOR: Fix flaky test ConnectWorkerIntegrationTest::testReconfigureConnectorWithFailingTaskConfigs [kafka]

2024-06-10 Thread via GitHub


gharris1727 commented on PR #16273:
URL: https://github.com/apache/kafka/pull/16273#issuecomment-2159746582

   I still see some remaining flakiness in this test.
   
   It fails ~10% of the time at 30% CPU, rising to ~70% of the time at 15% CPU.
   The failures are mostly this one:
   ```
   caught: org.apache.kafka.connect.errors.DataException: Insufficient records 
committed by connector simple-connector in 300 millis. Records expected=8, 
actual=0
   at 
org.apache.kafka.connect.integration.ConnectorHandle.awaitCommits(ConnectorHandle.java:213)
   at 
org.apache.kafka.connect.integration.ConnectWorkerIntegrationTest.testReconfigureConnectorWithFailingTaskConfigs(ConnectWorkerIntegrationTest.java:1292)
   at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   ```
   with a handful of these two:
   ```
   caught: java.lang.AssertionError: Connector tasks were not restarted in time
   at org.junit.Assert.fail(Assert.java:89)
   at org.junit.Assert.assertTrue(Assert.java:42)
   at 
org.apache.kafka.connect.integration.ConnectWorkerIntegrationTest.testReconfigureConnectorWithFailingTaskConfigs(ConnectWorkerIntegrationTest.java:1310)
   ```
   ```
   caught: org.apache.kafka.connect.errors.DataException: Insufficient records 
committed by connector simple-connector in 300 millis. Records expected=1, 
actual=0
   at 
org.apache.kafka.connect.integration.ConnectorHandle.awaitCommits(ConnectorHandle.java:213)
   at 
org.apache.kafka.connect.integration.ConnectWorkerIntegrationTest.testReconfigureConnectorWithFailingTaskConfigs(ConnectWorkerIntegrationTest.java:1317)
   ```
   
   I'll look into this more tomorrow if you need some more info.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] MINOR: Fix flaky test ConnectWorkerIntegrationTest::testReconfigureConnectorWithFailingTaskConfigs [kafka]

2024-06-10 Thread via GitHub


C0urante opened a new pull request, #16273:
URL: https://github.com/apache/kafka/pull/16273

   This test has been flaky since it was merged to trunk. To date, there have 
been 566 successful runs and 8 flaky failures (see [Gradle Enterprise 
analysis](https://ge.apache.org/scans/tests?search.relativeStartTime=P28D=kafka=America%2FNew_York=org.apache.kafka.connect.integration.ConnectWorkerIntegrationTest=Wzhd=testReconfigureConnectorWithFailingTaskConfigs)).
   
   One possible cause of this is that we establish an expectation on the number 
of offset commits that need to take place (two per task) before reconfiguring 
the connector, but the assumption in the test is that these commits will take 
place after the tasks have been restarted. In some rare cases, it's possible 
that these commits will have already taken place before the tasks are 
restarted, which causes an assertion failure with the message 
"java.lang.AssertionError: Source connector should have published at least one 
record to new Kafka topic after being reconfigured".
   
   This patch should resolve those failures by establishing the expected number 
of offset commits _after_ the connector has been reconfigured and its tasks 
have been restarted, which should guarantee that the offset commits are 
performed by tasks with the updated connector configuration.
   
   In addition, the number of expected offset commits is reduced to one, since 
a single commit is all that we need in order to expect at least one record is 
present in the new Kafka topic.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org