C0urante opened a new pull request, #16306: URL: https://github.com/apache/kafka/pull/16306
Similar to https://github.com/apache/kafka/pull/16286. This test is pretty flaky and has failed on 7% of all trunk builds in the last 90 days (see [Gradle Enterprise](https://ge.apache.org/scans/tests?search.startTimeMax=1718207141545&search.startTimeMin=1710388800000&search.tags=trunk&search.timeZoneId=America%2FNew_York&tests.container=org.apache.kafka.connect.integration.ExactlyOnceSourceIntegrationTest&tests.sortField=FLAKY)). Part of this test includes bringing up a separate Kafka cluster that is targeted by a source connector. We do not currently wait on the successful startup of that Kafka cluster before starting that connector, and we do not wait on the successful startup of the connector and its tasks before waiting for the connector to produce records within a bounded timeout. By adding assertions that the separate Kafka cluster and the connector+tasks are healthy before waiting for the connector to produce records, we accomplish two things: - We reduce the chance of flaky failures by allowing more time to pass for more resource-intensive operations to complete (5 minutes for Kafka cluster startup and 2 minutes for connector+tasks startup, vs. 30 seconds for record production) - We also provide more granularity into possible causes of failure; if the separate Kafka cluster or the connector+tasks fail to start, tests should report that failure directly, instead of simply reporting that not enough records were produced in time Although there is a decent change that this change will reduce flakiness for the affected test, the second benefit (more informative failure messages) is IMO significant enough that a close examination of logs for failed builds, multiple CI runs with this change, or other time-consuming efforts are not warranted. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org