C0urante opened a new pull request, #16306:
URL: https://github.com/apache/kafka/pull/16306

   Similar to https://github.com/apache/kafka/pull/16286.
   
   This test is pretty flaky and has failed on 7% of all trunk builds in the 
last 90 days (see [Gradle 
Enterprise](https://ge.apache.org/scans/tests?search.startTimeMax=1718207141545&search.startTimeMin=1710388800000&search.tags=trunk&search.timeZoneId=America%2FNew_York&tests.container=org.apache.kafka.connect.integration.ExactlyOnceSourceIntegrationTest&tests.sortField=FLAKY)).
   
   Part of this test includes bringing up a separate Kafka cluster that is 
targeted by a source connector. We do not currently wait on the successful 
startup of that Kafka cluster before starting that connector, and we do not 
wait on the successful startup of the connector and its tasks before waiting 
for the connector to produce records within a bounded timeout.
   
   By adding assertions that the separate Kafka cluster and the connector+tasks 
are healthy before waiting for the connector to produce records, we accomplish 
two things:
   - We reduce the chance of flaky failures by allowing more time to pass for 
more resource-intensive operations to complete (5 minutes for Kafka cluster 
startup and 2 minutes for connector+tasks startup, vs. 30 seconds for record 
production)
   - We also provide more granularity into possible causes of failure; if the 
separate Kafka cluster or the connector+tasks fail to start, tests should 
report that failure directly, instead of simply reporting that not enough 
records were produced in time
   
   Although there is a decent change that this change will reduce flakiness for 
the affected test, the second benefit (more informative failure messages) is 
IMO significant enough that a close examination of logs for failed builds, 
multiple CI runs with this change, or other time-consuming efforts are not 
warranted.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to