[ https://issues.apache.org/jira/browse/KAFKA-10286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Randall Hauch resolved KAFKA-10286. ----------------------------------- Reviewer: Randall Hauch Resolution: Fixed > Connect system tests should wait for workers to join group > ---------------------------------------------------------- > > Key: KAFKA-10286 > URL: https://issues.apache.org/jira/browse/KAFKA-10286 > Project: Kafka > Issue Type: Test > Components: KafkaConnect > Affects Versions: 2.6.0 > Reporter: Greg Harris > Assignee: Greg Harris > Priority: Minor > Fix For: 2.3.2, 2.6.0, 2.4.2, 2.5.1, 2.7.0 > > > There are a few flakey test failures for {{connect_distributed_test}} in > which one of the workers does not join the group quickly, and the test fails > in the following manner: > # The test starts each of the connect workers, and waits for their REST APIs > to become available > # All workers start up, complete plugin scanning, and start their REST API > # At least one worker kicks off an asynchronous job to join the group that > hangs for a yet unknown reason (30s timeout) > # The test continues without all of the members joined > # The test makes a call to the REST api that it expects to succeed, and gets > an error > # The test fails without the worker ever joining the group > Instead of allowing the test to fail in this manner, we could wait for each > worker to join the group with the existing 60s startup timeout. This change > would go into effect for all system tests using the > {{ConnectDistributedService}}, currently just {{connect_distributed_test}} > and {{connect_rest_test}}. > Alternatively we could retry the operation that failed, or ensure that we use > a known-good worker to continue the test, but these would require more > involved code changes. The existing wait-for-startup logic is the most > natural place to fix this issue. -- This message was sent by Atlassian Jira (v8.3.4#803005)