[ https://issues.apache.org/jira/browse/KAFKA-6577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16371989#comment-16371989 ]
Randall Hauch commented on KAFKA-6577: -------------------------------------- See KAFKA-6578 for a change to catch and log all runtime exceptions. > Connect standalone SASL file source and sink test fails without explanation > --------------------------------------------------------------------------- > > Key: KAFKA-6577 > URL: https://issues.apache.org/jira/browse/KAFKA-6577 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect, system tests > Affects Versions: 1.1.0 > Reporter: Randall Hauch > Assignee: Randall Hauch > Priority: Blocker > Fix For: 1.1.0 > > > The > {{tests/kafkatest/tests/connect/connect_test.py::ConnectStandaloneFileTest.test_file_source_and_sink}} > test is failing with the SASL configuration without a sufficient > explanation. During the test, the Connect worker fails to start, but the > Connect log contains no useful information. > There are actual several things compounding to cause the failure and make it > difficult to understand the problem. > First, the > {{tests/kafkatest/tests/connect/templates/connect_standalone.properties}} is > only adding in the broker's security configuration with the "producer." and > "consumer." prefixes, but is not adding them with no prefix. The worker uses > the AdminClient to connect to the broker to get the Kafka cluster ID and to > manage the three internal topics, and the AdminClient is configured via > top-level properties. Because the SASL test requires the clients all connect > using SASL, the lack of broker security configs means the AdminClient was > attempting and failing to connect to the broker. This is corrected by adding > the broker's security configuration to the Connect worker configuration file > at the top-level. (This was already being done in the > {{connect_distributed.properties}} file.) > Second, the default {{request.timeout.ms}} for the AdminClient (and the other > clients) is 120 seconds, so the AdminClient was retrying for 120 seconds > before it would give up and thrown an error. However, the test was only > waiting for 60 seconds before determining that the service failed to start. > This can be corrected by setting {{request.timeout.ms=10000}} in the Connect > worker configurations (both distributed and standalone). > Third, the Connect workers were recently changed to lookup the Kafka cluster > ID before it started the herder. This is unlike the older uses of the > AdminClient to find and manage the internal topics, where failure to connect > was not necessarily logged correctly but nevertheless still skipped over, > relying upon broker auto-topic creation to create the internal topics. (This > may be why the test did not fail prior to the recent change to always require > a successful AdminClient connection.) Although the worker never got this far > in its startup process, the fact that we missed such an error since the prior > releases means that failure to connect with the AdminClient was not being > properly reported. -- This message was sent by Atlassian JIRA (v7.6.3#76005)