[
https://issues.apache.org/jira/browse/KAFKA-6577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16371989#comment-16371989
]
Randall Hauch commented on KAFKA-6577:
--------------------------------------
See KAFKA-6578 for a change to catch and log all runtime exceptions.
> Connect standalone SASL file source and sink test fails without explanation
> ---------------------------------------------------------------------------
>
> Key: KAFKA-6577
> URL: https://issues.apache.org/jira/browse/KAFKA-6577
> Project: Kafka
> Issue Type: Bug
> Components: KafkaConnect, system tests
> Affects Versions: 1.1.0
> Reporter: Randall Hauch
> Assignee: Randall Hauch
> Priority: Blocker
> Fix For: 1.1.0
>
>
> The
> {{tests/kafkatest/tests/connect/connect_test.py::ConnectStandaloneFileTest.test_file_source_and_sink}}
> test is failing with the SASL configuration without a sufficient
> explanation. During the test, the Connect worker fails to start, but the
> Connect log contains no useful information.
> There are actual several things compounding to cause the failure and make it
> difficult to understand the problem.
> First, the
> {{tests/kafkatest/tests/connect/templates/connect_standalone.properties}} is
> only adding in the broker's security configuration with the "producer." and
> "consumer." prefixes, but is not adding them with no prefix. The worker uses
> the AdminClient to connect to the broker to get the Kafka cluster ID and to
> manage the three internal topics, and the AdminClient is configured via
> top-level properties. Because the SASL test requires the clients all connect
> using SASL, the lack of broker security configs means the AdminClient was
> attempting and failing to connect to the broker. This is corrected by adding
> the broker's security configuration to the Connect worker configuration file
> at the top-level. (This was already being done in the
> {{connect_distributed.properties}} file.)
> Second, the default {{request.timeout.ms}} for the AdminClient (and the other
> clients) is 120 seconds, so the AdminClient was retrying for 120 seconds
> before it would give up and thrown an error. However, the test was only
> waiting for 60 seconds before determining that the service failed to start.
> This can be corrected by setting {{request.timeout.ms=10000}} in the Connect
> worker configurations (both distributed and standalone).
> Third, the Connect workers were recently changed to lookup the Kafka cluster
> ID before it started the herder. This is unlike the older uses of the
> AdminClient to find and manage the internal topics, where failure to connect
> was not necessarily logged correctly but nevertheless still skipped over,
> relying upon broker auto-topic creation to create the internal topics. (This
> may be why the test did not fail prior to the recent change to always require
> a successful AdminClient connection.) Although the worker never got this far
> in its startup process, the fact that we missed such an error since the prior
> releases means that failure to connect with the AdminClient was not being
> properly reported.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)