[ 
https://issues.apache.org/jira/browse/KAFKA-6577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16371989#comment-16371989
 ] 

Randall Hauch commented on KAFKA-6577:
--------------------------------------

See KAFKA-6578 for a change to catch and log all runtime exceptions.

> Connect standalone SASL file source and sink test fails without explanation
> ---------------------------------------------------------------------------
>
>                 Key: KAFKA-6577
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6577
>             Project: Kafka
>          Issue Type: Bug
>          Components: KafkaConnect, system tests
>    Affects Versions: 1.1.0
>            Reporter: Randall Hauch
>            Assignee: Randall Hauch
>            Priority: Blocker
>             Fix For: 1.1.0
>
>
> The 
> {{tests/kafkatest/tests/connect/connect_test.py::ConnectStandaloneFileTest.test_file_source_and_sink}}
>  test is failing with the SASL configuration without a sufficient 
> explanation. During the test, the Connect worker fails to start, but the 
> Connect log contains no useful information.
> There are actual several things compounding to cause the failure and make it 
> difficult to understand the problem.
> First, the 
> {{tests/kafkatest/tests/connect/templates/connect_standalone.properties}} is 
> only adding in the broker's security configuration with the "producer." and 
> "consumer." prefixes, but is not adding them with no prefix. The worker uses 
> the AdminClient to connect to the broker to get the Kafka cluster ID and to 
> manage the three internal topics, and the AdminClient is configured via 
> top-level properties. Because the SASL test requires the clients all connect 
> using SASL, the lack of broker security configs means the AdminClient was 
> attempting and failing to connect to the broker. This is corrected by adding 
> the broker's security configuration to the Connect worker configuration file 
> at the top-level. (This was already being done in the 
> {{connect_distributed.properties}} file.)
> Second, the default {{request.timeout.ms}} for the AdminClient (and the other 
> clients) is 120 seconds, so the AdminClient was retrying for 120 seconds 
> before it would give up and thrown an error. However, the test was only 
> waiting for 60 seconds before determining that the service failed to start. 
> This can be corrected by setting {{request.timeout.ms=10000}} in the Connect 
> worker configurations (both distributed and standalone).
> Third, the Connect workers were recently changed to lookup the Kafka cluster 
> ID before it started the herder. This is unlike the older uses of the 
> AdminClient to find and manage the internal topics, where failure to connect 
> was not necessarily logged correctly but nevertheless still skipped over, 
> relying upon broker auto-topic creation to create the internal topics. (This 
> may be why the test did not fail prior to the recent change to always require 
> a successful AdminClient connection.) Although the worker never got this far 
> in its startup process, the fact that we missed such an error since the prior 
> releases means that failure to connect with the AdminClient was not being 
> properly reported.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to