[ https://issues.apache.org/jira/browse/KAFKA-8304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829849#comment-16829849 ]
ASF GitHub Bot commented on KAFKA-8304: --------------------------------------- C0urante commented on pull request #6651: KAFKA-8304: Fix registration of Connect REST extensions URL: https://github.com/apache/kafka/pull/6651 [Jira](https://issues.apache.org/jira/browse/KAFKA-8304) Currently, a REST extension can cause deadlock if it requests the list of connectors from its `ConnectRestExtensionContext.clusterState()` in its `ConnectRestExtension.register(...)` method. This is because the `ConnectClusterState` implementation is backed by a `HerderProvider` that, at that time, has no associated `Herder` instance, and since that `Herder` is given to the `HerderProvider` later by the same thread, deadlock occurs until the call to `HerderProvider.get()` made by the `ConnectClusterStateImpl` times out. At this point, startup fails and the Connect worker dies. The changes here separate `RestServer.start(...)` into two separate methods. The first, `RestServer.initializeServer()`, starts the Jetty server and binds to a port, which ensure the accuracy of the `RestServer.advertisedUrl()` method that is used later on by both the `ConnectStandalone` and `ConnectDistributed` classes to determine the worker ID. The second, `RestServer.start(Herder herder, Plugins plugins)` actually creates the Connect REST resources (`RootResource`, `ConnectorsResource`, etc.) and registers all REST extensions. Since these changes make `HerderProvider` obsolete and it is not part of any public API, that interface is also removed. This approach ensures that the Connect REST interface is started only when all of its REST extensions have been successfully registered, which is important for security use cases where request filters are installed and parts of the Connect REST API are subject to authentication/authorization. Between the call to `RestServer.intializeServer()` and `RestServer.start(...)`, any requests made to the worker will result in a 404 response. This is slightly less desirable than the current behavior, which is to block on requests until the herder is up and running, but shouldn't be too much of an issue. A new integration test is added that verifies that a call to `ConnectRestExtensionContext.clusterState().connectors()` succeeds. This test fails on the current `trunk` branch, and succeeds with the changes involved in this PR. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Connect susceptible to deadlock while registering REST extensions > ----------------------------------------------------------------- > > Key: KAFKA-8304 > URL: https://issues.apache.org/jira/browse/KAFKA-8304 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect > Affects Versions: 2.2.0, 2.1.1, 2.0.2, 2.3.0, 2.1.2, 2.2.1 > Reporter: Chris Egerton > Assignee: Chris Egerton > Priority: Blocker > > As part of KAFKA-7503, the {{ConnectClusterStateImpl}} class was altered to > use a {{HerderProvider}} instance instead of a {{Herder}}. However, the > Connect {{RestServer}} registers REST extensions before a herder is given to > that {{HerderProvider}}, so any extensions that invoke, e.g., > {{ConnectClusterState.connector()}} in their {{register(...)}} method end up > in a deadlock that eventually causes Connect startup to fail with the error > message "Timed out waiting for herder to be initialized." > If possible, the {{HerderProvider}} used for {{ConnectClusterStateImpl}} > instances given to REST extensions should be supplied with a {{Herder}} > before those extensions are registered. If that isn't feasible, another > option could be to install Connect REST extensions on a separate thread so > that they don't block the Connect startup process and eventual call of > {{HerderProvider.setHerder(...)}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)