[ 
https://issues.apache.org/jira/browse/KAFKA-8304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829849#comment-16829849
 ] 

ASF GitHub Bot commented on KAFKA-8304:
---------------------------------------

C0urante commented on pull request #6651: KAFKA-8304: Fix registration of 
Connect REST extensions
URL: https://github.com/apache/kafka/pull/6651
 
 
   [Jira](https://issues.apache.org/jira/browse/KAFKA-8304)
   
   Currently, a REST extension can cause deadlock if it requests the list of 
connectors from its `ConnectRestExtensionContext.clusterState()` in its 
`ConnectRestExtension.register(...)` method. This is because the 
`ConnectClusterState` implementation is backed by a `HerderProvider` that, at 
that time, has no associated `Herder` instance, and since that `Herder` is 
given to the `HerderProvider` later by the same thread, deadlock occurs until 
the call to `HerderProvider.get()` made by the `ConnectClusterStateImpl` times 
out. At this point, startup fails and the Connect worker dies.
   
   The changes here separate `RestServer.start(...)` into two separate methods. 
The first, `RestServer.initializeServer()`, starts the Jetty server and binds 
to a port, which ensure the accuracy of the `RestServer.advertisedUrl()` method 
that is used later on by both the `ConnectStandalone` and `ConnectDistributed` 
classes to determine the worker ID. The second, `RestServer.start(Herder 
herder, Plugins plugins)` actually creates the Connect REST resources 
(`RootResource`, `ConnectorsResource`, etc.) and registers all REST extensions.
   
   Since these changes make `HerderProvider` obsolete and it is not part of any 
public API, that interface is also removed.
   
   This approach ensures that the Connect REST interface is started only when 
all of its REST extensions have been successfully registered, which is 
important for security use cases where request filters are installed and parts 
of the Connect REST API are subject to authentication/authorization.
   
   Between the call to `RestServer.intializeServer()` and 
`RestServer.start(...)`, any requests made to the worker will result in a 404 
response. This is slightly less desirable than the current behavior, which is 
to block on requests until the herder is up and running, but shouldn't be too 
much of an issue.
   
   A new integration test is added that verifies that a call to 
`ConnectRestExtensionContext.clusterState().connectors()` succeeds. This test 
fails on the current `trunk` branch, and succeeds with the changes involved in 
this PR.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Connect susceptible to deadlock while registering REST extensions
> -----------------------------------------------------------------
>
>                 Key: KAFKA-8304
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8304
>             Project: Kafka
>          Issue Type: Bug
>          Components: KafkaConnect
>    Affects Versions: 2.2.0, 2.1.1, 2.0.2, 2.3.0, 2.1.2, 2.2.1
>            Reporter: Chris Egerton
>            Assignee: Chris Egerton
>            Priority: Blocker
>
> As part of KAFKA-7503, the {{ConnectClusterStateImpl}} class was altered to 
> use a {{HerderProvider}} instance instead of a {{Herder}}. However, the 
> Connect {{RestServer}} registers REST extensions before a herder is given to 
> that {{HerderProvider}}, so any extensions that invoke, e.g., 
> {{ConnectClusterState.connector()}} in their {{register(...)}} method end up 
> in a deadlock that eventually causes Connect startup to fail with the error 
> message "Timed out waiting for herder to be initialized."
> If possible, the {{HerderProvider}} used for {{ConnectClusterStateImpl}} 
> instances given to REST extensions should be supplied with a {{Herder}} 
> before those extensions are registered. If that isn't feasible, another 
> option could be to install Connect REST extensions on a separate thread so 
> that they don't block the Connect startup process and eventual call of 
> {{HerderProvider.setHerder(...)}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to