[ 
https://issues.apache.org/jira/browse/KAFKA-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-10816:
----------------------------------
    Description: 
There are a few ways to accurately detect whether a Connect worker is 
*completely* ready to process all REST requests:

# Wait for {{Herder started}} in the Connect worker logs
# Use the REST API to issue a request that will be completed only after the 
herder has started, such as {{GET /connectors/{name}/}} or {{GET 
/connectors/{name}/status}}.

Other techniques can be used to detect other startup states, though none of 
these will guarantee that the worker has indeed completely started up and can 
process all REST requests:

* {{GET /}} can be used to know when the REST server has started, but this may 
be before the worker has started completely and successfully.
* {{GET /connectors}} can be used to know when the REST server has started, but 
this may be before the worker has started completely and successfully. And, for 
the distributed Connect worker, this may actually return an older list of 
connectors if the worker hasn't yet completely read through the internal config 
topic. It's also possible that this request returns even if the worker is 
having trouble reading from the internal config topic.
* {{GET /connector-plugins}} can be used to know when the REST server has 
started, but this may be before the worker has started completely and 
successfully.

The Connect REST API should have an endpoint that more obviously and more 
simply can be used as a readiness probe. This could be a new resource (e.g., 
{{GET /status}}), though this would only work on newer Connect runtimes, and 
existing tooling, installations, and examples would have to be modified to take 
advantage of this feature (if it exists). 

Alternatively, we could make sure that the existing resources (e.g., {{GET /}} 
or {{GET /connectors}}) wait for the herder to start completely; this wouldn't 
require a KIP and it would not require clients use different technique for 
newer and older Connect runtimes. (Whether or not we back port this is another 
question altogether, since it's debatable whether the behavior of the existing 
REST resources is truly a bug.)

  was:
There are a few ways to accurately detect whether a Connect worker is 
*completely* ready to process all REST requests:

# Wait for `Herder started` in the Connect worker logs
# Use the REST API to issue a request that will be completed only after the 
herder has started, such as `GET /connectors/{name}/` or `GET 
/connectors/{name}/status`.

Other techniques can be used to detect other startup states, though none of 
these will guarantee that the worker has indeed completely started up and can 
process all REST requests:

* `GET /` can be used to know when the REST server has started, but this may be 
before the worker has started completely and successfully.
* `GET /connectors` can be used to know when the REST server has started, but 
this may be before the worker has started completely and successfully. And, for 
the distributed Connect worker, this may actually return an older list of 
connectors if the worker hasn't yet completely read through the internal config 
topic. It's also possible that this request returns even if the worker is 
having trouble reading from the internal config topic.
* `GET /connector-plugins` can be used to know when the REST server has 
started, but this may be before the worker has started completely and 
successfully.

The Connect REST API should have an endpoint that more obviously and more 
simply can be used as a readiness probe. This could be a new resource (e.g., 
`GET /status`), though this would only work on newer Connect runtimes, and 
existing tooling, installations, and examples would have to be modified to take 
advantage of this feature (if it exists). 

Alternatively, we could make sure that the existing resources (e.g., `GET /` or 
`GET /connectors`) wait for the herder to start completely; this wouldn't 
require a KIP and it would not require clients use different technique for 
newer and older Connect runtimes. (Whether or not we back port this is another 
question altogether, since it's debatable whether the behavior of the existing 
REST resources is truly a bug.)


> Connect REST API should have a resource that can be used as a readiness probe
> -----------------------------------------------------------------------------
>
>                 Key: KAFKA-10816
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10816
>             Project: Kafka
>          Issue Type: Improvement
>          Components: KafkaConnect
>            Reporter: Randall Hauch
>            Priority: Major
>
> There are a few ways to accurately detect whether a Connect worker is 
> *completely* ready to process all REST requests:
> # Wait for {{Herder started}} in the Connect worker logs
> # Use the REST API to issue a request that will be completed only after the 
> herder has started, such as {{GET /connectors/{name}/}} or {{GET 
> /connectors/{name}/status}}.
> Other techniques can be used to detect other startup states, though none of 
> these will guarantee that the worker has indeed completely started up and can 
> process all REST requests:
> * {{GET /}} can be used to know when the REST server has started, but this 
> may be before the worker has started completely and successfully.
> * {{GET /connectors}} can be used to know when the REST server has started, 
> but this may be before the worker has started completely and successfully. 
> And, for the distributed Connect worker, this may actually return an older 
> list of connectors if the worker hasn't yet completely read through the 
> internal config topic. It's also possible that this request returns even if 
> the worker is having trouble reading from the internal config topic.
> * {{GET /connector-plugins}} can be used to know when the REST server has 
> started, but this may be before the worker has started completely and 
> successfully.
> The Connect REST API should have an endpoint that more obviously and more 
> simply can be used as a readiness probe. This could be a new resource (e.g., 
> {{GET /status}}), though this would only work on newer Connect runtimes, and 
> existing tooling, installations, and examples would have to be modified to 
> take advantage of this feature (if it exists). 
> Alternatively, we could make sure that the existing resources (e.g., {{GET 
> /}} or {{GET /connectors}}) wait for the herder to start completely; this 
> wouldn't require a KIP and it would not require clients use different 
> technique for newer and older Connect runtimes. (Whether or not we back port 
> this is another question altogether, since it's debatable whether the 
> behavior of the existing REST resources is truly a bug.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to