C0urante commented on PR #15915:
URL: https://github.com/apache/kafka/pull/15915#issuecomment-2148092416

   Even under heavy load, ten seconds is an extremely long time for loggers to 
be set. I think something else may be wrong.
   
   It looks like the startup check for distributed workers could be 
insufficient. By default, we wait to see if a worker's REST API is initialized, 
which is done by querying the `/connectors` endpoint (see 
[here](https://github.com/apache/kafka/blob/55d38efcc5505a5a1bddb08ba05f4d923f8050f9/tests/kafkatest/services/connect.py#L115)).
 However, as was noted in https://github.com/apache/kafka/pull/15249, that 
check barely does anything aside from ensure that a worker has a valid config 
and has initialized its REST server. Is it possible that the failures you've 
seen were caused because workers in the cluster were still starting by the time 
we issued the logging level adjustment request and waited for it to take effect?
   
   If so, we can try first to change the startup mode for this test from 
`STARTUP_MODE_LISTEN` (the default) to `STARTUP_MODE_JOIN`, which should give 
stronger guarantees about worker readiness. And as a follow-up, there's 
[KIP-1017](https://cwiki.apache.org/confluence/display/KAFKA/KIP-1017%3A+Health+check+endpoint+for+Kafka+Connect),
 which can be used in situations like this to avoid having to use hacks like 
parsing log files or checking for nonexistent connectors to determine a 
worker's health and readiness.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to